Basic genomic regions exploration

Project: Example bumphunter.

Introduction

This report is meant to help explore a set of genomic regions and was generated using the regionReport (Collado-Torres, Jaffe, and Leek, 2015) package. While the report is rich, it is meant to just start the exploration of the results and exemplify some of the code used to do so. If you need a more in-depth analysis for your specific data set you might want to use the customCode argument.

Most plots were made with using ggplot2 (Wickham, 2009).

Code setup

## knitrBoostrap and device chunk options
load_install('knitr')
opts_chunk$set(bootstrap.show.code = FALSE, dev = device)
if(!outputIsHTML) opts_chunk$set(bootstrap.show.code = FALSE, dev = device, echo = FALSE)
#### Libraries needed

## Bioconductor
load_install('bumphunter')
load_install('derfinder')
load_install('derfinderPlot')
load_install('GenomeInfoDb')
load_install('GenomicRanges')
load_install('ggbio')

## Transcription database to use by default
if(is.null(txdb)) {
    load_install('TxDb.Hsapiens.UCSC.hg19.knownGene')
    txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene
}

## CRAN
load_install('ggplot2')
if(!is.null(theme)) theme_set(theme)
load_install('grid')
load_install('gridExtra')
load_install('knitr')
load_install('RColorBrewer')
load_install('mgcv')
load_install('whisker')
load_install('DT')
load_install('devtools')

## Working behind the scenes
# load_install('knitcitations')
# load_install('rmarkdown')
## Optionally
# load_install('knitrBootstrap')

#### Code setup

## For ggplot
tmp <- regions
names(tmp) <- seq_len(length(tmp))
regions.df <- as.data.frame(tmp)
regions.df$width <- width(tmp)
rm(tmp)

## Special subsets: need at least 3 points for a density plot
keepChr <- table(regions.df$seqnames) > 2
regions.df.plot <- subset(regions.df, seqnames %in% names(keepChr[keepChr]))

if(hasSignificant) {
    ## Keep only those sig
    regions.df.sig <- regions.df[significantVar, ]
    keepChr <- table(regions.df.sig$seqnames) > 2
    regions.df.sig <- subset(regions.df.sig, seqnames %in% names(keepChr[keepChr]))
}

## Find which chrs are present in the data set
chrs <- levels(seqnames(regions))

## areaVar initialize
areaVar <- NULL

Quality checks

Region width

p2a <- ggplot(regions.df.plot, aes(x=log10(width), colour=seqnames)) +
    geom_line(stat='density') + labs(title='Density of region lengths') +
    xlab('Region width (log10)') + scale_colour_discrete(limits=chrs) +
    theme(legend.title=element_blank())
p2a

This plot shows the density of the region lengths for all regions.

for(i in seq_len(length(densityVars))) {
    densityVarName <- names(densityVars[i])
    densityVarName <- ifelse(is.null(densityVarName), densityVars[i], densityVarName)
   cat(knit_child(text = whisker.render(templateDensityInUse, list(varName = densityVars[i], densityVarName = densityVarName)), quiet = TRUE), sep = '\n')
}

Area

p3aarea <- ggplot(regions.df.plot[is.finite(regions.df.plot[, 'area']), ], aes(x=area, colour=seqnames)) +
    geom_line(stat='density') + labs(title='Density of Area') +
    xlab('Area') + scale_colour_discrete(limits=chrs) +
    theme(legend.title=element_blank())
p3aarea

This plot shows the density of the Area for all regions.

Value

p3avalue <- ggplot(regions.df.plot[is.finite(regions.df.plot[, 'value']), ], aes(x=value, colour=seqnames)) +
    geom_line(stat='density') + labs(title='Density of Value') +
    xlab('Value') + scale_colour_discrete(limits=chrs) +
    theme(legend.title=element_blank())
p3avalue

This plot shows the density of the Value for all regions.

Cluster Length

p3aclusterL <- ggplot(regions.df.plot[is.finite(regions.df.plot[, 'clusterL']), ], aes(x=clusterL, colour=seqnames)) +
    geom_line(stat='density') + labs(title='Density of Cluster Length') +
    xlab('Cluster Length') + scale_colour_discrete(limits=chrs) +
    theme(legend.title=element_blank())
p3aclusterL

This plot shows the density of the Cluster Length for all regions.

Genomic overview

The following plots were made using ggbio (Yin, Cook, and Lawrence, 2012) which in turn uses ggplot2 (Wickham, 2009). For more details check plotOverview in derfinderPlot (Collado-Torres, Jaffe, and Leek, 2015).

P-values

This plot shows the genomic locations of the regions found in the analysis. The significant regions are highlighted and the of the regions is shown on top of each chromosome (skipped because there was no applicable variable).

Annotation

## Annotate regions with bumphunter
if(is.null(annotation)) {
    genes <- annotateTranscripts(txdb = txdb)
    annotation <- matchGenes(x = regions, subject = genes)
}
## Warning:   Calling species() on a TxDb object is *deprecated*.
##   Please use organism() instead.
## Make the plot
plotOverview(regions=regions, annotation=annotation, type='annotation', base_size=overviewParams$base_size, areaRel=overviewParams$areaRel, legend.position=c(0.97, 0.12))

This genomic overview plot shows the annotation region type for the regions as determined using bumphunter (Jaffe, Murakami, Lee, Leek, et al., 2012). Note that the regions are shown only if the annotation information is available. Below is a table of the actual number of results per annotation region type.

annoReg <- table(annotation$region, useNA='always')
annoReg.df <- data.frame(Region=names(annoReg), Count=as.vector(annoReg))
if(outputIsHTML) {
    kable(annoReg.df, format = 'markdown', align=rep('c', 3))
} else {
    kable(annoReg.df)
}
Region Count
upstream 10
promoter 0
overlaps 5’ 0
inside 0
overlaps 3’ 0
close to 3’ 0
downstream 5
covers 0
NA 0

Annotation (significant)

This genomic overview plot shows the annotation region type for the statistically significant regions. Note that the regions are shown only if the annotation information is available. Plot skipped because there are no significant regions.

Best regions

Genomic states

Below is a table summarizing the number of genomic states per region as determined using derfinder (Collado-Torres, Frazee, Love, Irizarry, et al., 2015).

## Construct genomic state object
genomicState <- makeGenomicState(txdb = txdb, chrs = chrs, verbose = FALSE)
## 'select()' returned 1:1 mapping between keys and columns
## Annotate regions by genomic state
annotatedRegions <- annotateRegions(regions, genomicState$fullGenome, verbose = FALSE)

## Genomic states table
info <- do.call(rbind, lapply(annotatedRegions$countTable, function(x) { data.frame(table(x)) }))
colnames(info) <- c('Number of Overlapping States', 'Frequency')
info$State <- gsub('\\..*', '', rownames(info))
rownames(info) <- NULL
if(outputIsHTML) {
    kable(info, format = 'markdown', align=rep('c', 4))
} else {
    kable(info)
}
Number of Overlapping States Frequency State
0 15 exon
1 15 intergenic
0 15 intron

The following is a venn diagram showing how many regions overlap known exons, introns, and intergenic segments, none of them, or multiple of these groups.

## Venn diagram for all regions
venn <- vennRegions(annotatedRegions, counts.col = 'blue', 
    main = 'Regions overlapping genomic states')

Region information

Below is an interactive table with the top 15 regions (out of 15) as ranked by p-value without ranking because no p-value information was provided. Inf and -Inf are shown as 1e100 and -1e100 respectively. Use the search function to find your region of interest or sort by one of the columns.

## Add annotation information
regions.df <- cbind(regions.df, annotation)

## Rank by p-value (first pvalue variable supplied)
if(hasPvalueVars){
    topRegions <- head(regions.df[order(regions.df[, pvalueVars[1]], 
        decreasing = FALSE), ], nBestRegions)
    topRegions <- cbind(data.frame('pvalueRank' = seq_len(nrow(topRegions))), 
        topRegions)
} else {
    topRegions <- head(regions.df, nBestRegions)
}

## Clean up -Inf, Inf if present
## More details at https://github.com/ramnathv/rCharts/issues/259
replaceInf <- function(df, colsubset=seq_len(ncol(df))) {
    for(i in colsubset) {
        inf.idx <- !is.finite(df[, i])
        if(any(inf.idx)) {
            inf.sign <- sign(df[inf.idx, i])
            df[inf.idx, i] <- inf.sign * 1e100
        }
    }
    return(df)
}
topRegions <- replaceInf(topRegions, which(sapply(topRegions, function(x) {
    class(x) %in% c('numeric', 'integer')})))

## Make the table
greptext <- 'value$|area$|mean|log2FoldChange'
greppval <- 'pvalues$|qvalues$|fwer$'
if(hasPvalueVars) {
    greppval <- paste0(paste(pvalueVars, collapse = '$|'), '$|', greppval)
}
if(hasDensityVars) {
    greptext <- paste0(paste(densityVars, collapse = '$|'), '$|', greptext)
}

for(i in which(grepl(greppval, colnames(topRegions)))) topRegions[, i] <- format(topRegions[, i], scientific = TRUE)

if(outputIsHTML) {
    datatable(topRegions, options = list(pagingType='full_numbers', pageLength=10, scrollX='100%'), rownames = FALSE) %>% formatRound(which(grepl(greptext, colnames(topRegions))), digits)
} else {
    ## Only print the top part if your output is a PDF file
    df_top <- head(topRegions, 20)
    for(i in which(grepl(greptext, colnames(topRegions)))) df_top[, i] <- round(df_top[, i], digits)
    kable(df_top)
}

Reproducibility

This report was generated in path /Users/lcollado/Dropbox/JHSPH/Code/regionReportSupp using the following call to renderReport():

## renderReport(regions = regions, project = "Example bumphunter", 
##     pvalueVars = NULL, densityVars = c(Area = "area", Value = "value", 
##         `Cluster Length` = "clusterL"), significantVar = NULL, 
##     outdir = "bumphunter-example", output = "index", device = "png", 
##     template = "/Users/lcollado/Dropbox/JHSPH/Code/regionReportSupp/bumphunter-example/regionReportBumphunter.Rmd")

Date the report was generated.

## [1] "2016-04-12 07:36:16 EDT"

Wallclock time spent generating the report.

## Time difference of 3.061 mins

R session information.

## Session info -----------------------------------------------------------------------------------------------------------
##  setting  value                                    
##  version  R version 3.3.0 alpha (2016-03-23 r70368)
##  system   x86_64, darwin13.4.0                     
##  ui       AQUA                                     
##  language (EN)                                     
##  collate  en_US.UTF-8                              
##  tz       America/New_York                         
##  date     2016-04-12
## Packages ---------------------------------------------------------------------------------------------------------------
##  package                           * version  date       source                                   
##  acepack                             1.3-3.3  2014-11-24 CRAN (R 3.3.0)                           
##  annotate                            1.49.1   2016-02-06 Bioconductor                             
##  AnnotationDbi                     * 1.33.8   2016-04-10 Bioconductor                             
##  AnnotationHub                       2.3.16   2016-03-25 Bioconductor                             
##  backports                           1.0.2    2016-03-18 CRAN (R 3.3.0)                           
##  bibtex                              0.4.0    2014-12-31 CRAN (R 3.3.0)                           
##  Biobase                           * 2.31.3   2016-01-14 Bioconductor                             
##  BiocGenerics                      * 0.17.4   2016-04-07 Bioconductor                             
##  BiocInstaller                       1.21.4   2016-03-23 Bioconductor                             
##  BiocParallel                        1.5.21   2016-03-23 Bioconductor                             
##  BiocStyle                         * 1.99.0   2016-04-05 Bioconductor                             
##  biomaRt                             2.27.2   2016-01-14 Bioconductor                             
##  Biostrings                          2.39.12  2016-02-21 Bioconductor                             
##  biovizBase                          1.19.6   2016-04-06 Bioconductor                             
##  bitops                              1.0-6    2013-08-17 CRAN (R 3.3.0)                           
##  BSgenome                            1.39.4   2016-02-21 Bioconductor                             
##  bumphunter                        * 1.11.5   2016-03-29 Bioconductor                             
##  checkmate                           1.7.4    2016-04-08 CRAN (R 3.3.0)                           
##  cluster                             2.0.3    2015-07-21 CRAN (R 3.3.0)                           
##  codetools                           0.2-14   2015-07-15 CRAN (R 3.3.0)                           
##  colorspace                          1.2-6    2015-03-11 CRAN (R 3.3.0)                           
##  DBI                               * 0.3.1    2014-09-24 CRAN (R 3.3.0)                           
##  DEFormats                           0.99.8   2016-03-31 Bioconductor                             
##  derfinder                         * 1.5.30   2016-03-25 Bioconductor                             
##  derfinderHelper                     1.5.3    2016-03-23 Bioconductor                             
##  derfinderPlot                     * 1.5.7    2016-03-23 Bioconductor                             
##  DESeq2                              1.11.42  2016-04-10 Bioconductor                             
##  devtools                          * 1.10.0   2016-01-23 CRAN (R 3.3.0)                           
##  dichromat                           2.0-0    2013-01-24 CRAN (R 3.3.0)                           
##  digest                              0.6.9    2016-01-08 CRAN (R 3.3.0)                           
##  doRNG                               1.6      2014-03-07 CRAN (R 3.3.0)                           
##  DT                                * 0.1      2015-06-09 CRAN (R 3.3.0)                           
##  edgeR                               3.13.8   2016-04-08 Bioconductor                             
##  ensembldb                           1.3.19   2016-04-03 Bioconductor                             
##  evaluate                            0.8.3    2016-03-05 CRAN (R 3.3.0)                           
##  foreach                           * 1.4.3    2015-10-13 CRAN (R 3.3.0)                           
##  foreign                             0.8-66   2015-08-19 CRAN (R 3.3.0)                           
##  formatR                             1.3      2016-03-05 CRAN (R 3.3.0)                           
##  Formula                             1.2-1    2015-04-07 CRAN (R 3.3.0)                           
##  genefilter                          1.53.3   2016-03-23 Bioconductor                             
##  geneplotter                         1.49.0   2016-01-14 Bioconductor                             
##  GenomeInfoDb                      * 1.7.6    2016-01-29 Bioconductor                             
##  GenomicAlignments                   1.7.20   2016-02-25 Bioconductor                             
##  GenomicFeatures                   * 1.23.29  2016-04-05 Bioconductor                             
##  GenomicFiles                        1.7.9    2016-02-22 Bioconductor                             
##  GenomicRanges                     * 1.23.25  2016-03-31 Bioconductor                             
##  GGally                              1.0.1    2016-01-14 CRAN (R 3.3.0)                           
##  ggbio                             * 1.19.13  2016-04-03 Bioconductor                             
##  ggplot2                           * 2.1.0    2016-03-01 CRAN (R 3.3.0)                           
##  graph                               1.49.1   2016-01-14 Bioconductor                             
##  gridExtra                         * 2.2.1    2016-02-29 CRAN (R 3.3.0)                           
##  gtable                              0.2.0    2016-02-26 CRAN (R 3.3.0)                           
##  highr                               0.5.1    2015-09-18 CRAN (R 3.3.0)                           
##  Hmisc                               3.17-3   2016-04-03 CRAN (R 3.3.0)                           
##  htmltools                           0.3.5    2016-03-21 CRAN (R 3.3.0)                           
##  htmlwidgets                         0.6      2016-02-25 CRAN (R 3.3.0)                           
##  httpuv                              1.3.3    2015-08-04 CRAN (R 3.3.0)                           
##  httr                                1.1.0    2016-01-28 CRAN (R 3.3.0)                           
##  interactiveDisplayBase              1.9.0    2016-01-14 Bioconductor                             
##  IRanges                           * 2.5.43   2016-04-10 Bioconductor                             
##  iterators                         * 1.0.8    2015-10-13 CRAN (R 3.3.0)                           
##  jsonlite                            0.9.19   2015-11-28 CRAN (R 3.3.0)                           
##  knitcitations                       1.0.7    2015-10-28 CRAN (R 3.3.0)                           
##  knitr                             * 1.12.3   2016-01-22 CRAN (R 3.3.0)                           
##  knitrBootstrap                      1.0.0    2016-03-24 Github (jimhester/knitrBootstrap@cdaa4a9)
##  labeling                            0.3      2014-08-23 CRAN (R 3.3.0)                           
##  lattice                             0.20-33  2015-07-14 CRAN (R 3.3.0)                           
##  latticeExtra                        0.6-28   2016-02-09 CRAN (R 3.3.0)                           
##  limma                               3.27.14  2016-03-23 Bioconductor                             
##  locfit                            * 1.5-9.1  2013-04-20 CRAN (R 3.3.0)                           
##  lubridate                           1.5.6    2016-04-06 CRAN (R 3.3.0)                           
##  magrittr                            1.5      2014-11-22 CRAN (R 3.3.0)                           
##  markdown                            0.7.7    2015-04-22 CRAN (R 3.3.0)                           
##  Matrix                              1.2-4    2016-03-02 CRAN (R 3.3.0)                           
##  matrixStats                         0.50.1   2015-12-15 CRAN (R 3.3.0)                           
##  memoise                             1.0.0    2016-01-29 CRAN (R 3.3.0)                           
##  mgcv                              * 1.8-12   2016-03-03 CRAN (R 3.3.0)                           
##  mime                                0.4      2015-09-03 CRAN (R 3.3.0)                           
##  munsell                             0.4.3    2016-02-13 CRAN (R 3.3.0)                           
##  nlme                              * 3.1-126  2016-03-14 CRAN (R 3.3.0)                           
##  nnet                                7.3-12   2016-02-02 CRAN (R 3.3.0)                           
##  org.Hs.eg.db                      * 3.3.0    2016-04-11 Bioconductor                             
##  OrganismDbi                         1.13.6   2016-04-05 Bioconductor                             
##  pkgmaker                            0.22     2014-05-14 CRAN (R 3.3.0)                           
##  plyr                                1.8.3    2015-06-12 CRAN (R 3.3.0)                           
##  qvalue                              2.3.2    2016-01-14 Bioconductor                             
##  R6                                  2.1.2    2016-01-26 CRAN (R 3.3.0)                           
##  RBGL                                1.47.0   2016-01-14 Bioconductor                             
##  RColorBrewer                      * 1.1-2    2014-12-07 CRAN (R 3.3.0)                           
##  Rcpp                                0.12.4   2016-03-26 CRAN (R 3.3.0)                           
##  RCurl                               1.95-4.8 2016-03-01 CRAN (R 3.3.0)                           
##  RefManageR                          0.10.13  2016-04-04 CRAN (R 3.3.0)                           
##  regionReport                      * 1.5.47   2016-04-12 Bioconductor                             
##  registry                            0.3      2015-07-08 CRAN (R 3.3.0)                           
##  reshape                             0.8.5    2014-04-23 CRAN (R 3.3.0)                           
##  reshape2                            1.4.1    2014-12-06 CRAN (R 3.3.0)                           
##  RJSONIO                             1.3-0    2014-07-28 CRAN (R 3.3.0)                           
##  rmarkdown                           0.9.5    2016-02-22 CRAN (R 3.3.0)                           
##  rngtools                            1.2.4    2014-03-06 CRAN (R 3.3.0)                           
##  rpart                               4.1-10   2015-06-29 CRAN (R 3.3.0)                           
##  Rsamtools                           1.23.8   2016-04-10 Bioconductor                             
##  RSQLite                           * 1.0.0    2014-10-25 CRAN (R 3.3.0)                           
##  rtracklayer                         1.31.10  2016-04-07 Bioconductor                             
##  S4Vectors                         * 0.9.46   2016-04-07 Bioconductor                             
##  scales                              0.4.0    2016-02-26 CRAN (R 3.3.0)                           
##  shiny                               0.13.2   2016-03-28 CRAN (R 3.3.0)                           
##  stringi                             1.0-1    2015-10-22 CRAN (R 3.3.0)                           
##  stringr                             1.0.0    2015-04-30 CRAN (R 3.3.0)                           
##  SummarizedExperiment                1.1.23   2016-04-06 Bioconductor                             
##  survival                            2.38-3   2015-07-02 CRAN (R 3.3.0)                           
##  TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2    2016-03-24 Bioconductor                             
##  VariantAnnotation                   1.17.23  2016-04-07 Bioconductor                             
##  whisker                           * 0.3-2    2013-04-28 CRAN (R 3.3.0)                           
##  XML                                 3.98-1.4 2016-03-01 CRAN (R 3.3.0)                           
##  xtable                              1.8-2    2016-02-05 CRAN (R 3.3.0)                           
##  XVector                             0.11.8   2016-04-06 Bioconductor                             
##  yaml                                2.1.13   2014-06-12 CRAN (R 3.3.0)                           
##  zlibbioc                            1.17.1   2016-03-19 Bioconductor

Pandoc version used: 1.17.0.3.

Bibliography

This report was created with regionReport (Collado-Torres, Jaffe, and Leek, 2015) using rmarkdown (Allaire, Cheng, Xie, McPherson, et al., 2016) while knitr (Xie, 2014) and DT (Xie, 2015) were running behind the scenes. whisker (de Jonge, 2013) was used for creating templates for the pvalueVars and densityVars.

Citations made with knitcitations (Boettiger, 2015). The BibTeX file can be found here.

[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 0.9.5. 2016. URL: https://CRAN.R-project.org/package=rmarkdown.

[2] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.7. 2015. URL: https://CRAN.R-project.org/package=knitcitations.

[3] L. Collado-Torres, A. C. Frazee, M. I. Love, R. A. Irizarry, et al. “derfinder: Software for annotation-agnostic RNA-seq differential expression analysis”. In: bioRxiv (2015). DOI: 10.1101/015370. URL: http://www.biorxiv.org/content/early/2015/02/19/015370.abstract.

[4] L. Collado-Torres, A. E. Jaffe and J. T. Leek. derfinderPlot: Plotting functions for derfinder. https://github.com/leekgroup/derfinderPlot - R package version 1.5.7. 2015. URL: http://www.bioconductor.org/packages/derfinderPlot.

[5] L. Collado-Torres, A. E. Jaffe and J. T. Leek. “regionReport: Interactive reports for region-based analyses”. In: F1000Research 4 (2015), p. 105. DOI: 10.12688/f1000research.6379.1. URL: http://f1000research.com/articles/4-105/v1.

[6] E. de Jonge. whisker: mustache for R, logicless templating. R package version 0.3-2. 2013. URL: https://CRAN.R-project.org/package=whisker.

[7] A. E. Jaffe, P. Murakami, H. Lee, J. T. Leek, et al. “Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies”. In: International journal of epidemiology 41.1 (2012), pp. 200–209. DOI: 10.1093/ije/dyr238.

[8] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009. ISBN: 978-0-387-98140-6. URL: http://ggplot2.org.

[9] Y. Xie. DT: A Wrapper of the JavaScript Library ‘DataTables’. R package version 0.1. 2015. URL: https://CRAN.R-project.org/package=DT.

[10] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.

[11] T. Yin, D. Cook and M. Lawrence. “ggbio: an R package for extending the grammar of graphics for genomic data”. In: Genome Biology 13.8 (2012), p. R77.