Generate a HTML/PDF report exploring a set of genomic regions

This function generates a HTML report with quality checks, genome location exploration, and an interactive table with the results. Other output formats are possible such as PDF but lose the interactivity. Users can easily append to the report by providing a R Markdown file to customCode, or can customize the entire template by providing an R Markdown file to template.

renderReport(
  regions,
  project = "",
  pvalueVars = c(`P-values` = "pval"),
  densityVars = NULL,
  significantVar = mcols(regions)$pval <= 0.05,
  annotation = NULL,
  nBestRegions = 500,
  customCode = NULL,
  outdir = "regionExploration",
  output = "regionExploration",
  browse = interactive(),
  txdb = NULL,
  device = "png",
  densityTemplates = list(Pvalue = templatePvalueDensity, Common = templateDensity,
    Manhattan = templateManhattan),
  template = NULL,
  theme = NULL,
  digits = 2,
  ...
)

templatePvalueDensity

templateDensity

templateManhattan

templatePvalueHistogram

templateHistogram

Format

An object of class character of length 1.

Arguments

regions

The set of genomic regions of interest as a GRanges object. All sequence lengths must be provided.

project

The title of the project.

pvalueVars

The names of the variables with values between 0 and 1 to plot density values by chromosome and a table for commonly used cutoffs. Most commonly used to explore p-value distributions. If a named character vector is provided, the names are used in the plot titles.

densityVars

The names of variables to use for making density plots by chromosome. Commonly used to explore scores and other variables given by region. If a named character vector is provided, the names are used in the plot titles.

significantVar

A logical variable differentiating statistically significant regions from the rest. When provided, both types of regions are compared against each other to see differences in width, location, etc.

annotation

The output from matchGenes used on regions. Note that this can take time for a large set of regions so it's better to pre-compute this information and save it.

nBestRegions

The number of regions to include in the interactive table.

customCode

An absolute path to a child R Markdown file with code to be evaluated before the reproducibility section. Its useful for users who want to customize the report by adding conclusions derived from the data and/or further quality checks and plots.

outdir

The name of output directory.

output

The name of output HTML file (without the html extension).

browse

If TRUE the HTML report is opened in your browser once it's completed.

txdb

Specify the transcription database to use for identifying the closest genes via matchGenes. If NULL it will use TxDb.Hsapiens.UCSC.hg19.knownGene by default.

device

The graphical device used when knitting. See more at http://yihui.name/knitr/options (dev argument).

densityTemplates

A list of length 3 with templates for the p-value density plots (variables from pvalueVars), the continuous variables density plots (variables from densityVars), and Manhattan plots for the p-value variables (pvalueVars). These templates are processed by whisker.render. Check the default templates for more information. The densityTemplates argument is available for those users interested in customizing these plots. For example, to show histograms instead of density plots use templatePvalueHistogram and templateHistogram instead of templatePvalueDensity and templateDensity respectively.

template

Template file to use for the report. If not provided, will use the default file found in regionExploration/regionExploration.Rmd within the package source.

theme

A ggplot2 theme to use for the plots made with ggplot2.

digits

The number of digits to round to in the interactive table of the top nBestRegions. Note that p-values and adjusted p-values won't be rounded.

...

Arguments passed to other methods and/or advanced arguments. Advanced arguments:

overviewParams: A two element list with base_size and areaRel that control the text size for the genomic overview plots.
output_format: Either html_document, pdf_document or knitrBootstrap::bootstrap_document unless you modify the YAML template.
clean: Logical, whether to clean the results or not. Passed to render.

Value

An HTML report with a basic exploration for the given set of genomic regions. See the example report at http://leekgroup.github.io/regionReport/reference/renderReport-example/regionExploration.html.

Details

Set output_format to 'knitrBootstrap::bootstrap_document' or 'pdf_document' if you want a HTML report styled by knitrBootstrap or a PDF report respectively. If using knitrBootstrap, we recommend the version available only via GitHub at https://github.com/jimhester/knitrBootstrap which has nicer features than the current version available via CRAN. You can also set the output_format to 'html_document' for a HTML report styled by rmarkdown. The default is set to 'BiocStyle::html_document'.

If you modify the YAML front matter of template, you can use other values for output_format.

The HTML report styled with knitrBootstrap can be smaller in size than the 'html_document' report.

Author

Leonardo Collado-Torres

Examples


## Load derfinder for an example set of regions
library("derfinder")
regions <- genomeRegions$regions

## Assign chr length
library("GenomicRanges")
seqlengths(regions) <- c("chr21" = 48129895)

## The output will be saved in the 'renderReport-example' directory
dir.create("renderReport-example", showWarnings = FALSE, recursive = TRUE)

## Generate the HTML report
report <- renderReport(regions, "Example run",
    pvalueVars = c(
        "Q-values" = "qvalues", "P-values" = "pvalues"
    ), densityVars = c(
        "Area" = "area", "Mean coverage" = "meanCoverage"
    ),
    significantVar = regions$qvalues <= 0.05, nBestRegions = 20,
    outdir = "renderReport-example"
)
#> Writing 11 Bibtex entries ... 
#> OK
#> Results written to file 'renderReport-example/regionExploration.bib'
#> 
#> 
#> processing file: regionExploration.Rmd
#> 1/43                         
#> 2/43 [docSetup]              
#> 3/43                         
#> 4/43 [setup]                 
#> 5/43                         
#> 6/43 [pvaluePlots]           
#> 7/43                         
#> 8/43 [regLen]                
#> 9/43                         
#> 10/43 [regLen2]               
#> 11/43                         
#> 12/43 [densityPlots]          
#> 13/43                         
#> 14/43 [genomeOverview1]       
#> 15/43                         
#> 16/43 [manhattanPlots]        
#> 17/43                         
#> 18/43 [genomeOverview2]       
#> 19/43                         
#> 20/43 [annoReg]               
#> 21/43                         
#> 22/43 [genomeOverview3]       
#> 23/43                         
#> 24/43 [countTable]            
#> 25/43                         
#> 26/43 [vennDiagram]           
#> 27/43                         
#> 28/43 [vennDiagramSignificant]
#> 29/43                         
#> 30/43 [bestRegionInfo]        
#> 31/43                         
#> 32/43 [unnamed-chunk-1]       
#> 33/43                         
#> 34/43 [thecall]               
#> 35/43                         
#> 36/43 [reproducibility1]      
#> 37/43                         
#> 38/43 [reproducibility2]      
#> 39/43                         
#> 40/43 [reproducibility3]      
#> 41/43                         
#> 42/43 [bibliography]          
#> 43/43                         
#> output file: regionExploration.knit.md
#> /usr/bin/pandoc +RTS -K512m -RTS regionExploration.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output regionExploration.html --lua-filter /__w/_temp/Library/bookdown/rmarkdown/lua/custom-environment.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/table-classes.lua --embed-resources --standalone --wrap preserve --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /tmp/RtmpNRm6dJ/BiocStyle/template.html --no-highlight --variable highlightjs=1 --number-sections --variable theme=bootstrap --css /__w/_temp/Library/BiocStyle/resources/html/bioconductor.css --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpNRm6dJ/rmarkdown-str2d5e1cd0c13f.html --variable code_folding=hide --variable code_menu=1 
#> 
#> Output created: regionExploration.html

if (interactive()) {
    ## Browse the report
    browseURL(report)
}

## See the example report at
## http://leekgroup.github.io/regionReport/reference/renderReport-example/regionExploration.html


## Check the default templates. For users interested in customizing these
## plots.
## For p-value variables:
cat(regionReport::templatePvalueDensity)
#> 
#> ## {{{densityVarName}}}
#> 
#> ```{r pval-density-{{{varName}}}, fig.width=10, fig.height=10}
#> p1{{{varName}}} <- ggplot(regions.df.plot, aes(x={{{varName}}}, colour=seqnames)) +
#>     geom_line(stat='density') + xlim(0, 1) +
#>     labs(title='Density of {{{densityVarName}}}') + xlab('{{{densityVarName}}}') +
#>     scale_colour_discrete(limits=chrs) + theme(legend.title=element_blank())
#> p1{{{varName}}}
#> ```
#> 
#> 
#> ```{r 'pval-summary-{{{varName}}}'}
#> summary(mcols(regions)[['{{{varName}}}']])
#> ```
#> 
#> 
#> This is the numerical summary of the distribution of the {{{densityVarName}}}.
#> 
#> ```{r pval-tableSummary-{{{varName}}}, results='asis'}
#> {{{varName}}}table <- lapply(c(1e-04, 0.001, 0.01, 0.025, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5,
#>     0.6, 0.7, 0.8, 0.9, 1), function(x) {
#>     data.frame('Cut' = x, 'Count' = sum(mcols(regions)[['{{{varName}}}']] <= x))
#> })
#> {{{varName}}}table <- do.call(rbind, {{{varName}}}table)
#> if(outputIsHTML) {
#>     kable({{{varName}}}table, format = 'markdown', align = c('c', 'c'))
#> } else {
#>     kable({{{varName}}}table)
#> }
#> ```
#> 
#> This table shows the number of regions with {{{densityVarName}}} less or equal than some commonly used cutoff values.
#> 

## For continous variables:
cat(regionReport::templateDensity)
#> 
#> ## {{{densityVarName}}}
#> 
#> ```{r density-{{{varName}}}, fig.width=14, fig.height=14, eval=hasSignificant, echo=hasSignificant}
#> xrange <- range(regions.df.plot[, '{{{varName}}}']) * c(0.95, 1.05)
#> p3a{{{varName}}} <- ggplot(regions.df.plot[is.finite(regions.df.plot[, '{{{varName}}}']), ], aes(x={{{varName}}}, colour=seqnames)) +
#>     geom_line(stat='density') + labs(title='Density of {{{densityVarName}}}') +
#>     xlab('{{{densityVarName}}}') + scale_colour_discrete(limits=chrs) +
#>     xlim(xrange) + theme(legend.title=element_blank())
#> p3b{{{varName}}} <- ggplot(regions.df.sig[is.finite(regions.df.sig[, '{{{varName}}}']), ], aes(x={{{varName}}}, colour=seqnames)) +
#>     geom_line(stat='density') +
#>     labs(title='Density of {{{densityVarName}}} (significant only)') +
#>     xlab('{{{densityVarName}}}') + scale_colour_discrete(limits=chrs) +
#>     xlim(xrange) + theme(legend.title=element_blank())
#> grid.arrange(p3a{{{varName}}}, p3b{{{varName}}})
#> ```
#> 
#> ```{r density-solo-{{{varName}}}, fig.width=10, fig.height=10, eval=!hasSignificant, echo=!hasSignificant}
#> p3a{{{varName}}} <- ggplot(regions.df.plot[is.finite(regions.df.plot[, '{{{varName}}}']), ], aes(x={{{varName}}}, colour=seqnames)) +
#>     geom_line(stat='density') + labs(title='Density of {{{densityVarName}}}') +
#>     xlab('{{{densityVarName}}}') + scale_colour_discrete(limits=chrs) +
#>     theme(legend.title=element_blank())
#> p3a{{{varName}}}
#> ```
#> 
#> This plot shows the density of the {{{densityVarName}}} for all regions. `r ifelse(hasSignificant, 'The bottom panel is restricted to significant regions.', '')`
#> 

## For Manhattan plots
cat(regionReport::templateManhattan)
#> 
#> ## Manhattan {{{densityVarName}}}
#> 
#> ```{r manhattan-{{{varName}}}, fig.width=10, fig.height=10, message = FALSE}
#> 
#> regions.manhattan <- regions
#> mcols(regions.manhattan)[['{{{varName}}}']] <- - log(mcols(regions.manhattan)[['{{{varName}}}']], base = 10)
#> pMan{{{varName}}} <- plotGrandLinear(regions.manhattan, aes(y = {{{varName}}}, colour = seqnames)) + theme(axis.text.x=element_text(angle=-90, hjust=0)) + ylab('-log10 {{{densityVarName}}}')
#> pMan{{{varName}}}
#> rm(regions.manhattan)
#> ```
#> 
#> This is a Manhattan plot for the {{{densityVarName}}} for all regions. A single dot is shown for each region, where higher values in the y-axis mean that the {{{densityVarName}}} are closer to zero.
#> 

##################################################
## bumphunter example mentioned in the vignette ##
##################################################

## Load bumphunter
library("bumphunter")

## Create data from the vignette
pos <- list(
    pos1 = seq(1, 1000, 35),
    pos2 = seq(2001, 3000, 35),
    pos3 = seq(1, 1000, 50)
)
chr <- rep(paste0("chr", c(1, 1, 2)), times = sapply(pos, length))
pos <- unlist(pos, use.names = FALSE)

## Find clusters
cl <- clusterMaker(chr, pos, maxGap = 300)

## Build simulated bumps
Indexes <- split(seq_along(cl), cl)
beta1 <- rep(0, length(pos))
for (i in seq(along = Indexes)) {
    ind <- Indexes[[i]]
    x <- pos[ind]
    z <- scale(x, median(x), max(x) / 12)
    beta1[ind] <- i * (-1)^(i + 1) * pmax(1 - abs(z)^3, 0)^3 ## multiply by i to vary size
}

## Build data
beta0 <- 3 * sin(2 * pi * pos / 720)
X <- cbind(rep(1, 20), rep(c(0, 1), each = 10))
set.seed(23852577)
error <- matrix(rnorm(20 * length(beta1), 0, 1), ncol = 20)
y <- t(X[, 1]) %x% beta0 + t(X[, 2]) %x% beta1 + error

## Perform bumphunting
tab <- bumphunter(y, X, chr, pos, cl, cutoff = .5)
#> [bumphunterEngine] Using a single core (backend: doSEQ, version: 1.5.2).
#> [bumphunterEngine] Computing coefficients.
#> [bumphunterEngine] Finding regions.
#> [bumphunterEngine] Found 15 bumps.

## Explore data
lapply(tab, head)
#> $table
#>     chr start  end      value       area cluster indexStart indexEnd  L clusterL
#> 10 chr1  2316 2631 -1.5814747 15.8147473       2         39       48 10       29
#> 7  chr2   451  551  1.5891293  4.7673878       3         68       70  3       20
#> 2  chr1   456  526  1.0678828  3.2036485       1         14       16  3       29
#> 5  chr1  2176 2211  0.7841794  1.5683589       2         35       36  2       29
#> 6  chr1  2841 2841  1.2010184  1.2010184       2         54       54  1       29
#> 4  chr1   771  771  0.7780902  0.7780902       1         23       23  1       29
#> 
#> $coef
#>             [,1]
#> [1,]  0.60960932
#> [2,] -0.09052769
#> [3,] -0.21482638
#> [4,]  0.13053755
#> [5,] -0.21723642
#> [6,]  0.39934961
#> 
#> $fitted
#>             [,1]
#> [1,]  0.60960932
#> [2,] -0.09052769
#> [3,] -0.21482638
#> [4,]  0.13053755
#> [5,] -0.21723642
#> [6,]  0.39934961
#> 
#> $pvaluesMarginal
#> [1] NA
#> 

library("GenomicRanges")

## Build GRanges with sequence lengths
regions <- GRanges(
    seqnames = tab$table$chr,
    IRanges(start = tab$table$start, end = tab$table$end),
    strand = "*", value = tab$table$value, area = tab$table$area,
    cluster = tab$table$cluster, L = tab$table$L, clusterL = tab$table$clusterL
)

## Assign chr lengths
seqlengths(regions) <- seqlengths(
    getChromInfoFromUCSC("hg19", as.Seqinfo = TRUE)
)[
    names(seqlengths(regions))
]

## Explore the regions
regions
#> GRanges object with 15 ranges and 5 metadata columns:
#>        seqnames    ranges strand |     value      area   cluster         L  clusterL
#>           <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric> <numeric> <integer>
#>    [1]     chr1 2316-2631      * | -1.581475  15.81475         2        10        29
#>    [2]     chr2   451-551      * |  1.589129   4.76739         3         3        20
#>    [3]     chr1   456-526      * |  1.067883   3.20365         1         3        29
#>    [4]     chr1 2176-2211      * |  0.784179   1.56836         2         2        29
#>    [5]     chr1      2841      * |  1.201018   1.20102         2         1        29
#>    ...      ...       ...    ... .       ...       ...       ...       ...       ...
#>   [11]     chr1       631      * |  0.618603  0.618603         1         1        29
#>   [12]     chr1         1      * |  0.609609  0.609609         1         1        29
#>   [13]     chr1      2911      * | -0.576423  0.576423         2         1        29
#>   [14]     chr2       251      * | -0.556160  0.556160         3         1        20
#>   [15]     chr1      2806      * | -0.521606  0.521606         2         1        29
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome

## Now create the report
report <- renderReport(regions, "Example bumphunter",
    pvalueVars = NULL,
    densityVars = c(
        "Area" = "area", "Value" = "value",
        "Cluster Length" = "clusterL"
    ), significantVar = NULL,
    output = "bumphunter-example", outdir = "bumphunter-example",
    device = "png"
)
#> Writing 11 Bibtex entries ... 
#> OK
#> Results written to file 'bumphunter-example/bumphunter-example.bib'
#> 
#> 
#> processing file: bumphunter-example.Rmd
#> 1/43                         
#> 2/43 [docSetup]              
#> 3/43                         
#> 4/43 [setup]                 
#> 5/43                         
#> 6/43 [pvaluePlots]           
#> 7/43                         
#> 8/43 [regLen]                
#> 9/43                         
#> 10/43 [regLen2]               
#> 11/43                         
#> 12/43 [densityPlots]          
#> 13/43                         
#> 14/43 [genomeOverview1]       
#> 15/43                         
#> 16/43 [manhattanPlots]        
#> 17/43                         
#> 18/43 [genomeOverview2]       
#> 19/43                         
#> 20/43 [annoReg]               
#> 21/43                         
#> 22/43 [genomeOverview3]       
#> 23/43                         
#> 24/43 [countTable]            
#> 25/43                         
#> 26/43 [vennDiagram]           
#> 27/43                         
#> 28/43 [vennDiagramSignificant]
#> 29/43                         
#> 30/43 [bestRegionInfo]        
#> 31/43                         
#> 32/43 [unnamed-chunk-1]       
#> 33/43                         
#> 34/43 [thecall]               
#> 35/43                         
#> 36/43 [reproducibility1]      
#> 37/43                         
#> 38/43 [reproducibility2]      
#> 39/43                         
#> 40/43 [reproducibility3]      
#> 41/43                         
#> 42/43 [bibliography]          
#> 43/43                         
#> output file: bumphunter-example.knit.md
#> /usr/bin/pandoc +RTS -K512m -RTS bumphunter-example.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output bumphunter-example.html --lua-filter /__w/_temp/Library/bookdown/rmarkdown/lua/custom-environment.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/latex-div.lua --lua-filter /usr/local/lib/R/site-library/rmarkdown/rmarkdown/lua/table-classes.lua --embed-resources --standalone --wrap preserve --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /tmp/RtmpNRm6dJ/BiocStyle/template.html --no-highlight --variable highlightjs=1 --number-sections --variable theme=bootstrap --css /__w/_temp/Library/BiocStyle/resources/html/bioconductor.css --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /tmp/RtmpNRm6dJ/rmarkdown-str2d5e4a7f550d.html --variable code_folding=hide --variable code_menu=1 
#> 
#> Output created: bumphunter-example.html

## See the example report at
## http://leekgroup.github.io/regionReport/reference/bumphunter-example/bumphunter-example.html