Contents

This report creates the CSV files with the candidate differentially expressed regions (DERs) found using derfinder (Collado-Torres, Frazee, Love, Irizarry, et al., 2015) via the single-base level approach on the BrainSpan data set. It also has Venn diagrams illustrating the overlap with known annotation.

1 Results

1.1 Generate CSV files

The following code shows how to generate the CSV files from the R objects.

## Setup
suppressMessages(library("GenomicRanges"))
suppressMessages(library("derfinder"))
suppressMessages(library("derfinderPlot"))

## Experiments
exps <- c('brainspan')

## Find the latest run from each experiment
run <- sapply(exps, function(exp) {
    runs <- dir(file.path('..', exp, 'derAnalysis'), pattern = 'run')
    runs[length(runs)]
})

## Load regions and annotation data
regions <- lapply(exps, function(exp) {
    load(file.path('..', exp, 'derAnalysis', run[exp], 'fullRegions.Rdata'))
    res <- fullRegions
    names(res) <- seq_len(length(res))
    return(res)
})

anno <- lapply(exps, function(exp) {
    load(file.path('..', exp, 'derAnalysis', run[exp],
        'fullAnnotatedRegions.Rdata'))
    
    ## Fix region labels: its intergenic, not intragenic
    if('intragenic' %in% names(fullAnnotatedRegions$countTable)) {
        ## Fix countTable
        names(fullAnnotatedRegions$countTable)[names(fullAnnotatedRegions$countTable) == 'intragenic'] <- 'intergenic'
    }
    res <- fullAnnotatedRegions
    return(res)
})

## Fix names
names(regions) <- names(anno) <- exps

## Perform check and create csv files
check <- vector("list", length(exps))
names(check) <- exps
for(exp in exps) {
    
    ## Peform check
    check[[exp]] <- identical(nrow(anno[[exp]]$countTable),
        length(regions[[exp]]))
    
    ## Export regions information into plain text
    write.csv(as.data.frame(regions[[exp]]), file = paste0("supplementaryFile", 
        which(exp == exps), ".csv"), quote = FALSE, row.names = FALSE)
    
    ## You can later read it in R using:
    # read.csv("supplementaryFile1.csv")
    
    ## Compress
    system(paste0('gzip supplementaryFile', which(exp == exps), '.csv'))
}

## Check that the rows match
unlist(check)
## brainspan 
##      TRUE

CSV file corresponds to experiment brainspan from run run4-v1.0.10.

1.2 Venn diagrams

The following Venn diagrams show how many candidate differentially expressed regions (DERs) overlap with an exon, intron, or intergenic regions of the UCSC hg19 knownGene annotation. By default, a minimum overlap of 20 base pairs is required to say that a candidate DER overlaps any of feature. The annotation overlap was done using the mergeResults() function from derfinder (Collado-Torres, Frazee, Love, Irizarry, et al., 2015) while the Venn diagrams were made using derfinderPlot (Collado-Torres, Jaffe, and Leek, 2015).

The first venn diagram one uses all the candidate DERs, the second one uses only the candidate DERs that had a significant q-value (by default less than 0.10), and the third uses only the candidate DERs that had a FWER adjusted p-value less than 0.05.

for(exp in exps) {
    ## Using all candidate DERs
    vennRegions(anno[[exp]], main=paste("\n", exp, "using UCSC.hg19.knownGene"),
        counts.col="blue")
    
    ## Using candidate DERs with a significant q-value
    if(sum(regions[[exp]]$significantQval == "TRUE") > 0)
    vennRegions(anno[[exp]], regions[[exp]]$significantQval == "TRUE", 
        main=paste("\n\n", exp, "using UCSC.hg19.knownGene\nRestricted to significant q-value candidate DERs"),
        counts.col = "blue")
        
    
    ## Using candidate DERs with a significant FWER adjusted p-value
    if(sum(regions[[exp]]$significantFWER == "TRUE") > 0)
    vennRegions(anno[[exp]], regions[[exp]]$significantFWER == "TRUE",
        main=paste("\n\n", exp, "using UCSC.hg19.knownGene\nRestricted to significant FWER adjusted p-value candidate DERs"),
        counts.col="blue")
}

2 Details

The supplementary files 1 contains the information for the candidate DERs found in CSV format. The CSV file contains the following columns:

  1. seqnames The chromosome name.
  2. start The position of the chromosome where the candidate DER begins.
  3. end The position of the chromosome where the candidate DER ends.
  4. width The width of the candidate DER.
  5. strand The strand of the candidate DER.
  6. value The mean of the F-statistics for the candidate DER.
  7. area The sum of the F-statistics for the candidate DER.
  8. indexStart Among the bases from the chromosome that passed the filtering step, the position where the candidate DER begins.
  9. indexEnd Among the bases from the chromosome that passed the filtering step, the position where the candidate DER ends.
  10. cluster The cluster number to which the candidate DER belongs to. Clusters were defined by chromosome and two candidate DERs belong to the same cluster if they are less than 3000 bp apart.
  11. clusterL The length of the cluster to which the candidate DER belongs to.
  12. meanCoverage The mean coverage among all samples for the candidate DER.
  13. *mean__G__* The mean coverage among the samples of group G for the candidate DER.
  14. *log2FoldChange__G1__.vs__G2__* The log2 fold change between the samples in G1 and the samples in G2 for the candidate DER.
  15. pvalues The p-value for the candidate DER. It is calculated empirically using the null candidate DERs (obtained via permutations) from all chromosomes.
  16. significant Whether the p-value is less than 0.05.
  17. qvalues The p-value adjusted to control the FDR by using the qvalue (with contributions from Andrew J. Bass, Dabney, and Robinson, 2015) function from the package with the same name.
  18. significantQval Whether the q-value is less than 0.10.
  19. name This and the following fields are computed using annotateNearest() from the bumphunter package. They were calculated using the UCSC hg19 annotation. name refers to the nearest gene.
  20. annotation RefSeq ID. Taken from the help page of bumphunter::annotateNearest().
  21. description A factor with levels upstream, promoter, overlaps 5’, inside intron, inside exon, covers exon(s), overlaps exon upstream, overlaps exon downstream, overlaps two exons, overlaps 3’, close to 3’, downstream, covers. Taken from the help page of bumphunter::annotateNearest().
  22. region A factor with levels upstream, promoter, overlaps 5’, inside, overlaps 3’, close to 3’, downstream, covers. Taken from the help page of bumphunter::annotateNearest().
  23. subregion A factor with levels inside intron, inside exon, covers exon(s), overlaps exon upstream, overlaps exon downstream, overlaps two exons. Taken from the help page of bumphunter::annotateNearest().
  24. insidedistance Distance past 5 prime end of gene. Taken from the help page of bumphunter::annotateNearest().
  25. exonnumber Which exon. Taken from the help page of bumphunter::annotateNearest().
  26. nexons Number of exons. Taken from the help page of bumphunter::annotateNearest().
  27. UTR A factor with levels inside transcription region, 5’ UTR, overlaps 5’ UTR, 3’ UTR, overlaps 3’ UTR, covers transcription region. Taken from the help page of bumphunter::annotateNearest().
  28. annoStrand + or -. Taken from the help page of bumphunter::annotateNearest().
  29. geneL The gene length. Taken from the help page of bumphunter::annotateNearest().
  30. codingL The coding length. Taken from the help page of bumphunter::annotateNearest().
  31. fwer The FWER adjusted p-value for the region.
  32. significantFWER Whether the FWER adjusted p-value is less than 0.05.

3 Reproducibility

Date the report was generated.

## [1] "2016-02-20 10:01:15 EST"

Wallclock time spent generating the report.

## Time difference of 1.475 mins

R session information.

## Session info -----------------------------------------------------------------------------------------------------------
##  setting  value                       
##  version  R version 3.2.2 (2015-08-14)
##  system   x86_64, darwin13.4.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_US.UTF-8                 
##  tz       America/New_York            
##  date     2016-02-20
## Packages ---------------------------------------------------------------------------------------------------------------
##  package              * version  date       source        
##  acepack                1.3-3.3  2014-11-24 CRAN (R 3.2.0)
##  AnnotationDbi          1.32.3   2015-12-24 Bioconductor  
##  bibtex                 0.4.0    2014-12-31 CRAN (R 3.2.0)
##  Biobase                2.30.0   2015-10-14 Bioconductor  
##  BiocGenerics         * 0.16.1   2015-11-06 Bioconductor  
##  BiocInstaller          1.20.1   2015-11-18 Bioconductor  
##  BiocParallel           1.4.3    2015-12-16 Bioconductor  
##  BiocStyle            * 1.8.0    2015-10-14 Bioconductor  
##  biomaRt                2.26.1   2015-11-23 Bioconductor  
##  Biostrings             2.38.3   2016-01-02 Bioconductor  
##  biovizBase             1.18.0   2015-10-14 Bioconductor  
##  bitops                 1.0-6    2013-08-17 CRAN (R 3.2.0)
##  BSgenome               1.38.0   2015-10-14 Bioconductor  
##  bumphunter             1.10.0   2015-10-14 Bioconductor  
##  Cairo                  1.5-9    2015-09-26 CRAN (R 3.2.0)
##  cluster                2.0.3    2015-07-21 CRAN (R 3.2.2)
##  codetools              0.2-14   2015-07-15 CRAN (R 3.2.2)
##  colorspace             1.2-6    2015-03-11 CRAN (R 3.2.0)
##  DBI                    0.3.1    2014-09-24 CRAN (R 3.2.0)
##  derfinder            * 1.5.19   2015-12-15 Bioconductor  
##  derfinderHelper        1.4.1    2015-11-03 Bioconductor  
##  derfinderPlot        * 1.4.1    2015-11-03 Bioconductor  
##  devtools               1.10.0   2016-01-23 CRAN (R 3.2.3)
##  dichromat              2.0-0    2013-01-24 CRAN (R 3.2.0)
##  digest                 0.6.9    2016-01-08 CRAN (R 3.2.3)
##  doRNG                  1.6      2014-03-07 CRAN (R 3.2.0)
##  evaluate               0.8      2015-09-18 CRAN (R 3.2.0)
##  foreach                1.4.3    2015-10-13 CRAN (R 3.2.0)
##  foreign                0.8-66   2015-08-19 CRAN (R 3.2.0)
##  formatR                1.2.1    2015-09-18 CRAN (R 3.2.0)
##  Formula                1.2-1    2015-04-07 CRAN (R 3.2.0)
##  futile.logger          1.4.1    2015-04-20 CRAN (R 3.2.0)
##  futile.options         1.0.0    2010-04-06 CRAN (R 3.2.0)
##  GenomeInfoDb         * 1.6.3    2016-01-26 Bioconductor  
##  GenomicAlignments      1.6.3    2016-01-06 Bioconductor  
##  GenomicFeatures        1.22.12  2016-01-28 Bioconductor  
##  GenomicFiles           1.6.2    2015-12-30 Bioconductor  
##  GenomicRanges        * 1.22.4   2016-01-30 Bioconductor  
##  GGally                 1.0.1    2016-01-14 CRAN (R 3.2.3)
##  ggbio                  1.18.3   2016-01-13 Bioconductor  
##  ggplot2                2.0.0    2015-12-18 CRAN (R 3.2.3)
##  graph                  1.48.0   2015-10-14 Bioconductor  
##  gridExtra              2.0.0    2015-07-14 CRAN (R 3.2.0)
##  gtable                 0.1.2    2012-12-05 CRAN (R 3.2.0)
##  Hmisc                  3.17-1   2015-12-18 CRAN (R 3.2.3)
##  htmltools              0.3      2015-12-29 CRAN (R 3.2.3)
##  httr                   1.1.0    2016-01-28 CRAN (R 3.2.3)
##  IRanges              * 2.4.6    2015-12-12 Bioconductor  
##  iterators              1.0.8    2015-10-13 CRAN (R 3.2.0)
##  knitcitations        * 1.0.7    2015-10-28 CRAN (R 3.2.0)
##  knitr                  1.12.3   2016-01-22 CRAN (R 3.2.3)
##  lambda.r               1.1.7    2015-03-20 CRAN (R 3.2.0)
##  lattice                0.20-33  2015-07-14 CRAN (R 3.2.2)
##  latticeExtra           0.6-26   2013-08-15 CRAN (R 3.2.0)
##  limma                  3.26.7   2016-01-28 Bioconductor  
##  locfit                 1.5-9.1  2013-04-20 CRAN (R 3.2.0)
##  lubridate              1.5.0    2015-12-03 CRAN (R 3.2.3)
##  magrittr               1.5      2014-11-22 CRAN (R 3.2.0)
##  Matrix                 1.2-3    2015-11-28 CRAN (R 3.2.2)
##  matrixStats            0.50.1   2015-12-15 CRAN (R 3.2.3)
##  memoise                1.0.0    2016-01-29 CRAN (R 3.2.3)
##  munsell                0.4.2    2013-07-11 CRAN (R 3.2.0)
##  nnet                   7.3-11   2015-08-30 CRAN (R 3.2.0)
##  OrganismDbi            1.12.1   2015-12-23 Bioconductor  
##  pkgmaker               0.22     2014-05-14 CRAN (R 3.2.0)
##  plyr                   1.8.3    2015-06-12 CRAN (R 3.2.1)
##  qvalue                 2.2.2    2016-01-08 Bioconductor  
##  R6                     2.1.2    2016-01-26 CRAN (R 3.2.3)
##  RBGL                   1.46.0   2015-10-14 Bioconductor  
##  RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.2.0)
##  Rcpp                   0.12.3   2016-01-10 CRAN (R 3.2.3)
##  RCurl                  1.95-4.7 2015-06-30 CRAN (R 3.2.1)
##  RefManageR             0.10.5   2016-01-02 CRAN (R 3.2.3)
##  registry               0.3      2015-07-08 CRAN (R 3.2.1)
##  reshape                0.8.5    2014-04-23 CRAN (R 3.2.0)
##  reshape2               1.4.1    2014-12-06 CRAN (R 3.2.0)
##  RJSONIO                1.3-0    2014-07-28 CRAN (R 3.2.0)
##  rmarkdown            * 0.9.2    2016-01-01 CRAN (R 3.2.3)
##  rngtools               1.2.4    2014-03-06 CRAN (R 3.2.0)
##  rpart                  4.1-10   2015-06-29 CRAN (R 3.2.2)
##  Rsamtools              1.22.0   2015-10-14 Bioconductor  
##  RSQLite                1.0.0    2014-10-25 CRAN (R 3.2.0)
##  rtracklayer            1.30.1   2015-10-22 Bioconductor  
##  S4Vectors            * 0.8.11   2016-01-29 Bioconductor  
##  scales                 0.3.0    2015-08-25 CRAN (R 3.2.0)
##  stringi                1.0-1    2015-10-22 CRAN (R 3.2.0)
##  stringr                1.0.0    2015-04-30 CRAN (R 3.2.0)
##  SummarizedExperiment   1.0.2    2016-01-01 Bioconductor  
##  survival               2.38-3   2015-07-02 CRAN (R 3.2.2)
##  VariantAnnotation      1.16.4   2015-12-09 Bioconductor  
##  XML                    3.98-1.3 2015-06-30 CRAN (R 3.2.0)
##  xtable                 1.8-0    2015-11-02 CRAN (R 3.2.0)
##  XVector                0.10.0   2015-10-14 Bioconductor  
##  yaml                   2.1.13   2014-06-12 CRAN (R 3.2.0)
##  zlibbioc               1.16.0   2015-10-14 Bioconductor

4 Bibliography

This report was generated using BiocStyle (Morgan, Oleś, and Huber, 2016) with knitr (Xie, 2014) and rmarkdown (Allaire, Cheng, Xie, McPherson, et al., 2016) running behind the scenes.

Citations made with knitcitations (Boettiger, 2015). Citation file: venn.bib.

[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 0.9.2. 2016. URL: http://CRAN.R-project.org/package=rmarkdown.

[2] J. D. S. with contributions from Andrew J. Bass, A. Dabney and D. Robinson. qvalue: Q-value estimation for false discovery rate control. R package version 2.2.2. 2015. URL: http://github.com/jdstorey/qvalue.

[3] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.7. 2015. URL: http://CRAN.R-project.org/package=knitcitations.

[4] L. Collado-Torres, A. C. Frazee, M. I. Love, R. A. Irizarry, et al. “derfinder: Software for annotation-agnostic RNA-seq differential expression analysis”. In: bioRxiv (2015). DOI: 10.1101/015370. URL: http://www.biorxiv.org/content/early/2015/02/19/015370.abstract.

[5] L. Collado-Torres, A. E. Jaffe and J. T. Leek. derfinderPlot: Plotting functions for derfinder. https://github.com/leekgroup/derfinderPlot - R package version 1.4.1. 2015. URL: http://www.bioconductor.org/packages/release/bioc/html/derfinderPlot.html.

[6] M. Morgan, A. Oleś and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 1.8.0. 2016. URL: https://github.com/Bioconductor/BiocStyle.

[7] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.