This report creates the CSV files with the candidate differentially expressed regions (DERs) found using derfinder
(Collado-Torres, Frazee, Jaffe, and Leek, 2014) on four data sets. It also has Venn diagrams illustrating the overlap with known annotation.
The following code shows how to generate the CSV files from the R
objects.
## Setup
suppressMessages(library("GenomicRanges"))
suppressMessages(library("derfinder"))
suppressMessages(library("limma"))
## Experiments
exps <- c('brainspan', 'simulation', 'hippo')
## Find the latest run from each experiment
run <- sapply(exps, function(exp) {
runs <- dir(file.path('..', exp, 'derAnalysis'), pattern = 'run')
runs[length(runs)]
})
## Load regions and annotation data
regions <- lapply(exps, function(exp) {
load(file.path('..', exp, 'derAnalysis', run[exp], 'fullRegions.Rdata'))
res <- fullRegions
names(res) <- seq_len(length(res))
return(res)
})
anno <- lapply(exps, function(exp) {
load(file.path('..', exp, 'derAnalysis', run[exp],
'fullAnnotatedRegions.Rdata'))
## Fix region labels: its intergenic, not intragenic
if('intragenic' %in% names(fullAnnotatedRegions$countTable)) {
## Fix countTable
names(fullAnnotatedRegions$countTable)[names(fullAnnotatedRegions$countTable) == 'intragenic'] <- 'intergenic'
## Fix annotation list
# fullAnnotatedRegions$annotationList <- GRangesList(lapply(fullAnnotatedRegions$annotationList, function(x) {
# idx <- x$theRegion == 'intragenic'
# if(sum(idx) > 0) {
# x$theRegion[idx] <- 'intergenic'
# }
# return(x)
# }))
#
# save(fullAnnotatedRegions, file = file.path('..', exp, 'derAnalysis',
# run[exp], 'fullAnnotatedRegions.Rdata'))
}
res <- fullAnnotatedRegions
return(res)
})
## Fix names
names(regions) <- names(anno) <- exps
## Perform check and create csv files
check <- vector("list", length(exps))
names(check) <- exps
for(exp in exps) {
## Peform check
check[[exp]] <- identical(nrow(anno[[exp]]$countTable),
length(regions[[exp]]))
## Export regions information into plain text
write.csv(as.data.frame(regions[[exp]]), file = paste0("supplementaryFile",
which(exp == exps), ".csv"), quote = FALSE, row.names = FALSE)
## You can later read them in R using:
# read.csv("supplementaryFile1.csv")
# read.csv("supplementaryFile2.csv")
# read.csv("supplementaryFile3.csv")
## Compress
system(paste0('gzip supplementaryFile', which(exp == exps), '.csv'))
}
## Check that the rows match
unlist(check)
## brainspan simulation hippo
## TRUE TRUE TRUE
CSV files correspond to experiments brainspan, simulation, hippo from runs run4-v1.0.10, run2-v1.0.10, run3-v1.0.10 respectively.
The following Venn diagrams show how many candidate differentially expressed regions (DERs) overlap with an exon, intron, or intergenic regions of the UCSC hg19 knownGene annotation. By default, a minimum overlap of 20 base pairs is required to say that a candidate DER overlaps any of feature. The annotation overlap was done using the mergeResults()
function from derfinder
(Collado-Torres, Frazee, Jaffe, and Leek, 2014) while the Venn diagrams were made using limma
(Ritchie, Phipson, Wu, Hu, et al., 2015).
A maximum of three Venn diagrams are shown per data set. The first one uses all the candidate DERs, the second one uses only the candidate DERs that had a significant q-value (by default less than 0.10), and the third uses only the candidate DERs that had a FWER adjusted p-value less than 0.05.
for(exp in exps) {
## Using all candidate DERs
vennDiagram(vennCounts(anno[[exp]]$countTable > 0), main=paste("\n", exp,
"using UCSC.hg19.knownGene"), counts.col="blue")
## Using candidate DERs with a significant q-value
if(sum(regions[[exp]]$significantQval == "TRUE") > 0)
vennDiagram(vennCounts(
anno[[exp]]$countTable[regions[[exp]]$significantQval == "TRUE",] > 0),
main=paste("\n\n", exp, "using UCSC.hg19.knownGene\nRestricted to significant q-value candidate DERs"
), counts.col="blue")
## Using candidate DERs with a significant FWER adjusted p-value
if(sum(regions[[exp]]$significantFWER == "TRUE") > 0)
vennDiagram(vennCounts(
anno[[exp]]$countTable[regions[[exp]]$significantFWER == "TRUE",] > 0),
main=paste("\n\n", exp, "using UCSC.hg19.knownGene\nRestricted to significant FWER adjusted p-value candidate DERs"
), counts.col="blue")
}
The supplementary files 1 through 3 files contain the information for the candidate DERs found in CSV format. The CSV files contain the following columns:
qvalue
(Dabney and Storey, 2015) function from the package with the same name.annotateNearest()
from the bumphunter
(Irizarry, Ayree, Hansen, and Hansen, 2015) package. They were calculated using the UCSC hg19 annotation. name refers to the nearest gene.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.bumphunter::annotateNearest()
.Date the report was generated.
## [1] "2015-01-20 14:29:06 EST"
Wallclock time spent generating the report.
## Time difference of 4.571 mins
R
session information.
## setting value
## version R version 3.1.1 Patched (2014-10-16 r66782)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz
## package * version date source
## acepack * 1.3-3.3 2013-05-03 CRAN (R 3.1.1)
## AnnotationDbi * 1.28.1 2014-10-29 Bioconductor
## base64enc * 0.1-2 2014-06-26 CRAN (R 3.1.0)
## BatchJobs * 1.5 2014-10-30 CRAN (R 3.1.1)
## BBmisc * 1.8 2014-10-30 CRAN (R 3.1.1)
## bibtex * 0.3-6 2013-07-29 CRAN (R 3.1.1)
## Biobase * 2.26.0 2014-10-15 Bioconductor
## BiocGenerics 0.12.1 2014-11-15 Bioconductor
## BiocParallel * 1.0.0 2014-10-15 Bioconductor
## biomaRt * 2.22.0 2014-10-15 Bioconductor
## Biostrings * 2.34.1 2014-12-13 Bioconductor
## bitops * 1.0-6 2013-08-17 CRAN (R 3.1.0)
## brew * 1.0-6 2011-04-13 CRAN (R 3.1.0)
## bumphunter * 1.6.0 2014-10-15 Bioconductor
## Cairo * 1.5-6 2014-06-26 CRAN (R 3.1.0)
## checkmate * 1.5.1 2014-12-14 CRAN (R 3.1.1)
## cluster * 1.15.3 2014-09-04 CRAN (R 3.1.1)
## codetools * 0.2-9 2014-08-21 CRAN (R 3.1.1)
## colorout 1.0-2 2014-11-04 local
## DBI * 0.3.1 2014-09-24 CRAN (R 3.1.1)
## derfinder 1.0.10 2014-11-23 Bioconductor
## derfinderHelper * 1.0.4 2014-11-05 Github (lcolladotor/derfinderHelper@27bcfe6)
## devtools * 1.7.0 2015-01-17 CRAN (R 3.1.1)
## digest * 0.6.8 2014-12-31 CRAN (R 3.1.1)
## doRNG * 1.6 2014-03-07 CRAN (R 3.1.0)
## evaluate * 0.5.5 2014-04-29 CRAN (R 3.1.0)
## fail * 1.2 2013-09-19 CRAN (R 3.1.0)
## foreach * 1.4.2 2014-04-11 CRAN (R 3.1.0)
## foreign * 0.8-61 2014-03-28 CRAN (R 3.1.1)
## formatR * 1.0 2014-08-25 CRAN (R 3.1.1)
## Formula * 1.2-0 2015-01-20 CRAN (R 3.1.1)
## GenomeInfoDb 1.2.4 2014-12-20 Bioconductor
## GenomicAlignments * 1.2.1 2014-11-05 Bioconductor
## GenomicFeatures * 1.18.3 2014-12-17 Bioconductor
## GenomicFiles * 1.2.0 2014-10-15 Bioconductor
## GenomicRanges 1.18.4 2015-01-08 Bioconductor
## Hmisc * 3.14-6 2014-11-22 CRAN (R 3.1.1)
## htmltools * 0.2.6 2014-09-08 CRAN (R 3.1.1)
## httr * 0.6.1 2015-01-01 CRAN (R 3.1.1)
## IRanges 2.0.1 2014-12-13 Bioconductor
## iterators * 1.0.7 2014-04-11 CRAN (R 3.1.0)
## knitcitations 1.0.5 2014-11-26 CRAN (R 3.1.1)
## knitr * 1.8 2014-11-11 CRAN (R 3.1.1)
## knitrBootstrap * 1.0.0 2014-11-19 Github (jimhester/knitrBootstrap@76c41f0)
## lattice * 0.20-29 2014-04-04 CRAN (R 3.1.1)
## latticeExtra * 0.6-26 2013-08-15 CRAN (R 3.1.0)
## limma 3.22.4 2015-01-16 Bioconductor
## locfit * 1.5-9.1 2013-04-20 CRAN (R 3.1.0)
## lubridate * 1.3.3 2013-12-31 CRAN (R 3.1.1)
## markdown * 0.7.4 2014-08-24 CRAN (R 3.1.1)
## Matrix * 1.1-4 2014-06-15 CRAN (R 3.1.1)
## matrixStats * 0.12.2 2014-12-07 CRAN (R 3.1.1)
## memoise * 0.2.1 2014-04-22 CRAN (R 3.1.0)
## mime * 0.2 2014-09-26 CRAN (R 3.1.1)
## nnet * 7.3-8 2014-03-28 CRAN (R 3.1.1)
## pkgmaker * 0.22 2014-05-14 CRAN (R 3.1.0)
## plyr * 1.8.1 2014-02-26 CRAN (R 3.1.0)
## qvalue * 1.40.0 2014-10-15 Bioconductor
## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.1.1)
## Rcpp * 0.11.3 2014-09-29 CRAN (R 3.1.1)
## RCurl * 1.95-4.5 2014-12-06 CRAN (R 3.1.1)
## RefManageR * 0.8.40 2014-10-29 CRAN (R 3.1.1)
## registry * 0.2 2012-01-24 CRAN (R 3.1.0)
## RJSONIO * 1.3-0 2014-07-28 CRAN (R 3.1.1)
## rmarkdown * 0.3.3 2014-09-17 CRAN (R 3.1.1)
## R.methodsS3 * 1.6.1 2014-01-05 CRAN (R 3.1.0)
## rngtools * 1.2.4 2014-03-06 CRAN (R 3.1.0)
## rpart * 4.1-8 2014-03-28 CRAN (R 3.1.1)
## Rsamtools * 1.18.2 2014-11-12 Bioconductor
## RSQLite * 1.0.0 2014-10-25 CRAN (R 3.1.1)
## rstudioapi * 0.2 2014-12-31 CRAN (R 3.1.1)
## rtracklayer * 1.26.2 2014-11-12 Bioconductor
## S4Vectors 0.4.0 2014-10-15 Bioconductor
## sendmailR * 1.2-1 2014-09-21 CRAN (R 3.1.1)
## stringr * 0.6.2 2012-12-06 CRAN (R 3.1.0)
## survival * 2.37-7 2014-01-22 CRAN (R 3.1.1)
## XML * 3.98-1.1 2013-06-20 CRAN (R 3.1.0)
## xtable * 1.7-4 2014-09-12 CRAN (R 3.1.1)
## XVector * 0.6.0 2014-10-15 Bioconductor
## yaml * 2.1.13 2014-06-12 CRAN (R 3.1.1)
## zlibbioc * 1.12.0 2014-10-15 Bioconductor
This report was generated using knitrBootstrap
(Hester, 2014) with knitr
(Xie, 2014) and rmarkdown
(Allaire, McPherson, Xie, Wickham, et al., 2014) running behind the scenes.
Citations made with knitcitations
(Boettiger, 2014).
[1] J. Allaire, J. McPherson, Y. Xie, H. Wickham, et al. rmarkdown: Dynamic Documents for R. R package version 0.3.3. 2014. URL: http://CRAN.R-project.org/package=rmarkdown.
[2] C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.5. 2014. URL: http://CRAN.R-project.org/package=knitcitations.
[3] L. Collado-Torres, A. C. Frazee, A. E. Jaffe and J. T. Leek. derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution. https://github.com/lcolladotor/derfinder - R package version 1.0.10. 2014. URL: http://www.bioconductor.org/packages/release/bioc/html/derfinder.html.
[4] A. Dabney and J. D. Storey. qvalue: Q-value estimation for false discovery rate control. R package version 1.40.0. 2015.
[5] J. Hester. knitrBootstrap: Knitr Bootstrap framework. R package version 1.0.0. 2014. URL: https://github.com/jimhester/.
[6] R. A. Irizarry, M. Ayree, K. D. Hansen and H. C. Hansen. bumphunter: Bump Hunter. R package version 1.6.0. 2015. URL: https://github.com/ririzarr/bumphunter.
[7] M. E. Ritchie, B. Phipson, D. Wu, Y. Hu, et al. “limma powers differential expression analyses for RNA-sequencing and microarray studies”. In: Nucleic Acids Research 43 (2015). URL: http://www.statsci.org/smyth/pubs/limmaPreprint.pdf.
[8] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.