This page describes the supplementary material for the derfinder
counting paper. All the bash
, R
and R Markdown
source files used to analyze the data for this project as well as generate the HTML reports are available in this website. However, it is easier to view them at github.com/leekgroup/derCountSupp.
This section of the website describes the code and reports associated with the hippocampus and time-course data sets that are referred to in the paper.
There are 9 main bash
scripts named _step1-*_ through _step9-*_.
There are also 3 optional bash
scripts used when BAM files are available.
HTSeq
. See optional2-HTSeq.sh.GenomicRanges
to create the exon count tables. See optional3-summOv.sh and optional3-summOv.R.A final bash
script, run-all.sh, can be used to run the main 9 steps (or a subset of them).
All scripts show at the beginning the way they were used. Some of them generate intermediate small bash
scripts, for example one script per chromosome for the analyzeChr step. For some steps, there is a companion R
or R Markdown
code file when the code is more involved or an HTML file is generated in the particular step.
The check-analysis-time.R script was useful for checking the progress of the step3-analyzeChr jobs and detect whenever a node in the cluster was presenting problems.
We expect that these scripts will be useful to derfinder
users who want to automate the single base-level and/or expressed regions-level analyses for several data sets and/or have the jobs run automatically without having to check if each step has finished running.
Note that all bash
scripts are tailored for the cluster we have access to which administer job queues with Sun Grid Engine (SGE).
This HTML report contains basic information on the derfinder
(Collado-Torres, Frazee, Love, Irizarry, et al., 2015) results from the Hippo data set. The report answers basic questions such as:
It also illustrates three clusters of candidate differentially expressed regions (DERs) from the single base-level analysis. You can view the report by following this link:
This HTML report has the code for loading the R data files and generating the CSV files. The report also has Venn diagrams showing the number of candidate DERs from the single base-level analysis that overlap known exons, introns and intergenic regions using the UCSC hg19 annotation. It also includes a detailed description of the columns in the CSV files.
View the venn report or its R Markdown
source file venn.Rmd.
This HTML report has code for reading and processing the time and memory information for each job extracted with efficiency_analytics (Frazee, 2014). The report contains a detailed description of the analysis steps and tables summarizing the maximum memory and time for each analysis step if all the jobs for that particular step were running simultaneously. Finally, there is an interactive table with the timing results.
View the timing report or check the R Markdown
file timing.Rmd.
compareVsPNAS is an HTML report comparing 29 regions that were previously found to be differentially expressed (Zhou, Zhifeng, Yuan, Qiaoping, et al., 2011) versus the derfinder
single base-level results. It also has code for identified differentially expressed disjoint exons. The additional script counts-gene.R has the code for gene counting with summarizeOverlaps()
. compareVsPNAS-gene compares the results between DESeq2
and edgeR
-robust against derfinder
at the gene level with 40 total plots: 10 for each case of agreement/disagreement.
View the compareVsPNAS report or check the R Markdown
file compareVsPNAS.Rmd run by the runComparison.sh script. Also view the compareVsPNAS-gene report and its linked R Markdown
file compareVsPNAS-gene.Rmd.
The following R
source files have the code for reproducing additional analyses described in the paper
This scripts also include other exploratory code.
Date this page was generated.
## [1] "2016-03-21 10:16:26 EDT"
Wallclock time spent generating the report.
## Time difference of 1.172 secs
R
session information.
## Session info -----------------------------------------------------------------------------------------------------------
## setting value
## version R version 3.2.2 (2015-08-14)
## system x86_64, darwin13.4.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/New_York
## date 2016-03-21
## Packages ---------------------------------------------------------------------------------------------------------------
## package * version date source
## bibtex 0.4.0 2014-12-31 CRAN (R 3.2.0)
## BiocStyle * 1.8.0 2015-10-14 Bioconductor
## bitops 1.0-6 2013-08-17 CRAN (R 3.2.0)
## devtools 1.10.0 2016-01-23 CRAN (R 3.2.3)
## digest 0.6.9 2016-01-08 CRAN (R 3.2.3)
## evaluate 0.8 2015-09-18 CRAN (R 3.2.0)
## formatR 1.2.1 2015-09-18 CRAN (R 3.2.0)
## htmltools 0.3 2015-12-29 CRAN (R 3.2.3)
## httr 1.1.0 2016-01-28 CRAN (R 3.2.3)
## knitcitations * 1.0.7 2015-10-28 CRAN (R 3.2.0)
## knitr 1.12.3 2016-01-22 CRAN (R 3.2.3)
## lubridate 1.5.0 2015-12-03 CRAN (R 3.2.3)
## magrittr 1.5 2014-11-22 CRAN (R 3.2.0)
## memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
## plyr 1.8.3 2015-06-12 CRAN (R 3.2.1)
## R6 2.1.2 2016-01-26 CRAN (R 3.2.3)
## Rcpp 0.12.3 2016-01-10 CRAN (R 3.2.3)
## RCurl 1.95-4.7 2015-06-30 CRAN (R 3.2.1)
## RefManageR 0.10.6 2016-02-15 CRAN (R 3.2.3)
## RJSONIO 1.3-0 2014-07-28 CRAN (R 3.2.0)
## rmarkdown * 0.9.2 2016-01-01 CRAN (R 3.2.3)
## stringi 1.0-1 2015-10-22 CRAN (R 3.2.0)
## stringr 1.0.0 2015-04-30 CRAN (R 3.2.0)
## XML 3.98-1.3 2015-06-30 CRAN (R 3.2.0)
## yaml 2.1.13 2014-06-12 CRAN (R 3.2.0)
You can view the source R Markdown
file for this page at index.Rmd.
This report was generated using BiocStyle
(Morgan, Oleś, and Huber, 2016) with knitr
(Xie, 2014) and rmarkdown
(Allaire, Cheng, Xie, McPherson, et al., 2016) running behind the scenes.
Citations were made with knitcitations
(Boettiger, 2015). Citation file: index.bib.
[1] J. Allaire, J. Cheng, Y. Xie, J. McPherson, et al. rmarkdown: Dynamic Documents for R. R package version 0.9.2. 2016. URL: http://CRAN.R-project.org/package=rmarkdown.
[2] C. Boettiger. knitcitations: Citations for ‘Knitr’ Markdown Files. R package version 1.0.7. 2015. URL: http://CRAN.R-project.org/package=knitcitations.
[3] L. Collado-Torres, A. C. Frazee, M. I. Love, R. A. Irizarry, et al. “derfinder: Software for annotation-agnostic RNA-seq differential expression analysis”. In: bioRxiv (2015). DOI: 10.1101/015370. URL: http://www.biorxiv.org/content/early/2015/02/19/015370.abstract.
[4] A. Frazee. Efficiency analysis of Sun Grid Engine batch jobs. 2014. URL: http://dx.doi.org/10.6084/m9.figshare.878000.
[5] M. Morgan, A. Oleś and W. Huber. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 1.8.0. 2016. URL: https://github.com/Bioconductor/BiocStyle.
[6] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.
[7] Zhou, Zhifeng, Yuan, Qiaoping, et al. “Substance-specific and shared transcription and epigenetic changes in the human hippocampus chronically exposed to cocaine and alcohol”. In: Proceedings of the National Academy of Sciences of the United States of America 108.16 (2011), pp. 6626-6631.