This page describes the supplementary material for the derfinder software paper which includes several HTML reports as well as code files for reproducing the results.
All the bash, R and R Markdown source files used to analyze the data for this project as well as generate the HTML reports are available in this website. However, it is easier to view them at github.com/leekgroup/derSoftware.
There are 9 main bash scripts named _step1-*_ through _step9-*_.
There are also 3 optional bash scripts used when BAM files are available.
HTSeq. See optional2-HTSeq.sh.GenomicRanges to create the exon count tables. See optional3-summOv.sh and optional3-summOv.R.A final bash script, run-all.sh, can be used to run the main 9 steps (or a subset of them).
All 13 bash scripts show at the top the way they were used. Some of them generate intermediate small bash scripts, for example one script per chromosome for the analyzeChr step. For some steps, there is a companion R or R Markdown code file when the code is more involved or an HTML file is generated in the particular step.
The check-analysis-time.R script was useful for checking the progress of the step3-analyzeChr jobs and detect whenever a node in the cluster was presenting problems.
We expect that these scripts will be useful to derfinder users who want to automate the single-base level and/or expressed-regions level analyses for several data sets and/or have the jobs run automatically without having to check if each step has finished running.
Note that all bash scripts are tailored for the cluster we have access to which administer job queues with Sun Grid Engine (SGE).
These HTML reports contain basic information on the derfinder (Collado-Torres, Frazee, Love, Irizarry, et al., 2015) results from the public data sets used (BrainSpan, Simulation, and Hippo). These reports answer basic questions such as:
They also illustrate three clusters of candidate differentially expressed regions (DERs) from the single-base level analysis. You can view the reports by following these links:
This HTML report has the code for loading the R data files and generating the CSV files. The report also has Venn diagrams showing the number of candidate DERs from the single-base level analysis that overlap known exons, introns and intergenic regions using the UCSC hg19 annotation. It also includes a detailed description of the columns in the CSV files.
View the venn report or its R Markdown source file venn.Rmd.
For each experiment, we made a simple comparison between the single-base level DERs and the expressed regions identified via regionMatrix(). Note that you would need to use limma, edgeR, DESeq, or another count-level differential expression package to determine which expressed regions are differentially expressed. In the simulation case we did so using limma as is described in the corresponding section. For some data sets we used more than one mean cutoff (shown in parenthesis below) for determining the expressed regions.
These reports show how many bases are picked up in each of the approaches and different overlap comparisons. You can view the reports by following these links:
The code for generating the simulated RNA-seq reads and the chosen setup is described in the generateReads report. This report is generated by the R Markdown generateReads.Rmd file.
The code for aligning the reads to the genome with TopHat is in the run-paired-tophat.sh and makeBai.sh scripts.
There is also code for exporting the coverage data to BigWig files, which was necessary for a tutorial on how to use derfinder: derTutor. The code is available in the bigwig.sh script.
A through evaluation of the simulation results from the single-base level analysis is described in the evaluate report. Several comparisons are made at the gene, transcript and exonic segment levels.
The R Markdown source file for this report is evaluate.Rmd
Similarly to the previous report, the expressed-regions level analysis evaluation is described in the evaluate-regionMatrix report. The code and language were slightly modified from the previous report.
The R Markdown source file for this report is evaluate-regionMatrix.Rmd
We have several scripts and reports for the comparison between derfinder and DESeq2, edgeR-robust as well as the original implementation of derfinder.
This first set of scripts were used to run the original implementation of derfinder.
The following scripts and reports show the comparison between these methods.
derfinder itself. The R Markdown source is counts-based.Rmd.R Markdown source is all-exons.Rmd.R Markdown source is counts-gene.Rmd.derfinder against the gene-level counts-based methods. The R Markdown source is counts-gene-eval.Rmd.This HTML report has code for reading and processing the time and memory information for each job extracted with efficiency_analytics (Frazee, 2014). Several plots exploring the relationship between wall time and memory used by the cluster jobs are included. Some of the plots make explicit the number of cores used by each job. The report contains a detailed description of the analysis steps shown on the plots. It also contains tables summarizing the maximum memory and time for each analysis step if all the jobs for that particular step were running simultaneously. Finally, there is an interactive table with the timing results.
View the timing report or check the R Markdown file timing.Rmd.
The code used for generating the panels using in Figure 1 of the paper is available in the figure1.R file.
The code used for generating the panels using in Figure 2 of the paper is available in the figure2.R file.
The following R source files have the code for reproducing additional analyses described in the paper
These scripts also include other exploratory code.
compareVsPNAS is an HTML report comparing 29 regions that were previously found to be differentially expressed (Zhou, Zhifeng, Yuan, Qiaoping, et al., 2011) versus the derfinder single-base level results. It also has code for identified differentially expressed disjoint exons. The additional script counts-gene.R has the code for gene counting with summarizeOverlaps(). compareVsPNAS-gene compares the results between DESeq2 and edgeR-robust against derfinder at the gene level with 40 total plots: 10 for each case of agreement/disagreement.
View the compareVsPNAS report or check the R Markdown file compareVsPNAS.Rmd run by the runComparison.sh script. Also view the compareVsPNAS-gene report and its linked R Markdown file compareVsPNAS-gene.Rmd.
Date this page was generated.
## [1] "2015-04-13 15:58:54 EDT"
Wallclock time spent generating the report.
## Time difference of 1.591 secs
R session information.
##  setting  value                                             
##  version  R Under development (unstable) (2014-11-01 r66923)
##  system   x86_64, darwin10.8.0                              
##  ui       X11                                               
##  language (EN)                                              
##  collate  en_US.UTF-8                                       
##  tz       America/New_York
##  package        * version  date       source                                   
##  bibtex           0.4.0    2014-12-31 CRAN (R 3.2.0)                           
##  bitops           1.0.6    2013-08-17 CRAN (R 3.2.0)                           
##  devtools         1.6.1    2014-10-07 CRAN (R 3.2.0)                           
##  digest           0.6.8    2014-12-31 CRAN (R 3.2.0)                           
##  evaluate         0.5.5    2014-04-29 CRAN (R 3.2.0)                           
##  formatR          1.0      2014-08-25 CRAN (R 3.2.0)                           
##  htmltools        0.2.6    2014-09-08 CRAN (R 3.2.0)                           
##  httr             0.5      2014-09-02 CRAN (R 3.2.0)                           
##  knitcitations  * 1.0.4    2014-11-03 Github (cboettig/knitcitations@508de74)  
##  knitr            1.7      2014-10-13 CRAN (R 3.2.0)                           
##  knitrBootstrap   1.0.0    2014-11-03 Github (jimhester/knitrBootstrap@76c41f0)
##  lubridate        1.3.3    2013-12-31 CRAN (R 3.2.0)                           
##  markdown         0.7.4    2014-08-24 CRAN (R 3.2.0)                           
##  memoise          0.2.1    2014-04-22 CRAN (R 3.2.0)                           
##  plyr             1.8.1    2014-02-26 CRAN (R 3.2.0)                           
##  Rcpp             0.11.5   2015-03-06 CRAN (R 3.2.0)                           
##  RCurl            1.95.4.5 2014-12-28 CRAN (R 3.2.0)                           
##  RefManageR       0.8.40   2014-10-29 CRAN (R 3.2.0)                           
##  RJSONIO          1.3.0    2014-07-28 CRAN (R 3.2.0)                           
##  rmarkdown      * 0.3.3    2014-09-17 CRAN (R 3.2.0)                           
##  rstudioapi       0.3.1    2015-04-07 CRAN (R 3.2.0)                           
##  stringr          0.6.2    2012-12-06 CRAN (R 3.2.0)                           
##  XML              3.98.1.1 2013-06-20 CRAN (R 3.2.0)                           
##  yaml             2.1.13   2014-06-12 CRAN (R 3.2.0)
You can view the source R Markdown file for this page at index.Rmd.
This report was generated using knitrBootstrap (Hester, 2014) with knitr (Xie, 2014) and rmarkdown (Allaire, McPherson, Xie, Wickham, et al., 2014) running behind the scenes.
Citations were made with knitcitations (Boettiger, 2015).
[1] J. Allaire, J. McPherson, Y. Xie, H. Wickham, et al. rmarkdown: Dynamic Documents for R. R package version 0.3.3. 2014. URL: http://CRAN.R-project.org/package=rmarkdown.
[2] C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.4. 2015. URL: https://github.com/cboettig/knitcitations.
[3] L. Collado-Torres, A. C. Frazee, M. I. Love, R. A. Irizarry, et al. “derfinder: Software for annotation-agnostic RNA-seq differential expression analysis”. In: bioRxiv (2015). DOI: 10.1101/015370. URL: http://www.biorxiv.org/content/early/2015/02/19/015370.abstract.
[4] A. Frazee. Efficiency analysis of Sun Grid Engine batch jobs. 2014. URL: http://dx.doi.org/10.6084/m9.figshare.878000.
[5] J. Hester. knitrBootstrap: Knitr Bootstrap framework. R package version 1.0.0. 2014. URL: https://github.com/jimhester/.
[6] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014. URL: http://www.crcpress.com/product/isbn/9781466561595.
[7] Zhou, Zhifeng, Yuan, Qiaoping, et al. “Substance-specific and shared transcription and epigenetic changes in the human hippocampus chronically exposed to cocaine and alcohol”. In: Proceedings of the National Academy of Sciences of the United States of America 108.16 (2011), pp. 6626-6631.