This section has the code for running edgeR-robust
and DESeq2
on the simulation data set using the known exons as features.
This first code chunk loads the necessary data.
The following code performs the DESeq2 analysis. Code is based on edgeR_Robust supplementary code. The main change is that it has been modified for the multi-group scenario.
## user system elapsed
## 680.447 2.572 683.900
The following code performs the DESeq2 analysis. Code is based on edgeR_Robust supplementary code. The main change is that it has been modified for the multi-group scenario.
## user system elapsed
## 1555.359 9.853 1567.200
We can first compare the results by using the DERs as the query and the exons as the subject. The following output shows the comparison using all DERs and exploring the mismatched cases. Then its repeated using the DERs ≥ 20 bp and a minimum overlap of 20bp.
For the mismatched cases of non-significant DERs overlapping a significant exon, we check:
For the other mismatched case, we check:
## Overlaps sig DE exon
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 16671 14792 31463
## TRUE 255 260 515
## Sum 16926 15052 31978
## Overlaps sig DE exon (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 1127 1151 2278
## TRUE 255 260 515
## Sum 1382 1411 2793
## $n_overlaps
##
## 1 2
## 1131 20
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 23.00 29.00 30.22 36.00 62.00
##
## $ders_per_exon_table
##
## 1 2 3 4 5 6
## 775 109 27 12 5 4
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 49.00 60.00 67.99 77.00 182.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 290 11010 16860 302400
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 7859 26350 60970 44550 2106000
## DESeq2 BH
## DESeq2 Holm FALSE TRUE Sum
## FALSE 199844 14185 214029
## TRUE 4 387 391
## Sum 199848 14572 214420
## Overlaps sig DE exon (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2143 135 2278
## TRUE 433 82 515
## Sum 2576 217 2793
Most of the DERs are shorter than 20bp (91.27 percent), so we'll focus on the longer ones. The majority of the mismatches are from non significant DERs that overlap a significant exon.
As expected, when controlling the FWER instead of the FDR, most of the DE exons are no longer significant. Using FWER-controlled DE exons, most of the DERs 20bp or longer agree with the exons as not being significantly DE.
We can now repeat the comparison using the exons as the query and the DERs as the subject.
For the mismatched cases of non-significant exons overlapping a significant DER, we check:
For the other mismatched case, we check:
## Overlaps sig DER (FWER, min 20bp)
## Significant DE exon FALSE TRUE Sum
## FALSE 199497 98 199595
## TRUE 14338 232 14570
## Sum 213835 330 214165
## $n_overlaps
##
## 1 2 3 9
## 87 8 2 1
##
## $width_exon
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 51.0 154.8 655.0 1328.0 1899.0 9761.0
##
## $exons_per_der_table
##
## 1
## 118
## $width_exon
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.0 143.0 332.5 1160.0 1578.0 205000.0
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 6440 86290 57270 5353000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 54 526200 2054000 4818000 5809000 48690000 229
## Overlaps sig DER (FWER, min 20bp)
## Significant DE exon FALSE TRUE Sum
## FALSE 213513 261 213774
## TRUE 322 69 391
## Sum 213835 330 214165
From these results, we can see that derfinder
is more conservative.
Similar comparison using DERs as query and exons as subject with edgeR-robust
results.
## Overlaps sig DE exon (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 1256 1022 2278
## TRUE 263 252 515
## Sum 1519 1274 2793
## $n_overlaps
##
## 1 2
## 1006 16
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 24.00 30.00 30.46 36.00 58.00
##
## $ders_per_exon_table
##
## 1 2 3 4 5 6 19
## 677 95 24 9 4 4 1
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.0 49.0 59.0 66.0 75.5 174.0
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 172 10680 16300 302400
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 4856 21090 50470 42110 2106000
## edger BH
## edgeR Holm FALSE TRUE Sum
## FALSE 197071 16827 213898
## TRUE 0 522 522
## Sum 197071 17349 214420
## Overlaps sig DE exon (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2155 123 2278
## TRUE 450 65 515
## Sum 2605 188 2793
The results are fairly similar to those from using DESeq2
.
Similar comparison using exons as query and DERs as subject with edgeR-robust
results.
## Overlaps sig DER (FWER, min 20bp)
## Significant DE exon FALSE TRUE Sum
## FALSE 196707 114 196821
## TRUE 17128 216 17344
## Sum 213835 330 214165
## $n_overlaps
##
## 1 2 3 4
## 105 7 1 1
##
## $width_exon
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 66.0 181.2 954.0 1396.0 1908.0 9761.0
##
## $exons_per_der_table
##
## 1 2
## 124 1
## $width_exon
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.0 125.0 211.5 941.3 1066.0 205000.0
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 148 14720 115400 89810 5353000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 54 612100 2251000 5073000 6170000 48690000 288
## Overlaps sig DER (FWER, min 20bp)
## Significant DE exon FALSE TRUE Sum
## FALSE 213371 272 213643
## TRUE 464 58 522
## Sum 213835 330 214165
While the DERs vs exons results are fairly similar between edgeR-robust
and DESeq2
, as shown below the number of mismatched cases is high compared to the number of cases both counts-based methods agree. This is also true when controlling the FWER to determine significance.
## edgeR vs DESeq2
addmargins(table('edgeR-robust (FDR)' = edger$sig, 'DESeq2 (FDR)' = deseq$sig))
## DESeq2 (FDR)
## edgeR-robust (FDR) FALSE TRUE Sum
## FALSE 194436 2635 197071
## TRUE 5412 11937 17349
## Sum 199848 14572 214420
## Control FWER
addmargins(table('edgeR-robust (FWER)' = edger_holm$sig, 'DESeq2 (FWER)' = deseq_holm$sig))
## DESeq2 (FWER)
## edgeR-robust (FWER) FALSE TRUE Sum
## FALSE 213832 66 213898
## TRUE 197 325 522
## Sum 214029 391 214420
## Only sig if both edgeR and DEseq2 say it is
both <- deseq
both$sig <- both$sig & edger$sig
## Same, for holm
both_holm <- deseq_holm
both_holm$sig <- both_holm$sig & edger_holm$sig
We can consider an exon to be DE only if both edgeR-robust
and DESeq2
find that its significantly DE. The next sections use this information.
## Overlaps sig DE exon (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 1326 952 2278
## TRUE 286 229 515
## Sum 1612 1181 2793
## $n_overlaps
##
## 1 2
## 936 16
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 24.00 30.00 30.49 36.00 58.00
##
## $ders_per_exon_table
##
## 1 2 3 4 5 6
## 647 89 21 9 4 4
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 50.00 60.00 67.59 76.75 182.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 9817 15010 302400
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 6497 23720 61260 43820 2106000
## Both BH
## Both Holm FALSE TRUE Sum
## FALSE 202479 11616 214095
## TRUE 4 321 325
## Sum 202483 11937 214420
## Overlaps sig DE exon (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2185 93 2278
## TRUE 456 59 515
## Sum 2641 152 2793
The trends observed previously are maintained in this comparison with a reduction of cases where the exon is DE. This is expected due to the non-perfect agreement between DESeq2
and edgeR-robust
.
We can now make plots to explore some DERs for each of the cases.
## $`540`
## NULL
##
## $`625`
## NULL
##
## $`518`
## NULL
##
## $`589`
## NULL
##
## $`610`
## NULL
##
## $`616`
## NULL
##
## $`533`
## NULL
##
## $`643`
## NULL
##
## $`696`
## NULL
##
## $`520`
## NULL
## pdf
## 2
## $`540`
## NULL
##
## $`625`
## NULL
##
## $`518`
## NULL
##
## $`589`
## NULL
##
## $`610`
## NULL
##
## $`616`
## NULL
##
## $`533`
## NULL
##
## $`643`
## NULL
##
## $`696`
## NULL
##
## $`520`
## NULL
As was shown with either DESeq2
or edgeR-robust
results, derfinder
is more conservative than the counts-based methods.
## Overlaps sig DER (FWER, min 20bp)
## Significant DE exon FALSE TRUE Sum
## FALSE 202105 125 202230
## TRUE 11730 205 11935
## Sum 213835 330 214165
## $n_overlaps
##
## 1 2 3 4 9
## 112 9 2 1 1
##
## $width_exon
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 51 180 890 1354 1905 9761
##
## $exons_per_der_table
##
## 1 2
## 147 1
## $width_exon
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21 139 321 1179 1595 205000
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 6672 89960 60730 5353000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 54 535600 2094000 4886000 5941000 48690000 199
## Overlaps sig DER (FWER, min 20bp)
## Significant DE exon FALSE TRUE Sum
## FALSE 213563 277 213840
## TRUE 272 53 325
## Sum 213835 330 214165
We can now visually explore some exons for each of the four cases.
## $`290397`
## NULL
##
## $`312032`
## NULL
##
## $`473746`
## NULL
##
## $`262083`
## NULL
##
## $`69671`
## NULL
##
## $`77501`
## NULL
##
## $`97002`
## NULL
##
## $`183187`
## NULL
##
## $`105322`
## NULL
##
## $`462539`
## NULL
## pdf
## 2
## $`290397`
## NULL
##
## $`312032`
## NULL
##
## $`473746`
## NULL
##
## $`262083`
## NULL
##
## $`69671`
## NULL
##
## $`77501`
## NULL
##
## $`97002`
## NULL
##
## $`183187`
## NULL
##
## $`105322`
## NULL
##
## $`462539`
## NULL
The code in this section is partially based on /home/epi/ajaffe/Lieber/Projects/RNAseq/HippoPublic/clean_previous_hits.R
.
First we find the regions of the genome corresponding to the genes of interest. That is, the genes from the original paper that were differentially expressed in the conditions we are analyzing.
## RN7SK SNORA73A
## "125050" "6080"
In this section, we make a plot for each gene showing the coverage data and whether derfinder
identified candidate DERs as described in the main text.
Date the report was generated.
## [1] "2015-03-14 04:52:25 EDT"
Wallclock time spent generating the report.
## Time difference of 3.266 hours
R
session information.
## setting value
## version R version 3.1.1 Patched (2014-10-16 r66782)
## system x86_64, linux-gnu
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz
## package * version date source
## acepack * 1.3-3.3 2013-05-03 CRAN (R 3.1.1)
## annotate * 1.44.0 2014-10-15 Bioconductor
## AnnotationDbi 1.28.1 2014-10-29 Bioconductor
## base64enc * 0.1-2 2014-06-26 CRAN (R 3.1.0)
## BatchJobs * 1.5 2014-10-30 CRAN (R 3.1.1)
## BBmisc * 1.9 2015-02-03 CRAN (R 3.1.1)
## Biobase 2.26.0 2014-10-15 Bioconductor
## BiocGenerics 0.12.1 2014-11-15 Bioconductor
## BiocParallel * 1.0.3 2015-02-09 Bioconductor
## biomaRt * 2.22.0 2014-10-15 Bioconductor
## Biostrings * 2.34.1 2014-12-13 Bioconductor
## biovizBase * 1.14.1 2014-12-14 Bioconductor
## bitops * 1.0-6 2013-08-17 CRAN (R 3.1.0)
## brew * 1.0-6 2011-04-13 CRAN (R 3.1.0)
## BSgenome * 1.34.1 2014-12-31 Bioconductor
## bumphunter * 1.6.0 2014-10-15 Bioconductor
## Cairo * 1.5-6 2014-06-26 CRAN (R 3.1.0)
## checkmate * 1.5.1 2014-12-14 CRAN (R 3.1.1)
## cluster * 1.15.3 2014-09-04 CRAN (R 3.1.1)
## codetools * 0.2-9 2014-08-21 CRAN (R 3.1.1)
## colorspace * 1.2-6 2015-03-11 CRAN (R 3.1.1)
## DBI 0.3.1 2014-09-24 CRAN (R 3.1.1)
## derfinder 1.0.10 2015-03-14 Bioconductor
## derfinderHelper 1.0.4 2014-11-05 Github (lcolladotor/derfinderHelper@27bcfe6)
## derfinderPlot 1.0.3 2015-02-22 Github (lcolladotor/derfinderPlot-release@d1dd4bd)
## DESeq2 1.6.3 2015-03-12 Bioconductor
## devtools * 1.7.0 2015-01-17 CRAN (R 3.1.1)
## dichromat * 2.0-0 2013-01-24 CRAN (R 3.1.0)
## digest * 0.6.8 2014-12-31 CRAN (R 3.1.1)
## doRNG * 1.6 2014-03-07 CRAN (R 3.1.0)
## edgeR 3.8.6 2015-03-12 Bioconductor
## evaluate * 0.5.5 2014-04-29 CRAN (R 3.1.0)
## fail * 1.2 2013-09-19 CRAN (R 3.1.0)
## foreach * 1.4.2 2014-04-11 CRAN (R 3.1.0)
## foreign * 0.8-61 2014-03-28 CRAN (R 3.1.1)
## formatR * 1.0 2014-08-25 CRAN (R 3.1.1)
## Formula * 1.2-0 2015-01-20 CRAN (R 3.1.1)
## genefilter * 1.48.1 2014-10-17 Bioconductor
## geneplotter * 1.44.0 2014-10-15 Bioconductor
## GenomeInfoDb 1.2.4 2014-12-20 Bioconductor
## GenomicAlignments * 1.2.2 2015-03-04 Bioconductor
## GenomicFeatures 1.18.3 2014-12-17 Bioconductor
## GenomicFiles * 1.2.1 2015-02-08 Bioconductor
## GenomicRanges 1.18.4 2015-01-08 Bioconductor
## GGally * 0.5.0 2014-12-02 CRAN (R 3.1.1)
## ggbio 1.14.0 2014-11-04 Bioconductor
## ggplot2 1.0.0 2014-05-21 CRAN (R 3.1.0)
## graph * 1.44.1 2014-12-10 Bioconductor
## gridExtra * 0.9.1 2012-08-09 CRAN (R 3.1.1)
## gtable * 0.1.2 2012-12-05 CRAN (R 3.1.0)
## Hmisc * 3.15-0 2015-02-16 CRAN (R 3.1.1)
## htmltools * 0.2.6 2014-09-08 CRAN (R 3.1.1)
## IRanges 2.0.1 2014-12-13 Bioconductor
## iterators * 1.0.7 2014-04-11 CRAN (R 3.1.0)
## knitr * 1.9 2015-01-20 CRAN (R 3.1.1)
## knitrBootstrap * 1.0.0 2014-11-19 Github (jimhester/knitrBootstrap@76c41f0)
## labeling * 0.3 2014-08-23 CRAN (R 3.1.1)
## lattice * 0.20-29 2014-04-04 CRAN (R 3.1.1)
## latticeExtra * 0.6-26 2013-08-15 CRAN (R 3.1.0)
## limma 3.22.7 2015-03-13 Bioconductor
## locfit * 1.5-9.1 2013-04-20 CRAN (R 3.1.0)
## markdown * 0.7.4 2014-08-24 CRAN (R 3.1.1)
## MASS * 7.3-35 2014-09-30 CRAN (R 3.1.1)
## Matrix * 1.1-4 2014-06-15 CRAN (R 3.1.1)
## matrixStats * 0.14.0 2015-02-14 CRAN (R 3.1.1)
## mime * 0.2 2014-09-26 CRAN (R 3.1.1)
## munsell * 0.4.2 2013-07-11 CRAN (R 3.1.0)
## nnet * 7.3-8 2014-03-28 CRAN (R 3.1.1)
## OrganismDbi * 1.8.1 2015-03-11 Bioconductor
## org.Hs.eg.db 3.0.0 2014-09-27 Bioconductor
## pkgmaker * 0.22 2014-05-14 CRAN (R 3.1.0)
## plyr 1.8.1 2014-02-26 CRAN (R 3.1.0)
## proto * 0.3-10 2012-12-22 CRAN (R 3.1.0)
## qvalue * 1.43.0 2015-03-06 Bioconductor
## RBGL * 1.42.0 2014-10-15 Bioconductor
## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.1.1)
## Rcpp 0.11.5 2015-03-06 CRAN (R 3.1.1)
## RcppArmadillo 0.4.650.1.1 2015-02-26 CRAN (R 3.1.1)
## RCurl * 1.95-4.5 2014-12-06 CRAN (R 3.1.1)
## registry * 0.2 2012-01-24 CRAN (R 3.1.0)
## reshape * 0.8.5 2014-04-23 CRAN (R 3.1.0)
## reshape2 1.4.1 2014-12-06 CRAN (R 3.1.1)
## rmarkdown 0.5.1 2015-01-26 CRAN (R 3.1.1)
## rngtools * 1.2.4 2014-03-06 CRAN (R 3.1.0)
## rpart * 4.1-8 2014-03-28 CRAN (R 3.1.1)
## Rsamtools * 1.18.3 2015-03-04 Bioconductor
## RSQLite 1.0.0 2014-10-25 CRAN (R 3.1.1)
## rstudioapi * 0.2 2014-12-31 CRAN (R 3.1.1)
## rtracklayer * 1.26.2 2014-11-12 Bioconductor
## S4Vectors 0.4.0 2014-10-15 Bioconductor
## scales 0.2.4 2014-04-22 CRAN (R 3.1.0)
## sendmailR * 1.2-1 2014-09-21 CRAN (R 3.1.1)
## stringr * 0.6.2 2012-12-06 CRAN (R 3.1.0)
## survival * 2.37-7 2014-01-22 CRAN (R 3.1.1)
## TxDb.Hsapiens.UCSC.hg19.knownGene 3.0.0 2014-09-29 Bioconductor
## VariantAnnotation * 1.12.9 2015-01-22 Bioconductor
## XML * 3.98-1.1 2013-06-20 CRAN (R 3.1.0)
## xtable * 1.7-4 2014-09-12 CRAN (R 3.1.1)
## XVector 0.6.0 2014-10-15 Bioconductor
## yaml * 2.1.13 2014-06-12 CRAN (R 3.1.1)
## zlibbioc * 1.12.0 2014-10-15 Bioconductor