This section has the code for running edgeR-robust
and DESeq2
on the simulation data set using the known genes as features.
This first code chunk loads the necessary data.
The following code performs the DESeq2 analysis. Code is based on edgeR_Robust supplementary code. The main change is that it has been modified for the multi-group scenario.
## user system elapsed
## 30.693 1.018 31.903
The following code performs the DESeq2 analysis. Code is based on edgeR_Robust supplementary code. The main change is that it has been modified for the multi-group scenario.
## user system elapsed
## 75.526 6.136 82.226
We can first compare the results by using the DERs as the query and the genes as the subject. The following output shows the comparison using all DERs and exploring the mismatched cases. Then its repeated using the DERs ≥ 20 bp and a minimum overlap of 20bp.
For the mismatched cases of non-significant DERs overlapping a significant gene, we check:
For the other mismatched case, we check:
## Overlaps sig DE gene
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 13416 18047 31463
## TRUE 288 227 515
## Sum 13704 18274 31978
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 1030 1248 2278
## TRUE 288 227 515
## Sum 1318 1475 2793
## $n_overlaps
##
## 1 2
## 1241 7
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 23.00 29.00 30.07 36.00 62.00
##
## $ders_per_gene_table
##
## 1 2 3 4 5 6 7 8 9 10 21
## 524 127 52 24 14 10 3 3 1 2 1
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 47.00 60.00 67.08 77.00 182.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 901 16810 16930 1385000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 6172 24420 83570 65550 2651000
## DESeq2 BH
## DESeq2 Holm FALSE TRUE Sum
## FALSE 14091 7414 21505
## TRUE 0 311 311
## Sum 14091 7725 21816
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2057 221 2278
## TRUE 442 73 515
## Sum 2499 294 2793
Most of the DERs are shorter than 20bp (91.27 percent), so we'll focus on the longer ones. The majority of the mismatches are from non significant DERs that overlap a significant gene.
As expected, when controlling the FWER instead of the FDR, most of the DE genes are no longer significant. Using FWER-controlled DE genes, most of the DERs 20bp or longer agree with the genes as not being significantly DE.
We can now repeat the comparison using the genes as the query and the DERs as the subject.
For the mismatched cases of non-significant genes overlapping a significant DER, we check:
For the other mismatched case, we check:
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 13988 103 14091
## TRUE 7550 175 7725
## Sum 21538 278 21816
## $n_overlaps
##
## 1 2 3 4 6 11
## 73 21 5 2 1 1
##
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 493 1456 3410 4318 7055 13630
##
## $genes_per_der_table
##
## 1 20
## 135 1
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 46 2372 3751 4705 5913 118100
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6161 30960 138100 118500 5384000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52 757800 2436000 5391000 6405000 48690000
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 21277 228 21505
## TRUE 261 50 311
## Sum 21538 278 21816
From these results, we can see that derfinder
is more conservative.
Similar comparison using DERs as query and genes as subject with edgeR-robust
results.
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 911 1367 2278
## TRUE 227 288 515
## Sum 1138 1655 2793
## $n_overlaps
##
## 1 2
## 1354 13
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 23.00 29.00 30.19 36.00 62.00
##
## $ders_per_gene_table
##
## 1 2 3 4 5 6 7 8 9 10 21
## 563 136 61 27 16 12 4 3 1 2 1
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.00 46.00 58.00 64.83 74.00 174.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 6121 21320 20470 1385000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 69 7492 24510 74750 43900 2089000
## edger BH
## edgeR Holm FALSE TRUE Sum
## FALSE 13397 7887 21284
## TRUE 0 532 532
## Sum 13397 8419 21816
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2067 211 2278
## TRUE 442 73 515
## Sum 2509 284 2793
The results are fairly similar to those from using DESeq2
.
Similar comparison using genes as query and DERs as subject with edgeR-robust
results.
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 13325 72 13397
## TRUE 8213 206 8419
## Sum 21538 278 21816
## $n_overlaps
##
## 1 2 3 4
## 57 11 2 2
##
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 493 1579 3758 4619 7095 13630
##
## $genes_per_der_table
##
## 1 19
## 74 1
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 46 2282 3631 4514 5667 118100
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 5977 31080 141900 122200 5384000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52 737100 2360000 5379000 6422000 48690000
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 21059 225 21284
## TRUE 479 53 532
## Sum 21538 278 21816
While the DERs vs genes results are fairly similar between edgeR-robust
and DESeq2
, as shown below the number of mismatched cases is high compared to the number of cases both counts-based methods agree. This is also true when controlling the FWER to determine significance.
## edgeR vs DESeq2
addmargins(table('edgeR-robust (FDR)' = mcols(edger)$sig, 'DESeq2 (FDR)' = mcols(deseq)$sig))
## DESeq2 (FDR)
## edgeR-robust (FDR) FALSE TRUE Sum
## FALSE 12897 500 13397
## TRUE 1194 7225 8419
## Sum 14091 7725 21816
## Control FWER
addmargins(table('edgeR-robust (FWER)' = mcols(edger_holm)$sig, 'DESeq2 (FWER)' = mcols(deseq_holm)$sig))
## DESeq2 (FWER)
## edgeR-robust (FWER) FALSE TRUE Sum
## FALSE 21234 50 21284
## TRUE 271 261 532
## Sum 21505 311 21816
## Only sig if both edgeR and DEseq2 say it is
both <- deseq
mcols(both)$sig <- mcols(both)$sig & mcols(edger)$sig
## Same, for holm
both_holm <- deseq_holm
mcols(both_holm)$sig <- mcols(both_holm)$sig & mcols(edger_holm)$sig
We can consider an gene to be DE only if both edgeR-robust
and DESeq2
find that its significantly DE. The next sections use this information.
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 1066 1212 2278
## TRUE 289 226 515
## Sum 1355 1438 2793
## $n_overlaps
##
## 1 2
## 1205 7
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 23.00 29.00 30.16 36.00 62.00
##
## $ders_per_gene_table
##
## 1 2 3 4 5 6 7 8 9 10 21
## 510 121 52 23 14 9 3 3 1 2 1
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 47.00 60.00 67.16 77.00 182.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 887 16750 16900 1385000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 7309 27620 89360 73330 2651000
## Both BH
## Both Holm FALSE TRUE Sum
## FALSE 14591 6964 21555
## TRUE 0 261 261
## Sum 14591 7225 21816
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2109 169 2278
## TRUE 454 61 515
## Sum 2563 230 2793
The trends observed previously are maintained in this comparison with a reduction of cases where the gene is DE. This is expected due to the non-perfect agreement between DESeq2
and edgeR-robust
.
We can now make plots to explore some DERs for each of the cases.
## $`535`
## NULL
##
## $`540`
## NULL
##
## $`625`
## NULL
##
## $`594`
## NULL
##
## $`589`
## NULL
##
## $`574`
## NULL
##
## $`610`
## NULL
##
## $`616`
## NULL
##
## $`626`
## NULL
##
## $`533`
## NULL
## quartz_off_screen
## 2
## $`535`
## NULL
##
## $`540`
## NULL
##
## $`625`
## NULL
##
## $`594`
## NULL
##
## $`589`
## NULL
##
## $`574`
## NULL
##
## $`610`
## NULL
##
## $`616`
## NULL
##
## $`626`
## NULL
##
## $`533`
## NULL
As was shown with either DESeq2
or edgeR-robust
results, derfinder
is more conservative than the counts-based methods.
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 14487 104 14591
## TRUE 7051 174 7225
## Sum 21538 278 21816
## $n_overlaps
##
## 1 2 3 4 6 11
## 74 21 5 2 1 1
##
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 493 1462 3452 4333 7050 13630
##
## $genes_per_der_table
##
## 1 20
## 136 1
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 46 2348 3702 4627 5802 118100
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 5687 29570 137300 115400 5384000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52 757600 2441000 5455000 6478000 48690000
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 21321 234 21555
## TRUE 217 44 261
## Sum 21538 278 21816
We can now visually explore some genes (maximum 10kb long) for each of the four cases.
## $`401505`
## NULL
##
## $`1581`
## NULL
##
## $`26168`
## NULL
##
## $`55299`
## NULL
##
## $`4708`
## NULL
##
## $`2524`
## NULL
##
## $`112703`
## NULL
##
## $`100506746`
## NULL
##
## $`54989`
## NULL
##
## $`363`
## NULL
## quartz_off_screen
## 2
## $`401505`
## NULL
##
## $`1581`
## NULL
##
## $`26168`
## NULL
##
## $`55299`
## NULL
##
## $`4708`
## NULL
##
## $`2524`
## NULL
##
## $`112703`
## NULL
##
## $`100506746`
## NULL
##
## $`54989`
## NULL
##
## $`363`
## NULL
## $`2987`
## NULL
## quartz_off_screen
## 2
## $`84365`
## NULL
## quartz_off_screen
## 2
## $`6122`
## NULL
## quartz_off_screen
## 2
## $`1581`
## NULL
## quartz_off_screen
## 2
Date the report was generated.
## [1] "2016-02-20 15:50:54 EST"
Wallclock time spent generating the report.
## Time difference of 25.974 mins
R
session information.
## setting value
## version R version 3.2.2 (2015-08-14)
## system x86_64, darwin13.4.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/New_York
## date 2016-02-20
## package * version date source
## acepack 1.3-3.3 2014-11-24 CRAN (R 3.2.0)
## annotate 1.48.0 2015-10-14 Bioconductor
## AnnotationDbi * 1.32.3 2015-12-24 Bioconductor
## Biobase * 2.30.0 2015-10-14 Bioconductor
## BiocGenerics * 0.16.1 2015-11-06 Bioconductor
## BiocInstaller 1.20.1 2015-11-18 Bioconductor
## BiocParallel 1.4.3 2015-12-16 Bioconductor
## biomaRt 2.26.1 2015-11-23 Bioconductor
## Biostrings 2.38.4 2016-02-09 Bioconductor
## biovizBase 1.18.0 2015-10-14 Bioconductor
## bitops 1.0-6 2013-08-17 CRAN (R 3.2.0)
## BSgenome 1.38.0 2015-10-14 Bioconductor
## bumphunter 1.10.0 2015-10-14 Bioconductor
## cluster 2.0.3 2015-07-21 CRAN (R 3.2.2)
## codetools 0.2-14 2015-07-15 CRAN (R 3.2.2)
## colorspace 1.2-6 2015-03-11 CRAN (R 3.2.0)
## DBI 0.3.1 2014-09-24 CRAN (R 3.2.0)
## derfinder * 1.5.19 2015-12-15 Bioconductor
## derfinderHelper * 1.4.1 2015-11-03 Bioconductor
## derfinderPlot * 1.5.4 2016-02-20 Bioconductor
## DESeq2 * 1.10.1 2015-12-22 Bioconductor
## devtools 1.10.0 2016-01-23 CRAN (R 3.2.3)
## dichromat 2.0-0 2013-01-24 CRAN (R 3.2.0)
## digest 0.6.9 2016-01-08 CRAN (R 3.2.3)
## doRNG 1.6 2014-03-07 CRAN (R 3.2.0)
## edgeR * 3.12.0 2015-10-14 Bioconductor
## evaluate 0.8 2015-09-18 CRAN (R 3.2.0)
## foreach 1.4.3 2015-10-13 CRAN (R 3.2.0)
## foreign 0.8-66 2015-08-19 CRAN (R 3.2.0)
## formatR 1.2.1 2015-09-18 CRAN (R 3.2.0)
## Formula 1.2-1 2015-04-07 CRAN (R 3.2.0)
## futile.logger 1.4.1 2015-04-20 CRAN (R 3.2.0)
## futile.options 1.0.0 2010-04-06 CRAN (R 3.2.0)
## genefilter 1.52.1 2016-01-28 Bioconductor
## geneplotter 1.48.0 2015-10-14 Bioconductor
## GenomeInfoDb * 1.6.3 2016-01-26 Bioconductor
## GenomicAlignments 1.6.3 2016-01-06 Bioconductor
## GenomicFeatures * 1.22.13 2016-02-11 Bioconductor
## GenomicFiles 1.6.2 2015-12-30 Bioconductor
## GenomicRanges * 1.22.4 2016-01-30 Bioconductor
## GGally 1.0.1 2016-01-14 CRAN (R 3.2.3)
## ggbio 1.18.5 2016-02-13 Bioconductor
## ggplot2 2.0.0 2015-12-18 CRAN (R 3.2.3)
## graph 1.48.0 2015-10-14 Bioconductor
## gridExtra 2.0.0 2015-07-14 CRAN (R 3.2.0)
## gtable 0.1.2 2012-12-05 CRAN (R 3.2.0)
## Hmisc 3.17-1 2015-12-18 CRAN (R 3.2.3)
## htmltools 0.3 2015-12-29 CRAN (R 3.2.3)
## IRanges * 2.4.7 2016-02-09 Bioconductor
## iterators 1.0.8 2015-10-13 CRAN (R 3.2.0)
## knitr 1.12.3 2016-01-22 CRAN (R 3.2.3)
## knitrBootstrap 1.0.0 2015-05-19 Github (jimhester/knitrBootstrap@76c41f0)
## lambda.r 1.1.7 2015-03-20 CRAN (R 3.2.0)
## lattice 0.20-33 2015-07-14 CRAN (R 3.2.2)
## latticeExtra 0.6-28 2016-02-09 CRAN (R 3.2.3)
## limma * 3.26.8 2016-02-12 Bioconductor
## locfit 1.5-9.1 2013-04-20 CRAN (R 3.2.0)
## magrittr 1.5 2014-11-22 CRAN (R 3.2.0)
## markdown 0.7.7 2015-04-22 CRAN (R 3.2.0)
## Matrix 1.2-3 2015-11-28 CRAN (R 3.2.2)
## matrixStats 0.50.1 2015-12-15 CRAN (R 3.2.3)
## memoise 1.0.0 2016-01-29 CRAN (R 3.2.3)
## mime 0.4 2015-09-03 CRAN (R 3.2.0)
## munsell 0.4.3 2016-02-13 CRAN (R 3.2.3)
## nnet 7.3-12 2016-02-02 CRAN (R 3.2.3)
## OrganismDbi 1.12.1 2015-12-23 Bioconductor
## pkgmaker 0.22 2014-05-14 CRAN (R 3.2.0)
## plyr 1.8.3 2015-06-12 CRAN (R 3.2.1)
## qvalue 2.2.2 2016-01-08 Bioconductor
## RBGL 1.46.0 2015-10-14 Bioconductor
## RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.2.0)
## Rcpp * 0.12.3 2016-01-10 CRAN (R 3.2.3)
## RcppArmadillo * 0.6.500.4.0 2016-01-27 CRAN (R 3.2.3)
## RCurl 1.95-4.7 2015-06-30 CRAN (R 3.2.1)
## registry 0.3 2015-07-08 CRAN (R 3.2.1)
## reshape 0.8.5 2014-04-23 CRAN (R 3.2.0)
## reshape2 1.4.1 2014-12-06 CRAN (R 3.2.0)
## rmarkdown * 0.9.2 2016-01-01 CRAN (R 3.2.3)
## rngtools 1.2.4 2014-03-06 CRAN (R 3.2.0)
## rpart 4.1-10 2015-06-29 CRAN (R 3.2.2)
## Rsamtools 1.22.0 2015-10-14 Bioconductor
## RSQLite 1.0.0 2014-10-25 CRAN (R 3.2.0)
## rtracklayer 1.30.2 2016-02-06 Bioconductor
## S4Vectors * 0.8.11 2016-01-29 Bioconductor
## scales 0.3.0 2015-08-25 CRAN (R 3.2.0)
## stringi 1.0-1 2015-10-22 CRAN (R 3.2.0)
## stringr 1.0.0 2015-04-30 CRAN (R 3.2.0)
## SummarizedExperiment * 1.0.2 2016-01-01 Bioconductor
## survival 2.38-3 2015-07-02 CRAN (R 3.2.2)
## TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2 2015-10-15 Bioconductor
## VariantAnnotation 1.16.4 2015-12-09 Bioconductor
## XML 3.98-1.3 2015-06-30 CRAN (R 3.2.0)
## xtable 1.8-2 2016-02-05 CRAN (R 3.2.3)
## XVector 0.10.0 2015-10-14 Bioconductor
## yaml 2.1.13 2014-06-12 CRAN (R 3.2.0)
## zlibbioc 1.16.0 2015-10-14 Bioconductor