This section has the code for running edgeR-robust
and DESeq2
on the simulation data set using the known genes as features. It is based on compareVsPNAS.html, counts-based.html and counts-gene-eval.html.
This first code chunk loads the necessary data.
The following code performs the DESeq2 analysis. Code is based on edgeR_Robust supplementary code. The main change is that it has been modified for the multi-group scenario.
## user system elapsed
## 32.601 0.844 33.560
The following code performs the DESeq2 analysis. Code is based on edgeR_Robust supplementary code. The main change is that it has been modified for the multi-group scenario.
## user system elapsed
## 76.999 4.804 82.360
We can first compare the results by using the DERs as the query and the genes as the subject. The following output shows the comparison using all DERs and exploring the mismatched cases. Then its repeated using the DERs ≥ 20 bp and a minimum overlap of 20bp.
For the mismatched cases of non-significant DERs overlapping a significant gene, we check:
For the other mismatched case, we check:
## Overlaps sig DE gene
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 13285 18178 31463
## TRUE 285 230 515
## Sum 13570 18408 31978
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 1026 1252 2278
## TRUE 285 230 515
## Sum 1311 1482 2793
## $n_overlaps
##
## 1 2
## 1245 7
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 23.00 29.00 30.05 36.00 62.00
##
## $ders_per_gene_table
##
## 1 2 3 4 5 6 7 8 9 10 21
## 528 127 52 24 14 10 3 3 1 2 1
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 47.00 59.00 66.76 77.00 182.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 933 16980 16990 1385000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 5925 23620 83210 65500 2651000
## DESeq2 BH
## DESeq2 Holm FALSE TRUE Sum
## FALSE 14042 7463 21505
## TRUE 0 311 311
## Sum 14042 7774 21816
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2057 221 2278
## TRUE 442 73 515
## Sum 2499 294 2793
Most of the DERs are shorter than 20bp (91.27 percent), so we'll focus on the longer ones. The majority of the mismatches are from non significant DERs that overlap a significant gene.
As expected, when controlling the FWER instead of the FDR, most of the DE genes are no longer significant. Using FWER-controlled DE genes, most of the DERs 20bp or longer agree with the genes as not being significantly DE.
We can now repeat the comparison using the genes as the query and the DERs as the subject.
For the mismatched cases of non-significant genes overlapping a significant DER, we check:
For the other mismatched case, we check:
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 13940 102 14042
## TRUE 7598 176 7774
## Sum 21538 278 21816
## $n_overlaps
##
## 1 2 3 4 6 11
## 73 21 4 2 1 1
##
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 493 1488 3452 4354 7060 13630
##
## $genes_per_der_table
##
## 1 20
## 132 1
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 2386 3770 4722 5941 118100
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 6176 31160 138800 119100 5384000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52 759100 2443000 5405000 6433000 48690000
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 21277 228 21505
## TRUE 261 50 311
## Sum 21538 278 21816
From these results, we can see that derfinder
is more conservative.
Similar comparison using DERs as query and genes as subject with edgeR-robust
results.
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 911 1367 2278
## TRUE 227 288 515
## Sum 1138 1655 2793
## $n_overlaps
##
## 1 2
## 1354 13
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 23.00 29.00 30.19 36.00 62.00
##
## $ders_per_gene_table
##
## 1 2 3 4 5 6 7 8 9 10 21
## 563 136 61 27 16 12 4 3 1 2 1
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.00 46.00 58.00 64.83 74.00 174.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 6121 21320 20470 1385000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 69 7492 24510 74750 43900 2089000
## edger BH
## edgeR Holm FALSE TRUE Sum
## FALSE 13397 7887 21284
## TRUE 0 532 532
## Sum 13397 8419 21816
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2067 211 2278
## TRUE 442 73 515
## Sum 2509 284 2793
The results are fairly similar to those from using DESeq2
.
Similar comparison using genes as query and DERs as subject with edgeR-robust
results.
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 13325 72 13397
## TRUE 8213 206 8419
## Sum 21538 278 21816
## $n_overlaps
##
## 1 2 3 4
## 57 11 2 2
##
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 493 1579 3758 4619 7095 13630
##
## $genes_per_der_table
##
## 1 19
## 74 1
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 46 2282 3631 4514 5667 118100
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 5977 31080 141900 122200 5384000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52 737100 2360000 5379000 6422000 48690000
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 21059 225 21284
## TRUE 479 53 532
## Sum 21538 278 21816
While the DERs vs genes results are fairly similar between edgeR-robust
and DESeq2
, as shown below the number of mismatched cases is high compared to the number of cases both counts-based methods agree. This is also true when controlling the FWER to determine significance.
## edgeR vs DESeq2
addmargins(table('edgeR-robust (FDR)' = mcols(edger)$sig, 'DESeq2 (FDR)' = mcols(deseq)$sig))
## DESeq2 (FDR)
## edgeR-robust (FDR) FALSE TRUE Sum
## FALSE 12877 520 13397
## TRUE 1165 7254 8419
## Sum 14042 7774 21816
## Control FWER
addmargins(table('edgeR-robust (FWER)' = mcols(edger_holm)$sig, 'DESeq2 (FWER)' = mcols(deseq_holm)$sig))
## DESeq2 (FWER)
## edgeR-robust (FWER) FALSE TRUE Sum
## FALSE 21234 50 21284
## TRUE 271 261 532
## Sum 21505 311 21816
## Only sig if both edgeR and DEseq2 say it is
both <- deseq
mcols(both)$sig <- mcols(both)$sig & mcols(edger)$sig
## Same, for holm
both_holm <- deseq_holm
mcols(both_holm)$sig <- mcols(both_holm)$sig & mcols(edger_holm)$sig
We can consider an gene to be DE only if both edgeR-robust
and DESeq2
find that its significantly DE. The next sections use this information.
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 1063 1215 2278
## TRUE 286 229 515
## Sum 1349 1444 2793
## $n_overlaps
##
## 1 2
## 1208 7
##
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.00 23.00 29.00 30.15 36.00 62.00
##
## $ders_per_gene_table
##
## 1 2 3 4 5 6 7 8 9 10 21
## 513 121 52 23 14 9 3 3 1 2 1
## $width_der
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 29.00 47.00 59.50 66.84 77.00 182.00
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 924 16930 16970 1385000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 7028 26650 89060 72640 2651000
## Both BH
## Both Holm FALSE TRUE Sum
## FALSE 14562 6993 21555
## TRUE 0 261 261
## Sum 14562 7254 21816
## Overlaps sig DE gene (min 20bp)
## Significant DER (FWER) FALSE TRUE Sum
## FALSE 2109 169 2278
## TRUE 454 61 515
## Sum 2563 230 2793
The trends observed previously are maintained in this comparison with a reduction of cases where the gene is DE. This is expected due to the non-perfect agreement between DESeq2
and edgeR-robust
.
We can now make plots to explore some DERs for each of the cases.
## $`535`
## NULL
##
## $`540`
## NULL
##
## $`625`
## NULL
##
## $`594`
## NULL
##
## $`589`
## NULL
##
## $`574`
## NULL
##
## $`610`
## NULL
##
## $`616`
## NULL
##
## $`626`
## NULL
##
## $`533`
## NULL
## pdf
## 2
## $`535`
## NULL
##
## $`540`
## NULL
##
## $`625`
## NULL
##
## $`594`
## NULL
##
## $`589`
## NULL
##
## $`574`
## NULL
##
## $`610`
## NULL
##
## $`616`
## NULL
##
## $`626`
## NULL
##
## $`533`
## NULL
As was shown with either DESeq2
or edgeR-robust
results, derfinder
is more conservative than the counts-based methods.
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 14459 103 14562
## TRUE 7079 175 7254
## Sum 21538 278 21816
## $n_overlaps
##
## 1 2 3 4 6 11
## 74 21 4 2 1 1
##
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 493 1507 3495 4369 7055 13630
##
## $genes_per_der_table
##
## 1 20
## 133 1
## $width_gene
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 54 2355 3709 4636 5822 118100
##
## $distance_nearest_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 5670 29530 137500 115400 5384000
##
## $distance_nearest_sig_sum
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 52 756600 2439000 5465000 6498000 48690000
## Overlaps sig DER (FWER, min 20bp)
## Significant DE gene FALSE TRUE Sum
## FALSE 21321 234 21555
## TRUE 217 44 261
## Sum 21538 278 21816
We can now visually explore some genes (maximum 10kb long) for each of the four cases.
## $`401505`
## NULL
##
## $`1581`
## NULL
##
## $`26168`
## NULL
##
## $`55299`
## NULL
##
## $`4708`
## NULL
##
## $`2524`
## NULL
##
## $`112703`
## NULL
##
## $`100506746`
## NULL
##
## $`54989`
## NULL
##
## $`363`
## NULL
## pdf
## 2
## $`401505`
## NULL
##
## $`1581`
## NULL
##
## $`26168`
## NULL
##
## $`55299`
## NULL
##
## $`4708`
## NULL
##
## $`2524`
## NULL
##
## $`112703`
## NULL
##
## $`100506746`
## NULL
##
## $`54989`
## NULL
##
## $`363`
## NULL
Date the report was generated.
## [1] "2015-03-30 21:24:43 EDT"
Wallclock time spent generating the report.
## Time difference of 29.945 mins
R
session information.
## setting value
## version R Under development (unstable) (2014-11-01 r66923)
## system x86_64, darwin10.8.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## tz America/New_York
## package * version date source
## acepack 1.3.3.3 2013-05-03 CRAN (R 3.2.0)
## annotate 1.45.4 2015-03-21 Bioconductor
## AnnotationDbi * 1.29.20 2015-03-19 Bioconductor
## Biobase * 2.27.3 2015-03-27 Bioconductor
## BiocGenerics * 0.13.10 2015-03-27 Bioconductor
## BiocParallel 1.1.21 2015-03-24 Bioconductor
## biomaRt 2.23.5 2014-11-22 Bioconductor
## Biostrings 2.35.12 2015-03-26 Bioconductor
## biovizBase 1.15.3 2015-03-30 Bioconductor
## bitops 1.0.6 2013-08-17 CRAN (R 3.2.0)
## BSgenome 1.35.20 2015-03-27 Bioconductor
## bumphunter 1.7.6 2015-03-13 Github (lcolladotor/bumphunter@37d10e7)
## cluster 2.0.1 2015-01-31 CRAN (R 3.2.0)
## codetools 0.2.11 2015-03-10 CRAN (R 3.2.0)
## colorout * 1.0.2 2014-11-03 local
## colorspace 1.2.6 2015-03-11 CRAN (R 3.2.0)
## DBI 0.3.1 2014-09-24 CRAN (R 3.2.0)
## derfinder * 1.1.17 2015-03-14 Github (lcolladotor/derfinder@3532e0c)
## derfinderHelper * 1.1.6 2015-03-15 Bioconductor
## derfinderPlot * 1.1.6 2015-03-14 Github (lcolladotor/derfinderPlot@1319754)
## DESeq2 * 1.7.45 2015-03-25 Bioconductor
## devtools 1.6.1 2014-10-07 CRAN (R 3.2.0)
## dichromat 2.0.0 2013-01-24 CRAN (R 3.2.0)
## digest 0.6.8 2014-12-31 CRAN (R 3.2.0)
## doRNG 1.6 2014-03-07 CRAN (R 3.2.0)
## edgeR * 3.9.14 2015-03-27 Bioconductor
## evaluate 0.5.5 2014-04-29 CRAN (R 3.2.0)
## foreach 1.4.2 2014-04-11 CRAN (R 3.2.0)
## foreign 0.8.63 2015-02-20 CRAN (R 3.2.0)
## formatR 1.0 2014-08-25 CRAN (R 3.2.0)
## Formula 1.2.0 2015-01-20 CRAN (R 3.2.0)
## futile.logger 1.4 2015-03-21 CRAN (R 3.2.0)
## futile.options 1.0.0 2010-04-06 CRAN (R 3.2.0)
## genefilter 1.49.2 2014-10-21 Bioconductor
## geneplotter 1.45.0 2014-10-14 Bioconductor
## GenomeInfoDb * 1.3.16 2015-03-27 Bioconductor
## GenomicAlignments 1.3.32 2015-03-18 Bioconductor
## GenomicFeatures * 1.19.36 2015-03-30 Bioconductor
## GenomicFiles 1.3.14 2015-03-07 Bioconductor
## GenomicRanges * 1.19.48 2015-03-27 Bioconductor
## GGally 0.4.8 2014-08-26 CRAN (R 3.2.0)
## ggbio 1.15.2 2015-03-24 Bioconductor
## ggplot2 1.0.0 2014-05-21 CRAN (R 3.2.0)
## graph 1.45.2 2015-03-01 Bioconductor
## gridExtra 0.9.1 2012-08-09 CRAN (R 3.2.0)
## gtable 0.1.2 2012-12-05 CRAN (R 3.2.0)
## Hmisc 3.14.5 2014-09-12 CRAN (R 3.2.0)
## htmltools 0.2.6 2014-09-08 CRAN (R 3.2.0)
## IRanges * 2.1.43 2015-03-07 Bioconductor
## iterators 1.0.7 2014-04-11 CRAN (R 3.2.0)
## knitr 1.7 2014-10-13 CRAN (R 3.2.0)
## knitrBootstrap 1.0.0 2014-11-03 Github (jimhester/knitrBootstrap@76c41f0)
## lambda.r 1.1.7 2015-03-20 CRAN (R 3.2.0)
## lattice 0.20.30 2015-02-22 CRAN (R 3.2.0)
## latticeExtra 0.6.26 2013-08-15 CRAN (R 3.2.0)
## limma * 3.23.11 2015-03-15 Bioconductor
## locfit 1.5.9.1 2013-04-20 CRAN (R 3.2.0)
## markdown 0.7.4 2014-08-24 CRAN (R 3.2.0)
## MASS 7.3.40 2015-03-21 CRAN (R 3.2.0)
## Matrix 1.1.5.1 2015-03-23 CRAN (R 3.2.0)
## matrixStats 0.14.0 2015-02-14 CRAN (R 3.2.0)
## mime 0.3 2015-03-29 CRAN (R 3.2.0)
## munsell 0.4.2 2013-07-11 CRAN (R 3.2.0)
## nnet 7.3.9 2015-02-11 CRAN (R 3.2.0)
## OrganismDbi 1.9.15 2015-03-30 Bioconductor
## pkgmaker 0.22 2014-05-14 CRAN (R 3.2.0)
## plyr 1.8.1 2014-02-26 CRAN (R 3.2.0)
## proto 0.3.10 2012-12-22 CRAN (R 3.2.0)
## qvalue 1.99.0 2015-03-30 Bioconductor
## RBGL 1.43.0 2014-10-14 Bioconductor
## RColorBrewer 1.1.2 2014-12-07 CRAN (R 3.2.0)
## Rcpp * 0.11.5 2015-03-06 CRAN (R 3.2.0)
## RcppArmadillo * 0.4.650.1.1 2015-02-26 CRAN (R 3.2.0)
## RCurl 1.95.4.5 2014-12-28 CRAN (R 3.2.0)
## registry 0.2 2012-01-24 CRAN (R 3.2.0)
## reshape 0.8.5 2014-04-23 CRAN (R 3.2.0)
## reshape2 1.4.1 2014-12-06 CRAN (R 3.2.0)
## rmarkdown 0.3.3 2014-09-17 CRAN (R 3.2.0)
## rngtools 1.2.4 2014-03-06 CRAN (R 3.2.0)
## rpart 4.1.9 2015-02-24 CRAN (R 3.2.0)
## Rsamtools 1.19.49 2015-03-27 Bioconductor
## RSQLite 1.0.0 2014-10-25 CRAN (R 3.2.0)
## rstudioapi 0.2 2014-12-31 CRAN (R 3.2.0)
## rtracklayer 1.27.10 2015-03-27 Bioconductor
## S4Vectors * 0.5.22 2015-03-06 Bioconductor
## scales 0.2.4 2014-04-22 CRAN (R 3.2.0)
## stringr 0.6.2 2012-12-06 CRAN (R 3.2.0)
## survival 2.38.1 2015-02-24 CRAN (R 3.2.0)
## TxDb.Hsapiens.UCSC.hg19.knownGene * 3.0.0 2014-09-26 Bioconductor
## VariantAnnotation 1.13.46 2015-03-26 Bioconductor
## XML 3.98.1.1 2013-06-20 CRAN (R 3.2.0)
## xtable 1.7.4 2014-09-12 CRAN (R 3.2.0)
## XVector * 0.7.4 2015-02-08 Bioconductor
## yaml 2.1.13 2014-06-12 CRAN (R 3.2.0)
## zlibbioc 1.13.3 2015-03-23 Bioconductor