December 9th, 2014

Pre-reqs

Docs

Why should I use derfinder?

  • You want to perform differential expression analysis at base-pair resolution
  • Because it's annotation-agnostic
  • Can handle your data set (has been used with ~500 samples)
  • Others have gotten grants partially thanks to it

tim's grant

What will I need?

  • Most likely a high performance computing environment. It'll depend on the size of your data set
  • Familiarity with other packages such as GenomicRanges
  • Experience writing bash scripts can help, although its not required

What does it do?

  • It calculates F-statistics at every base: uses nested models
  • Let \(y_{ij} = \log_2\left( \text{coverage}_{ij} + \text{scaling factor} \right)\) for position \(i\) and sample \(j\)
  • Alternative model: \[ y_{ij} = \alpha_i + \sum_{p=1}^n \beta_{ip} X_{jp} + \sum_{q=1}^m \gamma_{iq} Z_{jq} + \epsilon_{ij} \]
  • Null model doesn't have \(\beta\) terms.
  • F-statistics are given by: \[ F_i = \frac{(\text{RSS}0_i - \text{RSS}1_i) / (\text{df}_1 - \text{df}_0) }{ \text{offset} + (\text{RSS}1_i / (n - \text{df}_1)) } \]

fstats

Main functions

flow

Finding DERs

analyzechr chart

Data

Simulated data using polyester (Jaffe, Frazee, and Leek, 2014) for 3 groups, 10 samples per group, fold changes 2x and \(\frac{1}{2}\)x.

  • 24 single transcript genes
    • 12 set to be DE: 4 per group (2 high, 2 low, other groups normal)
    • 12 not DE
  • 36 two transcript genes
    • 12 not DE
    • 12 one transcript DE
    • 12 both transcripts DE

Aligned with TopHat 2.0.13. Saved coverage in BigWig files.

Load data

library('derfinder')
library('TxDb.Hsapiens.UCSC.hg19.knownGene')

## Files
files <- paste0('http://lcolladotor.github.io/derTutor/data/sample', 
    1:30, '.bw')
names(files) <- paste0('sample', 1:30)

## Load data
system.time( fullCov <- fullCoverage(files, 'chr22', verbose = FALSE) )
##    user  system elapsed 
##   4.862   0.087   9.348
## You are using Windows?
# Use http://lcolladotor.github.io/derTutor/data/fullCov.Rdata

Lib size

## Calculate library size adjustment
system.time( collapsedFull <- collapseFullCoverage(fullCov) )
##    user  system elapsed 
##   5.290   0.115   5.411
lapply(collapsedFull[[1]], head)
## $values
## [1] 0 1 2 3 4 5
## 
## $weights
## [1] 51191562      441      300      196      159      209
sampleDepths <- sampleDepth(collapsedFull, probs = 1)
## 2014-12-09 10:56:08 sampleDepth: Calculating sample quantiles
## 2014-12-09 10:56:08 sampleDepth: Calculating sample adjustments

Build models

## Create models
groupInfo <- factor(rep(c('A', 'B', 'C'), each = 10))
models <- makeModels(sampleDepths = sampleDepths, testvars = groupInfo)

Run analysis

## Filter Data
covData <- filterData(fullCov$chr22, cutoff = 0)
## 2014-12-09 10:56:09 filterData: originally there were 51304566 rows, now there are 114584 rows. Meaning that 99.78 percent was filtered.
## Run analysis for chr22
dir.create('chr22', showWarnings = FALSE)
system.time(
    res <- analyzeChr(chr = 'chr22', coverageInfo = covData, models = models, 
        cutoffFstat = 1e-03, cutoffPre = 0,
        nPermute = 100, seeds = seq_len(100) + 20141202, maxClusterGap = 3000,
        groupInfo = groupInfo, mc.cores = 1,
        lowMemDir = file.path(tempdir(), 'chr22', 'chunksDir'),
        writeOutput = TRUE, returnOutput = TRUE)
)
## 2014-12-09 10:56:09 analyzeChr: Pre-processing the coverage data
## 2014-12-09 10:56:21 analyzeChr: Calculating statistics
## 2014-12-09 10:56:21 calculateStats: calculating the F-statistics
## 2014-12-09 10:56:22 analyzeChr: Calculating pvalues
## 2014-12-09 10:56:22 analyzeChr: Using the following theoretical cutoff for the F-statistics 9.11630563808366
## 2014-12-09 10:56:22 calculatePvalues: identifying data segments
## 2014-12-09 10:56:22 findRegions: segmenting F-stats information
## 2014-12-09 10:56:22 findRegions: identifying candidate regions
## 2014-12-09 10:56:22 findRegions: identifying region clusters
## 2014-12-09 10:56:22 calculatePvalues: calculating F-statistics for permutation 1 and seed 20141203
## 2014-12-09 10:56:23 findRegions: segmenting F-stats information
## 2014-12-09 10:56:23 findRegions: identifying candidate regions
## 2014-12-09 10:56:23 calculatePvalues: calculating F-statistics for permutation 2 and seed 20141204
## 2014-12-09 10:56:24 findRegions: segmenting F-stats information
## 2014-12-09 10:56:24 findRegions: identifying candidate regions
## 2014-12-09 10:56:24 calculatePvalues: calculating F-statistics for permutation 3 and seed 20141205
## 2014-12-09 10:56:25 findRegions: segmenting F-stats information
## 2014-12-09 10:56:25 findRegions: identifying candidate regions
## 2014-12-09 10:56:25 calculatePvalues: calculating F-statistics for permutation 4 and seed 20141206
## 2014-12-09 10:56:26 findRegions: segmenting F-stats information
## 2014-12-09 10:56:26 findRegions: identifying candidate regions
## 2014-12-09 10:56:26 calculatePvalues: calculating F-statistics for permutation 5 and seed 20141207
## 2014-12-09 10:56:26 findRegions: segmenting F-stats information
## 2014-12-09 10:56:26 findRegions: identifying candidate regions
## 2014-12-09 10:56:26 calculatePvalues: calculating F-statistics for permutation 6 and seed 20141208
## 2014-12-09 10:56:27 findRegions: segmenting F-stats information
## 2014-12-09 10:56:27 findRegions: identifying candidate regions
## 2014-12-09 10:56:27 calculatePvalues: calculating F-statistics for permutation 7 and seed 20141209
## 2014-12-09 10:56:28 findRegions: segmenting F-stats information
## 2014-12-09 10:56:28 findRegions: identifying candidate regions
## 2014-12-09 10:56:28 calculatePvalues: calculating F-statistics for permutation 8 and seed 20141210
## 2014-12-09 10:56:28 findRegions: segmenting F-stats information
## 2014-12-09 10:56:28 findRegions: identifying candidate regions
## 2014-12-09 10:56:28 calculatePvalues: calculating F-statistics for permutation 9 and seed 20141211
## 2014-12-09 10:56:29 findRegions: segmenting F-stats information
## 2014-12-09 10:56:29 findRegions: identifying candidate regions
## 2014-12-09 10:56:29 calculatePvalues: calculating F-statistics for permutation 10 and seed 20141212
## 2014-12-09 10:56:30 findRegions: segmenting F-stats information
## 2014-12-09 10:56:30 findRegions: identifying candidate regions
## 2014-12-09 10:56:30 calculatePvalues: calculating F-statistics for permutation 11 and seed 20141213
## 2014-12-09 10:56:31 findRegions: segmenting F-stats information
## 2014-12-09 10:56:31 findRegions: identifying candidate regions
## 2014-12-09 10:56:31 calculatePvalues: calculating F-statistics for permutation 12 and seed 20141214
## 2014-12-09 10:56:31 findRegions: segmenting F-stats information
## 2014-12-09 10:56:31 findRegions: identifying candidate regions
## 2014-12-09 10:56:31 calculatePvalues: calculating F-statistics for permutation 13 and seed 20141215
## 2014-12-09 10:56:32 findRegions: segmenting F-stats information
## 2014-12-09 10:56:32 findRegions: identifying candidate regions
## 2014-12-09 10:56:32 calculatePvalues: calculating F-statistics for permutation 14 and seed 20141216
## 2014-12-09 10:56:33 findRegions: segmenting F-stats information
## 2014-12-09 10:56:33 findRegions: identifying candidate regions
## 2014-12-09 10:56:33 calculatePvalues: calculating F-statistics for permutation 15 and seed 20141217
## 2014-12-09 10:56:33 findRegions: segmenting F-stats information
## 2014-12-09 10:56:34 findRegions: identifying candidate regions
## 2014-12-09 10:56:34 calculatePvalues: calculating F-statistics for permutation 16 and seed 20141218
## 2014-12-09 10:56:34 findRegions: segmenting F-stats information
## 2014-12-09 10:56:34 findRegions: identifying candidate regions
## 2014-12-09 10:56:34 calculatePvalues: calculating F-statistics for permutation 17 and seed 20141219
## 2014-12-09 10:56:35 findRegions: segmenting F-stats information
## 2014-12-09 10:56:35 findRegions: identifying candidate regions
## 2014-12-09 10:56:35 calculatePvalues: calculating F-statistics for permutation 18 and seed 20141220
## 2014-12-09 10:56:36 findRegions: segmenting F-stats information
## 2014-12-09 10:56:36 findRegions: identifying candidate regions
## 2014-12-09 10:56:36 calculatePvalues: calculating F-statistics for permutation 19 and seed 20141221
## 2014-12-09 10:56:36 findRegions: segmenting F-stats information
## 2014-12-09 10:56:36 findRegions: identifying candidate regions
## 2014-12-09 10:56:36 calculatePvalues: calculating F-statistics for permutation 20 and seed 20141222
## 2014-12-09 10:56:37 findRegions: segmenting F-stats information
## 2014-12-09 10:56:37 findRegions: identifying candidate regions
## 2014-12-09 10:56:37 calculatePvalues: calculating F-statistics for permutation 21 and seed 20141223
## 2014-12-09 10:56:38 findRegions: segmenting F-stats information
## 2014-12-09 10:56:38 findRegions: identifying candidate regions
## 2014-12-09 10:56:38 calculatePvalues: calculating F-statistics for permutation 22 and seed 20141224
## 2014-12-09 10:56:39 findRegions: segmenting F-stats information
## 2014-12-09 10:56:39 findRegions: identifying candidate regions
## 2014-12-09 10:56:39 calculatePvalues: calculating F-statistics for permutation 23 and seed 20141225
## 2014-12-09 10:56:40 findRegions: segmenting F-stats information
## 2014-12-09 10:56:40 findRegions: identifying candidate regions
## 2014-12-09 10:56:40 calculatePvalues: calculating F-statistics for permutation 24 and seed 20141226
## 2014-12-09 10:56:40 findRegions: segmenting F-stats information
## 2014-12-09 10:56:40 findRegions: identifying candidate regions
## 2014-12-09 10:56:40 calculatePvalues: calculating F-statistics for permutation 25 and seed 20141227
## 2014-12-09 10:56:41 findRegions: segmenting F-stats information
## 2014-12-09 10:56:41 findRegions: identifying candidate regions
## 2014-12-09 10:56:41 calculatePvalues: calculating F-statistics for permutation 26 and seed 20141228
## 2014-12-09 10:56:42 findRegions: segmenting F-stats information
## 2014-12-09 10:56:42 findRegions: identifying candidate regions
## 2014-12-09 10:56:42 calculatePvalues: calculating F-statistics for permutation 27 and seed 20141229
## 2014-12-09 10:56:43 findRegions: segmenting F-stats information
## 2014-12-09 10:56:43 findRegions: identifying candidate regions
## 2014-12-09 10:56:43 calculatePvalues: calculating F-statistics for permutation 28 and seed 20141230
## 2014-12-09 10:56:44 findRegions: segmenting F-stats information
## 2014-12-09 10:56:44 findRegions: identifying candidate regions
## 2014-12-09 10:56:44 calculatePvalues: calculating F-statistics for permutation 29 and seed 20141231
## 2014-12-09 10:56:45 findRegions: segmenting F-stats information
## 2014-12-09 10:56:45 findRegions: identifying candidate regions
## 2014-12-09 10:56:45 calculatePvalues: calculating F-statistics for permutation 30 and seed 20141232
## 2014-12-09 10:56:45 findRegions: segmenting F-stats information
## 2014-12-09 10:56:45 findRegions: identifying candidate regions
## 2014-12-09 10:56:45 calculatePvalues: calculating F-statistics for permutation 31 and seed 20141233
## 2014-12-09 10:56:46 findRegions: segmenting F-stats information
## 2014-12-09 10:56:46 findRegions: identifying candidate regions
## 2014-12-09 10:56:46 calculatePvalues: calculating F-statistics for permutation 32 and seed 20141234
## 2014-12-09 10:56:47 findRegions: segmenting F-stats information
## 2014-12-09 10:56:47 findRegions: identifying candidate regions
## 2014-12-09 10:56:47 calculatePvalues: calculating F-statistics for permutation 33 and seed 20141235
## 2014-12-09 10:56:48 findRegions: segmenting F-stats information
## 2014-12-09 10:56:48 findRegions: identifying candidate regions
## 2014-12-09 10:56:48 calculatePvalues: calculating F-statistics for permutation 34 and seed 20141236
## 2014-12-09 10:56:49 findRegions: segmenting F-stats information
## 2014-12-09 10:56:49 findRegions: identifying candidate regions
## 2014-12-09 10:56:49 calculatePvalues: calculating F-statistics for permutation 35 and seed 20141237
## 2014-12-09 10:56:50 findRegions: segmenting F-stats information
## 2014-12-09 10:56:50 findRegions: identifying candidate regions
## 2014-12-09 10:56:50 calculatePvalues: calculating F-statistics for permutation 36 and seed 20141238
## 2014-12-09 10:56:51 findRegions: segmenting F-stats information
## 2014-12-09 10:56:51 findRegions: identifying candidate regions
## 2014-12-09 10:56:51 calculatePvalues: calculating F-statistics for permutation 37 and seed 20141239
## 2014-12-09 10:56:52 findRegions: segmenting F-stats information
## 2014-12-09 10:56:52 findRegions: identifying candidate regions
## 2014-12-09 10:56:52 calculatePvalues: calculating F-statistics for permutation 38 and seed 20141240
## 2014-12-09 10:56:53 findRegions: segmenting F-stats information
## 2014-12-09 10:56:53 findRegions: identifying candidate regions
## 2014-12-09 10:56:53 calculatePvalues: calculating F-statistics for permutation 39 and seed 20141241
## 2014-12-09 10:56:54 findRegions: segmenting F-stats information
## 2014-12-09 10:56:54 findRegions: identifying candidate regions
## 2014-12-09 10:56:54 calculatePvalues: calculating F-statistics for permutation 40 and seed 20141242
## 2014-12-09 10:56:55 findRegions: segmenting F-stats information
## 2014-12-09 10:56:55 findRegions: identifying candidate regions
## 2014-12-09 10:56:55 calculatePvalues: calculating F-statistics for permutation 41 and seed 20141243
## 2014-12-09 10:56:56 findRegions: segmenting F-stats information
## 2014-12-09 10:56:56 findRegions: identifying candidate regions
## 2014-12-09 10:56:56 calculatePvalues: calculating F-statistics for permutation 42 and seed 20141244
## 2014-12-09 10:56:57 findRegions: segmenting F-stats information
## 2014-12-09 10:56:57 findRegions: identifying candidate regions
## 2014-12-09 10:56:57 calculatePvalues: calculating F-statistics for permutation 43 and seed 20141245
## 2014-12-09 10:56:58 findRegions: segmenting F-stats information
## 2014-12-09 10:56:58 findRegions: identifying candidate regions
## 2014-12-09 10:56:58 calculatePvalues: calculating F-statistics for permutation 44 and seed 20141246
## 2014-12-09 10:56:59 findRegions: segmenting F-stats information
## 2014-12-09 10:56:59 findRegions: identifying candidate regions
## 2014-12-09 10:56:59 calculatePvalues: calculating F-statistics for permutation 45 and seed 20141247
## 2014-12-09 10:57:00 findRegions: segmenting F-stats information
## 2014-12-09 10:57:00 findRegions: identifying candidate regions
## 2014-12-09 10:57:00 calculatePvalues: calculating F-statistics for permutation 46 and seed 20141248
## 2014-12-09 10:57:01 findRegions: segmenting F-stats information
## 2014-12-09 10:57:01 findRegions: identifying candidate regions
## 2014-12-09 10:57:01 calculatePvalues: calculating F-statistics for permutation 47 and seed 20141249
## 2014-12-09 10:57:02 findRegions: segmenting F-stats information
## 2014-12-09 10:57:02 findRegions: identifying candidate regions
## 2014-12-09 10:57:02 calculatePvalues: calculating F-statistics for permutation 48 and seed 20141250
## 2014-12-09 10:57:03 findRegions: segmenting F-stats information
## 2014-12-09 10:57:03 findRegions: identifying candidate regions
## 2014-12-09 10:57:03 calculatePvalues: calculating F-statistics for permutation 49 and seed 20141251
## 2014-12-09 10:57:04 findRegions: segmenting F-stats information
## 2014-12-09 10:57:04 findRegions: identifying candidate regions
## 2014-12-09 10:57:04 calculatePvalues: calculating F-statistics for permutation 50 and seed 20141252
## 2014-12-09 10:57:05 findRegions: segmenting F-stats information
## 2014-12-09 10:57:05 findRegions: identifying candidate regions
## 2014-12-09 10:57:05 calculatePvalues: calculating F-statistics for permutation 51 and seed 20141253
## 2014-12-09 10:57:05 findRegions: segmenting F-stats information
## 2014-12-09 10:57:05 findRegions: identifying candidate regions
## 2014-12-09 10:57:05 calculatePvalues: calculating F-statistics for permutation 52 and seed 20141254
## 2014-12-09 10:57:06 findRegions: segmenting F-stats information
## 2014-12-09 10:57:06 findRegions: identifying candidate regions
## 2014-12-09 10:57:06 calculatePvalues: calculating F-statistics for permutation 53 and seed 20141255
## 2014-12-09 10:57:07 findRegions: segmenting F-stats information
## 2014-12-09 10:57:07 findRegions: identifying candidate regions
## 2014-12-09 10:57:07 calculatePvalues: calculating F-statistics for permutation 54 and seed 20141256
## 2014-12-09 10:57:07 findRegions: segmenting F-stats information
## 2014-12-09 10:57:07 findRegions: identifying candidate regions
## 2014-12-09 10:57:07 calculatePvalues: calculating F-statistics for permutation 55 and seed 20141257
## 2014-12-09 10:57:08 findRegions: segmenting F-stats information
## 2014-12-09 10:57:08 findRegions: identifying candidate regions
## 2014-12-09 10:57:08 calculatePvalues: calculating F-statistics for permutation 56 and seed 20141258
## 2014-12-09 10:57:09 findRegions: segmenting F-stats information
## 2014-12-09 10:57:09 findRegions: identifying candidate regions
## 2014-12-09 10:57:09 calculatePvalues: calculating F-statistics for permutation 57 and seed 20141259
## 2014-12-09 10:57:09 findRegions: segmenting F-stats information
## 2014-12-09 10:57:09 findRegions: identifying candidate regions
## 2014-12-09 10:57:09 calculatePvalues: calculating F-statistics for permutation 58 and seed 20141260
## 2014-12-09 10:57:10 findRegions: segmenting F-stats information
## 2014-12-09 10:57:10 findRegions: identifying candidate regions
## 2014-12-09 10:57:10 calculatePvalues: calculating F-statistics for permutation 59 and seed 20141261
## 2014-12-09 10:57:11 findRegions: segmenting F-stats information
## 2014-12-09 10:57:11 findRegions: identifying candidate regions
## 2014-12-09 10:57:11 calculatePvalues: calculating F-statistics for permutation 60 and seed 20141262
## 2014-12-09 10:57:11 findRegions: segmenting F-stats information
## 2014-12-09 10:57:11 findRegions: identifying candidate regions
## 2014-12-09 10:57:11 calculatePvalues: calculating F-statistics for permutation 61 and seed 20141263
## 2014-12-09 10:57:12 findRegions: segmenting F-stats information
## 2014-12-09 10:57:12 findRegions: identifying candidate regions
## 2014-12-09 10:57:12 calculatePvalues: calculating F-statistics for permutation 62 and seed 20141264
## 2014-12-09 10:57:13 findRegions: segmenting F-stats information
## 2014-12-09 10:57:13 findRegions: identifying candidate regions
## 2014-12-09 10:57:13 calculatePvalues: calculating F-statistics for permutation 63 and seed 20141265
## 2014-12-09 10:57:14 findRegions: segmenting F-stats information
## 2014-12-09 10:57:14 findRegions: identifying candidate regions
## 2014-12-09 10:57:14 calculatePvalues: calculating F-statistics for permutation 64 and seed 20141266
## 2014-12-09 10:57:15 findRegions: segmenting F-stats information
## 2014-12-09 10:57:15 findRegions: identifying candidate regions
## 2014-12-09 10:57:15 calculatePvalues: calculating F-statistics for permutation 65 and seed 20141267
## 2014-12-09 10:57:15 findRegions: segmenting F-stats information
## 2014-12-09 10:57:15 findRegions: identifying candidate regions
## 2014-12-09 10:57:15 calculatePvalues: calculating F-statistics for permutation 66 and seed 20141268
## 2014-12-09 10:57:16 findRegions: segmenting F-stats information
## 2014-12-09 10:57:16 findRegions: identifying candidate regions
## 2014-12-09 10:57:16 calculatePvalues: calculating F-statistics for permutation 67 and seed 20141269
## 2014-12-09 10:57:17 findRegions: segmenting F-stats information
## 2014-12-09 10:57:17 findRegions: identifying candidate regions
## 2014-12-09 10:57:17 calculatePvalues: calculating F-statistics for permutation 68 and seed 20141270
## 2014-12-09 10:57:17 findRegions: segmenting F-stats information
## 2014-12-09 10:57:17 findRegions: identifying candidate regions
## 2014-12-09 10:57:17 calculatePvalues: calculating F-statistics for permutation 69 and seed 20141271
## 2014-12-09 10:57:18 findRegions: segmenting F-stats information
## 2014-12-09 10:57:18 findRegions: identifying candidate regions
## 2014-12-09 10:57:18 calculatePvalues: calculating F-statistics for permutation 70 and seed 20141272
## 2014-12-09 10:57:19 findRegions: segmenting F-stats information
## 2014-12-09 10:57:19 findRegions: identifying candidate regions
## 2014-12-09 10:57:19 calculatePvalues: calculating F-statistics for permutation 71 and seed 20141273
## 2014-12-09 10:57:19 findRegions: segmenting F-stats information
## 2014-12-09 10:57:19 findRegions: identifying candidate regions
## 2014-12-09 10:57:19 calculatePvalues: calculating F-statistics for permutation 72 and seed 20141274
## 2014-12-09 10:57:20 findRegions: segmenting F-stats information
## 2014-12-09 10:57:20 findRegions: identifying candidate regions
## 2014-12-09 10:57:20 calculatePvalues: calculating F-statistics for permutation 73 and seed 20141275
## 2014-12-09 10:57:21 findRegions: segmenting F-stats information
## 2014-12-09 10:57:21 findRegions: identifying candidate regions
## 2014-12-09 10:57:21 calculatePvalues: calculating F-statistics for permutation 74 and seed 20141276
## 2014-12-09 10:57:22 findRegions: segmenting F-stats information
## 2014-12-09 10:57:22 findRegions: identifying candidate regions
## 2014-12-09 10:57:22 calculatePvalues: calculating F-statistics for permutation 75 and seed 20141277
## 2014-12-09 10:57:22 findRegions: segmenting F-stats information
## 2014-12-09 10:57:22 findRegions: identifying candidate regions
## 2014-12-09 10:57:22 calculatePvalues: calculating F-statistics for permutation 76 and seed 20141278
## 2014-12-09 10:57:23 findRegions: segmenting F-stats information
## 2014-12-09 10:57:23 findRegions: identifying candidate regions
## 2014-12-09 10:57:23 calculatePvalues: calculating F-statistics for permutation 77 and seed 20141279
## 2014-12-09 10:57:23 findRegions: segmenting F-stats information
## 2014-12-09 10:57:24 findRegions: identifying candidate regions
## 2014-12-09 10:57:24 calculatePvalues: calculating F-statistics for permutation 78 and seed 20141280
## 2014-12-09 10:57:24 findRegions: segmenting F-stats information
## 2014-12-09 10:57:24 findRegions: identifying candidate regions
## 2014-12-09 10:57:24 calculatePvalues: calculating F-statistics for permutation 79 and seed 20141281
## 2014-12-09 10:57:25 findRegions: segmenting F-stats information
## 2014-12-09 10:57:25 findRegions: identifying candidate regions
## 2014-12-09 10:57:25 calculatePvalues: calculating F-statistics for permutation 80 and seed 20141282
## 2014-12-09 10:57:26 findRegions: segmenting F-stats information
## 2014-12-09 10:57:26 findRegions: identifying candidate regions
## 2014-12-09 10:57:26 calculatePvalues: calculating F-statistics for permutation 81 and seed 20141283
## 2014-12-09 10:57:26 findRegions: segmenting F-stats information
## 2014-12-09 10:57:26 findRegions: identifying candidate regions
## 2014-12-09 10:57:26 calculatePvalues: calculating F-statistics for permutation 82 and seed 20141284
## 2014-12-09 10:57:27 findRegions: segmenting F-stats information
## 2014-12-09 10:57:27 findRegions: identifying candidate regions
## 2014-12-09 10:57:27 calculatePvalues: calculating F-statistics for permutation 83 and seed 20141285
## 2014-12-09 10:57:28 findRegions: segmenting F-stats information
## 2014-12-09 10:57:28 findRegions: identifying candidate regions
## 2014-12-09 10:57:28 calculatePvalues: calculating F-statistics for permutation 84 and seed 20141286
## 2014-12-09 10:57:28 findRegions: segmenting F-stats information
## 2014-12-09 10:57:28 findRegions: identifying candidate regions
## 2014-12-09 10:57:28 calculatePvalues: calculating F-statistics for permutation 85 and seed 20141287
## 2014-12-09 10:57:29 findRegions: segmenting F-stats information
## 2014-12-09 10:57:29 findRegions: identifying candidate regions
## 2014-12-09 10:57:29 calculatePvalues: calculating F-statistics for permutation 86 and seed 20141288
## 2014-12-09 10:57:30 findRegions: segmenting F-stats information
## 2014-12-09 10:57:30 findRegions: identifying candidate regions
## 2014-12-09 10:57:30 calculatePvalues: calculating F-statistics for permutation 87 and seed 20141289
## 2014-12-09 10:57:31 findRegions: segmenting F-stats information
## 2014-12-09 10:57:31 findRegions: identifying candidate regions
## 2014-12-09 10:57:31 calculatePvalues: calculating F-statistics for permutation 88 and seed 20141290
## 2014-12-09 10:57:31 findRegions: segmenting F-stats information
## 2014-12-09 10:57:31 findRegions: identifying candidate regions
## 2014-12-09 10:57:31 calculatePvalues: calculating F-statistics for permutation 89 and seed 20141291
## 2014-12-09 10:57:32 findRegions: segmenting F-stats information
## 2014-12-09 10:57:32 findRegions: identifying candidate regions
## 2014-12-09 10:57:32 calculatePvalues: calculating F-statistics for permutation 90 and seed 20141292
## 2014-12-09 10:57:33 findRegions: segmenting F-stats information
## 2014-12-09 10:57:33 findRegions: identifying candidate regions
## 2014-12-09 10:57:33 calculatePvalues: calculating F-statistics for permutation 91 and seed 20141293
## 2014-12-09 10:57:33 findRegions: segmenting F-stats information
## 2014-12-09 10:57:33 findRegions: identifying candidate regions
## 2014-12-09 10:57:33 calculatePvalues: calculating F-statistics for permutation 92 and seed 20141294
## 2014-12-09 10:57:34 findRegions: segmenting F-stats information
## 2014-12-09 10:57:34 findRegions: identifying candidate regions
## 2014-12-09 10:57:34 calculatePvalues: calculating F-statistics for permutation 93 and seed 20141295
## 2014-12-09 10:57:35 findRegions: segmenting F-stats information
## 2014-12-09 10:57:35 findRegions: identifying candidate regions
## 2014-12-09 10:57:35 calculatePvalues: calculating F-statistics for permutation 94 and seed 20141296
## 2014-12-09 10:57:35 findRegions: segmenting F-stats information
## 2014-12-09 10:57:35 findRegions: identifying candidate regions
## 2014-12-09 10:57:35 calculatePvalues: calculating F-statistics for permutation 95 and seed 20141297
## 2014-12-09 10:57:36 findRegions: segmenting F-stats information
## 2014-12-09 10:57:36 findRegions: identifying candidate regions
## 2014-12-09 10:57:36 calculatePvalues: calculating F-statistics for permutation 96 and seed 20141298
## 2014-12-09 10:57:37 findRegions: segmenting F-stats information
## 2014-12-09 10:57:37 findRegions: identifying candidate regions
## 2014-12-09 10:57:37 calculatePvalues: calculating F-statistics for permutation 97 and seed 20141299
## 2014-12-09 10:57:38 findRegions: segmenting F-stats information
## 2014-12-09 10:57:38 findRegions: identifying candidate regions
## 2014-12-09 10:57:38 calculatePvalues: calculating F-statistics for permutation 98 and seed 20141300
## 2014-12-09 10:57:38 findRegions: segmenting F-stats information
## 2014-12-09 10:57:38 findRegions: identifying candidate regions
## 2014-12-09 10:57:39 calculatePvalues: calculating F-statistics for permutation 99 and seed 20141301
## 2014-12-09 10:57:39 findRegions: segmenting F-stats information
## 2014-12-09 10:57:39 findRegions: identifying candidate regions
## 2014-12-09 10:57:39 calculatePvalues: calculating F-statistics for permutation 100 and seed 20141302
## 2014-12-09 10:57:40 findRegions: segmenting F-stats information
## 2014-12-09 10:57:40 findRegions: identifying candidate regions
## 2014-12-09 10:57:40 calculatePvalues: calculating the p-values
## 2014-12-09 10:57:40 analyzeChr: Annotating regions
## Matching regions to genes.
## nearestgene: loading bumphunter hg19 transcript database
## finding nearest transcripts...
## AnnotatingDone.
##    user  system elapsed 
## 101.531   8.735 110.403

Explore results

names(res)
## [1] "timeinfo"     "optionsStats" "coveragePrep" "fstats"      
## [5] "regions"      "annotation"
dir('chr22')
## [1] "annotation.Rdata"   "coveragePrep.Rdata" "fstats.Rdata"      
## [4] "optionsStats.Rdata" "regions.Rdata"      "timeinfo.Rdata"
names(res$coveragePrep)
## [1] "coverageProcessed" "mclapplyIndex"     "position"         
## [4] "meanCoverage"      "groupMeans"
names(res$regions)
## [1] "regions"         "nullStats"       "nullWidths"      "nullPermutation"

Merge

## Genomic state
system.time(gs <- makeGenomicState(txdb=TxDb.Hsapiens.UCSC.hg19.knownGene, chrs='chr22'))
##    user  system elapsed 
##  21.116   1.128  22.262
## Merge results from different chrs
mergeResults('chr22', genomicState = gs$fullGenome, optionsStats = res$optionsStats)
## 2014-12-09 10:58:22 mergeResults: Saving options used
## 2014-12-09 10:58:22 Loading chromosome chr22
## 2014-12-09 10:58:22 mergeResults: calculating FWER
## 2014-12-09 10:58:22 mergeResults: Saving fullNullSummary
## 2014-12-09 10:58:22 mergeResults: Re-calculating the p-values
## 2014-12-09 10:58:22 mergeResults: Saving fullRegions
## 2014-12-09 10:58:23 mergeResults: assigning genomic states
## 2014-12-09 10:58:23 annotateRegions: counting
## 2014-12-09 10:58:23 annotateRegions: annotating
## 2014-12-09 10:58:23 mergeResults: Saving fullAnnotatedRegions
## 2014-12-09 10:58:23 mergeResults: Saving fullFstats
## 2014-12-09 10:58:23 mergeResults: Saving fullTime

Main results

dir(pattern = 'Rdata')
## [1] "fullAnnotatedRegions.Rdata" "fullFstats.Rdata"          
## [3] "fullNullSummary.Rdata"      "fullRegions.Rdata"         
## [5] "fullTime.Rdata"             "optionsMerge.Rdata"
load('fullRegions.Rdata')
class(fullRegions)
## [1] "GRanges"
## attr(,"package")
## [1] "GenomicRanges"
length(fullRegions)
## [1] 469

colnames(mcols(fullRegions))
##  [1] "value"              "area"               "indexStart"        
##  [4] "indexEnd"           "cluster"            "clusterL"          
##  [7] "meanCoverage"       "meanA"              "meanB"             
## [10] "meanC"              "log2FoldChangeBvsA" "log2FoldChangeCvsA"
## [13] "pvalues"            "significant"        "qvalues"           
## [16] "significantQval"    "name"               "annotation"        
## [19] "description"        "region"             "distance"          
## [22] "subregion"          "insidedistance"     "exonnumber"        
## [25] "nexons"             "UTR"                "annoStrand"        
## [28] "geneL"              "codingL"            "fwer"              
## [31] "significantFWER"
table(fullRegions$significantFWER)
## 
##  TRUE FALSE 
##   122   347

Need help?

Citing derfinder

## Citation info
citation('derfinder')
## 
## Collado-Torres L, Frazee AC, Jaffe AE and Leek JT (2014).
## _derfinder: Annotation-agnostic differential expression analysis
## of RNA-seq data at base-pair resolution_.
## https://github.com/lcolladotor/derfinder - R package version
## 1.1.14, <URL:
## http://www.bioconductor.org/packages/release/bioc/html/derfinder.html>.
## 
## Frazee AC, Sabunciyan S, Hansen KD, Irizarry RA and Leek JT
## (2014). "Differential expression analysis of RNA-seq data at
## single-base resolution." _Biostatistics_, *15 (3)*, pp. 413-426.
## <URL: http://dx.doi.org/10.1093/biostatistics/kxt053>, <URL:
## http://biostatistics.oxfordjournals.org/content/15/3/413.long>.

Reproducibility

Code for creating this page

## Create this page
library('rmarkdown')
render('index.Rmd')

## Clean up
file.remove('derTutorRef.bib')

## Extract the R code
library('knitr')
knit('index.Rmd', tangle = TRUE)

Date this tutorial was generated.

## [1] "2014-12-09 10:58:23 EST"

Wallclock time spent running this tutorial.

## Time difference of 2.733 mins

R session information.

##  setting  value                                             
##  version  R Under development (unstable) (2014-11-01 r66923)
##  system   x86_64, darwin10.8.0                              
##  ui       X11                                               
##  language (EN)                                              
##  collate  en_US.UTF-8                                       
##  tz       America/New_York

##  package                           * version date       source                                 
##  AnnotationDbi                     * 1.29.10 2014-11-22 Bioconductor                           
##  BiocParallel                        1.1.9   2014-11-24 Bioconductor                           
##  bumphunter                          1.7.2   2014-11-19 Bioconductor                           
##  derfinder                         * 1.1.14  2014-11-22 Github (lcolladotor/derfinder@24f9fbb) 
##  derfinderHelper                     1.1.5   2014-11-05 Bioconductor                           
##  GenomeInfoDb                      * 1.3.7   2014-11-15 Bioconductor                           
##  GenomicAlignments                   1.3.14  2014-12-05 Bioconductor                           
##  GenomicFeatures                   * 1.19.6  2014-11-04 Bioconductor                           
##  GenomicFiles                        1.3.8   2014-11-12 Bioconductor                           
##  GenomicRanges                     * 1.19.20 2014-12-05 Bioconductor                           
##  Hmisc                               3.14.5  2014-09-12 CRAN (R 3.2.0)                         
##  IRanges                           * 2.1.28  2014-12-05 Bioconductor                           
##  knitcitations                     * 1.0.4   2014-11-03 Github (cboettig/knitcitations@508de74)
##  Matrix                              1.1.4   2014-06-15 CRAN (R 3.2.0)                         
##  qvalue                              1.41.0  2014-10-14 Bioconductor                           
##  rmarkdown                         * 0.3.3   2014-09-17 CRAN (R 3.2.0)                         
##  Rsamtools                           1.19.11 2014-11-26 Bioconductor                           
##  rtracklayer                         1.27.6  2014-11-26 Bioconductor                           
##  S4Vectors                         * 0.5.14  2014-12-05 Bioconductor                           
##  TxDb.Hsapiens.UCSC.hg19.knownGene * 3.0.0   2014-09-26 Bioconductor

Bibliography

This tutorial was generated using rmarkdown (Allaire, McPherson, Xie, Wickham, et al., 2014) and knitcitations (Boettiger, 2014).

[1] J. Allaire, J. McPherson, Y. Xie, H. Wickham, et al. rmarkdown: Dynamic Documents for R. R package version 0.3.3. 2014. URL: http://CRAN.R-project.org/package=rmarkdown.

[2] C. Boettiger. knitcitations: Citations for knitr markdown files. R package version 1.0.4. 2014. URL: https://github.com/cboettig/knitcitations.

[3] L. Collado-Torres, A. C. Frazee, A. E. Jaffe and J. T. Leek. derfinder: Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution. https://github.com/lcolladotor/derfinder - R package version 1.1.14. 2014. URL: http://www.bioconductor.org/packages/release/bioc/html/derfinder.html.

[4] A. Jaffe, A. Frazee and J. Leek. polyester: Simulate RNA-seq reads. R package version 1.1.0. 2014.