vignettes/derfinderData.Rmd
derfinderData.Rmd
derfinderData
is a small data package with information
extracted from BrainSpan (see here)
(BrainSpan, 2011) for 24 samples restricted to chromosome 21. The BigWig
files in this package can then be used by other packages for examples,
such as in derfinder
and derfinderPlot
.
While you could download the data from BrainSpan (BrainSpan, 2011), this package is helpful for scenarios where you might encounter some difficulties such as the one described in this thread.
The following code builds the phenotype table included in
derfinderData
. For two randomly selected structures, 12
samples were chosen with 6 of them being fetal samples and the other 6
coming from adult individuals. For the fetal samples, the age in PCW is
transformed into age in years by
age_in_years = (age_in_PCW - 40) / 52
In other data sets you might want to subtract 42 instead of 40 if some observations have PCW up to 42.
## Construct brainspanPheno table
brainspanPheno <- data.frame(
gender = c("F", "M", "M", "M", "F", "F", "F", "M", "F", "M", "M", "F", "M", "M", "M", "M", "F", "F", "F", "M", "F", "M", "M", "F"),
lab = c("HSB97.AMY", "HSB92.AMY", "HSB178.AMY", "HSB159.AMY", "HSB153.AMY", "HSB113.AMY", "HSB130.AMY", "HSB136.AMY", "HSB126.AMY", "HSB145.AMY", "HSB123.AMY", "HSB135.AMY", "HSB114.A1C", "HSB103.A1C", "HSB178.A1C", "HSB154.A1C", "HSB150.A1C", "HSB149.A1C", "HSB130.A1C", "HSB136.A1C", "HSB126.A1C", "HSB145.A1C", "HSB123.A1C", "HSB135.A1C"),
Age = c(-0.442307692307693, -0.365384615384615, -0.461538461538461, -0.307692307692308, -0.538461538461539, -0.538461538461539, 21, 23, 30, 36, 37, 40, -0.519230769230769, -0.519230769230769, -0.461538461538461, -0.461538461538461, -0.538461538461539, -0.519230769230769, 21, 23, 30, 36, 37, 40)
)
brainspanPheno$structure_acronym <- rep(c("AMY", "A1C"), each = 12)
brainspanPheno$structure_name <- rep(c("amygdaloid complex", "primary auditory cortex (core)"), each = 12)
brainspanPheno$file <- paste0("http://download.alleninstitute.org/brainspan/MRF_BigWig_Gencode_v10/bigwig/", brainspanPheno$lab, ".bw")
brainspanPheno$group <- factor(ifelse(brainspanPheno$Age < 0, "fetal", "adult"), levels = c("fetal", "adult"))
We can then save the phenotype information, which is included in
derfinderData
.
## Save pheno table
save(brainspanPheno, file = "brainspanPheno.RData")
Here is how the data looks like:
library("knitr")
## Explore pheno
p <- brainspanPheno[, -which(colnames(brainspanPheno) %in% c("structure_acronym", "structure_name", "file"))]
kable(p, format = "html", row.names = TRUE)
gender | lab | Age | group | |
---|---|---|---|---|
1 | F | HSB97.AMY | -0.4423077 | fetal |
2 | M | HSB92.AMY | -0.3653846 | fetal |
3 | M | HSB178.AMY | -0.4615385 | fetal |
4 | M | HSB159.AMY | -0.3076923 | fetal |
5 | F | HSB153.AMY | -0.5384615 | fetal |
6 | F | HSB113.AMY | -0.5384615 | fetal |
7 | F | HSB130.AMY | 21.0000000 | adult |
8 | M | HSB136.AMY | 23.0000000 | adult |
9 | F | HSB126.AMY | 30.0000000 | adult |
10 | M | HSB145.AMY | 36.0000000 | adult |
11 | M | HSB123.AMY | 37.0000000 | adult |
12 | F | HSB135.AMY | 40.0000000 | adult |
13 | M | HSB114.A1C | -0.5192308 | fetal |
14 | M | HSB103.A1C | -0.5192308 | fetal |
15 | M | HSB178.A1C | -0.4615385 | fetal |
16 | M | HSB154.A1C | -0.4615385 | fetal |
17 | F | HSB150.A1C | -0.5384615 | fetal |
18 | F | HSB149.A1C | -0.5192308 | fetal |
19 | F | HSB130.A1C | 21.0000000 | adult |
20 | M | HSB136.A1C | 23.0000000 | adult |
21 | F | HSB126.A1C | 30.0000000 | adult |
22 | M | HSB145.A1C | 36.0000000 | adult |
23 | M | HSB123.A1C | 37.0000000 | adult |
24 | F | HSB135.A1C | 40.0000000 | adult |
We can verify that this is indeed the information included in
derfinderData
.
## Rename our newly created pheno data
newPheno <- brainspanPheno
## Load the included data
library("derfinderData")
##
## Attaching package: 'derfinderData'
## The following object is masked _by_ '.GlobalEnv':
##
## brainspanPheno
## Verify
identical(newPheno, brainspanPheno)
## [1] TRUE
Using the phenotype information, you can use derfinder
to extract the base-level coverage information for chromosome 21 from
these samples. Then, you can export the data to BigWig files.
library("derfinder")
## Determine the files to use and fix the names
files <- brainspanPheno$file
names(files) <- gsub(".AMY|.A1C", "", brainspanPheno$lab)
## Load the data
system.time(fullCovAMY <- fullCoverage(
files = files[brainspanPheno$structure_acronym == "AMY"], chrs = "chr21"
))
# user system elapsed
# 4.505 0.178 37.676
system.time(fullCovA1C <- fullCoverage(
files = files[brainspanPheno$structure_acronym == "A1C"], chrs = "chr21"
))
# user system elapsed
# 2.968 0.139 27.704
## Write BigWig files
dir.create("AMY")
system.time(createBw(fullCovAMY, path = "AMY", keepGR = FALSE))
# user system elapsed
# 5.749 0.332 6.045
dir.create("A1C")
system.time(createBw(fullCovA1C, path = "A1C", keepGR = FALSE))
# user system elapsed
# 5.025 0.299 5.323
## Check that 12 files were created in each directory
all(c(length(dir("AMY")), length(dir("A1C"))) == 12)
# TRUE
## Save data for examples running on Windows
save(fullCovAMY, file = "fullCovAMY.RData")
save(fullCovA1C, file = "fullCovA1C.RData")
These BigWig files are available under extdata as shown below:
## Find AMY BigWigs
dir(system.file("extdata", "AMY", package = "derfinderData"))
## [1] "HSB113.bw" "HSB123.bw" "HSB126.bw" "HSB130.bw" "HSB135.bw" "HSB136.bw"
## [7] "HSB145.bw" "HSB153.bw" "HSB159.bw" "HSB178.bw" "HSB92.bw" "HSB97.bw"
## Find A1C BigWigs
dir(system.file("extdata", "A1C", package = "derfinderData"))
## [1] "HSB103.bw" "HSB114.bw" "HSB123.bw" "HSB126.bw" "HSB130.bw" "HSB135.bw"
## [7] "HSB136.bw" "HSB145.bw" "HSB149.bw" "HSB150.bw" "HSB154.bw" "HSB178.bw"
Code for creating the vignette
## Create the vignette
library("rmarkdown")
system.time(render("derfinderData.Rmd", "BiocStyle::html_document"))
## Extract the R code
library("knitr")
knit("derfinderData.Rmd", tangle = TRUE)
Date the vignette was generated.
## [1] "2023-05-07 05:34:26 UTC"
Wallclock time spent generating the vignette.
## Time difference of 1.357 secs
R
session information.
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 4.3.0 (2023-04-21)
## os Ubuntu 22.04.2 LTS
## system x86_64, linux-gnu
## ui X11
## language en
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz UTC
## date 2023-05-07
## pandoc 2.19.2 @ /usr/local/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date (UTC) lib source
## backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
## bibtex 0.5.1 2023-01-26 [1] RSPM (R 4.3.0)
## BiocManager 1.30.20 2023-02-24 [2] CRAN (R 4.3.0)
## BiocStyle * 2.28.0 2023-04-25 [1] Bioconductor
## bookdown 0.33 2023-03-06 [1] RSPM (R 4.3.0)
## bslib 0.4.2 2022-12-16 [2] RSPM (R 4.3.0)
## cachem 1.0.8 2023-05-01 [2] RSPM (R 4.3.0)
## cli 3.6.1 2023-03-23 [2] RSPM (R 4.3.0)
## derfinderData * 2.19.0 2023-05-07 [1] Bioconductor
## desc 1.4.2 2022-09-08 [2] RSPM (R 4.3.0)
## digest 0.6.31 2022-12-11 [2] RSPM (R 4.3.0)
## evaluate 0.20 2023-01-17 [2] RSPM (R 4.3.0)
## fastmap 1.1.1 2023-02-24 [2] RSPM (R 4.3.0)
## fs 1.6.2 2023-04-25 [2] RSPM (R 4.3.0)
## generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
## glue 1.6.2 2022-02-24 [2] RSPM (R 4.3.0)
## highr 0.10 2022-12-22 [2] RSPM (R 4.3.0)
## htmltools 0.5.5 2023-03-23 [2] RSPM (R 4.3.0)
## httr 1.4.5 2023-02-24 [2] RSPM (R 4.3.0)
## jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.3.0)
## jsonlite 1.8.4 2022-12-06 [2] RSPM (R 4.3.0)
## knitr * 1.42 2023-01-25 [2] RSPM (R 4.3.0)
## lifecycle 1.0.3 2022-10-07 [2] RSPM (R 4.3.0)
## lubridate 1.9.2 2023-02-10 [1] RSPM (R 4.3.0)
## magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.3.0)
## memoise 2.0.1 2021-11-26 [2] RSPM (R 4.3.0)
## pkgdown 2.0.7 2022-12-14 [2] RSPM (R 4.3.0)
## plyr 1.8.8 2022-11-11 [1] CRAN (R 4.3.0)
## purrr 1.0.1 2023-01-10 [2] RSPM (R 4.3.0)
## R6 2.5.1 2021-08-19 [2] RSPM (R 4.3.0)
## ragg 1.2.5 2023-01-12 [2] RSPM (R 4.3.0)
## Rcpp 1.0.10 2023-01-22 [2] RSPM (R 4.3.0)
## RefManageR * 1.4.0 2022-09-30 [1] CRAN (R 4.3.0)
## rlang 1.1.1 2023-04-28 [2] RSPM (R 4.3.0)
## rmarkdown 2.21 2023-03-26 [2] RSPM (R 4.3.0)
## rprojroot 2.0.3 2022-04-02 [2] RSPM (R 4.3.0)
## sass 0.4.6 2023-05-03 [2] RSPM (R 4.3.0)
## sessioninfo * 1.2.2 2021-12-06 [2] RSPM (R 4.3.0)
## stringi 1.7.12 2023-01-11 [2] RSPM (R 4.3.0)
## stringr 1.5.0 2022-12-02 [2] RSPM (R 4.3.0)
## systemfonts 1.0.4 2022-02-11 [2] RSPM (R 4.3.0)
## textshaping 0.3.6 2021-10-13 [2] RSPM (R 4.3.0)
## timechange 0.2.0 2023-01-11 [1] RSPM (R 4.3.0)
## vctrs 0.6.2 2023-04-19 [2] RSPM (R 4.3.0)
## xfun 0.39 2023-04-20 [2] RSPM (R 4.3.0)
## xml2 1.3.4 2023-04-27 [2] RSPM (R 4.3.0)
## yaml 2.3.7 2023-01-23 [2] RSPM (R 4.3.0)
##
## [1] /__w/_temp/Library
## [2] /usr/local/lib/R/site-library
## [3] /usr/local/lib/R/library
##
## ──────────────────────────────────────────────────────────────────────────────
This vignette was generated using BiocStyle
(Oleś, 2023)
with knitr
(Xie, 2014) and rmarkdown
(Allaire,
Xie, Dervieux, McPherson, Luraschi, Ushey, Atkins, Wickham, Cheng,
Chang, and Iannone, 2023) running behind the scenes.
Citations made with knitcitations
.
[1] J. Allaire, Y. Xie, C. Dervieux, et al. rmarkdown: Dynamic Documents for R. R package version 2.21. 2023. URL: https://github.com/rstudio/rmarkdown.
[2] BrainSpan. “Atlas of the Developing Human Brain [Internet]. Funded by ARRA Awards 1RC2MH089921-01, 1RC2MH090047-01, and 1RC2MH089929-01.” 2011. URL: http://developinghumanbrain.org.
[3] A. Oleś. BiocStyle: Standard styles for vignettes and other Bioconductor documents. R package version 2.28.0. 2023. DOI: 10.18129/B9.bioc.BiocStyle. URL: https://bioconductor.org/packages/BiocStyle.
[4] Y. Xie. “knitr: A Comprehensive Tool for Reproducible Research in R”. In: Implementing Reproducible Computational Research. Ed. by V. Stodden, F. Leisch and R. D. Peng. ISBN 978-1466561595. Chapman and Hall/CRC, 2014.