Download data for a given SRA study id from the recount project

Download the gene or exon level RangedSummarizedExperiment-class objects provided by the recount project. Alternatively download the counts, metadata or file information for a given SRA study id. You can also download the sample bigWig files or the mean coverage bigWig file.

download_study(
  project,
  type = "rse-gene",
  outdir = project,
  download = TRUE,
  version = 2,
  ...
)

Arguments

project

A character vector with one SRA study id.

type

Specifies which files to download. The options are:

rse-gene: the gene-level RangedSummarizedExperiment-class object in a file named rse_gene.Rdata.
rse-exon: the exon-level RangedSummarizedExperiment-class object in a file named rse_exon.Rdata.
rse-jx: the exon-exon junction level RangedSummarizedExperiment-class object in a file named rse_jx.Rdata.
rse-tx: the transcript level RangedSummarizedExperiment-class object in a file named rse_tx.RData.
counts-gene: the gene-level counts in a tsv file named counts_gene.tsv.gz.
counts-exon: the exon-level counts in a tsv file named counts_exon.tsv.gz.
counts-jx: the exon-exon junction level counts in a tsv file named counts_jx.tsv.gz.
phenotype: the phenotype data for the study in a tsv file named project.tsv.
files-info: the files information for the given study (including md5sum hashes) in a tsv file named files_info.tsv.
samples: one bigWig file per sample in the study.
mean: one mean bigWig file for the samples in the study, with each sample normalized to a 40 million 100 bp library using the total coverage sum (area under the coverage curve, AUC) for the given sample.
all: Downloads all the above types. Note that it might take some time if the project has many samples. When using type = 'all' a small delay will be added before each download request to avoid request issues.
rse-fc: Downloads the FANTOM-CAT/recount2 rse file described in Imada, Sanchez, et al., bioRxiv, 2019.

outdir

The destination directory for the downloaded file(s). Alternatively check the SciServer section on the vignette to see how to access all the recount data via a R Jupyter Notebook.

download

Whether to download the files or just get the download urls.

version

A single integer specifying which version of the files to download. Valid options are 1 and 2, as described in https://jhubiostatistics.shinyapps.io/recount/ under the documentation tab. Briefly, version 1 are counts based on reduced exons while version 2 are based on disjoint exons. This argument mostly just matters for the exon counts. Defaults to version 2 (disjoint exons). Use version = 1 for backward compatability with exon counts prior to version 1.5.3 of the package.

...

Additional arguments passed to download.

Value

Returns invisibly the URL(s) for the files that were downloaded.

Details

Check http://stackoverflow.com/a/34383991 if you need to find the effective URLs. For example, http://duffel.rail.bio/recount/DRP000366/bw/mean_DRP000366.bw points to a link from SciServer.

Transcript quantifications are described in Fu et al, bioRxiv, 2018. https://www.biorxiv.org/content/10.1101/247346v2

FANTOM-CAT/recount2 quantifications are described in Imada, Sanchez, et al., bioRxiv, 2019. https://www.biorxiv.org/content/10.1101/659490v1

Author

Leonardo Collado-Torres

Examples

## Find the URL to download the RangedSummarizedExperiment for the
## Geuvadis consortium study.
url <- download_study("ERP001942", download = FALSE)

## See the actual URL
url
#> [1] "http://duffel.rail.bio/recount/v2/ERP001942/rse_gene.Rdata"
if (FALSE) { # \dontrun{
## Download the example data included in the package for study SRP009615

url2 <- download_study("SRP009615")
url2

## Load the data
load(file.path("SRP009615", "rse_gene.Rdata"))

## Compare the data
library("testthat")
expect_equivalent(rse_gene, rse_gene_SRP009615)
} # }