Identify expressed regions from the mean coverage for a given SRA project

This function uses the pre-computed mean coverage for a given SRA project to identify the expressed regions (ERs) for a given chromosome. It returns a GRanges-class object with the expressed regions as defined by findRegions.

expressed_regions(
  project,
  chr,
  cutoff,
  outdir = NULL,
  maxClusterGap = 300L,
  chrlen = NULL,
  verbose = TRUE,
  ...
)

Arguments

project: A character vector with one SRA study id.
chr: A character vector with the name of the chromosome.
cutoff: The base-pair level cutoff to use.
outdir: The destination directory for the downloaded file(s) that were previously downloaded with download_study. If the files are missing, but outdir is specified, they will get downloaded first. By default outdir is set to NULL which will use the data from the web. We only recommend downloading the full data if you will use it several times.
maxClusterGap: This determines the maximum gap between candidate ERs.
chrlen: The chromosome length in base pairs. If it's NULL, the chromosome length is extracted from the Rail-RNA runs GitHub repository. Alternatively check the SciServer section on the vignette to see how to access all the recount data via a R Jupyter Notebook.
verbose: If TRUE basic status updates will be printed along the way.
...: Additional arguments passed to download_study when outdir is specified but the required files are missing.

Value

A GRanges-class object as created by findRegions.

Author

Leonardo Collado-Torres

Examples

## Define expressed regions for study SRP002001, chrY

## Workaround for https://github.com/lawremi/rtracklayer/issues/83
download_study("SRP002001", type = "mean")
#> 2024-12-12 22:14:56.338814 downloading file mean_SRP002001.bw to SRP002001/bw

regions <- expressed_regions("SRP002001", "chrY",
    cutoff = 5L,
    maxClusterGap = 3000L,
    outdir = "SRP002001"
)
#> 2024-12-12 22:14:57.458327 loadCoverage: loading BigWig file SRP002001/bw/mean_SRP002001.bw
#> 2024-12-12 22:14:57.50171 loadCoverage: applying the cutoff to the merged data
#> 2024-12-12 22:14:57.514551 filterData: originally there were 57227415 rows, now there are 57227415 rows. Meaning that 0 percent was filtered.
#> 2024-12-12 22:14:57.515705 findRegions: identifying potential segments
#> 2024-12-12 22:14:57.517884 findRegions: segmenting information
#> 2024-12-12 22:14:57.518217 .getSegmentsRle: segmenting with cutoff(s) 5
#> 2024-12-12 22:14:57.521274 findRegions: identifying candidate regions
#> 2024-12-12 22:14:58.378612 findRegions: identifying region clusters

if (FALSE) { # \dontrun{
## Define the regions for multiple chrs
regs <- sapply(chrs, expressed_regions, project = "SRP002001", cutoff = 5L)

## You can then combine them into a single GRanges object if you want to
library("GenomicRanges")
single <- unlist(GRangesList(regs))
} # }

Identify expressed regions from the mean coverage for a given SRA project

Arguments

Value

See also

Author

Examples