This function uses the pre-computed mean coverage for a given SRA project to identify the expressed regions (ERs) for a given chromosome. It returns a GRanges-class object with the expressed regions as defined by findRegions.

expressed_regions(
  project,
  chr,
  cutoff,
  outdir = NULL,
  maxClusterGap = 300L,
  chrlen = NULL,
  verbose = TRUE,
  ...
)

Arguments

project

A character vector with one SRA study id.

chr

A character vector with the name of the chromosome.

cutoff

The base-pair level cutoff to use.

outdir

The destination directory for the downloaded file(s) that were previously downloaded with download_study. If the files are missing, but outdir is specified, they will get downloaded first. By default outdir is set to NULL which will use the data from the web. We only recommend downloading the full data if you will use it several times.

maxClusterGap

This determines the maximum gap between candidate ERs.

chrlen

The chromosome length in base pairs. If it's NULL, the chromosome length is extracted from the Rail-RNA runs GitHub repository. Alternatively check the SciServer section on the vignette to see how to access all the recount data via a R Jupyter Notebook.

verbose

If TRUE basic status updates will be printed along the way.

...

Additional arguments passed to download_study when outdir is specified but the required files are missing.

Value

A GRanges-class object as created by findRegions.

Author

Leonardo Collado-Torres

Examples

## Define expressed regions for study SRP002001, chrY

## Workaround for https://github.com/lawremi/rtracklayer/issues/83
download_study("SRP002001", type = "mean")
#> 2024-05-21 17:45:38.249273 downloading file mean_SRP002001.bw to SRP002001/bw

regions <- expressed_regions("SRP002001", "chrY",
    cutoff = 5L,
    maxClusterGap = 3000L,
    outdir = "SRP002001"
)
#> 2024-05-21 17:45:39.305727 loadCoverage: loading BigWig file SRP002001/bw/mean_SRP002001.bw
#> 2024-05-21 17:45:39.349249 loadCoverage: applying the cutoff to the merged data
#> 2024-05-21 17:45:39.363331 filterData: originally there were 57227415 rows, now there are 57227415 rows. Meaning that 0 percent was filtered.
#> 2024-05-21 17:45:39.364316 findRegions: identifying potential segments
#> 2024-05-21 17:45:39.366273 findRegions: segmenting information
#> 2024-05-21 17:45:39.366452 .getSegmentsRle: segmenting with cutoff(s) 5
#> 2024-05-21 17:45:39.369293 findRegions: identifying candidate regions
#> 2024-05-21 17:45:40.233994 findRegions: identifying region clusters

if (FALSE) {
## Define the regions for multiple chrs
regs <- sapply(chrs, expressed_regions, project = "SRP002001", cutoff = 5L)

## You can then combine them into a single GRanges object if you want to
library("GenomicRanges")
single <- unlist(GRangesList(regs))
}