R/expressed_regions.R
expressed_regions.Rd
This function uses the pre-computed mean coverage for a given SRA project to identify the expressed regions (ERs) for a given chromosome. It returns a GRanges-class object with the expressed regions as defined by findRegions.
expressed_regions(
project,
chr,
cutoff,
outdir = NULL,
maxClusterGap = 300L,
chrlen = NULL,
verbose = TRUE,
...
)
A character vector with one SRA study id.
A character vector with the name of the chromosome.
The base-pair level cutoff to use.
The destination directory for the downloaded file(s) that were
previously downloaded with download_study. If the files are missing,
but outdir
is specified, they will get downloaded first. By default
outdir
is set to NULL
which will use the data from the web.
We only recommend downloading the full data if you will use it several times.
This determines the maximum gap between candidate ERs.
The chromosome length in base pairs. If it's NULL
, the
chromosome length is extracted from the Rail-RNA runs GitHub repository.
Alternatively check the SciServer
section on the vignette to see
how to access all the recount data via a R Jupyter Notebook.
If TRUE
basic status updates will be printed along the
way.
Additional arguments passed to download_study when
outdir
is specified but the required files are missing.
A GRanges-class object as created by findRegions.
## Define expressed regions for study SRP002001, chrY
## Workaround for https://github.com/lawremi/rtracklayer/issues/83
download_study("SRP002001", type = "mean")
#> 2024-05-21 17:45:38.249273 downloading file mean_SRP002001.bw to SRP002001/bw
regions <- expressed_regions("SRP002001", "chrY",
cutoff = 5L,
maxClusterGap = 3000L,
outdir = "SRP002001"
)
#> 2024-05-21 17:45:39.305727 loadCoverage: loading BigWig file SRP002001/bw/mean_SRP002001.bw
#> 2024-05-21 17:45:39.349249 loadCoverage: applying the cutoff to the merged data
#> 2024-05-21 17:45:39.363331 filterData: originally there were 57227415 rows, now there are 57227415 rows. Meaning that 0 percent was filtered.
#> 2024-05-21 17:45:39.364316 findRegions: identifying potential segments
#> 2024-05-21 17:45:39.366273 findRegions: segmenting information
#> 2024-05-21 17:45:39.366452 .getSegmentsRle: segmenting with cutoff(s) 5
#> 2024-05-21 17:45:39.369293 findRegions: identifying candidate regions
#> 2024-05-21 17:45:40.233994 findRegions: identifying region clusters
if (FALSE) {
## Define the regions for multiple chrs
regs <- sapply(chrs, expressed_regions, project = "SRP002001", cutoff = 5L)
## You can then combine them into a single GRanges object if you want to
library("GenomicRanges")
single <- unlist(GRangesList(regs))
}