In preparation for a differential expression analysis, you will have to choose how to scale the raw counts provided by the recount project. Note that the raw counts are the sum of the base level coverage so you have to take into account the read length or simply the total coverage for the given sample (default option). You might want to do some further scaling to take into account the gene or exon lengths. If you prefer to calculate read counts without scaling check the function read_counts.
scale_counts(
rse,
by = "auc",
targetSize = 4e+07,
L = 100,
factor_only = FALSE,
round = TRUE
)
A RangedSummarizedExperiment-class object as downloaded with download_study.
Either auc
or mapped_reads
. If set to auc
it
will scale the counts by the total coverage of the sample. That is, the area
under the curve (AUC) of the coverage. If set to mapped_reads
it will
scale the counts by the number of mapped reads, whether the library was
paired-end or not, and the desired read length (L
).
The target library size in number of single end reads.
The target read length. Only used when by = 'mapped_reads'
since it cancels out in the calculation when using by = 'auc'
.
Whether to only return the numeric scaling factor or
to return a RangedSummarizedExperiment-class
object with the counts scaled. If set to TRUE
, you have to multiply
the sample counts by this scaling factor.
Whether to round the counts to integers or not.
If factor_only = TRUE
it returns a numeric vector with the
scaling factor for each sample. If factor_only = FALSE
it returns a
RangedSummarizedExperiment-class object with
the counts already scaled.
Rail-RNA http://rail.bio uses soft clipping when aligning
which is why we recommed using by = 'auc'
.
If the reads are from a paired-end library, then the avg_read_length
is the average fragment length. This is taken into account when using
by = 'mapped_reads'
.
## Load an example rse_gene object
rse_gene <- rse_gene_SRP009615
## Scale counts
rse <- scale_counts(rse_gene)
## Find the project used as an example
project_info <- abstract_search("GSE32465")
## See some summary information for this project
project_info
#> number_samples species
#> 340 12 human
#> abstract
#> 340 Summary: K562-shX cells are made in an effort to validate TFBS data and ChIP-seq antibodies in Myers lab (GSE32465). K562 cells are transduced with lentiviral vector having Tet-inducible shRNA targeting a transcription factor gene. Cells with stable integration of shRNA constructs are selected using puromycin in growth media. Doxycycline is added to the growth media to induce the expression of shRNA and a red fluorescent protein marker. A successful shRNA cell line shows at least a 70% reduction in expression of the target transcription factor as measured by qPCR. For identification, we designated these cell lines as K562-shX, where X is the transcription factor targeted by shRNA and K562 denotes the parent cell line. For example, K562-shATF3 cells are K562 derived cells selected for stable integration of shRNA targeting the transcription factor ATF3 gene and showed at least a 70% reduction in the expression of ATF3 gene when measured by qPCR. Cells growing without doxycycline (uninduced) are used as a control to measure the change in expression of target transcription factor gene after induction of shRNA using doxycycline. For detailed growth and culturing protocols for these cells please refer to http://hudsonalpha.org/myers-lab/protocols . To identify the potential downstream targets of the candidate transcription factor, analyze the mRNA expression profile of the uninduced and induced K562-shX using RNA-seq. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Overall Design: Make K562-shX cells as described in the http://hudsonalpha.org/myers-lab/protocols . Measure the mRNA expression levels in uninduced K562-shX and induced K562-shX cells in two biological replicates using RNA-seq. Identify the potential downstream targets of the candidate transcription factor.
#> project
#> 340 SRP009615
## Use the following code to re-download this file
if (FALSE) {
## Download
download_study(project_info$project)
## Load file
load(file.path(project_info$project, "rse_gene.Rdata"))
identical(rse_gene, rse_gene_SRP009615)
}