Compute a TPM matrix based on a RangedSummarizedExperiment object

For some analyses you might be interested in transforming the counts into TPMs which you can do with this function. This function uses the gene-level RPKMs to derive TPM values (see Details).

getTPM(rse, length_var = "bp_length", mapped_var = NULL)

Arguments

rse: A RangedSummarizedExperiment-class object as downloaded with download_study.
length_var: A length 1 character vector with the column name from rowData(rse) that has the coding length. For gene level objects from recount this is bp_length. If NULL, then it will use width(rowRanges(rse)) which should be used for exon RSEs.
mapped_var: A length 1 character vector with the column name from colData(rse) that has the number of reads mapped. For recount RSE object this would be mapped_read_count. If NULL (default) then it will use the column sums of the counts matrix. The results are different because not all mapped reads are mapped to exonic segments of the genome.

Value

A matrix with the TPM values.

Details

For gene RSE objects, you will want to specify the length_var because otherwise you will be adjusting for the total gene length instead of the total exonic sequence length of the gene.

As noted in https://support.bioconductor.org/p/124265/, Sonali Arora et al computed TPMs in https://www.biorxiv.org/content/10.1101/445601v2 using the formula: TPM = FPKM / (sum of FPKM over all genes/transcripts) * 10^6

Arora et al mention in their code that the formula comes from https://doi.org/10.1093/bioinformatics/btp692; specifically 1.1.1 Comparison to RPKM estimation where they mention an important assumption: Under the assumption of uniformly distributed reads, we note that RPKM measures are estimates of ...

There's also a blog post by Harold Pimentel explaining the relationship between FPKM and TPM: https://haroldpimentel.wordpress.com/2014/05/08/what-the-fpkm-a-review-rna-seq-expression-units/.

References

https://www.biorxiv.org/content/10.1101/445601v2 https://arxiv.org/abs/1104.3889

Author

Sonali Arora, Leonardo Collado-Torres

Examples


## Compute the TPM matrix from the raw gene-level base-pair counts.
tpm <- getTPM(rse_gene_SRP009615)