As described in the recount workflow, the counts provided by the recount2 project are base-pair counts. You can scale them using scale_counts or compute the read counts using the area under coverage information (AUC). We use the AUC because Rail-RNA soft clips some reads.
read_counts(rse, use_paired_end = TRUE, round = FALSE)
A RangedSummarizedExperiment-class object as downloaded with download_study.
A logical vector. When TRUE
it uses the
paired-end flag (colData(rse)$paired_end
) to determine whether
the sample is paired-end or not. If it's paired-end, then it divides the
counts by 2 to return paired-end read counts instead of fragment counts.
When FALSE
, this information is ignored.
Whether to round the counts to integers or not.
Returns a RangedSummarizedExperiment-class object with the read counts.
Check the recount workflow for details about the counts provided by the recount2 project. Note that this function should not be used after scale_counts or it will return non-sensical results.
Collado-Torres L, Nellore A and Jaffe AE. recount workflow: Accessing over 70,000 human RNA-seq samples with Bioconductor version 1; referees: 1 approved, 2 approved with reservations. F1000Research 2017, 6:1558 doi: 10.12688/f1000research.12223.1.
## Difference between read counts and reads downloaded by Rail-RNA
colSums(assays(read_counts(rse_gene_SRP009615))$counts) / 1e6 -
colData(rse_gene_SRP009615)$reads_downloaded / 1e6
#> SRR387777 SRR387778 SRR387779 SRR387780 SRR389077 SRR389078 SRR389079
#> -8.839994 -12.892541 -7.536284 -6.497571 -13.239233 -15.816168 -14.775568
#> SRR389080 SRR389081 SRR389082 SRR389083 SRR389084
#> -6.274001 -5.054220 -2.328694 -5.265160 -4.421728
## Paired-end read counts vs fragment counts (single-end counts)
download_study("DRP000499")
#> 2024-05-21 17:45:49.634661 downloading file rse_gene.Rdata to DRP000499
load("DRP000499/rse_gene.Rdata")
colSums(assays(read_counts(rse_gene, use_paired_end = FALSE))$counts) /
colSums(assays(read_counts(rse_gene))$counts)
#> DRR001622 DRR001625 DRR001628 DRR001631 DRR001634 DRR001637 DRR001623 DRR001626
#> 2 2 2 2 2 2 2 2
#> DRR001629 DRR001632 DRR001635 DRR001638 DRR001642 DRR001624 DRR001627 DRR001630
#> 2 2 2 2 2 2 2 2
#> DRR001633 DRR001636 DRR001639 DRR001640 DRR001641
#> 2 2 2 2 2
## Difference between paired-end read counts vs paired-end reads downloaded
colSums(assays(read_counts(rse_gene))$counts) / 1e6 -
colData(rse_gene)$reads_downloaded / 1e6 / 2
#> DRR001622 DRR001625 DRR001628 DRR001631 DRR001634 DRR001637
#> -2.0348249 -0.6138274 -53.1563756 -9.8438273 -8.4707991 -7.3001527
#> DRR001623 DRR001626 DRR001629 DRR001632 DRR001635 DRR001638
#> -1.0163589 -0.9350952 -54.8523133 -15.8955635 -8.8314352 -8.0629589
#> DRR001642 DRR001624 DRR001627 DRR001630 DRR001633 DRR001636
#> -7.8494200 -1.5618029 -1.2820004 -16.5891652 -8.5003669 -12.5271479
#> DRR001639 DRR001640 DRR001641
#> -15.3775958 -9.8064514 -7.3778134