Shannon Ellis et al (2017) predicted phenotypes based on expression data for
the samples in the recount2 project. Using this function you can add the
predictions to a
RangedSummarizedExperiment-class object
to the colData()
slot.
add_predictions(rse, is_tcga = FALSE, version = "latest", verbose = TRUE)
A RangedSummarizedExperiment-class object as downloaded with download_study. If this argument is not specified, the function will return the full predictions table.
Set to TRUE
only when rse
is from TCGA.
Otherwise set to FALSE
(default).
The version number for the predicted phenotypes data. It has to match one of the available numbers at https://github.com/leekgroup/recount-website/blob/master/predictions/. Feel free to check if there is a newer version than the default. The version used is printed as part of the file name.
If TRUE
it will print a message of where the
predictions file is being downloaded to.
A RangedSummarizedExperiment-class
object with the prediction columns appended to the colData()
slot.
The predicted phenotypes are:
male or female,
cell_line or tissue,
tissue predicted based off of 30 tissues in GTEx,
single or paired end sequencing.
For each of the predicted phenotypes there are several columns as described next:
NA
when not available,
NA
when we did not predict, "Unassigned"
when prediction was ambiguous,
accuracy is assigned per dataset based on comparison to samples for which we had reported phenotype information so there are three distinct values per predictor (GTEx, TCGA, SRA) across all studies.
If you use these predicted phenotypes please cite the Ellis et al bioRxiv pre-print available at https://www.biorxiv.org/content/early/2017/06/03/145656. See citation details with citation('recount').
Ellis et al, bioRxiv, 2017. https://www.biorxiv.org/content/early/2017/06/03/145656
## Add the predictions to an example rse_gene object
rse_gene <- add_predictions(rse_gene_SRP009615)
#> 2024-05-21 17:45:29.848915 downloading the predictions to /tmp/RtmpJLggZ6/PredictedPhenotypes_v0.0.06.rda
#> Loading objects:
#> PredictedPhenotypes
## Explore the predictions
colData(rse_gene)
#> DataFrame with 12 rows and 33 columns
#> project sample experiment run
#> <character> <character> <character> <character>
#> SRR387777 SRP009615 SRS281685 SRX110461 SRR387777
#> SRR387778 SRP009615 SRS281686 SRX110462 SRR387778
#> SRR387779 SRP009615 SRS281687 SRX110463 SRR387779
#> SRR387780 SRP009615 SRS281688 SRX110464 SRR387780
#> SRR389077 SRP009615 SRS282369 SRX111299 SRR389077
#> ... ... ... ... ...
#> SRR389080 SRP009615 SRS282372 SRX111302 SRR389080
#> SRR389081 SRP009615 SRS282373 SRX111303 SRR389081
#> SRR389082 SRP009615 SRS282374 SRX111304 SRR389082
#> SRR389083 SRP009615 SRS282375 SRX111305 SRR389083
#> SRR389084 SRP009615 SRS282376 SRX111306 SRR389084
#> read_count_as_reported_by_sra reads_downloaded
#> <integer> <integer>
#> SRR387777 30631853 30631853
#> SRR387778 37001306 37001306
#> SRR387779 40552001 40552001
#> SRR387780 32466352 32466352
#> SRR389077 27819603 27819603
#> ... ... ...
#> SRR389080 34856203 34856203
#> SRR389081 23351679 23351679
#> SRR389082 18144828 18144828
#> SRR389083 24417368 24417368
#> SRR389084 23060084 23060084
#> proportion_of_reads_reported_by_sra_downloaded paired_end
#> <numeric> <logical>
#> SRR387777 1 FALSE
#> SRR387778 1 FALSE
#> SRR387779 1 FALSE
#> SRR387780 1 FALSE
#> SRR389077 1 FALSE
#> ... ... ...
#> SRR389080 1 FALSE
#> SRR389081 1 FALSE
#> SRR389082 1 FALSE
#> SRR389083 1 FALSE
#> SRR389084 1 FALSE
#> sra_misreported_paired_end mapped_read_count auc
#> <logical> <integer> <numeric>
#> SRR387777 FALSE 28798572 1029494445
#> SRR387778 FALSE 33170281 1184877985
#> SRR387779 FALSE 37322762 1336528969
#> SRR387780 FALSE 29970735 1073178116
#> SRR389077 FALSE 24966859 893978355
#> ... ... ... ...
#> SRR389080 FALSE 32469994 1163527939
#> SRR389081 FALSE 21904197 781685955
#> SRR389082 FALSE 17199795 616048853
#> SRR389083 FALSE 22499386 806323346
#> SRR389084 FALSE 21957003 787795710
#> sharq_beta_tissue sharq_beta_cell_type biosample_submission_date
#> <character> <character> <character>
#> SRR387777 blood k562 2011-12-05T15:40:03...
#> SRR387778 blood k562 2011-12-05T15:40:03...
#> SRR387779 blood k562 2011-12-05T15:40:03...
#> SRR387780 blood k562 2011-12-05T15:40:03...
#> SRR389077 blood k562 2011-12-13T11:26:05...
#> ... ... ... ...
#> SRR389080 blood k562 2011-12-13T11:26:05...
#> SRR389081 blood k562 2011-12-13T11:26:05...
#> SRR389082 blood k562 2011-12-13T11:26:05...
#> SRR389083 blood k562 2011-12-13T11:26:05...
#> SRR389084 blood k562 2011-12-13T11:26:05...
#> biosample_publication_date biosample_update_date avg_read_length
#> <character> <character> <integer>
#> SRR387777 2011-12-07T09:29:59... 2014-08-27T04:18:20... 36
#> SRR387778 2011-12-07T09:29:59... 2014-08-27T04:18:21... 36
#> SRR387779 2011-12-07T09:29:59... 2014-08-27T04:18:21... 36
#> SRR387780 2011-12-07T09:29:59... 2014-08-27T04:18:22... 36
#> SRR389077 2011-12-13T11:26:06... 2014-08-27T04:22:14... 36
#> ... ... ... ...
#> SRR389080 2011-12-13T11:26:06... 2014-08-27T04:22:15... 36
#> SRR389081 2011-12-13T11:26:06... 2014-08-27T04:22:16... 36
#> SRR389082 2011-12-13T11:26:06... 2014-08-27T04:22:16... 36
#> SRR389083 2011-12-13T11:26:06... 2014-08-27T04:22:17... 36
#> SRR389084 2011-12-13T11:26:06... 2014-08-27T04:22:17... 36
#> geo_accession bigwig_file title
#> <character> <character> <character>
#> SRR387777 GSM836270 SRR387777.bw K562 cells with shRN..
#> SRR387778 GSM836271 SRR387778.bw K562 cells with shRN..
#> SRR387779 GSM836272 SRR387779.bw K562 cells with shRN..
#> SRR387780 GSM836273 SRR387780.bw K562 cells with shRN..
#> SRR389077 GSM847561 SRR389077.bw K562 cells with shRN..
#> ... ... ... ...
#> SRR389080 GSM847564 SRR389080.bw K562 cells with shRN..
#> SRR389081 GSM847565 SRR389081.bw K562 cells with shRN..
#> SRR389082 GSM847566 SRR389082.bw K562 cells with shRN..
#> SRR389083 GSM847567 SRR389083.bw K562 cells with shRN..
#> SRR389084 GSM847568 SRR389084.bw K562 cells with shRN..
#> characteristics
#> <CharacterList>
#> SRR387777 cells: K562,shRNA expression: no,treatment: Puromycin
#> SRR387778 cells: K562,shRNA expression: ye..,treatment: Puromycin..
#> SRR387779 cells: K562,shRNA expression: no,treatment: Puromycin
#> SRR387780 cells: K562,shRNA expression: ye..,treatment: Puromycin..
#> SRR389077 cell line: K562,shRNA expression: no..,treatment: Puromycin
#> ... ...
#> SRR389080 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#> SRR389081 cell line: K562,shRNA expression: no..,treatment: Puromycin
#> SRR389082 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#> SRR389083 cell line: K562,shRNA expression: no..,treatment: Puromycin
#> SRR389084 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#> reported_sex predicted_sex accuracy_sex reported_samplesource
#> <factor> <factor> <numeric> <factor>
#> SRR387777 NA female 0.862637 NA
#> SRR387778 NA female 0.862637 NA
#> SRR387779 NA female 0.862637 NA
#> SRR387780 NA female 0.862637 NA
#> SRR389077 NA female 0.862637 cell_line
#> ... ... ... ... ...
#> SRR389080 NA female 0.862637 cell_line
#> SRR389081 NA female 0.862637 cell_line
#> SRR389082 NA female 0.862637 cell_line
#> SRR389083 NA female 0.862637 cell_line
#> SRR389084 NA female 0.862637 cell_line
#> predicted_samplesource accuracy_samplesource reported_tissue
#> <factor> <numeric> <factor>
#> SRR387777 tissue NA Blood
#> SRR387778 tissue NA Blood
#> SRR387779 tissue NA Blood
#> SRR387780 cell_line 0.89235 Blood
#> SRR389077 cell_line 0.89235 Blood
#> ... ... ... ...
#> SRR389080 cell_line 0.89235 Blood
#> SRR389081 cell_line 0.89235 Blood
#> SRR389082 cell_line 0.89235 Blood
#> SRR389083 cell_line 0.89235 Blood
#> SRR389084 cell_line 0.89235 Blood
#> predicted_tissue accuracy_tissue reported_sequencingstrategy
#> <factor> <numeric> <factor>
#> SRR387777 Uterus 0.518825 SINGLE
#> SRR387778 Blood 0.518825 SINGLE
#> SRR387779 Salivary Gland 0.518825 SINGLE
#> SRR387780 Uterus 0.518825 SINGLE
#> SRR389077 Uterus 0.518825 SINGLE
#> ... ... ... ...
#> SRR389080 Salivary Gland 0.518825 SINGLE
#> SRR389081 Blood 0.518825 SINGLE
#> SRR389082 Blood 0.518825 SINGLE
#> SRR389083 Blood 0.518825 SINGLE
#> SRR389084 Blood 0.518825 SINGLE
#> predicted_sequencingstrategy accuracy_sequencingstrategy
#> <factor> <numeric>
#> SRR387777 SINGLE 0.908575
#> SRR387778 SINGLE 0.908575
#> SRR387779 SINGLE 0.908575
#> SRR387780 SINGLE 0.908575
#> SRR389077 SINGLE 0.908575
#> ... ... ...
#> SRR389080 SINGLE 0.908575
#> SRR389081 SINGLE 0.908575
#> SRR389082 SINGLE 0.908575
#> SRR389083 SINGLE 0.908575
#> SRR389084 SINGLE 0.908575
## Download all the latest predictions
PredictedPhenotypes <- add_predictions()
#> 2024-05-21 17:45:30.19145 downloading the predictions to /tmp/RtmpJLggZ6/PredictedPhenotypes_v0.0.06.rda
#> Loading objects:
#> PredictedPhenotypes