Shannon Ellis et al (2017) predicted phenotypes based on expression data for the samples in the recount2 project. Using this function you can add the predictions to a RangedSummarizedExperiment-class object to the colData() slot.

add_predictions(rse, is_tcga = FALSE, version = "latest", verbose = TRUE)

Arguments

rse

A RangedSummarizedExperiment-class object as downloaded with download_study. If this argument is not specified, the function will return the full predictions table.

is_tcga

Set to TRUE only when rse is from TCGA. Otherwise set to FALSE (default).

version

The version number for the predicted phenotypes data. It has to match one of the available numbers at https://github.com/leekgroup/recount-website/blob/master/predictions/. Feel free to check if there is a newer version than the default. The version used is printed as part of the file name.

verbose

If TRUE it will print a message of where the predictions file is being downloaded to.

Value

A RangedSummarizedExperiment-class

object with the prediction columns appended to the colData() slot. The predicted phenotypes are:

sex

male or female,

samplesource

cell_line or tissue,

tissue

tissue predicted based off of 30 tissues in GTEx,

sequencingstrategy

single or paired end sequencing.

For each of the predicted phenotypes there are several columns as described next:

reported_phenotype

NA when not available,

predicted_phenotype

NA when we did not predict, "Unassigned" when prediction was ambiguous,

accuracy_phenotype

accuracy is assigned per dataset based on comparison to samples for which we had reported phenotype information so there are three distinct values per predictor (GTEx, TCGA, SRA) across all studies.

Details

If you use these predicted phenotypes please cite the Ellis et al bioRxiv pre-print available at https://www.biorxiv.org/content/early/2017/06/03/145656. See citation details with citation('recount').

References

Ellis et al, bioRxiv, 2017. https://www.biorxiv.org/content/early/2017/06/03/145656

Author

Leonardo Collado-Torres

Examples


## Add the predictions to an example rse_gene object
rse_gene <- add_predictions(rse_gene_SRP009615)
#> 2024-05-21 17:45:29.848915 downloading the predictions to /tmp/RtmpJLggZ6/PredictedPhenotypes_v0.0.06.rda
#> Loading objects:
#>   PredictedPhenotypes

## Explore the predictions
colData(rse_gene)
#> DataFrame with 12 rows and 33 columns
#>               project      sample  experiment         run
#>           <character> <character> <character> <character>
#> SRR387777   SRP009615   SRS281685   SRX110461   SRR387777
#> SRR387778   SRP009615   SRS281686   SRX110462   SRR387778
#> SRR387779   SRP009615   SRS281687   SRX110463   SRR387779
#> SRR387780   SRP009615   SRS281688   SRX110464   SRR387780
#> SRR389077   SRP009615   SRS282369   SRX111299   SRR389077
#> ...               ...         ...         ...         ...
#> SRR389080   SRP009615   SRS282372   SRX111302   SRR389080
#> SRR389081   SRP009615   SRS282373   SRX111303   SRR389081
#> SRR389082   SRP009615   SRS282374   SRX111304   SRR389082
#> SRR389083   SRP009615   SRS282375   SRX111305   SRR389083
#> SRR389084   SRP009615   SRS282376   SRX111306   SRR389084
#>           read_count_as_reported_by_sra reads_downloaded
#>                               <integer>        <integer>
#> SRR387777                      30631853         30631853
#> SRR387778                      37001306         37001306
#> SRR387779                      40552001         40552001
#> SRR387780                      32466352         32466352
#> SRR389077                      27819603         27819603
#> ...                                 ...              ...
#> SRR389080                      34856203         34856203
#> SRR389081                      23351679         23351679
#> SRR389082                      18144828         18144828
#> SRR389083                      24417368         24417368
#> SRR389084                      23060084         23060084
#>           proportion_of_reads_reported_by_sra_downloaded paired_end
#>                                                <numeric>  <logical>
#> SRR387777                                              1      FALSE
#> SRR387778                                              1      FALSE
#> SRR387779                                              1      FALSE
#> SRR387780                                              1      FALSE
#> SRR389077                                              1      FALSE
#> ...                                                  ...        ...
#> SRR389080                                              1      FALSE
#> SRR389081                                              1      FALSE
#> SRR389082                                              1      FALSE
#> SRR389083                                              1      FALSE
#> SRR389084                                              1      FALSE
#>           sra_misreported_paired_end mapped_read_count        auc
#>                            <logical>         <integer>  <numeric>
#> SRR387777                      FALSE          28798572 1029494445
#> SRR387778                      FALSE          33170281 1184877985
#> SRR387779                      FALSE          37322762 1336528969
#> SRR387780                      FALSE          29970735 1073178116
#> SRR389077                      FALSE          24966859  893978355
#> ...                              ...               ...        ...
#> SRR389080                      FALSE          32469994 1163527939
#> SRR389081                      FALSE          21904197  781685955
#> SRR389082                      FALSE          17199795  616048853
#> SRR389083                      FALSE          22499386  806323346
#> SRR389084                      FALSE          21957003  787795710
#>           sharq_beta_tissue sharq_beta_cell_type biosample_submission_date
#>                 <character>          <character>               <character>
#> SRR387777             blood                 k562    2011-12-05T15:40:03...
#> SRR387778             blood                 k562    2011-12-05T15:40:03...
#> SRR387779             blood                 k562    2011-12-05T15:40:03...
#> SRR387780             blood                 k562    2011-12-05T15:40:03...
#> SRR389077             blood                 k562    2011-12-13T11:26:05...
#> ...                     ...                  ...                       ...
#> SRR389080             blood                 k562    2011-12-13T11:26:05...
#> SRR389081             blood                 k562    2011-12-13T11:26:05...
#> SRR389082             blood                 k562    2011-12-13T11:26:05...
#> SRR389083             blood                 k562    2011-12-13T11:26:05...
#> SRR389084             blood                 k562    2011-12-13T11:26:05...
#>           biosample_publication_date  biosample_update_date avg_read_length
#>                          <character>            <character>       <integer>
#> SRR387777     2011-12-07T09:29:59... 2014-08-27T04:18:20...              36
#> SRR387778     2011-12-07T09:29:59... 2014-08-27T04:18:21...              36
#> SRR387779     2011-12-07T09:29:59... 2014-08-27T04:18:21...              36
#> SRR387780     2011-12-07T09:29:59... 2014-08-27T04:18:22...              36
#> SRR389077     2011-12-13T11:26:06... 2014-08-27T04:22:14...              36
#> ...                              ...                    ...             ...
#> SRR389080     2011-12-13T11:26:06... 2014-08-27T04:22:15...              36
#> SRR389081     2011-12-13T11:26:06... 2014-08-27T04:22:16...              36
#> SRR389082     2011-12-13T11:26:06... 2014-08-27T04:22:16...              36
#> SRR389083     2011-12-13T11:26:06... 2014-08-27T04:22:17...              36
#> SRR389084     2011-12-13T11:26:06... 2014-08-27T04:22:17...              36
#>           geo_accession  bigwig_file                  title
#>             <character>  <character>            <character>
#> SRR387777     GSM836270 SRR387777.bw K562 cells with shRN..
#> SRR387778     GSM836271 SRR387778.bw K562 cells with shRN..
#> SRR387779     GSM836272 SRR387779.bw K562 cells with shRN..
#> SRR387780     GSM836273 SRR387780.bw K562 cells with shRN..
#> SRR389077     GSM847561 SRR389077.bw K562 cells with shRN..
#> ...                 ...          ...                    ...
#> SRR389080     GSM847564 SRR389080.bw K562 cells with shRN..
#> SRR389081     GSM847565 SRR389081.bw K562 cells with shRN..
#> SRR389082     GSM847566 SRR389082.bw K562 cells with shRN..
#> SRR389083     GSM847567 SRR389083.bw K562 cells with shRN..
#> SRR389084     GSM847568 SRR389084.bw K562 cells with shRN..
#>                                                         characteristics
#>                                                         <CharacterList>
#> SRR387777         cells: K562,shRNA expression: no,treatment: Puromycin
#> SRR387778     cells: K562,shRNA expression: ye..,treatment: Puromycin..
#> SRR387779         cells: K562,shRNA expression: no,treatment: Puromycin
#> SRR387780     cells: K562,shRNA expression: ye..,treatment: Puromycin..
#> SRR389077   cell line: K562,shRNA expression: no..,treatment: Puromycin
#> ...                                                                 ...
#> SRR389080 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#> SRR389081   cell line: K562,shRNA expression: no..,treatment: Puromycin
#> SRR389082 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#> SRR389083   cell line: K562,shRNA expression: no..,treatment: Puromycin
#> SRR389084 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#>           reported_sex predicted_sex accuracy_sex reported_samplesource
#>               <factor>      <factor>    <numeric>              <factor>
#> SRR387777           NA        female     0.862637             NA       
#> SRR387778           NA        female     0.862637             NA       
#> SRR387779           NA        female     0.862637             NA       
#> SRR387780           NA        female     0.862637             NA       
#> SRR389077           NA        female     0.862637             cell_line
#> ...                ...           ...          ...                   ...
#> SRR389080           NA        female     0.862637             cell_line
#> SRR389081           NA        female     0.862637             cell_line
#> SRR389082           NA        female     0.862637             cell_line
#> SRR389083           NA        female     0.862637             cell_line
#> SRR389084           NA        female     0.862637             cell_line
#>           predicted_samplesource accuracy_samplesource reported_tissue
#>                         <factor>             <numeric>        <factor>
#> SRR387777              tissue                       NA           Blood
#> SRR387778              tissue                       NA           Blood
#> SRR387779              tissue                       NA           Blood
#> SRR387780              cell_line               0.89235           Blood
#> SRR389077              cell_line               0.89235           Blood
#> ...                          ...                   ...             ...
#> SRR389080              cell_line               0.89235           Blood
#> SRR389081              cell_line               0.89235           Blood
#> SRR389082              cell_line               0.89235           Blood
#> SRR389083              cell_line               0.89235           Blood
#> SRR389084              cell_line               0.89235           Blood
#>           predicted_tissue accuracy_tissue reported_sequencingstrategy
#>                   <factor>       <numeric>                    <factor>
#> SRR387777   Uterus                0.518825                      SINGLE
#> SRR387778   Blood                 0.518825                      SINGLE
#> SRR387779   Salivary Gland        0.518825                      SINGLE
#> SRR387780   Uterus                0.518825                      SINGLE
#> SRR389077   Uterus                0.518825                      SINGLE
#> ...                    ...             ...                         ...
#> SRR389080   Salivary Gland        0.518825                      SINGLE
#> SRR389081   Blood                 0.518825                      SINGLE
#> SRR389082   Blood                 0.518825                      SINGLE
#> SRR389083   Blood                 0.518825                      SINGLE
#> SRR389084   Blood                 0.518825                      SINGLE
#>           predicted_sequencingstrategy accuracy_sequencingstrategy
#>                               <factor>                   <numeric>
#> SRR387777                       SINGLE                    0.908575
#> SRR387778                       SINGLE                    0.908575
#> SRR387779                       SINGLE                    0.908575
#> SRR387780                       SINGLE                    0.908575
#> SRR389077                       SINGLE                    0.908575
#> ...                                ...                         ...
#> SRR389080                       SINGLE                    0.908575
#> SRR389081                       SINGLE                    0.908575
#> SRR389082                       SINGLE                    0.908575
#> SRR389083                       SINGLE                    0.908575
#> SRR389084                       SINGLE                    0.908575

## Download all the latest predictions
PredictedPhenotypes <- add_predictions()
#> 2024-05-21 17:45:30.19145 downloading the predictions to /tmp/RtmpJLggZ6/PredictedPhenotypes_v0.0.06.rda
#> Loading objects:
#>   PredictedPhenotypes