This function appends sample metadata information to a RangedSummarizedExperiment-class from the recount2 project. The sample metadata comes from curated efforts independent from the original recount2 project. Currently the only information comes from the recount_brain project described in more detail at http://lieberinstitute.github.io/recount-brain/.

add_metadata(rse, source = "recount_brain_v2", is_tcga = FALSE, verbose = TRUE)

Arguments

rse

A RangedSummarizedExperiment-class object as downloaded with download_study. If this argument is not specified, the function will return the raw metadata table.

source

A valid source name. The only supported options at this moment are recount_brain_v1 and recount_brain_v2.

is_tcga

Set to TRUE only when rse is from TCGA. Otherwise set to FALSE (default).

verbose

If TRUE it will print a message of where the predictions file is being downloaded to.

Value

A RangedSummarizedExperiment-class

object with the sample metadata columns appended to the colData()

slot.

Details

For source = "recount_brain_v1" and source = "recount_brain_v2", the metadata columns are described at http://lieberinstitute.github.io/recount-brain/. Alternatively, you can explore recount_brain_v2 interactively at https://jhubiostatistics.shinyapps.io/recount-brain/.

If you use the recount_brain data please cite the Razmara et al. bioRxiv, 2019 https://www.biorxiv.org/content/10.1101/618025v1. A bib file is available via citation('recount').

References

Razmara et al, bioRxiv, 2019. https://www.biorxiv.org/content/10.1101/618025v1

Author

Leonardo Collado-Torres

Examples


## Add the sample metadata to an example rse_gene object
rse_gene <- add_metadata(rse_gene_SRP009615, "recount_brain_v2")
#> 2023-05-07 05:53:09.705675 downloading the recount_brain metadata to /tmp/RtmpDcoigj/recount_brain_v2.Rdata
#> Loading objects:
#>   recount_brain
#> 2023-05-07 05:53:11.089727 found 0 out of 12 samples in the recount_brain metadata

## Explore the metadata
colData(rse_gene)
#> DataFrame with 12 rows and 85 columns
#>               project      sample  experiment         run
#>           <character> <character> <character> <character>
#> SRR387777   SRP009615   SRS281685   SRX110461   SRR387777
#> SRR387778   SRP009615   SRS281686   SRX110462   SRR387778
#> SRR387779   SRP009615   SRS281687   SRX110463   SRR387779
#> SRR387780   SRP009615   SRS281688   SRX110464   SRR387780
#> SRR389077   SRP009615   SRS282369   SRX111299   SRR389077
#> ...               ...         ...         ...         ...
#> SRR389080   SRP009615   SRS282372   SRX111302   SRR389080
#> SRR389081   SRP009615   SRS282373   SRX111303   SRR389081
#> SRR389082   SRP009615   SRS282374   SRX111304   SRR389082
#> SRR389083   SRP009615   SRS282375   SRX111305   SRR389083
#> SRR389084   SRP009615   SRS282376   SRX111306   SRR389084
#>           read_count_as_reported_by_sra reads_downloaded
#>                               <integer>        <integer>
#> SRR387777                      30631853         30631853
#> SRR387778                      37001306         37001306
#> SRR387779                      40552001         40552001
#> SRR387780                      32466352         32466352
#> SRR389077                      27819603         27819603
#> ...                                 ...              ...
#> SRR389080                      34856203         34856203
#> SRR389081                      23351679         23351679
#> SRR389082                      18144828         18144828
#> SRR389083                      24417368         24417368
#> SRR389084                      23060084         23060084
#>           proportion_of_reads_reported_by_sra_downloaded paired_end
#>                                                <numeric>  <logical>
#> SRR387777                                              1      FALSE
#> SRR387778                                              1      FALSE
#> SRR387779                                              1      FALSE
#> SRR387780                                              1      FALSE
#> SRR389077                                              1      FALSE
#> ...                                                  ...        ...
#> SRR389080                                              1      FALSE
#> SRR389081                                              1      FALSE
#> SRR389082                                              1      FALSE
#> SRR389083                                              1      FALSE
#> SRR389084                                              1      FALSE
#>           sra_misreported_paired_end mapped_read_count        auc
#>                            <logical>         <integer>  <numeric>
#> SRR387777                      FALSE          28798572 1029494445
#> SRR387778                      FALSE          33170281 1184877985
#> SRR387779                      FALSE          37322762 1336528969
#> SRR387780                      FALSE          29970735 1073178116
#> SRR389077                      FALSE          24966859  893978355
#> ...                              ...               ...        ...
#> SRR389080                      FALSE          32469994 1163527939
#> SRR389081                      FALSE          21904197  781685955
#> SRR389082                      FALSE          17199795  616048853
#> SRR389083                      FALSE          22499386  806323346
#> SRR389084                      FALSE          21957003  787795710
#>           sharq_beta_tissue sharq_beta_cell_type biosample_submission_date
#>                 <character>          <character>               <character>
#> SRR387777             blood                 k562    2011-12-05T15:40:03...
#> SRR387778             blood                 k562    2011-12-05T15:40:03...
#> SRR387779             blood                 k562    2011-12-05T15:40:03...
#> SRR387780             blood                 k562    2011-12-05T15:40:03...
#> SRR389077             blood                 k562    2011-12-13T11:26:05...
#> ...                     ...                  ...                       ...
#> SRR389080             blood                 k562    2011-12-13T11:26:05...
#> SRR389081             blood                 k562    2011-12-13T11:26:05...
#> SRR389082             blood                 k562    2011-12-13T11:26:05...
#> SRR389083             blood                 k562    2011-12-13T11:26:05...
#> SRR389084             blood                 k562    2011-12-13T11:26:05...
#>           biosample_publication_date  biosample_update_date avg_read_length
#>                          <character>            <character>       <integer>
#> SRR387777     2011-12-07T09:29:59... 2014-08-27T04:18:20...              36
#> SRR387778     2011-12-07T09:29:59... 2014-08-27T04:18:21...              36
#> SRR387779     2011-12-07T09:29:59... 2014-08-27T04:18:21...              36
#> SRR387780     2011-12-07T09:29:59... 2014-08-27T04:18:22...              36
#> SRR389077     2011-12-13T11:26:06... 2014-08-27T04:22:14...              36
#> ...                              ...                    ...             ...
#> SRR389080     2011-12-13T11:26:06... 2014-08-27T04:22:15...              36
#> SRR389081     2011-12-13T11:26:06... 2014-08-27T04:22:16...              36
#> SRR389082     2011-12-13T11:26:06... 2014-08-27T04:22:16...              36
#> SRR389083     2011-12-13T11:26:06... 2014-08-27T04:22:17...              36
#> SRR389084     2011-12-13T11:26:06... 2014-08-27T04:22:17...              36
#>           geo_accession  bigwig_file                  title
#>             <character>  <character>            <character>
#> SRR387777     GSM836270 SRR387777.bw K562 cells with shRN..
#> SRR387778     GSM836271 SRR387778.bw K562 cells with shRN..
#> SRR387779     GSM836272 SRR387779.bw K562 cells with shRN..
#> SRR387780     GSM836273 SRR387780.bw K562 cells with shRN..
#> SRR389077     GSM847561 SRR389077.bw K562 cells with shRN..
#> ...                 ...          ...                    ...
#> SRR389080     GSM847564 SRR389080.bw K562 cells with shRN..
#> SRR389081     GSM847565 SRR389081.bw K562 cells with shRN..
#> SRR389082     GSM847566 SRR389082.bw K562 cells with shRN..
#> SRR389083     GSM847567 SRR389083.bw K562 cells with shRN..
#> SRR389084     GSM847568 SRR389084.bw K562 cells with shRN..
#>                                                         characteristics
#>                                                         <CharacterList>
#> SRR387777         cells: K562,shRNA expression: no,treatment: Puromycin
#> SRR387778     cells: K562,shRNA expression: ye..,treatment: Puromycin..
#> SRR387779         cells: K562,shRNA expression: no,treatment: Puromycin
#> SRR387780     cells: K562,shRNA expression: ye..,treatment: Puromycin..
#> SRR389077   cell line: K562,shRNA expression: no..,treatment: Puromycin
#> ...                                                                 ...
#> SRR389080 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#> SRR389081   cell line: K562,shRNA expression: no..,treatment: Puromycin
#> SRR389082 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#> SRR389083   cell line: K562,shRNA expression: no..,treatment: Puromycin
#> SRR389084 cell line: K562,shRNA expression: ex..,treatment: Puromycin..
#>                 age age_units assay_type_s avgspotlen_l bioproject_s
#>           <logical> <logical>    <logical>    <logical>    <logical>
#> SRR387777        NA        NA           NA           NA           NA
#> SRR387778        NA        NA           NA           NA           NA
#> SRR387779        NA        NA           NA           NA           NA
#> SRR387780        NA        NA           NA           NA           NA
#> SRR389077        NA        NA           NA           NA           NA
#> ...             ...       ...          ...          ...          ...
#> SRR389080        NA        NA           NA           NA           NA
#> SRR389081        NA        NA           NA           NA           NA
#> SRR389082        NA        NA           NA           NA           NA
#> SRR389083        NA        NA           NA           NA           NA
#> SRR389084        NA        NA           NA           NA           NA
#>           biosample_s brain_bank brodmann_area cell_line center_name_s
#>             <logical>  <logical>     <logical> <logical>     <logical>
#> SRR387777          NA         NA            NA        NA            NA
#> SRR387778          NA         NA            NA        NA            NA
#> SRR387779          NA         NA            NA        NA            NA
#> SRR387780          NA         NA            NA        NA            NA
#> SRR389077          NA         NA            NA        NA            NA
#> ...               ...        ...           ...       ...           ...
#> SRR389080          NA         NA            NA        NA            NA
#> SRR389081          NA         NA            NA        NA            NA
#> SRR389082          NA         NA            NA        NA            NA
#> SRR389083          NA         NA            NA        NA            NA
#> SRR389084          NA         NA            NA        NA            NA
#>           clinical_stage_1 clinical_stage_2 consent_s development   disease
#>                  <logical>        <logical> <logical>   <logical> <logical>
#> SRR387777               NA               NA        NA          NA        NA
#> SRR387778               NA               NA        NA          NA        NA
#> SRR387779               NA               NA        NA          NA        NA
#> SRR387780               NA               NA        NA          NA        NA
#> SRR389077               NA               NA        NA          NA        NA
#> ...                    ...              ...       ...         ...       ...
#> SRR389080               NA               NA        NA          NA        NA
#> SRR389081               NA               NA        NA          NA        NA
#> SRR389082               NA               NA        NA          NA        NA
#> SRR389083               NA               NA        NA          NA        NA
#> SRR389084               NA               NA        NA          NA        NA
#>           disease_status experiment_s hemisphere insertsize_l instrument_s
#>                <logical>    <logical>  <logical>    <logical>    <logical>
#> SRR387777             NA           NA         NA           NA           NA
#> SRR387778             NA           NA         NA           NA           NA
#> SRR387779             NA           NA         NA           NA           NA
#> SRR387780             NA           NA         NA           NA           NA
#> SRR389077             NA           NA         NA           NA           NA
#> ...                  ...          ...        ...          ...          ...
#> SRR389080             NA           NA         NA           NA           NA
#> SRR389081             NA           NA         NA           NA           NA
#> SRR389082             NA           NA         NA           NA           NA
#> SRR389083             NA           NA         NA           NA           NA
#> SRR389084             NA           NA         NA           NA           NA
#>           library_name_s librarylayout_s libraryselection_s librarysource_s
#>                <logical>       <logical>          <logical>       <logical>
#> SRR387777             NA              NA                 NA              NA
#> SRR387778             NA              NA                 NA              NA
#> SRR387779             NA              NA                 NA              NA
#> SRR387780             NA              NA                 NA              NA
#> SRR389077             NA              NA                 NA              NA
#> ...                  ...             ...                ...             ...
#> SRR389080             NA              NA                 NA              NA
#> SRR389081             NA              NA                 NA              NA
#> SRR389082             NA              NA                 NA              NA
#> SRR389083             NA              NA                 NA              NA
#> SRR389084             NA              NA                 NA              NA
#>           loaddate_s  mbases_l  mbytes_l organism_s pathology platform_s
#>            <logical> <logical> <logical>  <logical> <logical>  <logical>
#> SRR387777         NA        NA        NA         NA        NA         NA
#> SRR387778         NA        NA        NA         NA        NA         NA
#> SRR387779         NA        NA        NA         NA        NA         NA
#> SRR387780         NA        NA        NA         NA        NA         NA
#> SRR389077         NA        NA        NA         NA        NA         NA
#> ...              ...       ...       ...        ...       ...        ...
#> SRR389080         NA        NA        NA         NA        NA         NA
#> SRR389081         NA        NA        NA         NA        NA         NA
#> SRR389082         NA        NA        NA         NA        NA         NA
#> SRR389083         NA        NA        NA         NA        NA         NA
#> SRR389084         NA        NA        NA         NA        NA         NA
#>                 pmi pmi_units preparation present_in_recount      race
#>           <logical> <logical>   <logical>          <logical> <logical>
#> SRR387777        NA        NA          NA                 NA        NA
#> SRR387778        NA        NA          NA                 NA        NA
#> SRR387779        NA        NA          NA                 NA        NA
#> SRR387780        NA        NA          NA                 NA        NA
#> SRR389077        NA        NA          NA                 NA        NA
#> ...             ...       ...         ...                ...       ...
#> SRR389080        NA        NA          NA                 NA        NA
#> SRR389081        NA        NA          NA                 NA        NA
#> SRR389082        NA        NA          NA                 NA        NA
#> SRR389083        NA        NA          NA                 NA        NA
#> SRR389084        NA        NA          NA                 NA        NA
#>           releasedate_s       rin sample_name_s sample_origin       sex
#>               <logical> <logical>     <logical>     <logical> <logical>
#> SRR387777            NA        NA            NA            NA        NA
#> SRR387778            NA        NA            NA            NA        NA
#> SRR387779            NA        NA            NA            NA        NA
#> SRR387780            NA        NA            NA            NA        NA
#> SRR389077            NA        NA            NA            NA        NA
#> ...                 ...       ...           ...           ...       ...
#> SRR389080            NA        NA            NA            NA        NA
#> SRR389081            NA        NA            NA            NA        NA
#> SRR389082            NA        NA            NA            NA        NA
#> SRR389083            NA        NA            NA            NA        NA
#> SRR389084            NA        NA            NA            NA        NA
#>           sra_sample_s sra_study_s tissue_site_1 tissue_site_2 tissue_site_3
#>              <logical>   <logical>     <logical>     <logical>     <logical>
#> SRR387777           NA          NA            NA            NA            NA
#> SRR387778           NA          NA            NA            NA            NA
#> SRR387779           NA          NA            NA            NA            NA
#> SRR387780           NA          NA            NA            NA            NA
#> SRR389077           NA          NA            NA            NA            NA
#> ...                ...         ...           ...           ...           ...
#> SRR389080           NA          NA            NA            NA            NA
#> SRR389081           NA          NA            NA            NA            NA
#> SRR389082           NA          NA            NA            NA            NA
#> SRR389083           NA          NA            NA            NA            NA
#> SRR389084           NA          NA            NA            NA            NA
#>           tumor_type viability Study_full drugName_full drug_info_full
#>            <logical> <logical>  <logical>     <logical>      <logical>
#> SRR387777         NA        NA         NA            NA             NA
#> SRR387778         NA        NA         NA            NA             NA
#> SRR387779         NA        NA         NA            NA             NA
#> SRR387780         NA        NA         NA            NA             NA
#> SRR389077         NA        NA         NA            NA             NA
#> ...              ...       ...        ...           ...            ...
#> SRR389080         NA        NA         NA            NA             NA
#> SRR389081         NA        NA         NA            NA             NA
#> SRR389082         NA        NA         NA            NA             NA
#> SRR389083         NA        NA         NA            NA             NA
#> SRR389084         NA        NA         NA            NA             NA
#>           drug_type_full full_260_280 count_file_identifier   Dataset
#>                <logical>    <logical>             <logical> <logical>
#> SRR387777             NA           NA                    NA        NA
#> SRR387778             NA           NA                    NA        NA
#> SRR387779             NA           NA                    NA        NA
#> SRR387780             NA           NA                    NA        NA
#> SRR389077             NA           NA                    NA        NA
#> ...                  ...          ...                   ...       ...
#> SRR389080             NA           NA                    NA        NA
#> SRR389081             NA           NA                    NA        NA
#> SRR389082             NA           NA                    NA        NA
#> SRR389083             NA           NA                    NA        NA
#> SRR389084             NA           NA                    NA        NA
#>           brodmann_ontology brodmann_synonyms brodmann_parents
#>                   <logical>         <logical>        <logical>
#> SRR387777                NA                NA               NA
#> SRR387778                NA                NA               NA
#> SRR387779                NA                NA               NA
#> SRR387780                NA                NA               NA
#> SRR389077                NA                NA               NA
#> ...                     ...               ...              ...
#> SRR389080                NA                NA               NA
#> SRR389081                NA                NA               NA
#> SRR389082                NA                NA               NA
#> SRR389083                NA                NA               NA
#> SRR389084                NA                NA               NA
#>           brodmann_parents_label disease_ontology    tissue tissue_ontology
#>                        <logical>        <logical> <logical>       <logical>
#> SRR387777                     NA               NA        NA              NA
#> SRR387778                     NA               NA        NA              NA
#> SRR387779                     NA               NA        NA              NA
#> SRR387780                     NA               NA        NA              NA
#> SRR389077                     NA               NA        NA              NA
#> ...                          ...              ...       ...             ...
#> SRR389080                     NA               NA        NA              NA
#> SRR389081                     NA               NA        NA              NA
#> SRR389082                     NA               NA        NA              NA
#> SRR389083                     NA               NA        NA              NA
#> SRR389084                     NA               NA        NA              NA
#>           tissue_synonyms tissue_parents tissue_parents_label
#>                 <logical>      <logical>            <logical>
#> SRR387777              NA             NA                   NA
#> SRR387778              NA             NA                   NA
#> SRR387779              NA             NA                   NA
#> SRR387780              NA             NA                   NA
#> SRR389077              NA             NA                   NA
#> ...                   ...            ...                  ...
#> SRR389080              NA             NA                   NA
#> SRR389081              NA             NA                   NA
#> SRR389082              NA             NA                   NA
#> SRR389083              NA             NA                   NA
#> SRR389084              NA             NA                   NA

## For a list of studies present in recount_brain check
## http://lieberinstitute.github.io/recount-brain/.
## recount_brain_v2 includes GTEx and TCGA brain samples in addition to the
## recount_brain_v1 data, plus ontology information.


## Obtain all the recount_brain_v2 metadata if you want to
## explore the metadata manually
recount_brain_v2 <- add_metadata(source = "recount_brain_v2")
#> 2023-05-07 05:53:11.249438 downloading the recount_brain metadata to /tmp/RtmpDcoigj/recount_brain_v2.Rdata
#> Loading objects:
#>   recount_brain