Reference data overview - PCGR

core statistics - release 20240621 (grch38)

Author
Affiliations

Sigve Nakken, PhD, MSc

Published

June 21, 2024

Introduction

This report gives an overview of the data contents within the reference data bundle that comes with the Personal Cancer Genome Reporter (PCGR), an interpretation tool for genomic aberrations, aiming to provide clinical decision support for precision cancer medicine.

This report overview of the reference data gives the users of PCGR an ability to understand what the tool is able to report upon, and what kind of knowledge resources it uses (and does not use) for interpretation.

Currently, the PCGR reference data bundle contains integrated datasets that informs on the following properties with respect to molecular cancer medicine:

  • Basic human gene/transcript annotations - identifiers, official symbols, gene names etc.
  • Human cancer gene annotations - known tumor suppressor and proto-oncogenes
  • Cancer phenotypes - main sites/tissues of human cancers and associated subtypes
  • Targeted anti-cancer agents - small molecule inhibitors/antibodies, their molecular targets, and tumor types they are indicated for (approved or early/late clinical development)
  • Known somatic DNA mutations - found previously in tumor samples, relative frequencies across tumor types
  • Mutational hotspots - sites of significantly frequent somatic mutations in tumor samples
  • Known germline DNA variants - allelic frequencies across populations
  • Insilico variant effect predictions - assessment of damaging/tolerated effects of single nucleotide variants by multiple algorithms
  • Biomarkers - expression markers, fusions/translocations, and DNA aberrations that are associated with prognosis, diagnosis, or sensitivity/resistance to particular treatments
  • Protein domains
  • Mutational signatures - cancer type prevalence, associated aetiologies
  • Tumor gene expression patterns - cell lines, primary tumor samples, both early-onset (pediatric tumors), and adult tumors
Note

Note that this report is provided for a specific release of the data bundle, as outlined in the title banner.

Files, filesizes and MD5 checksums

  • The contents of the assembly-specific databundle is organized into seven main file folders. In the table below, one can explore the file types, files sizes, and MD5 checksums of each file within the various folders.


Gene and transcript data

Data resources (with versions and licenses):

  • GENCODE - Human gene transcripts - release 46 (Free/open access)
  • UniprotKB - UniProt identifiers and accessions with cross-references to Ensembl - release 2024_03 (CC BY 4.0)
  • Ensembl Biomart - API for retrieval of gene and transcript cross-references (MANE, RefSeq) - release 112 (EMBL-EBI terms of use)
  • APPRIS - Prinicipal transcript isoform annotation - release 2024-06-08 (Free/open access)

Numbers - genome level

Genes

63,085

Transcripts

254,060

Protein-coding genes

20,065

Protein-coding transcripts

171,410

Numbers - chromosome level

Chromosome Transcripts_total Transcripts_protein_coding
chr1 22690 15844
chr2 18925 11976
chr3 16017 10815
chr4 10390 6562
chr5 12038 7706
chr6 12004 7446
chr7 12264 7933
chr8 10299 6323
chr9 8801 5869
chr10 9170 5999
chr11 14941 11282
chr12 13772 10011
chr13 4726 2400
chr14 9237 6213
chr15 9134 5818
chr16 11797 8590
chr17 14739 11261
chr18 4789 2797
chr19 14453 11452
chr20 6073 3804
chr21 3302 1638
chr22 5341 3670
chrX 8107 5675
chrY 1014 313

Proto-oncogenes and tumor suppressor genes

Data resources (with versions and licenses):

Brief synopsis

  • Genes are annotated as proto-oncogenes or tumor suppressor genes if they are i) found in either of two curated resources: Cancer Gene Census (CGC Tier 1/2) or Network of Cancer Genes, or ii) predicted with the corresponding annotation in the CancerMine text mining resource. For oncogene/tumor suppressor candidates predicted exclusively by CancerMine, we require these to have support from at least 20 distinct publications in the literature.



Tumor suppressor genes

360

Proto-oncogenes

372



Cancer predisposition genes

Data resources (with versions and licenses):

Brief synopsis

  • Cancer predisposition genes that can be used for variant analysis and classification in CPSR are listed here, specifically virtual panel zero, the complete collection of predisposition genes. These have been collected from the Cancer Gene Census, genes in panels that target hereditary cancer conditions (Genomics England PanelApp), TCGA’s pancancer germline study, and curated (user-contributed) genes.

Variant data

Data resources (with versions and licenses):

  • TCGA - The Cancer Genome Atlas - release39_20231204 (Free/open access)
  • dbNSFP - A comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs - v4.5 (Free for non-commercial, academic use)
  • ClinVar - Public archive of reports of the relationships among human DNA variations and phenotypes - release 2024-06 (NCBI data usage policies)
  • dbMTS - A comprehensive database of putative human microRNA target site (MTS) SNVs and their functional predictions - v1.0 (Free/open access)
  • GWAS Catalog - The NHGRI-EBI Catalog of human genome-wide association studies - v20240520 (EMBL-EBI terms of use)
  • gnomAD - Genome Aggregation Database - non-cancer subset - v2.1.1 (CC0 1.0)

Brief synopsis

  • Variant datasets are used for interrogation of somatic variant frequency across tissues (TCGA), germline variant population frequencies (gnomAD) and clinical significance (ClinVar), or insilico assessment of variant functional effect (dbNSFP/dbMTS)


Variant numbers - total


ClinVar

2,888,805

dbNSFP

95,599,053

dbMTS

6,391,279

TCGA

2,206,832

GWAS catalogue

6,014

gnomAD - non-cancer subset (cancer genes)

1,018,737



Variant numbers - chromosome level

Chromosome nAll nSNV nDeletion nInsertion nMNV
chr1 228,743 219,383 6,844 2,502 14
chr2 164,612 157,849 4,932 1,823 8
chr3 126,413 121,005 4,032 1,375 1
chr4 89,137 85,331 2,755 1,047 4
chr5 112,300 107,641 3,394 1,264 1
chr6 113,358 108,250 3,765 1,339 4
chr7 110,456 106,056 3,273 1,123 4
chr8 81,997 78,640 2,447 907 3
chr9 79,539 76,139 2,545 854 1
chr10 84,662 80,787 2,834 1,037 4
chr11 133,211 127,748 4,094 1,366 3
chr12 116,910 111,751 3,790 1,368 1
chr13 40,496 38,620 1,366 509 1
chr14 69,522 66,406 2,283 825 8
chr15 67,742 64,938 2,083 717 4
chr16 79,146 75,577 2,734 833 2
chr17 111,923 106,384 4,166 1,369 4
chr18 35,105 33,616 1,111 377 1
chr19 143,465 137,999 4,184 1,279 3
chr20 55,221 53,092 1,575 553 1
chr21 21,575 20,645 693 236 1
chr22 39,900 38,218 1,252 427 3
chrX 100,706 97,111 2,574 1,020 1
chrY 693 663 24 6 0
Chromosome nAll nSNV nDeletion nInsertion nMNV
chr1 255,181 237,808 11,068 5,558 747
chr2 270,552 246,763 15,128 7,806 855
chr3 155,659 143,793 7,470 3,945 451
chr4 100,119 93,069 4,486 2,262 302
chr5 144,877 132,557 7,796 4,035 489
chr6 129,301 118,605 6,788 3,515 393
chr7 139,990 129,402 6,657 3,479 452
chr8 99,186 91,854 4,677 2,320 335
chr9 128,555 118,561 6,088 3,512 394
chr10 107,160 98,709 5,199 2,895 357
chr11 171,694 157,817 8,917 4,412 548
chr12 128,909 118,870 6,409 3,234 396
chr13 60,526 52,284 5,424 2,593 225
chr14 92,124 85,578 4,180 2,125 241
chr15 102,834 94,225 5,441 2,827 341
chr16 152,353 140,578 7,530 3,693 552
chr17 179,217 160,391 12,131 5,979 716
chr18 50,565 46,690 2,347 1,390 138
chr19 155,892 146,233 5,874 3,271 514
chr20 59,137 55,302 2,360 1,302 173
chr21 33,962 31,216 1,690 949 107
chr22 62,901 57,990 3,107 1,580 224
chrX 108,016 96,259 7,783 3,708 266
chrY 95 83 8 4 0
Chromosome nAll nSNV nDeletion nInsertion nMNV
chr1 9,645,829 9,645,829 0 0 0
chr2 7,083,654 7,083,654 0 0 0
chr3 5,540,153 5,540,153 0 0 0
chr4 3,815,852 3,815,852 0 0 0
chr5 4,425,867 4,425,867 0 0 0
chr6 4,836,504 4,836,504 0 0 0
chr7 4,489,261 4,489,261 0 0 0
chr8 3,244,915 3,244,915 0 0 0
chr9 3,838,098 3,838,098 0 0 0
chr10 3,726,476 3,726,476 0 0 0
chr11 5,586,754 5,586,754 0 0 0
chr12 5,052,139 5,052,139 0 0 0
chr13 1,740,027 1,740,027 0 0 0
chr14 3,011,585 3,011,585 0 0 0
chr15 3,351,238 3,351,238 0 0 0
chr16 4,040,167 4,040,167 0 0 0
chr17 5,501,629 5,501,629 0 0 0
chr18 1,506,525 1,506,525 0 0 0
chr19 6,221,200 6,221,200 0 0 0
chr20 2,252,415 2,252,415 0 0 0
chr21 957,673 957,673 0 0 0
chr22 1,992,402 1,992,402 0 0 0
chrX 3,548,544 3,548,544 0 0 0
chrY 190,146 190,146 0 0 0
Chromosome nAll nSNV nDeletion nInsertion nMNV
chr1 720,033 720,033 0 0 0
chr2 519,030 519,030 0 0 0
chr3 430,103 430,103 0 0 0
chr4 344,379 344,379 0 0 0
chr5 393,020 393,020 0 0 0
chr6 396,215 396,215 0 0 0
chr7 340,217 340,217 0 0 0
chr8 269,169 269,169 0 0 0
chr9 266,564 266,564 0 0 0
chr10 298,408 298,408 0 0 0
chr11 325,250 325,250 0 0 0
chr12 445,379 445,379 0 0 0
chr13 169,486 169,486 0 0 0
chr14 266,405 266,405 0 0 0
chr15 233,405 233,405 0 0 0
chr16 158,793 158,793 0 0 0
chr17 208,062 208,062 0 0 0
chr18 105,918 105,918 0 0 0
chr19 222,410 222,410 0 0 0
chr20 105,773 105,773 0 0 0
chr21 48,398 48,398 0 0 0
chr22 80,370 80,370 0 0 0
chrX 44,492 44,492 0 0 0
chrY 0 0 0 0 0
Chromosome nAll nSNV nDeletion nInsertion nMNV
chr1 384 384 0 0 0
chr2 516 516 0 0 0
chr3 361 361 0 0 0
chr4 184 184 0 0 0
chr5 361 361 0 0 0
chr6 657 657 0 0 0
chr7 208 208 0 0 0
chr8 402 402 0 0 0
chr9 296 296 0 0 0
chr10 324 324 0 0 0
chr11 332 332 0 0 0
chr12 289 289 0 0 0
chr13 126 126 0 0 0
chr14 148 148 0 0 0
chr15 193 193 0 0 0
chr16 240 240 0 0 0
chr17 211 211 0 0 0
chr18 111 111 0 0 0
chr19 187 187 0 0 0
chr20 229 229 0 0 0
chr21 68 68 0 0 0
chr22 140 140 0 0 0
chrX 47 47 0 0 0
chrY 0 0 0 0 0
Chromosome nAll nSNV nDeletion nInsertion nMNV
chr1 89,336 84,150 3,507 1,679 0
chr2 99,140 93,534 3,728 1,878 0
chr3 60,021 56,308 2,525 1,188 0
chr4 35,279 33,132 1,468 679 0
chr5 43,882 41,049 1,972 861 0
chr6 53,169 50,133 2,050 986 0
chr7 45,296 42,721 1,731 844 0
chr8 38,065 35,797 1,545 723 0
chr9 43,388 40,897 1,600 891 0
chr10 40,315 37,890 1,640 785 0
chr11 72,689 68,637 2,715 1,337 0
chr12 43,338 40,566 1,847 925 0
chr13 21,387 19,860 1,047 480 0
chr14 27,961 26,150 1,235 576 0
chr15 35,133 32,863 1,514 756 0
chr16 56,043 53,002 2,016 1,025 0
chr17 67,694 63,340 3,000 1,354 0
chr18 11,529 10,882 455 192 0
chr19 70,563 65,921 3,004 1,638 0
chr20 15,335 14,308 703 324 0
chr21 3,918 3,663 161 94 0
chr22 20,991 19,649 881 461 0
chrX 24,222 22,903 892 427 0
chrY 43 43 0 0 0

Drug data

Data resources (with versions and licenses):


Brief synopsis

  • A repository of targeted cancer drugs organized according to primary tumor sites have been established through the pharmOncoX R package, which combines drug data from Open Targets Platform, NCI Thesaurus and DGIdb with cancer type classifications from phenOncoX.

  • Compounds (and associated targets) are listed below for the various tumor types, where compounds highlighted in green are those approved or in late clinical development

Note

Although we try to make drug listings as accurate as possible, drug misclassifications are likely to occur, and entries may be missing.

Note

While the focus here is on molecularly targeted cancer compounds, note that the data bundle is also shipped with a more complete set of drugs not shown here (chemotherapy drugs, drugs primarily indicated for other diseases etc.).

Targeted compounds

934

Drug targets

484


Targeted agents per tumor type


Biomarker data

Data resources (with versions and licenses):


Brief synopsis

  • Biomarkers in PCGR are retrieved both from CIViC and CGI.

  • The biomarker data in PCGR is structured largely according to the CIViC knowledge model, in which

    • A particular genomic aberration (e.g. BRAF V600E) is associated with one or more clinical evidence items, which typically denotes a relationship between the variant and a therapeutic response (or prognosis/diagnosis) in a defined disease/cancer type context. The type of relationship for a given evidence item is known as the evidence type, and the strength of the evidence is assigned distinct evidence levels (evidence level)
Note

Given the disparate formatting notations of the CIViC and CGI resources, an attempt to merge their contents into an integrated biomarker source is not yet complete. Hence, aggregated numbers presented in the value boxes below may not be fully accurate due to overlapping entries among the two resources.

Note

Why are the numbers presented here different than the ones that can be seen on e.g. civicdb.org? Numbers presented here reflect accepted evidence, which has been subject to multiple post-processing and quality-control checks.


Biomarker genes - somatic

398

Biomarker variants - somatic

1499

Evidence items - somatic

3659

Biomarker genes - germline

64

Biomarker variants - germline

353

Evidence items - germline

830


Statistics - somatic biomarkers


Evidence items

Phenotype/disease data

Data resources (with versions and licenses):

Brief synopsis:

  • In order to cross-reference different knowledge resources that utilize different nomenclature for disease/cancer types, we have built an integrated resource (phenOncoX) that organizes cross-referenced phenotype terms across the major tumor types for different ontologies (OncoTree, Disease Ontology, EFO, ICD-10, MeSH).
  • In PCGR, tumor types are organized according to 32 primary sites/tissues, and these are populated with specific and cross-referenced phenotype terms that typically denote distinct subtypes of a major cancer type.

Other

Mutational hotspots

Data resources (with versions and licenses):


Hotspot genes

240

Amino acid hotspot variants

3311

Splice site hotspot variants

118

Mutational signatures

Data resources (with versions and licenses):


Brief synopsis

  • For each mutational signature in COSMIC (v3.4, SBS only), we have collected data on which tumor types the signatures have been observed in (signature attribution). This information is utilized to limit search space when signature re-fitting is performed for individual samples in PCGR.


Protein domains

  • PFAM/InterPro - Functional analysis of protein sequences and prediction of families and protein domains - v (CC0 1.0)

Expression data

Data resources (with versions and licenses):

  • DepMap - The Cancer Dependency Map - release 23Q4 (Free/open access)
  • TCGA - The Cancer Genome Atlas - release39_20231204 (Free/open access)
  • TreeHouse - The Treehouse Childhood Cancer Data Initiative - v11_2020 (Free/open access)

Brief synopsis:

  • Reference data on on gene expression across tumor types is currently collected from DepMap (cancer cell lines), TCGA (adult, primary tumor samples), and the TreeHouse Childhood Cancer Data Initiative (primarily pediatric tumor samples)
  • Data from the three reference sources (TCGA, DepMap, TreeHouse) have been harmonized with respect to expression measures (TPM), gene annotation, and sample metadata. An overview of all samples are listed below.
TreeHouse data

Note that data on many of the samples publicly listed for the TreeHouse dataset has been excluded here due to either i) incomplete sample metadata, or ii) sample metadata that indicates adult-onset (> 30 yrs at age of diagnosis) rather than early-onset cancer. Only one sample per case has been included.

Sample metadata



References

Alexandrov, Ludmil B, Jaegil Kim, Nicholas J Haradhvala, Mi Ni Huang, Alvin Wei Tian Ng, Yang Wu, Arnoud Boot, et al. 2020. “The Repertoire of Mutational Signatures in Human Cancer.” Nature 578 (7793): 94–101. http://dx.doi.org/10.1038/s41586-020-1943-3.
Apweiler, Rolf, Amos Bairoch, Cathy H Wu, Winona C Barker, Brigitte Boeckmann, Serenella Ferro, Elisabeth Gasteiger, et al. 2004. UniProt: The Universal Protein Knowledgebase.” Nucleic Acids Res. 32 (Database issue): D115–9. http://dx.doi.org/10.1093/nar/gkh131.
Buniello, Annalisa, Jacqueline A L MacArthur, Maria Cerezo, Laura W Harris, James Hayhurst, Cinzia Malangone, Aoife McMahon, et al. 2019. “The NHGRI-EBI GWAS Catalog of Published Genome-Wide Association Studies, Targeted Arrays and Summary Statistics 2019.” Nucleic Acids Res. 47 (D1): D1005–12. http://dx.doi.org/10.1093/nar/gky1120.
Dienstmann, Rodrigo, In Sock Jang, Brian Bot, Stephen Friend, and Justin Guinney. 2015. “Database of Genomic Biomarkers for Cancer Drugs and Clinical Targetability in Solid Tumors.” Cancer Discov. 5 (2): 118–23. http://dx.doi.org/10.1158/2159-8290.CD-14-1118.
Durinck, Steffen, Paul T Spellman, Ewan Birney, and Wolfgang Huber. 2009. “Mapping Identifiers for the Integration of Genomic Datasets with the R/Bioconductor Package biomaRt.” Nat. Protoc. 4 (8): 1184–91. http://dx.doi.org/10.1038/nprot.2009.97.
Frankish, Adam, Mark Diekhans, Irwin Jungreis, Julien Lagarde, Jane E Loveland, Jonathan M Mudge, Cristina Sisu, et al. 2021. GENCODE 2021.” Nucleic Acids Res. 49 (D1): D916–23. http://dx.doi.org/10.1093/nar/gkaa1087.
Griffith, Malachi, Obi L Griffith, Adam C Coffman, James Weible V, Josh F McMichael, Nicholas C Spies, James Koval, et al. 2013. DGIdb: Mining the Druggable Genome.” Nat. Methods 10 (12): 1209–10. http://dx.doi.org/10.1038/nmeth.2689.
Griffith, Malachi, Nicholas C Spies, Kilannin Krysiak, Joshua F McMichael, Adam C Coffman, Arpad M Danos, Benjamin J Ainscough, et al. 2017. CIViC Is a Community Knowledgebase for Expert Crowdsourcing the Clinical Interpretation of Variants in Cancer.” Nat. Genet. 49 (2): 170–74. http://dx.doi.org/10.1038/ng.3774.
Kanehisa, M, and S Goto. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes.” Nucleic Acids Res. 28 (1): 27–30. http://dx.doi.org/10.1093/nar/28.1.27.
Kundra, Ritika, Hongxin Zhang, Robert Sheridan, Sahussapont Joseph Sirintrapun, Avery Wang, Angelica Ochoa, Manda Wilson, et al. 2021. OncoTree: A Cancer Classification System for Precision Oncology.” JCO Clin Cancer Inform 5 (February): 221–30. http://dx.doi.org/10.1200/CCI.20.00108.
Landrum, Melissa J, Jennifer M Lee, George R Riley, Wonhee Jang, Wendy S Rubinstein, Deanna M Church, and Donna R Maglott. 2014. ClinVar: Public Archive of Relationships Among Sequence Variation and Human Phenotype.” Nucleic Acids Res. 42 (Database issue): D980–5. http://dx.doi.org/10.1093/nar/gkt1113.
Lever, Jake, Eric Y Zhao, Jasleen Grewal, Martin R Jones, and Steven J M Jones. 2019. CancerMine: A Literature-Mined Resource for Drivers, Oncogenes and Tumor Suppressors in Cancer.” Nat. Methods 16 (6): 505–7. http://dx.doi.org/10.1038/s41592-019-0422-y.
Li, Chang, Chengcheng Mou, Michael D Swartz, Bing Yu, Yongsheng Bai, Yicheng Tu, and Xiaoming Liu. 2020. dbMTS: A Comprehensive Database of Putative Human microRNA Target Site SNVs and Their Functional Predictions.” Hum. Mutat. 41 (6): 1123–30. http://dx.doi.org/10.1002/humu.24020.
Liu, Xiaoming, Chang Li, Chengcheng Mou, Yibo Dong, and Yicheng Tu. 2020. dbNSFP V4: A Comprehensive Database of Transcript-Specific Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs.” Genome Med. 12 (1): 103. http://dx.doi.org/10.1186/s13073-020-00803-9.
Malone, James, Ele Holloway, Tomasz Adamusiak, Misha Kapushesky, Jie Zheng, Nikolay Kolesnikov, Anna Zhukova, Alvis Brazma, and Helen Parkinson. 2010. “Modeling Sample Variables with an Experimental Factor Ontology.” Bioinformatics 26 (8): 1112–18. http://dx.doi.org/10.1093/bioinformatics/btq099.
McLaren, William, Laurent Gil, Sarah E Hunt, Harpreet Singh Riat, Graham R S Ritchie, Anja Thormann, Paul Flicek, and Fiona Cunningham. 2016. “The Ensembl Variant Effect Predictor.” Genome Biol. 17 (1): 122. http://dx.doi.org/10.1186/s13059-016-0974-4.
Ochoa, David, Andrew Hercules, Miguel Carmona, Daniel Suveges, Asier Gonzalez-Uriarte, Cinzia Malangone, Alfredo Miranda, et al. 2021. “Open Targets Platform: Supporting Systematic Drug-Target Identification and Prioritisation.” Nucleic Acids Res. 49 (D1): D1302–10. http://dx.doi.org/10.1093/nar/gkaa1027.
Repana, Dimitra, Joel Nulsen, Lisa Dressler, Michele Bortolomeazzi, Santhilata Kuppili Venkata, Aikaterini Tourna, Anna Yakovleva, Tommaso Palmieri, and Francesca D Ciccarelli. 2019. “The Network of Cancer Genes (NCG): A Comprehensive Catalogue of Known and Candidate Cancer Genes from Cancer Sequencing Screens.” Genome Biol. 20 (1): 1. http://dx.doi.org/10.1186/s13059-018-1612-0.
Schriml, Lynn Marie, Cesar Arze, Suvarna Nadendla, Yu-Wei Wayne Chang, Mark Mazaitis, Victor Felix, Gang Feng, and Warren Alden Kibbe. 2012. “Disease Ontology: A Backbone for Disease Semantic Integration.” Nucleic Acids Res. 40 (Database issue): D940–6. http://dx.doi.org/10.1093/nar/gkr972.
Sioutos, Nicholas, Sherri de Coronado, Margaret W Haber, Frank W Hartel, Wen-Ling Shaiu, and Lawrence W Wright. 2007. NCI Thesaurus: A Semantic Model Integrating Cancer-Related Clinical and Molecular Information.” J. Biomed. Inform. 40 (1): 30–43. http://dx.doi.org/10.1016/j.jbi.2006.02.013.
Sondka, Zbyslaw, Sally Bamford, Charlotte G Cole, Sari A Ward, Ian Dunham, and Simon A Forbes. 2018. “The COSMIC Cancer Gene Census: Describing Genetic Dysfunction Across All Human Cancers.” Nat. Rev. Cancer 18 (11): 696–705. http://dx.doi.org/10.1038/s41568-018-0060-1.
Tamborero, David, Carlota Rubio-Perez, Jordi Deu-Pons, Michael P Schroeder, Ana Vivancos, Ana Rovira, Ignasi Tusquets, et al. 2018. “Cancer Genome Interpreter Annotates the Biological and Clinical Relevance of Tumor Alterations.” Genome Med. 10 (1): 25. http://dx.doi.org/10.1186/s13073-018-0531-8.
The Cancer Genome Atlas Research Network, John N Weinstein, Eric A Collisson, Gordon B Mills, Kenna R Mills Shaw, Brad A Ozenberger, Kyle Ellrott, Ilya Shmulevich, Chris Sander, and Joshua M Stuart. 2013. “The Cancer Genome Atlas Pan-Cancer Analysis Project.” Nat. Genet. 45 (10): 1113–20. http://dx.doi.org/10.1038/ng.2764.

Author contributions

SN: Developed supporting R packages (pharmOncoX, phenOncoX, geneOncoX), created software to automatically build and update the reference datasets, wrote up the report.