This is an R Markdown document showing a toy pipeline for analyzing gene copy number for the CRC genes used in de Sousa e Melo et al., 2017. For more details on using cbioPortal data in R or MATLAB see http://www.cbioportal.org/cgds_r.jsp.
# Create CGDS object
require(cgdsr)
require(tidyverse)
mycgds = CGDS("http://www.cbioportal.org/public-portal/")
test(mycgds)
## getCancerStudies... OK
## getCaseLists (1/2) ... OK
## getCaseLists (2/2) ... OK
## getGeneticProfiles (1/2) ... OK
## getGeneticProfiles (2/2) ... OK
## getClinicalData (1/1) ... OK
## getProfileData (1/6) ... OK
## getProfileData (2/6) ... OK
## getProfileData (3/6) ... OK
## getProfileData (4/6) ... OK
## getProfileData (5/6) ... OK
## getProfileData (6/6) ... OK
# Get list of cancer studies at server
lookAtAllTheseStudies <- getCancerStudies(mycgds)
# Get available case lists (collection of samples) for a given cancer study
# Look through list returned by getCancerStudies(mycgds) and select study.
mycancerstudy <- getCancerStudies(mycgds)[37,1] # "coadread_tcga_pub"
mycaselist <- getCaseLists(mycgds,mycancerstudy)[1,1] # "coadread_tcga_pub_3way_complete"
# Get available genetic profiles and descriptions
mygeneticprofiles <- getGeneticProfiles(mycgds,mycancerstudy)
head(select(mygeneticprofiles, genetic_profile_id),5) #Genetic profile IDs, 5/11 shown
## genetic_profile_id
## 1 coadread_tcga_pub_gistic
## 2 coadread_tcga_pub_rna_seq_mrna
## 3 coadread_tcga_pub_rna_seq_mrna_median_Zscores
## 4 coadread_tcga_pub_mrna
## 5 coadread_tcga_pub_mrna_median_Zscores
head(select(mygeneticprofiles, genetic_profile_description),5) #Genetic profile descriptions, 5/11 shown
## genetic_profile_description
## 1 Putative copy-number calls from GISTIC 2.0. Values: -2 = homozygous deletion; -1 = hemizygous deletion; 0 = neutral / no change; 1 = gain; 2 = high level amplification.
## 2 Expression levels (RNA Seq RPKM).
## 3 mRNA z-Scores (RNA Seq RPKM) compared to the expression distribution of each gene tumors that are diploid for this gene.
## 4 Expression levels (Agilent microarray).
## 5 mRNA z-Scores (Agilent microarray) compared to the expression distribution of each gene tumors that are diploid for this gene.
# Make a list of genes of interest
AKPS_genes <- c('APC', 'TP53', 'SMAD4', 'KRAS')
# Get data slices for a specified list of genes, genetic profile and case list
# Here, genetic profile selected is copy number (GISTIC values)
crcCopyNumber <- getGeneticProfiles(mycgds,mycancerstudy)[1,1]
print(crcCopyNumber)
## [1] "coadread_tcga_pub_gistic"
crcCopyData = getProfileData(
mycgds,
AKPS_genes,
crcCopyNumber,
mycaselist
)
crcCopyData_tidy <- crcCopyData %>%
gather(AKPS_genes, key = "gene", value = "copy_number_GISTIC_value")
# documentation
help('cgdsr')
help('CGDS')
We see that APC and KRAS are largely diploid in CRC cohort (GISTIC_val = 0), while SMAD4 and TP53 shows copy number variation a little over 50% of the time. APC, SMAD4, and TP53 are het deleted, whereas KRAS is more often amplified (GISTIC = 1, suggesting low-level gains). These data make sense, given that APC, SMAD4, and TP53 are tumor suppressors; KRAS is an oncogene.