R Markdown

This is an R Markdown document showing a toy pipeline for analyzing gene copy number for the CRC genes used in de Sousa e Melo et al., 2017. For more details on using cbioPortal data in R or MATLAB see http://www.cbioportal.org/cgds_r.jsp.

# Create CGDS object
require(cgdsr)
require(tidyverse)
mycgds = CGDS("http://www.cbioportal.org/public-portal/")

test(mycgds)
## getCancerStudies...  OK
## getCaseLists (1/2) ...  OK
## getCaseLists (2/2) ...  OK
## getGeneticProfiles (1/2) ...  OK
## getGeneticProfiles (2/2) ...  OK
## getClinicalData (1/1) ...  OK
## getProfileData (1/6) ...  OK
## getProfileData (2/6) ...  OK
## getProfileData (3/6) ...  OK
## getProfileData (4/6) ...  OK
## getProfileData (5/6) ...  OK
## getProfileData (6/6) ...  OK
# Get list of cancer studies at server
lookAtAllTheseStudies <- getCancerStudies(mycgds)

# Get available case lists (collection of samples) for a given cancer study
# Look through list returned by getCancerStudies(mycgds) and select study.

mycancerstudy <- getCancerStudies(mycgds)[37,1] # "coadread_tcga_pub"

mycaselist <- getCaseLists(mycgds,mycancerstudy)[1,1] # "coadread_tcga_pub_3way_complete"

# Get available genetic profiles and descriptions
mygeneticprofiles <- getGeneticProfiles(mycgds,mycancerstudy)
head(select(mygeneticprofiles, genetic_profile_id),5) #Genetic profile IDs, 5/11 shown
##                              genetic_profile_id
## 1                      coadread_tcga_pub_gistic
## 2                coadread_tcga_pub_rna_seq_mrna
## 3 coadread_tcga_pub_rna_seq_mrna_median_Zscores
## 4                        coadread_tcga_pub_mrna
## 5         coadread_tcga_pub_mrna_median_Zscores
head(select(mygeneticprofiles, genetic_profile_description),5) #Genetic profile descriptions, 5/11 shown
##                                                                                                                                                genetic_profile_description
## 1 Putative copy-number calls from GISTIC 2.0. Values: -2 = homozygous deletion; -1 = hemizygous deletion; 0 = neutral / no change; 1 = gain; 2 = high level amplification.
## 2                                                                                                                                        Expression levels (RNA Seq RPKM).
## 3                                                 mRNA z-Scores (RNA Seq RPKM) compared to the expression distribution of each gene tumors that are diploid for this gene.
## 4                                                                                                                                  Expression levels (Agilent microarray).
## 5                                           mRNA z-Scores (Agilent microarray) compared to the expression distribution of each gene tumors that are diploid for this gene.
# Make a list of genes of interest
AKPS_genes <- c('APC', 'TP53', 'SMAD4', 'KRAS')

# Get data slices for a specified list of genes, genetic profile and case list
#   Here, genetic profile selected is copy number (GISTIC values)
crcCopyNumber <- getGeneticProfiles(mycgds,mycancerstudy)[1,1]
print(crcCopyNumber)
## [1] "coadread_tcga_pub_gistic"
crcCopyData = getProfileData(
  mycgds,
  AKPS_genes,
  crcCopyNumber,
  mycaselist
)

crcCopyData_tidy <- crcCopyData %>%
  gather(AKPS_genes, key = "gene", value  = "copy_number_GISTIC_value")

# documentation
help('cgdsr')
help('CGDS')

Histogram plot of our data

We see that APC and KRAS are largely diploid in CRC cohort (GISTIC_val = 0), while SMAD4 and TP53 shows copy number variation a little over 50% of the time. APC, SMAD4, and TP53 are het deleted, whereas KRAS is more often amplified (GISTIC = 1, suggesting low-level gains). These data make sense, given that APC, SMAD4, and TP53 are tumor suppressors; KRAS is an oncogene.