2019 5 24

The Cancer Genome Atlas (TCGA)

R and Rstudio

TCGA genomic data

  • Pan-Cancer Atlas
  • RTCGAToolbox package
  • cgdsr package
  • GDC Data Portal

Pan-Cancer Atlas

Pan-Cancer Atlas

Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images. (Cell Rep. 2018 Apr 3;23(1):181-193.e7)

Pan-Cancer Atlas

Til_map <- read.delim('TILMap_TableS1.txt')
library(dplyr)
library(ggplot2)
ggplot(Til_map, aes(x=Study, y=Leukocyte.Fraction)) +
         geom_boxplot()

RTCGAToolbox

RTCGAToolbox

RTCGAToolbox

library('RTCGAToolbox')
getFirehoseDatasets()[1:10]
##  [1] "ACC"      "BLCA"     "BRCA"     "CESC"     "CHOL"     "COADREAD"
##  [7] "COAD"     "DLBC"     "ESCA"     "FPPP"
getFirehoseRunningDates(last = NULL)[1:3]
## [1] "20160128" "20151101" "20150821"

Read data from Firehose

readData = getFirehoseData (dataset="BRCA", 
                            runDate="20160128",
                            forceDownload = TRUE,
                            Clinic=TRUE, 
                            RNASeq2GeneNorm=TRUE)

Get clinical data

clin <- getData(readData, "clinical")
clin[1:2,1:2]
##              Composite Element REF years_to_birth
## tcga.5l.aat0                 value             42
## tcga.5l.aat1                 value             63

Get mesengerRNA expresion data

mRNA <- t(getData(readData, 'RNASeq2GeneNorm'))
mRNA[1:2, 1:2]
##                                  A1BG A1CF
## TCGA-3C-AAAU-01A-11R-A41B-07 197.0897    0
## TCGA-3C-AALI-01A-11R-A41B-07 237.3844    0

cgdsr

cgdsr

library(cgdsr)
# Create CGDS object
mycgds = CGDS("https://www.cbioportal.org/")

# Get list of cancer studies at server
getCancerStudies(mycgds)[4:5,1]
## [1] "all_stjude_2016" "laml_tcga_pub"

GDC Data Portal

SessionInfo

R version 3.6.0 (2019-04-26)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: pander(v.0.6.3), cgdsr(v.1.2.10) and RTCGAToolbox(v.2.14.0)

loaded via a namespace (and not attached): Rcpp(v.1.0.1), compiler(v.3.6.0), GenomeInfoDb(v.1.20.0), XVector(v.0.24.0), R.methodsS3(v.1.7.1), bitops(v.1.0-6), tools(v.3.6.0), zlibbioc(v.1.30.0), digest(v.0.6.18), evaluate(v.0.13), lattice(v.0.20-38), Matrix(v.1.2-17), DelayedArray(v.0.10.0), yaml(v.2.2.0), parallel(v.3.6.0), xfun(v.0.6), GenomeInfoDbData(v.1.2.1), stringr(v.1.4.0), knitr(v.1.22), RCircos(v.1.2.1), S4Vectors(v.0.22.0), IRanges(v.2.18.0), stats4(v.3.6.0), grid(v.3.6.0), Biobase(v.2.44.0), data.table(v.1.12.2), XML(v.3.98-1.19), survival(v.2.44-1.1), BiocParallel(v.1.18.0), rmarkdown(v.1.12), RJSONIO(v.1.3-1.1), limma(v.3.40.0), magrittr(v.1.5), matrixStats(v.0.54.0), htmltools(v.0.3.6), BiocGenerics(v.0.30.0), GenomicRanges(v.1.36.0), splines(v.3.6.0), RaggedExperiment(v.1.8.0), SummarizedExperiment(v.1.14.0), stringi(v.1.4.3), RCurl(v.1.95-4.12) and R.oo(v.1.22.0)