This Vignette follows on from the Overview vignette and assumes that the user has already set up the SQLite database containing at least the CCLE data - this vignette won’t work from the toy database!
Connect to the database and generate SQLiteConnection and dplyr connection objects for convenience.
dbpath <- '~/BigData/CellLineData/CancerCellLines.db'
#dbpath <- system.file('extdata/toy.db', package="CancerCellLines")
full_con <- setupSQLite(dbpath)
dplyr_con <- src_sqlite(full_con@dbname)We are interested in looking at some important melanoma genes and compounds that act through them We can use the dplyr interface to easily populate a cell line vector with all of the melanoma cell lines.
#specify the genes
ex1_genes <- c('BRAF', 'NRAS', 'CRAF', 'TP53')
#get the melanoma cell lines
ex1_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% dplyr::filter(Site_primary=='skin') %>%
collect %>% as.data.frame
ex1_cell_lines <- ex1_cell_lines$CCLE_name
ex1_cell_lines[1:10]## [1] "A101D_SKIN" "A2058_SKIN" "A375_SKIN" "BJHTERT_SKIN"
## [5] "C32_SKIN" "CHL1_SKIN" "CJM_SKIN" "COLO679_SKIN"
## [9] "COLO741_SKIN" "COLO783_SKIN"
#get BRAF and MEK inhibitors
ex1_drugs <- c('AZD6244','PLX4720','PD-0325901')Next we can make data frames for the genes, drugs and cell lines that we’re interested int:
#make a tall frame
ex1_tall_df <- makeTallDataFrame(full_con, ex1_genes, ex1_cell_lines, ex1_drugs)## Warning in makeTallDataFrame(full_con, ex1_genes, ex1_cell_lines,
## ex1_drugs): No response data for following cell lines: A101D_SKIN,
## BJHTERT_SKIN, CJM_SKIN, COLO783_SKIN, COLO792_SKIN, COLO800_SKIN,
## COLO818_SKIN, COLO829_SKIN, COLO849_SKIN, GRM_SKIN, HS600T_SKIN,
## HS688AT_SKIN, HS834T_SKIN, HS839T_SKIN, HS934T_SKIN, HS940T_SKIN,
## IGR1_SKIN, MELJUSO_SKIN, SH4_SKIN, SKMEL1_SKIN, SKMEL28_SKIN, SKMEL3_SKIN
ex1_tall_df## Source: local data frame [680 x 5]
##
## CCLE_name ID Type original value
## (chr) (chr) (chr) (chr) (dbl)
## 1 A2058_SKIN BRAF affy 6.783971 6.783971
## 2 A375_SKIN BRAF affy 7.268306 7.268306
## 3 C32_SKIN BRAF affy 6.986622 6.986622
## 4 CHL1_SKIN BRAF affy 6.731626 6.731626
## 5 COLO679_SKIN BRAF affy 6.358811 6.358811
## 6 COLO741_SKIN BRAF affy 6.539123 6.539123
## 7 G361_SKIN BRAF affy 7.161463 7.161463
## 8 HMCB_SKIN BRAF affy 7.184167 7.184167
## 9 HS294T_SKIN BRAF affy 6.336898 6.336898
## 10 HS695T_SKIN BRAF affy 6.929732 6.929732
## .. ... ... ... ... ...
#convert this into a wide data frame
ex1_wide_df <- ex1_tall_df %>% makeWideFromTallDataFrame
ex1_wide_df## Source: local data frame [40 x 18]
##
## CCLE_name AZD6244_resp PD-0325901_resp PLX4720_resp BRAF_affy
## (chr) (dbl) (dbl) (dbl) (dbl)
## 1 A2058_SKIN 6.460232 7.457248 6.155758 6.783971
## 2 A375_SKIN 7.089206 5.233161 6.692943 7.268306
## 3 C32_SKIN 6.007525 7.084126 5.793843 6.986622
## 4 CHL1_SKIN NA 5.093517 5.202432 6.731626
## 5 COLO679_SKIN 7.143573 5.209147 6.346865 6.358811
## 6 COLO741_SKIN 5.089469 7.678747 5.086532 6.539123
## 7 G361_SKIN 6.493421 7.978689 6.034205 7.161463
## 8 HMCB_SKIN 5.051563 5.046749 5.515497 7.184167
## 9 HS294T_SKIN 6.465442 7.284739 5.632942 6.336898
## 10 HS695T_SKIN 6.285137 7.307927 5.544855 6.929732
## .. ... ... ... ... ...
## Variables not shown: BRAF_cn (dbl), BRAF_cosmicclp (dbl), BRAF_hybcap
## (dbl), CRAF_cosmicclp (dbl), CRAF_hybcap (dbl), NRAS_affy (dbl), NRAS_cn
## (dbl), NRAS_cosmicclp (dbl), NRAS_hybcap (dbl), TP53_affy (dbl), TP53_cn
## (dbl), TP53_cosmicclp (dbl), TP53_hybcap (dbl)
#compare the drug activities
pairs(~AZD6244_resp+PLX4720_resp+`PD-0325901_resp`, ex1_wide_df)Whilst the wide data frame is useful for modelling, it’s the tall data frame that is more useful for plotting since it’s in a tidy format (long and thin). Let’s make a heatmap using the built in plotHeatmap function:
#make a heatmap!
plotHeatmap(ex1_tall_df)Cell lines are plotted as rows and features as columns. The response data is always plotted to the left, with the most sensitive cell lines at the bottom in green, and the least sensitive at the top in red. Affy and copy number data is plotted from blue (low) to red (high) whilst mutation data is plotted as light colours for wild type and dark colours for mutant.
We also have some degree of control over the order of the x and y axes. For example, if we want the cell lines to be ordered on the response to PLX4720, we can specify this:
plotHeatmap(ex1_tall_df, order_feature='PLX4720_resp')## Using user specified feature to order cell lines
This time we are interested in how the expression and mutation status of EGFR interacts with the response to the EGFR inhibitor, erlotinib. Let’s dive right in using the makeRespVsGeneticDataFrame function to make a data frame suitable for the plotRespVsGeneticHist and plotRespVsGeneticScatter functions:
#get all cell lines
ex2_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>%
collect %>% as.data.frame
ex2_cell_lines <- ex2_cell_lines$CCLE_name
#make a data frame for the affy analysis
df <- makeRespVsGeneticDataFrame(full_con, gene='EGFR',
cell_lines=ex2_cell_lines,
drug='Erlotinib',
data_types = 'affy',
drug_df = NULL)
#scatter plot of EGFR expression vs Erlotinib response
plotRespVsGeneticHist(df, 'affy', FALSE)## Warning: Non Lab interpolation is deprecated
#histogram of Erlotinib response coloured by EGFR expression
plotRespVsGeneticPoint(df, 'affy', FALSE)## Warning: Non Lab interpolation is deprecated
## Warning: Removed 228 rows containing missing values (stat_smooth).
## Warning: Removed 228 rows containing missing values (geom_point).
Now let’s do a similar analysis with PLX4720 and BRAF mutation status:
#make a data frame for the affy analysis
df <- makeRespVsGeneticDataFrame(full_con, gene='BRAF',
cell_lines=ex2_cell_lines,
drug='PLX4720',
data_types = 'hybcap',
drug_df = NULL)
#scatter plot of EGFR expression vs Erlotinib response
plotRespVsGeneticHist(df, 'hybcap', FALSE) #histogram of Erlotinib response coloured by EGFR expression
plotRespVsGeneticPoint(df, 'hybcap', FALSE)## Warning: Removed 326 rows containing non-finite values (stat_boxplot).
## Warning: Removed 326 rows containing missing values (geom_point).
The GeneticVsGenetic suite of functions and plots allows genetic features to be compared against eachother, rather than against a response variable. For example, looking at SMARCA4 in lung cancer:
#get lung cell lines
ex4_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% filter(Site_primary == 'lung') %>%
collect %>% as.data.frame
ex4_cell_lines <- ex4_cell_lines$CCLE_name
#make the data frame
gvg.df <- makeGeneticVsGeneticDataFrame(full_con,
cell_lines=ex4_cell_lines,
gene1='SMARCA4',
data_type1='hybcap',
gene2='SMARCA4',
data_type2='affy')
#view the data frame
head(gvg.df)## Source: local data frame [6 x 12]
##
## CCLE_name gene1 feature_type1 feature_name1 feature_value1
## (chr) (chr) (chr) (chr) (dbl)
## 1 A549_LUNG SMARCA4 hybcap SMARCA4_hybcap 1
## 2 CAL12T_LUNG SMARCA4 hybcap SMARCA4_hybcap 1
## 3 DMS53_LUNG SMARCA4 hybcap SMARCA4_hybcap 1
## 4 DV90_LUNG SMARCA4 hybcap SMARCA4_hybcap 1
## 5 EPLC272H_LUNG SMARCA4 hybcap SMARCA4_hybcap 1
## 6 HCC1195_LUNG SMARCA4 hybcap SMARCA4_hybcap 1
## Variables not shown: feature_original1 (chr), gene2 (chr), feature_type2
## (chr), feature_name2 (chr), feature_value2 (dbl), feature_original2
## (chr), tissue (chr)
#do the plot
plotGeneticVsGeneticPoint(gvg.df) #all in one go with axes swapped
makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticPoint() #two continuous
makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticPoint()## Warning: Non Lab interpolation is deprecated
#two discrete
makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='hybcap',
gene2='KRAS', data_type2='hybcap') %>% plotGeneticVsGeneticPoint() #also plot by cell line with one feature a y axis and another as fill colour
#continous + discrete
makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticHist() #continous + continous
makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines[1:25], gene1='SMARCA4', data_type1='affy',
gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticHist(label_option = TRUE)## Warning: Non Lab interpolation is deprecated
There are also a series of shiny__ functions for interactive visualisations. The response data can be fed into the shinyRespVsGeneticApp function as follows:
dietlein_data_fn <- system.file("extdata", "Dietlein2014_supp_table_1.txt", package = "CancerCellLines")
dietlein_data <- read.table(dietlein_data_fn, header=T, sep='\t', stringsAsFactors=F)
head(dietlein_data)
dietlein_data <- dietlein_data %>%
filter(nchar(CCLE_name) > 1) %>%
transmute(unified_id=CCLE_name, compound_id='KU60648', endpoint='pGI50', original=GI50, value=9-log10(GI50))
head(dietlein_data)
full_con <- setupSQLite('~/BigData/CellLineData/CancerCellLines.db')
shinyRespVsGeneticApp(con=full_con, drug_df=dietlein_data)Alternatively, if a custom dataset isn’t defined CCLE will be used:
shinyRespVsGeneticApp(con=full_con)If you are just interested in the GeneticVsGenetic analysis functions then you can launch the shinyGeneticVsGenetic shiny app:
shinyGeneticVsGeneticApp(con=full_con) Text
#get all cell lines
ex5_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>%
collect %>% as.data.frame
ex5_cell_lines <- ex5_cell_lines$CCLE_name
#make a data frame
df <- makeRespVsRespDataFrame(full_con,
cell_lines=ex5_cell_lines,
drugs=c('Erlotinib', 'AZD6244'),
tissue_info = 'ccle')
head(df)## CCLE_name ID Type original value
## 1 1321N1_CENTRAL_NERVOUS_SYSTEM AZD6244 resp <NA> NA
## 2 22RV1_PROSTATE AZD6244 resp 6.374595234 5.195547
## 3 42MGBA_CENTRAL_NERVOUS_SYSTEM AZD6244 resp <NA> NA
## 4 5637_URINARY_TRACT AZD6244 resp 2.747059107 5.561132
## 5 639V_URINARY_TRACT AZD6244 resp 0.13161619 6.880691
## 6 697_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE AZD6244 resp <NA> NA
## native_id tissue subtype1
## 1 1321N1 central_nervous_system glioma
## 2 22Rv1 prostate carcinoma
## 3 42-MG-BA central_nervous_system glioma
## 4 5637 urinary_tract carcinoma
## 5 639-V urinary_tract carcinoma
## 6 697 haematopoietic_and_lymphoid_tissue lymphoid_neoplasm
## subtype2
## 1 astrocytoma
## 2 NS
## 3 astrocytoma_Grade_IV
## 4 NS
## 5 transitional_cell_carcinoma
## 6 acute_lymphoblastic_B_cell_leukaemia
#makes a wide data frame
wide.df <- df %>% makeWideFromRespVsRespDataFrame()
head(wide.df)## CCLE_name native_id
## 1 1321N1_CENTRAL_NERVOUS_SYSTEM 1321N1
## 2 22RV1_PROSTATE 22Rv1
## 3 42MGBA_CENTRAL_NERVOUS_SYSTEM 42-MG-BA
## 4 5637_URINARY_TRACT 5637
## 5 639V_URINARY_TRACT 639-V
## 6 697_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE 697
## tissue subtype1
## 1 central_nervous_system glioma
## 2 prostate carcinoma
## 3 central_nervous_system glioma
## 4 urinary_tract carcinoma
## 5 urinary_tract carcinoma
## 6 haematopoietic_and_lymphoid_tissue lymphoid_neoplasm
## subtype2 AZD6244 Erlotinib
## 1 astrocytoma NA 6.106130
## 2 NS 5.195547 NA
## 3 astrocytoma_Grade_IV NA 5.800518
## 4 NS 5.561132 5.079719
## 5 transitional_cell_carcinoma 6.880691 NA
## 6 acute_lymphoblastic_B_cell_leukaemia NA 5.068745
#now do some plots
plotRespVsRespWaterfall(filter(df, grepl('Erlotinib', ID)))## Loading required package: RColorBrewer
plotRespVsRespDensity(df)## Warning: Removed 243 rows containing non-finite values (stat_density).
## Warning: Removed 233 rows containing non-finite values (stat_density).
plotRespVsRespPairs(df)## Warning: Removed 334 rows containing missing values (stat_smooth).
## Warning: Removed 334 rows containing missing values (stat_smooth).
## Warning: Removed 334 rows containing missing values (geom_point).
## Warning: Removed 334 rows containing missing values (geom_point).
Also a shiny app:
shinyRespVsRespApp(con=full_con)To do:
- GeneticVsGenetic - RespVsResp
sessionInfo() ## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.9.5 (Mavericks)
##
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-2 CancerCellLines_0.6.6 reshape2_1.4.1
## [4] RSQLite_1.0.0 DBI_0.3.1 shiny_0.12.2
## [7] ggplot2_1.0.1 scales_0.3.0 tidyr_0.3.1
## [10] readr_0.2.2 readxl_0.1.0 dplyr_0.4.3
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.2 knitr_1.11 magrittr_1.5 MASS_7.3-45
## [5] munsell_0.4.2 xtable_1.8-0 colorspace_1.2-6 R6_2.1.1
## [9] stringr_1.0.0 plyr_1.8.3 tools_3.2.2 parallel_3.2.2
## [13] grid_3.2.2 gtable_0.1.2 htmltools_0.2.6 lazyeval_0.1.10
## [17] yaml_2.1.13 assertthat_0.1 digest_0.6.8 formatR_1.2.1
## [21] mime_0.4 evaluate_0.8 rmarkdown_0.8.1 labeling_0.3
## [25] stringi_1.0-1 httpuv_1.3.3 proto_0.3-10