Introduction

This Vignette follows on from the Overview vignette and assumes that the user has already set up the SQLite database containing at least the CCLE data - this vignette won’t work from the toy database!

Setup

Connect to the database and generate SQLiteConnection and dplyr connection objects for convenience.

dbpath <- '~/BigData/CellLineData/CancerCellLines.db'
#dbpath <- system.file('extdata/toy.db', package="CancerCellLines")
full_con <- setupSQLite(dbpath)
dplyr_con <- src_sqlite(full_con@dbname)

Example 1: Melanoma heatmap with MEK and BRAF inhibitors

We are interested in looking at some important melanoma genes and compounds that act through them We can use the dplyr interface to easily populate a cell line vector with all of the melanoma cell lines.

    #specify the genes
    ex1_genes <- c('BRAF', 'NRAS', 'CRAF', 'TP53')
  
    #get the melanoma cell lines
    ex1_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% dplyr::filter(Site_primary=='skin') %>%
       collect %>% as.data.frame
    ex1_cell_lines <- ex1_cell_lines$CCLE_name
    ex1_cell_lines[1:10]
##  [1] "A101D_SKIN"   "A2058_SKIN"   "A375_SKIN"    "BJHTERT_SKIN"
##  [5] "C32_SKIN"     "CHL1_SKIN"    "CJM_SKIN"     "COLO679_SKIN"
##  [9] "COLO741_SKIN" "COLO783_SKIN"
    #get BRAF and MEK inhibitors
    ex1_drugs <- c('AZD6244','PLX4720','PD-0325901')

Next we can make data frames for the genes, drugs and cell lines that we’re interested int:

    #make a tall frame
    ex1_tall_df <- makeTallDataFrame(full_con, ex1_genes, ex1_cell_lines, ex1_drugs)
## Warning in makeTallDataFrame(full_con, ex1_genes, ex1_cell_lines,
## ex1_drugs): No response data for following cell lines: A101D_SKIN,
## BJHTERT_SKIN, CJM_SKIN, COLO783_SKIN, COLO792_SKIN, COLO800_SKIN,
## COLO818_SKIN, COLO829_SKIN, COLO849_SKIN, GRM_SKIN, HS600T_SKIN,
## HS688AT_SKIN, HS834T_SKIN, HS839T_SKIN, HS934T_SKIN, HS940T_SKIN,
## IGR1_SKIN, MELJUSO_SKIN, SH4_SKIN, SKMEL1_SKIN, SKMEL28_SKIN, SKMEL3_SKIN
    ex1_tall_df
## Source: local data frame [680 x 5]
## 
##       CCLE_name    ID  Type original    value
##           (chr) (chr) (chr)    (chr)    (dbl)
## 1    A2058_SKIN  BRAF  affy 6.783971 6.783971
## 2     A375_SKIN  BRAF  affy 7.268306 7.268306
## 3      C32_SKIN  BRAF  affy 6.986622 6.986622
## 4     CHL1_SKIN  BRAF  affy 6.731626 6.731626
## 5  COLO679_SKIN  BRAF  affy 6.358811 6.358811
## 6  COLO741_SKIN  BRAF  affy 6.539123 6.539123
## 7     G361_SKIN  BRAF  affy 7.161463 7.161463
## 8     HMCB_SKIN  BRAF  affy 7.184167 7.184167
## 9   HS294T_SKIN  BRAF  affy 6.336898 6.336898
## 10  HS695T_SKIN  BRAF  affy 6.929732 6.929732
## ..          ...   ...   ...      ...      ...
    #convert this into a wide data frame
    ex1_wide_df <- ex1_tall_df %>% makeWideFromTallDataFrame
    ex1_wide_df
## Source: local data frame [40 x 18]
## 
##       CCLE_name AZD6244_resp PD-0325901_resp PLX4720_resp BRAF_affy
##           (chr)        (dbl)           (dbl)        (dbl)     (dbl)
## 1    A2058_SKIN     6.460232        7.457248     6.155758  6.783971
## 2     A375_SKIN     7.089206        5.233161     6.692943  7.268306
## 3      C32_SKIN     6.007525        7.084126     5.793843  6.986622
## 4     CHL1_SKIN           NA        5.093517     5.202432  6.731626
## 5  COLO679_SKIN     7.143573        5.209147     6.346865  6.358811
## 6  COLO741_SKIN     5.089469        7.678747     5.086532  6.539123
## 7     G361_SKIN     6.493421        7.978689     6.034205  7.161463
## 8     HMCB_SKIN     5.051563        5.046749     5.515497  7.184167
## 9   HS294T_SKIN     6.465442        7.284739     5.632942  6.336898
## 10  HS695T_SKIN     6.285137        7.307927     5.544855  6.929732
## ..          ...          ...             ...          ...       ...
## Variables not shown: BRAF_cn (dbl), BRAF_cosmicclp (dbl), BRAF_hybcap
##   (dbl), CRAF_cosmicclp (dbl), CRAF_hybcap (dbl), NRAS_affy (dbl), NRAS_cn
##   (dbl), NRAS_cosmicclp (dbl), NRAS_hybcap (dbl), TP53_affy (dbl), TP53_cn
##   (dbl), TP53_cosmicclp (dbl), TP53_hybcap (dbl)
    #compare the drug activities
    pairs(~AZD6244_resp+PLX4720_resp+`PD-0325901_resp`, ex1_wide_df)

Whilst the wide data frame is useful for modelling, it’s the tall data frame that is more useful for plotting since it’s in a tidy format (long and thin). Let’s make a heatmap using the built in plotHeatmap function:

    #make a heatmap!
    plotHeatmap(ex1_tall_df)

Cell lines are plotted as rows and features as columns. The response data is always plotted to the left, with the most sensitive cell lines at the bottom in green, and the least sensitive at the top in red. Affy and copy number data is plotted from blue (low) to red (high) whilst mutation data is plotted as light colours for wild type and dark colours for mutant.

We also have some degree of control over the order of the x and y axes. For example, if we want the cell lines to be ordered on the response to PLX4720, we can specify this:

    plotHeatmap(ex1_tall_df, order_feature='PLX4720_resp')
## Using user specified feature to order cell lines

Example 2: EGFR inhibitors vs EGFR mutation status or expression

This time we are interested in how the expression and mutation status of EGFR interacts with the response to the EGFR inhibitor, erlotinib. Let’s dive right in using the makeRespVsGeneticDataFrame function to make a data frame suitable for the plotRespVsGeneticHist and plotRespVsGeneticScatter functions:

    #get all cell lines
    ex2_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% 
       collect %>% as.data.frame
    ex2_cell_lines <- ex2_cell_lines$CCLE_name
    
    #make a data frame for the affy analysis
    df <- makeRespVsGeneticDataFrame(full_con, gene='EGFR',
                               cell_lines=ex2_cell_lines,
                               drug='Erlotinib',
                               data_types = 'affy',
                               drug_df = NULL) 
    
    #scatter plot of EGFR expression vs Erlotinib response
    plotRespVsGeneticHist(df, 'affy', FALSE)
## Warning: Non Lab interpolation is deprecated

    #histogram of Erlotinib response coloured by EGFR expression
    plotRespVsGeneticPoint(df, 'affy', FALSE)
## Warning: Non Lab interpolation is deprecated
## Warning: Removed 228 rows containing missing values (stat_smooth).
## Warning: Removed 228 rows containing missing values (geom_point).

Example 3: BRAF inhibitors vs BRAF mutation status

Now let’s do a similar analysis with PLX4720 and BRAF mutation status:

    #make a data frame for the affy analysis
    df <- makeRespVsGeneticDataFrame(full_con, gene='BRAF',
                               cell_lines=ex2_cell_lines,
                               drug='PLX4720',
                               data_types = 'hybcap',
                               drug_df = NULL) 
    
    #scatter plot of EGFR expression vs Erlotinib response
    plotRespVsGeneticHist(df, 'hybcap', FALSE)

    #histogram of Erlotinib response coloured by EGFR expression
    plotRespVsGeneticPoint(df, 'hybcap', FALSE)
## Warning: Removed 326 rows containing non-finite values (stat_boxplot).
## Warning: Removed 326 rows containing missing values (geom_point).

Example 4: Comparing SMARCA4 expression in SMARCA4 mutated cell lines to wildtype

The GeneticVsGenetic suite of functions and plots allows genetic features to be compared against eachother, rather than against a response variable. For example, looking at SMARCA4 in lung cancer:

    #get lung cell lines
    ex4_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% filter(Site_primary == 'lung') %>%
       collect %>% as.data.frame
    ex4_cell_lines <- ex4_cell_lines$CCLE_name

    #make the data frame
    gvg.df <- makeGeneticVsGeneticDataFrame(full_con, 
                                            cell_lines=ex4_cell_lines,
                                            gene1='SMARCA4',
                                            data_type1='hybcap',
                                            gene2='SMARCA4',
                                            data_type2='affy') 
    
    #view the data frame
    head(gvg.df)
## Source: local data frame [6 x 12]
## 
##       CCLE_name   gene1 feature_type1  feature_name1 feature_value1
##           (chr)   (chr)         (chr)          (chr)          (dbl)
## 1     A549_LUNG SMARCA4        hybcap SMARCA4_hybcap              1
## 2   CAL12T_LUNG SMARCA4        hybcap SMARCA4_hybcap              1
## 3    DMS53_LUNG SMARCA4        hybcap SMARCA4_hybcap              1
## 4     DV90_LUNG SMARCA4        hybcap SMARCA4_hybcap              1
## 5 EPLC272H_LUNG SMARCA4        hybcap SMARCA4_hybcap              1
## 6  HCC1195_LUNG SMARCA4        hybcap SMARCA4_hybcap              1
## Variables not shown: feature_original1 (chr), gene2 (chr), feature_type2
##   (chr), feature_name2 (chr), feature_value2 (dbl), feature_original2
##   (chr), tissue (chr)
    #do the plot
    plotGeneticVsGeneticPoint(gvg.df)

    #all in one go with axes swapped
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticPoint()

    #two continuous
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticPoint()
## Warning: Non Lab interpolation is deprecated

    #two discrete
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='hybcap',
                                            gene2='KRAS', data_type2='hybcap') %>% plotGeneticVsGeneticPoint()

    #also plot by cell line with one feature a y axis and another as fill colour
    #continous + discrete
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines, gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='hybcap') %>% plotGeneticVsGeneticHist()

    #continous + continous
    makeGeneticVsGeneticDataFrame(full_con, cell_lines=ex4_cell_lines[1:25], gene1='SMARCA4', data_type1='affy',
                                            gene2='SMARCA4', data_type2='cn') %>% plotGeneticVsGeneticHist(label_option = TRUE)
## Warning: Non Lab interpolation is deprecated

Interactive visualisations

There are also a series of shiny__ functions for interactive visualisations. The response data can be fed into the shinyRespVsGeneticApp function as follows:

    dietlein_data_fn <- system.file("extdata", "Dietlein2014_supp_table_1.txt", package = "CancerCellLines")
    dietlein_data <- read.table(dietlein_data_fn, header=T, sep='\t', stringsAsFactors=F)
    head(dietlein_data)
    dietlein_data <- dietlein_data %>% 
      filter(nchar(CCLE_name) > 1) %>% 
      transmute(unified_id=CCLE_name, compound_id='KU60648', endpoint='pGI50', original=GI50, value=9-log10(GI50))     
    head(dietlein_data)

    full_con <- setupSQLite('~/BigData/CellLineData/CancerCellLines.db')
    shinyRespVsGeneticApp(con=full_con, drug_df=dietlein_data)

Alternatively, if a custom dataset isn’t defined CCLE will be used:

    shinyRespVsGeneticApp(con=full_con)

If you are just interested in the GeneticVsGenetic analysis functions then you can launch the shinyGeneticVsGenetic shiny app:

    shinyGeneticVsGeneticApp(con=full_con)    

Example 5: Comparing response values

Text

    #get all cell lines
    ex5_cell_lines <- dplyr_con %>% tbl('ccle_sampleinfo') %>% 
       collect %>% as.data.frame
    ex5_cell_lines <- ex5_cell_lines$CCLE_name
    
    #make a data frame
    df <- makeRespVsRespDataFrame(full_con, 
                               cell_lines=ex5_cell_lines,
                               drugs=c('Erlotinib', 'AZD6244'),
                               tissue_info = 'ccle')
    head(df)
##                                CCLE_name      ID Type    original    value
## 1          1321N1_CENTRAL_NERVOUS_SYSTEM AZD6244 resp        <NA>       NA
## 2                         22RV1_PROSTATE AZD6244 resp 6.374595234 5.195547
## 3          42MGBA_CENTRAL_NERVOUS_SYSTEM AZD6244 resp        <NA>       NA
## 4                     5637_URINARY_TRACT AZD6244 resp 2.747059107 5.561132
## 5                     639V_URINARY_TRACT AZD6244 resp  0.13161619 6.880691
## 6 697_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE AZD6244 resp        <NA>       NA
##   native_id                             tissue          subtype1
## 1    1321N1             central_nervous_system            glioma
## 2     22Rv1                           prostate         carcinoma
## 3  42-MG-BA             central_nervous_system            glioma
## 4      5637                      urinary_tract         carcinoma
## 5     639-V                      urinary_tract         carcinoma
## 6       697 haematopoietic_and_lymphoid_tissue lymphoid_neoplasm
##                               subtype2
## 1                          astrocytoma
## 2                                   NS
## 3                 astrocytoma_Grade_IV
## 4                                   NS
## 5          transitional_cell_carcinoma
## 6 acute_lymphoblastic_B_cell_leukaemia
    #makes a wide data frame
    wide.df <- df %>% makeWideFromRespVsRespDataFrame()
    head(wide.df)
##                                CCLE_name native_id
## 1          1321N1_CENTRAL_NERVOUS_SYSTEM    1321N1
## 2                         22RV1_PROSTATE     22Rv1
## 3          42MGBA_CENTRAL_NERVOUS_SYSTEM  42-MG-BA
## 4                     5637_URINARY_TRACT      5637
## 5                     639V_URINARY_TRACT     639-V
## 6 697_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE       697
##                               tissue          subtype1
## 1             central_nervous_system            glioma
## 2                           prostate         carcinoma
## 3             central_nervous_system            glioma
## 4                      urinary_tract         carcinoma
## 5                      urinary_tract         carcinoma
## 6 haematopoietic_and_lymphoid_tissue lymphoid_neoplasm
##                               subtype2  AZD6244 Erlotinib
## 1                          astrocytoma       NA  6.106130
## 2                                   NS 5.195547        NA
## 3                 astrocytoma_Grade_IV       NA  5.800518
## 4                                   NS 5.561132  5.079719
## 5          transitional_cell_carcinoma 6.880691        NA
## 6 acute_lymphoblastic_B_cell_leukaemia       NA  5.068745
    #now do some plots
    plotRespVsRespWaterfall(filter(df, grepl('Erlotinib', ID)))
## Loading required package: RColorBrewer

    plotRespVsRespDensity(df)
## Warning: Removed 243 rows containing non-finite values (stat_density).
## Warning: Removed 233 rows containing non-finite values (stat_density).

    plotRespVsRespPairs(df)
## Warning: Removed 334 rows containing missing values (stat_smooth).
## Warning: Removed 334 rows containing missing values (stat_smooth).
## Warning: Removed 334 rows containing missing values (geom_point).
## Warning: Removed 334 rows containing missing values (geom_point).

Also a shiny app:

    shinyRespVsRespApp(con=full_con)

Future directions

To do:
- GeneticVsGenetic - RespVsResp

Session Info

   sessionInfo() 
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.9.5 (Mavericks)
## 
## locale:
## [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] RColorBrewer_1.1-2    CancerCellLines_0.6.6 reshape2_1.4.1       
##  [4] RSQLite_1.0.0         DBI_0.3.1             shiny_0.12.2         
##  [7] ggplot2_1.0.1         scales_0.3.0          tidyr_0.3.1          
## [10] readr_0.2.2           readxl_0.1.0          dplyr_0.4.3          
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2      knitr_1.11       magrittr_1.5     MASS_7.3-45     
##  [5] munsell_0.4.2    xtable_1.8-0     colorspace_1.2-6 R6_2.1.1        
##  [9] stringr_1.0.0    plyr_1.8.3       tools_3.2.2      parallel_3.2.2  
## [13] grid_3.2.2       gtable_0.1.2     htmltools_0.2.6  lazyeval_0.1.10 
## [17] yaml_2.1.13      assertthat_0.1   digest_0.6.8     formatR_1.2.1   
## [21] mime_0.4         evaluate_0.8     rmarkdown_0.8.1  labeling_0.3    
## [25] stringi_1.0-1    httpuv_1.3.3     proto_0.3-10