In this project, I aimed to mine the beatAML dataset (Internal release wave 2, 2017 June) to identify potential anti-leukemia mechanisms of drug BCL-2 inhibitor Venetoclax.
beatAML overview
270
unique drugs screened.
For each drug, there are 3
replicates.
1924
unique lab IDs.
1500
unique patient IDs.
Unique specimen types:
## # A tibble: 3 x 1
## specimen_type
## <fctr>
## 1 Bone Marrow Aspirate
## 2 Leukapheresis
## 3 Peripheral Blood
Specific diagnosis types:
## # A tibble: 64 x 1
## specific_diagnosis
## <fctr>
## 1 Acute megakaryoblastic leukaemia
## 2 Acute monoblastic and monocytic leukaemia
## 3 AML with mutated NPM1
## 4 Acute myeloid leukaemia, NOS
## 5 Acute myelomonocytic leukaemia
## 6 AML with myelodysplasia-related changes
## 7 AML without maturation
## 8 AML with inv(16)(p13.1q22) or t(16;16)(p13.1;q22); CBFB-MYH11
## 9 AML with maturation
## 10 AML with minimal differentiation
## # ... with 54 more rows
Unique drugs sequenced:
## # A tibble: 270 x 1
## drug
## <fctr>
## 1 Nilotinib
## 2 Flavopiridol
## 3 H-89
## 4 LY294002
## 5 GW-2580
## 6 PD98059
## 7 VX-745
## 8 Sunitinib
## 9 STO609
## 10 Sorafenib
## # ... with 260 more rows
## [1] 1136 21
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.004572 0.005215 0.006530 0.486500 0.018430 10.000000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.004572 0.007476 0.014480 1.337000 0.415800 10.000000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.004572 0.016350 0.333700 3.686000 10.000000 10.000000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.004572 0.086140 10.000000 6.025000 10.000000 10.000000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.008215 7.849000 10.000000 7.858000 10.000000 10.000000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -65.4 181.9 409.5 467.7 663.6 7506.0
## # A tibble: 867 x 1
## lab_id
## <chr>
## 1 14-00739
## 2 14-00752
## 3 14-00760
## 4 14-00765
## 5 14-00774
## 6 14-00781
## 7 14-00787
## 8 14-00789
## 9 14-00798
## 10 14-00801
## # ... with 857 more rows
## # A tibble: 119 x 3
## XRNAseqID LabID RNAseqID
## <chr> <chr> <chr>
## 1 X20.00051 16-00339 20-00051
## 2 X20.00058 16-00459 20-00058
## 3 X20.00059 16-00465 20-00059
## 4 X20.00050 16-00332 20-00050
## 5 X20.00052 16-00351 20-00052
## 6 X20.00057 16-00410 20-00057
## 7 X20.00053 16-00354 20-00053
## 8 X20.00056 16-00406 20-00056
## 9 X20.00060 16-00474 20-00060
## 10 X20.00049 16-00150 20-00049
## # ... with 109 more rows
## # A tibble: 73 x 1
## LabID
## <chr>
## 1 16-00339
## 2 16-00459
## 3 16-00465
## 4 16-00332
## 5 16-00351
## 6 16-00410
## 7 16-00354
## 8 16-00474
## 9 15-00742
## 10 16-00315
## # ... with 63 more rows
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 32.73 183.20 417.00 460.20 697.60 1220.00
## # A tibble: 18 x 1
## LabID
## <chr>
## 1 16-00073
## 2 16-01227
## 3 16-00867
## 4 16-01270
## 5 16-00540
## 6 16-00770
## 7 16-01061
## 8 16-01262
## 9 16-01103
## 10 16-01010
## 11 16-00951
## 12 16-00498
## 13 16-01219
## 14 16-00315
## 15 16-01185
## 16 16-00519
## 17 16-00525
## 18 16-00031
## # A tibble: 18 x 1
## LabID
## <chr>
## 1 16-00731
## 2 16-00339
## 3 16-00702
## 4 16-00836
## 5 16-00708
## 6 16-01216
## 7 16-01102
## 8 16-00538
## 9 16-00541
## 10 16-00627
## 11 16-00831
## 12 16-00771
## 13 16-00548
## 14 16-00815
## 15 16-01017
## 16 16-01049
## 17 16-00701
## 18 16-01121
## Warning: Column `LabID`/`labId` joining character vector and factor,
## coercing into character vector
## [1] "Venetoclax-sensitive specimens:"
## # A tibble: 18 x 12
## Venetoclax LabID patientId
## <chr> <chr> <int>
## 1 sensitive 16-00073 2443
## 2 sensitive 16-01227 4275
## 3 sensitive 16-00867 4042
## 4 sensitive 16-01270 4317
## 5 sensitive 16-00540 2721
## 6 sensitive 16-00770 4007
## 7 sensitive 16-01061 4197
## 8 sensitive 16-01262 4303
## 9 sensitive 16-01103 4197
## 10 sensitive 16-01010 4197
## 11 sensitive 16-00951 4075
## 12 sensitive 16-00498 2706
## 13 sensitive 16-01219 4271
## 14 sensitive 16-00315 1973
## 15 sensitive 16-01185 4252
## 16 sensitive 16-00519 2713
## 17 sensitive 16-00525 2715
## 18 sensitive 16-00031 2477
## # ... with 9 more variables: specificDxAtInclusion <fctr>,
## # karyotype <fctr>, ageAtSpecimenAcquisition <int>, specimenType <fctr>,
## # priorTreatmentTypes <fctr>, priorTreatmentRegimens <fctr>,
## # priorTreatmentStages <fctr>, percentBlastsBM <fctr>,
## # percentBlastsPB <fctr>
## Warning: Column `LabID`/`labId` joining character vector and factor,
## coercing into character vector
## [1] "Venetoclax-resistant specimens:"
## # A tibble: 18 x 12
## Venetoclax LabID patientId specificDxAtInclusion
## <chr> <chr> <int> <fctr>
## 1 resistant 16-00731 3990 Acute myeloid leukaemia, NOS
## 2 resistant 16-00339 NA <NA>
## 3 resistant 16-00702 3976 Unknown
## 4 resistant 16-00836 4039 AML with mutated NPM1
## 5 resistant 16-00708 3979 AML with myelodysplasia-related changes
## 6 resistant 16-01216 4263 AML with myelodysplasia-related changes
## 7 resistant 16-01102 4232 Unknown
## 8 resistant 16-00538 2694 Acute myeloid leukaemia, NOS
## 9 resistant 16-00541 NA <NA>
## 10 resistant 16-00627 2785 Unknown
## 11 resistant 16-00831 4038 Acute myeloid leukaemia, NOS
## 12 resistant 16-00771 4008 Acute myeloid leukaemia, NOS
## 13 resistant 16-00548 2119 AML with myelodysplasia-related changes
## 14 resistant 16-00815 4030 AML with mutated NPM1
## 15 resistant 16-01017 4043 Acute myeloid leukaemia, NOS
## 16 resistant 16-01049 4207 Acute myeloid leukaemia, NOS
## 17 resistant 16-00701 2747 Chronic myelomonocytic leukaemia
## 18 resistant 16-01121 4239 Unknown
## # ... with 8 more variables: karyotype <fctr>,
## # ageAtSpecimenAcquisition <int>, specimenType <fctr>,
## # priorTreatmentTypes <fctr>, priorTreatmentRegimens <fctr>,
## # priorTreatmentStages <fctr>, percentBlastsBM <fctr>,
## # percentBlastsPB <fctr>
## # A tibble: 18 x 3
## XRNAseqID LabID RNAseqID
## <chr> <chr> <chr>
## 1 X20.00068 16-00315 20-00068
## 2 X20.00076 16-00525 20-00076
## 3 X20.00095 16-00519 20-00095
## 4 X20.00317 16-00540 20-00317
## 5 X20.00312 16-00073 20-00312
## 6 X20.00350 16-00770 20-00350
## 7 X20.00420 16-00867 20-00420
## 8 X20.00417 16-00498 20-00417
## 9 X20.00449 16-00951 20-00449
## 10 X20.00456 16-01061 20-00456
## 11 X20.00453 16-01010 20-00453
## 12 X20.00492 16-01103 20-00492
## 13 X20.00513 16-01270 20-00513
## 14 X20.00508 16-01227 20-00508
## 15 X20.00490 16-00031 20-00490
## 16 X20.00511 16-01262 20-00511
## 17 X20.00499 16-01185 20-00499
## 18 X20.00505 16-01219 20-00505
## # A tibble: 18 x 3
## XRNAseqID LabID RNAseqID
## <chr> <chr> <chr>
## 1 X20.00051 16-00339 20-00051
## 2 X20.00335 16-00771 20-00335
## 3 X20.00340 16-00836 20-00340
## 4 X20.00322 16-00627 20-00322
## 5 X20.00327 16-00708 20-00327
## 6 X20.00331 16-00731 20-00331
## 7 X20.00316 16-00538 20-00316
## 8 X20.00338 16-00815 20-00338
## 9 X20.00325 16-00702 20-00325
## 10 X20.00318 16-00541 20-00318
## 11 X20.00324 16-00701 20-00324
## 12 X20.00352 16-00831 20-00352
## 13 X20.00348 16-00548 20-00348
## 14 X20.00454 16-01017 20-00454
## 15 X20.00463 16-01102 20-00463
## 16 X20.00491 16-01049 20-00491
## 17 X20.00504 16-01216 20-00504
## 18 X20.00495 16-01121 20-00495
## # A tibble: 63,677 x 19
## Symbol X20.00068 X20.00076 X20.00095 X20.00317 X20.00312 X20.00350
## <fctr> <int> <int> <int> <int> <int> <int>
## 1 TSPAN6 7 87 1 1 7 3
## 2 TNMD 0 0 0 0 0 0
## 3 DPM1 1272 1137 1624 1228 1150 918
## 4 SCYL3 1398 999 2134 1263 860 419
## 5 C1orf112 537 859 1710 428 442 88
## 6 FGR 2368 7039 1796 6662 10249 5241
## 7 CFH 53 163 137 4021 35 9
## 8 FUCA2 757 689 3848 2520 2337 546
## 9 GCLC 2131 1541 5039 4641 1449 897
## 10 NFYA 2831 4270 8549 2220 3073 1001
## # ... with 63,667 more rows, and 12 more variables: X20.00420 <int>,
## # X20.00417 <int>, X20.00449 <int>, X20.00456 <int>, X20.00453 <int>,
## # X20.00492 <int>, X20.00513 <int>, X20.00508 <int>, X20.00490 <int>,
## # X20.00511 <int>, X20.00499 <int>, X20.00505 <int>
## # A tibble: 63,677 x 19
## Symbol X20.00051 X20.00335 X20.00340 X20.00322 X20.00327 X20.00331
## <fctr> <int> <int> <int> <int> <int> <int>
## 1 TSPAN6 14 9 2 7 21 1
## 2 TNMD 0 0 0 1 0 0
## 3 DPM1 1640 598 1028 990 1503 1189
## 4 SCYL3 1057 199 841 548 679 1143
## 5 C1orf112 759 47 474 370 205 499
## 6 FGR 28388 13319 49688 57842 5305 37909
## 7 CFH 140 14 305 63 58 186
## 8 FUCA2 1458 769 2935 2784 1412 1609
## 9 GCLC 2100 628 2785 1523 2041 1725
## 10 NFYA 4902 626 2005 3397 1232 3426
## # ... with 63,667 more rows, and 12 more variables: X20.00316 <int>,
## # X20.00338 <int>, X20.00325 <int>, X20.00318 <int>, X20.00324 <int>,
## # X20.00352 <int>, X20.00348 <int>, X20.00454 <int>, X20.00463 <int>,
## # X20.00491 <int>, X20.00504 <int>, X20.00495 <int>
## # A tibble: 63,677 x 37
## Symbol X20.00068 X20.00076 X20.00095 X20.00317 X20.00312 X20.00350
## <fctr> <int> <int> <int> <int> <int> <int>
## 1 TSPAN6 7 87 1 1 7 3
## 2 TNMD 0 0 0 0 0 0
## 3 DPM1 1272 1137 1624 1228 1150 918
## 4 SCYL3 1398 999 2134 1263 860 419
## 5 C1orf112 537 859 1710 428 442 88
## 6 FGR 2368 7039 1796 6662 10249 5241
## 7 CFH 53 163 137 4021 35 9
## 8 FUCA2 757 689 3848 2520 2337 546
## 9 GCLC 2131 1541 5039 4641 1449 897
## 10 NFYA 2831 4270 8549 2220 3073 1001
## # ... with 63,667 more rows, and 30 more variables: X20.00420 <int>,
## # X20.00417 <int>, X20.00449 <int>, X20.00456 <int>, X20.00453 <int>,
## # X20.00492 <int>, X20.00513 <int>, X20.00508 <int>, X20.00490 <int>,
## # X20.00511 <int>, X20.00499 <int>, X20.00505 <int>, X20.00051 <int>,
## # X20.00335 <int>, X20.00340 <int>, X20.00322 <int>, X20.00327 <int>,
## # X20.00331 <int>, X20.00316 <int>, X20.00338 <int>, X20.00325 <int>,
## # X20.00318 <int>, X20.00324 <int>, X20.00352 <int>, X20.00348 <int>,
## # X20.00454 <int>, X20.00463 <int>, X20.00491 <int>, X20.00504 <int>,
## # X20.00495 <int>
##
## FALSE TRUE
## 41522 15116
TRUE
identifies number of genes with 0 counts across all specimens.
After removing low-expressing genes, out of the starting 56638
genes, 14462
are kept.
## [1] 1.116 1.127 1.150 1.155 1.140 0.774 1.269 1.156 1.258 1.107 1.276 1.177 0.972 0.747
## [15] 0.903 0.987 1.169 1.122 1.077 0.732 1.086 0.801 1.040 1.090 0.798 0.840 1.027 0.883
## [29] 0.689 0.974 1.040 0.894 0.787 1.003 0.947 1.182
## resistant sensitive
## 1 0 1
## 2 0 1
## 3 0 1
## 4 0 1
## 5 0 1
## 6 0 1
## 7 0 1
## 8 0 1
## 9 0 1
## 10 0 1
## 11 0 1
## 12 0 1
## 13 0 1
## 14 0 1
## 15 0 1
## 16 0 1
## 17 0 1
## 18 0 1
## 19 1 0
## 20 1 0
## 21 1 0
## 22 1 0
## 23 1 0
## 24 1 0
## 25 1 0
## 26 1 0
## 27 1 0
## 28 1 0
## 29 1 0
## 30 1 0
## 31 1 0
## 32 1 0
## 33 1 0
## 34 1 0
## 35 1 0
## 36 1 0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group_name
## [1] "contr.treatment"
## resistant sensitive
## DPM1 -10.39 -10.58
## SCYL3 -11.00 -10.75
## C1orf112 -11.88 -11.46
## FGR -7.17 -9.33
## CFH -10.59 -10.11
## FUCA2 -9.94 -10.17
## Coefficient: -1*resistant 1*sensitive
## logFC logCPM F PValue FDR
## SLC15A3 -4.35 5.99 82.9 5.69e-11 8.23e-07
## LMTK2 -2.32 6.15 69.0 5.54e-10 2.09e-06
## TCIRG1 -1.91 8.13 67.9 6.63e-10 2.09e-06
## MAPK13 -2.40 3.83 67.1 7.69e-10 2.09e-06
## TBC1D12 -5.15 2.88 67.3 7.71e-10 2.09e-06
## SNTB2 -2.69 2.92 65.9 9.52e-10 2.09e-06
## EFHD2 -1.75 8.52 65.6 1.01e-09 2.09e-06
## STX11 -3.07 6.86 63.1 1.66e-09 3.00e-06
## C5AR1 -4.79 7.86 62.3 1.95e-09 3.13e-06
## SEC14L1 -2.04 7.54 61.0 2.40e-09 3.45e-06
## [,1]
## -1 2943
## 0 8371
## 1 3148
## Coefficient: -1*resistant 1*sensitive
## logFC unshrunk.logFC logCPM PValue FDR
## SLC15A3 -4.35 -4.35 5.99 1.16e-09 1.67e-05
## TBC1D12 -5.15 -5.16 2.88 5.46e-09 3.95e-05
## MAFB -6.38 -6.38 7.92 1.41e-08 5.95e-05
## C5AR1 -4.79 -4.79 7.86 1.65e-08 5.95e-05
## RAPH1 -4.44 -4.45 2.85 4.38e-08 1.21e-04
## PDE4A -4.32 -4.33 4.63 5.02e-08 1.21e-04
## TMCC3 -4.03 -4.03 4.55 9.26e-08 1.91e-04
## RP11-288H12.3 -3.47 -3.47 1.91 1.20e-07 2.18e-04
## STX11 -3.07 -3.07 6.86 1.62e-07 2.26e-04
## HK3 -3.79 -3.79 7.18 1.76e-07 2.26e-04
## [,1]
## -1 478
## 0 13929
## 1 55
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## rn logFC unshrunk.logFC logCPM PValue
## 1: DPM1 -0.2781 -0.2782 4.8097 1.00e+00
## 2: SCYL3 0.3609 0.3609 4.2523 9.95e-01
## 3: C1orf112 0.6196 0.6198 3.1360 7.90e-01
## 4: FGR -3.1125 -3.1125 8.7460 2.05e-07
## 5: CFH 0.6915 0.6916 5.0471 5.20e-01
## ---
## 14458: RP5-1074L1.4 0.6522 0.6525 2.6499 7.04e-01
## 14459: RP5-1065J22.8 0.2828 0.2837 0.0233 9.76e-01
## 14460: RP11-166O4.6 1.2874 1.2907 0.4697 5.24e-02
## 14461: RP11-548H3.1 0.0894 0.0895 0.6997 1.00e+00
## 14462: RP11-731C17.2 0.7693 0.7703 1.3287 5.33e-01
##
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
##
## lowess
## Loading required package: pROC
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
##
## ggsave
## [1] 14460 36
## [,1] [,2]
## [1,] "TEC" "C5AR1"
## [2,] "FAM216A" "SLC15A3"
## [3,] "GOLGA8N" "HNRNPLL"
## [4,] "RP11-439E19.10" "DMPK"
## [5,] "ZNF221" "RP11-288H12.4"
CLEARY_LSC_UP
DICK_FUNCTIONAL_LSC
WANG_17
GOODELL_HSC
KEGG_TYROSINE
KEGG_CYS_MET_METABOLISM
KEGG_PYRUVATE
KEGG_PURINE
KEGG_PYRIMIDINE
GOODELL_MYELOID
NFKB_TARGETS
CEBPA_TARGETS
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.6
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] cowplot_0.9.1 switchBox_1.10.0 pROC_1.10.0 gplots_3.0.1
## [5] data.table_1.10.4-3 statmod_1.4.30 RColorBrewer_1.1-2 edgeR_3.16.5
## [9] Glimma_1.2.1 limma_3.30.13 knitr_1.17 BiocStyle_2.2.1
## [13] stringr_1.2.0 ggplot2_2.2.1 bindrcpp_0.2 dplyr_0.7.4
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.34.0 tidyr_0.7.2 bit64_0.9-7
## [4] splines_3.3.2 gtools_3.5.0 Formula_1.2-2
## [7] assertthat_0.2.0 stats4_3.3.2 latticeExtra_0.6-28
## [10] blob_1.1.0 yaml_2.1.16 RSQLite_2.0
## [13] backports_1.1.2 lattice_0.20-35 glue_1.2.0
## [16] digest_0.6.12 GenomicRanges_1.26.4 XVector_0.14.1
## [19] checkmate_1.8.5 colorspace_1.3-2 htmltools_0.3.6
## [22] Matrix_1.2-12 plyr_1.8.4 DESeq2_1.14.1
## [25] XML_3.98-1.9 pkgconfig_2.0.1 genefilter_1.56.0
## [28] zlibbioc_1.20.0 purrr_0.2.4 xtable_1.8-2
## [31] scales_0.5.0 gdata_2.18.0 BiocParallel_1.8.2
## [34] htmlTable_1.11.0 tibble_1.3.4 annotate_1.52.1
## [37] IRanges_2.8.2 SummarizedExperiment_1.4.0 nnet_7.3-12
## [40] BiocGenerics_0.20.0 lazyeval_0.2.1 survival_2.41-3
## [43] magrittr_1.5 memoise_1.1.0 evaluate_0.10.1
## [46] foreign_0.8-69 tools_3.3.2 S4Vectors_0.12.2
## [49] locfit_1.5-9.1 munsell_0.4.3 cluster_2.0.6
## [52] AnnotationDbi_1.36.2 GenomeInfoDb_1.10.3 caTools_1.17.1
## [55] rlang_0.1.4 grid_3.3.2 RCurl_1.95-4.8
## [58] rstudioapi_0.7 htmlwidgets_0.9 bitops_1.0-6
## [61] base64enc_0.1-3 labeling_0.3 rmarkdown_1.8
## [64] gtable_0.2.0 DBI_0.7 R6_2.2.2
## [67] gridExtra_2.3 bit_1.1-12 bindr_0.1
## [70] Hmisc_4.0-3 rprojroot_1.2 KernSmooth_2.23-15
## [73] stringi_1.1.6 parallel_3.3.2 Rcpp_0.12.14
## [76] geneplotter_1.52.0 rpart_4.1-11 acepack_1.4.1