Generate Overlapped drug list from L1000 and LTKB

Introduction

This research is to show you the overlapped (drug) list betweeo two big databases.

About L1000 project

  • Summary: The Library of Integrated Cellular Signatures (LINCS) is an NIH program which funds the generation of perturbational profiles across multiple cell and perturbation types, as well as read-outs, at a massive scale. The LINCS Center for Transcriptomics at the Broad Institute uses the L1000 high-throughput gene-expression assay to build a Connectivity Map which seeks to enable the discovery of functional connections between drugs, genes and diseases through analysis of patterns induced by common gene-expression changes. These files represent L1000 data generated during the LINCS Pilot Phase (2012-2015), as well as profiles generated for more specific purposes, such as assay development and validation projects or testing custom compounds or non-standard cell lines (not part of the core LINCS cell lines). Note: Related GEO projects include (a) Additional L1000 and RNA-Seq data used to validate the assay and improve the inference model, available at GSE92743 (b) The LINCS “production phase” (also termed Phase II, 2015-2020) which is generating an additional cohort of L1000 data, available at GSE70138.

  • Overall design: LINCS aims to enable a functional understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents. The Broad Institute LINCS Center for Transcriptomics contributes to this collaborative effort by application of the Connectivity Map concept. In brief, the study design involves the generation of a compendium of transcriptional expression data from cultured human cells treated with small-molecule and genetic loss/gain of function perturbagens. These measurements are made using the L1000 high-throughput gene-expression assay that enables data generation at an unprecedented scale. The data are processed through a computational system, that converts raw fluorescence intensities into differential gene expression signatures. The data at each stage of the pre-processing are available:
    • Level 1 (LXB) - raw, unprocessed flow cytometry data from Luminex scanners. One LXB file is generated for each well of a 384-well plate, and each file contains a fluorescence intensity value for every observed analyte in the well.
    • Level 2 (GEX) - gene expression values per 1,000 genes after deconvolution from Luminex beads.
    • Level 3 (Q2NORM) - gene expression profiles of both directly measured landmark transcripts plus inferred genes. Normalized using invariant set scaling followed by quantile normalization.
    • Level 4 (Z-SCORES) - signatures with differentially expressed genes computed by robust z-scores for each profile relative to control (PC relative to plate population as control; VC relative to vehicle control).
    • Level 5 (SIG) consists of the replicates, usually 3 per treatment, aggregated into a single differential expression vector derived from the weighted averages of the individual replicates.

About LTKB database

The LTKB provides a centralized repository of information for DILI study and predictive model development. The DILI classification data in LTKB could be a useful resource for developing biomarkers, predictive models and assessing data from emerging technologies such as in silico, high-throughput and high-content screening methodologies. In coming years, streamlining the prediction process by including DILI predictive models for both DILI severity and types in LTKB would enhance the identification of compounds with the DILI potential earlier in drug development and risk assessment.

Data access

L1000 data access

L1000 data could be obtained via API provided by LINCs as follow instructions:

  1. download the pert info table from GEO database (GSE92742):
file_url = 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE92742&format=file&file=GSE92742%5FBroad%5FLINCS%5Fpert%5Finfo%2Etxt%2Egz'

if(!file.exists("pert.info.txt.gz")){
  download.file(file_url,"pert.info.txt.gz", mode = "wb")
}

L1000_perts <- read.table("pert.info.txt.gz", header=T, fill = TRUE)
dim(L1000_perts)
## [1] 51383     8

We can see that L1000 in total has 51383 distinct compounds.

However, these doesn’t mean they are all drugs. We will compare it to the LTKB drug list to care about this issue later.

LTKB (DILIRanks) data access

file_url <- "https://ars.els-cdn.com/content/image/1-s2.0-S1359644616300411-mmc2.xlsx"

if(!file.exists("DILIRank.xlsx")){
  download.file(file_url,"DILIRank.xlsx", mode = "wb")
}

LTKB_data <- read.xlsx("DILIRank.xlsx", sheetIndex = 1, header = T)
dim(LTKB_data)
## [1] 1036   16

Map LTKB Drugs to LINCs

There are many ways to map drugs. A common way is by name.

First, we use API to search each LTKB drug from Lincs CLUE website via name:

result_matrix <- matrix(0, nrow=0, ncol=(ncol(LTKB_data)+3))
colnames(result_matrix) <- c(colnames(LTKB_data),"cell_id", "pert_id", "pert_iname")
count_drugs <- 0

Success_attempt <- 0
Failed_attempt <- 0

test_drug_num <- dim(LTKB_data)[1]
for (i in 1:test_drug_num){
  drug_name <- tolower(LTKB_data$LabelCompoundName[i])
  drug_name <- gsub(pattern = " ", replacement = "%20", drug_name)

  key <- "e54f98445e0131a8fbfccbe463c0fcf4"
  url <- paste("api.clue.io/api/perts?filter={%22fields%22:[%22pert_id%22,%22pert_iname%22,%22cell_id%22],%22where%22:{%22pert_iname%22:%22",drug_name,"%22}}&user_key=",key,sep="")

  raw.data <- GET(url)
  if (raw.data$status_code == 200){
    Success_attempt <- Success_attempt+1
    data_content <- content(raw.data)
    data = jsonlite::fromJSON(toJSON(content(raw.data)))
  
    if (length(data)>0){
      count_drugs <- count_drugs +1
      for (j in 1: dim(data)[1]){
        tmp_result <- cbind(LTKB_data[i,],
                              data[j,c("cell_id", "pert_id", "pert_iname")])
        result_matrix <- rbind(result_matrix, tmp_result)
      }
    }
  }else{
    Failed_attempt <- Failed_attempt+1
  }
}

Finally, among 1036 drugs from LTKB database, 1000 drugs have been queried, 575 has been mapped via their name.
Note that, according to query limitation set by LINCS team, 36 drugs can not be processed

y_value <- c(count_drugs, Success_attempt-count_drugs, Failed_attempt)
barplot(y_value, main="Number of Drugs", xlab="Types", names.arg = c("Name_Matched","Not_Found","Failed"), col="firebrick1")

MC7 Cell-Line data

Here is the brief review of these mapped drugs. Only MCF7 related perts are shown:

result_matrix_MCF7 <- matrix(0,nrow=0,ncol=5)
for (i in 1:nrow(result_matrix)){
  if ("MCF7" %in% unlist(result_matrix$cell_id[i]) ){
    result_matrix_MCF7 <- rbind(result_matrix_MCF7, result_matrix[i,c(1,3,9, 19, 17)])
  }
}
colnames(result_matrix_MCF7) <- c("LTKB_ID", "Mapped_Drug_Name", "DILI_Label", "L1000-Pert-ID", "Cell-ID")
dim(result_matrix_MCF7)
## [1] 807   5
head(result_matrix_MCF7, n=20)
##    LTKB_ID    Mapped_Drug_Name         DILI_Label       L1000-Pert-ID
## 2  LT01842        trimethoprim vLess-DILI-Concern        trimethoprim
## 3  LT00036        tetracycline vLess-DILI-Concern        tetracycline
## 31 LT00036        tetracycline vLess-DILI-Concern        tetracycline
## 4  LT00289             dapsone vLess-DILI-Concern             dapsone
## 5  LT00166        pyrazinamide vLess-DILI-Concern        pyrazinamide
## 6  LT00098         fenofibrate vLess-DILI-Concern         fenofibrate
## 7  LT00013    cyclophosphamide vLess-DILI-Concern    cyclophosphamide
## 8  LT00068      chlorpromazine vLess-DILI-Concern      chlorpromazine
## 9  LT00335         doxorubicin vLess-DILI-Concern         doxorubicin
## 91 LT00335         doxorubicin vLess-DILI-Concern         doxorubicin
## 92 LT00335         doxorubicin vLess-DILI-Concern         doxorubicin
## 10 LT01225         clindamycin vLess-DILI-Concern         clindamycin
## 11 LT01716       pyrimethamine vLess-DILI-Concern       pyrimethamine
## 12 LT00429           oxaprozin vLess-DILI-Concern           oxaprozin
## 13 LT02041       dicloxacillin vLess-DILI-Concern       dicloxacillin
## 14 LT00059           captopril vLess-DILI-Concern           captopril
## 15 LT00393         doxycycline vLess-DILI-Concern         doxycycline
## 16 LT01167          cefadroxil vLess-DILI-Concern          cefadroxil
## 17 LT01433 hydrochlorothiazide vLess-DILI-Concern hydrochlorothiazide
## 18 LT01492          lisinopril vLess-DILI-Concern          lisinopril
##                                                                                              Cell-ID
## 2                                                                                          MCF7, PC3
## 3                                                    A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 31                                                                                         MCF7, PC3
## 4                                                    A375, VCAP, A549, HA1E, HCC515, HT29, MCF7, PC3
## 5                                                    A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 6                         MCF7, PC3, HA1E, HCC515, VCAP, A375, A549, ASC, HEPG2, HT29, NPC, PHH, SKB
## 7                                                    A375, HT29, MCF7, PC3, VCAP, A549, HA1E, HCC515
## 8  A549, HCC515, A375, HA1E, HEPG2, HT29, MCF7, PC3, VCAP, ASC, NPC, PHH, SKB, FIBRNPC, NEU, NEU.KCL
## 9                              A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, PHH, SKB
## 91                                                   A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 92                             HA1E, HCC515, PC3, VCAP, A375, A549, HT29, MCF7, ASC, HEPG2, PHH, SKB
## 10                                                                                         MCF7, PC3
## 11                        HA1E, HCC515, PC3, VCAP, A375, A549, ASC, HEPG2, HT29, MCF7, NPC, PHH, SKB
## 12                  A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, NPC, PHH, SKB, HUH7
## 13                                            A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP
## 14                                                   A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 15                                                   PC3, VCAP, A375, A549, HA1E, HCC515, HT29, MCF7
## 16                                                                                         MCF7, PC3
## 17                                                                                         MCF7, PC3
## 18                                                                                         MCF7, PC3
tmp_data <- unique.matrix(result_matrix_MCF7[,1:3])
dim(tmp_data)
## [1] 567   3
tmp_count <- table(tmp_data$DILI_Label)
barplot(tmp_count, col="green", main="MCF7 Cell Line statistics")

PC3 Cell Line data

Similarly, We showed the perts having PC3 data:

result_matrix_PC3 <- matrix(0,nrow=0,ncol=5)
for (i in 1:nrow(result_matrix)){
  if ("PC3" %in% unlist(result_matrix$cell_id[i]) ){
    result_matrix_PC3 <- rbind(result_matrix_PC3, result_matrix[i,c(1,3,9, 19, 17)])
  }
}
colnames(result_matrix_PC3) <- c("LTKB_ID", "Mapped_Drug_Name", "DILI_Label", "L1000-Pert-ID", "Cell-ID")
dim(result_matrix_PC3)
## [1] 808   5
head(result_matrix_PC3, n=20)
##    LTKB_ID    Mapped_Drug_Name         DILI_Label       L1000-Pert-ID
## 2  LT01842        trimethoprim vLess-DILI-Concern        trimethoprim
## 3  LT00036        tetracycline vLess-DILI-Concern        tetracycline
## 31 LT00036        tetracycline vLess-DILI-Concern        tetracycline
## 4  LT00289             dapsone vLess-DILI-Concern             dapsone
## 5  LT00166        pyrazinamide vLess-DILI-Concern        pyrazinamide
## 6  LT00098         fenofibrate vLess-DILI-Concern         fenofibrate
## 7  LT00013    cyclophosphamide vLess-DILI-Concern    cyclophosphamide
## 8  LT00068      chlorpromazine vLess-DILI-Concern      chlorpromazine
## 9  LT00335         doxorubicin vLess-DILI-Concern         doxorubicin
## 91 LT00335         doxorubicin vLess-DILI-Concern         doxorubicin
## 92 LT00335         doxorubicin vLess-DILI-Concern         doxorubicin
## 10 LT01225         clindamycin vLess-DILI-Concern         clindamycin
## 11 LT01716       pyrimethamine vLess-DILI-Concern       pyrimethamine
## 12 LT00429           oxaprozin vLess-DILI-Concern           oxaprozin
## 13 LT02041       dicloxacillin vLess-DILI-Concern       dicloxacillin
## 14 LT00059           captopril vLess-DILI-Concern           captopril
## 15 LT00393         doxycycline vLess-DILI-Concern         doxycycline
## 16 LT01167          cefadroxil vLess-DILI-Concern          cefadroxil
## 17 LT01433 hydrochlorothiazide vLess-DILI-Concern hydrochlorothiazide
## 18 LT01492          lisinopril vLess-DILI-Concern          lisinopril
##                                                                                              Cell-ID
## 2                                                                                          MCF7, PC3
## 3                                                    A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 31                                                                                         MCF7, PC3
## 4                                                    A375, VCAP, A549, HA1E, HCC515, HT29, MCF7, PC3
## 5                                                    A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 6                         MCF7, PC3, HA1E, HCC515, VCAP, A375, A549, ASC, HEPG2, HT29, NPC, PHH, SKB
## 7                                                    A375, HT29, MCF7, PC3, VCAP, A549, HA1E, HCC515
## 8  A549, HCC515, A375, HA1E, HEPG2, HT29, MCF7, PC3, VCAP, ASC, NPC, PHH, SKB, FIBRNPC, NEU, NEU.KCL
## 9                              A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, PHH, SKB
## 91                                                   A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 92                             HA1E, HCC515, PC3, VCAP, A375, A549, HT29, MCF7, ASC, HEPG2, PHH, SKB
## 10                                                                                         MCF7, PC3
## 11                        HA1E, HCC515, PC3, VCAP, A375, A549, ASC, HEPG2, HT29, MCF7, NPC, PHH, SKB
## 12                  A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, NPC, PHH, SKB, HUH7
## 13                                            A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP
## 14                                                   A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 15                                                   PC3, VCAP, A375, A549, HA1E, HCC515, HT29, MCF7
## 16                                                                                         MCF7, PC3
## 17                                                                                         MCF7, PC3
## 18                                                                                         MCF7, PC3
tmp_data <- unique.matrix(result_matrix_PC3[,1:3])
dim(tmp_data)
## [1] 567   3
tmp_count <- table(tmp_data$DILI_Label)
barplot(tmp_count, col="blue", main="PC3 Cell Line statistics")

Summary

######Document END