This research is to show you the overlapped (drug) list betweeo two big databases.
Summary: The Library of Integrated Cellular Signatures (LINCS) is an NIH program which funds the generation of perturbational profiles across multiple cell and perturbation types, as well as read-outs, at a massive scale. The LINCS Center for Transcriptomics at the Broad Institute uses the L1000 high-throughput gene-expression assay to build a Connectivity Map which seeks to enable the discovery of functional connections between drugs, genes and diseases through analysis of patterns induced by common gene-expression changes. These files represent L1000 data generated during the LINCS Pilot Phase (2012-2015), as well as profiles generated for more specific purposes, such as assay development and validation projects or testing custom compounds or non-standard cell lines (not part of the core LINCS cell lines). Note: Related GEO projects include (a) Additional L1000 and RNA-Seq data used to validate the assay and improve the inference model, available at GSE92743 (b) The LINCS “production phase” (also termed Phase II, 2015-2020) which is generating an additional cohort of L1000 data, available at GSE70138.
The LTKB provides a centralized repository of information for DILI study and predictive model development. The DILI classification data in LTKB could be a useful resource for developing biomarkers, predictive models and assessing data from emerging technologies such as in silico, high-throughput and high-content screening methodologies. In coming years, streamlining the prediction process by including DILI predictive models for both DILI severity and types in LTKB would enhance the identification of compounds with the DILI potential earlier in drug development and risk assessment.
L1000 data could be obtained via API provided by LINCs as follow instructions:
file_url = 'https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE92742&format=file&file=GSE92742%5FBroad%5FLINCS%5Fpert%5Finfo%2Etxt%2Egz'
if(!file.exists("pert.info.txt.gz")){
download.file(file_url,"pert.info.txt.gz", mode = "wb")
}
L1000_perts <- read.table("pert.info.txt.gz", header=T, fill = TRUE)
dim(L1000_perts)
## [1] 51383 8
We can see that L1000 in total has 51383 distinct compounds.
However, these doesn’t mean they are all drugs. We will compare it to the LTKB drug list to care about this issue later.
file_url <- "https://ars.els-cdn.com/content/image/1-s2.0-S1359644616300411-mmc2.xlsx"
if(!file.exists("DILIRank.xlsx")){
download.file(file_url,"DILIRank.xlsx", mode = "wb")
}
LTKB_data <- read.xlsx("DILIRank.xlsx", sheetIndex = 1, header = T)
dim(LTKB_data)
## [1] 1036 16
There are many ways to map drugs. A common way is by name.
First, we use API to search each LTKB drug from Lincs CLUE website via name:
result_matrix <- matrix(0, nrow=0, ncol=(ncol(LTKB_data)+3))
colnames(result_matrix) <- c(colnames(LTKB_data),"cell_id", "pert_id", "pert_iname")
count_drugs <- 0
Success_attempt <- 0
Failed_attempt <- 0
test_drug_num <- dim(LTKB_data)[1]
for (i in 1:test_drug_num){
drug_name <- tolower(LTKB_data$LabelCompoundName[i])
drug_name <- gsub(pattern = " ", replacement = "%20", drug_name)
key <- "e54f98445e0131a8fbfccbe463c0fcf4"
url <- paste("api.clue.io/api/perts?filter={%22fields%22:[%22pert_id%22,%22pert_iname%22,%22cell_id%22],%22where%22:{%22pert_iname%22:%22",drug_name,"%22}}&user_key=",key,sep="")
raw.data <- GET(url)
if (raw.data$status_code == 200){
Success_attempt <- Success_attempt+1
data_content <- content(raw.data)
data = jsonlite::fromJSON(toJSON(content(raw.data)))
if (length(data)>0){
count_drugs <- count_drugs +1
for (j in 1: dim(data)[1]){
tmp_result <- cbind(LTKB_data[i,],
data[j,c("cell_id", "pert_id", "pert_iname")])
result_matrix <- rbind(result_matrix, tmp_result)
}
}
}else{
Failed_attempt <- Failed_attempt+1
}
}
Finally, among 1036 drugs from LTKB database, 1000 drugs have been queried, 575 has been mapped via their name.
Note that, according to query limitation set by LINCS team, 36 drugs can not be processed
y_value <- c(count_drugs, Success_attempt-count_drugs, Failed_attempt)
barplot(y_value, main="Number of Drugs", xlab="Types", names.arg = c("Name_Matched","Not_Found","Failed"), col="firebrick1")
Here is the brief review of these mapped drugs. Only MCF7 related perts are shown:
result_matrix_MCF7 <- matrix(0,nrow=0,ncol=5)
for (i in 1:nrow(result_matrix)){
if ("MCF7" %in% unlist(result_matrix$cell_id[i]) ){
result_matrix_MCF7 <- rbind(result_matrix_MCF7, result_matrix[i,c(1,3,9, 19, 17)])
}
}
colnames(result_matrix_MCF7) <- c("LTKB_ID", "Mapped_Drug_Name", "DILI_Label", "L1000-Pert-ID", "Cell-ID")
dim(result_matrix_MCF7)
## [1] 807 5
head(result_matrix_MCF7, n=20)
## LTKB_ID Mapped_Drug_Name DILI_Label L1000-Pert-ID
## 2 LT01842 trimethoprim vLess-DILI-Concern trimethoprim
## 3 LT00036 tetracycline vLess-DILI-Concern tetracycline
## 31 LT00036 tetracycline vLess-DILI-Concern tetracycline
## 4 LT00289 dapsone vLess-DILI-Concern dapsone
## 5 LT00166 pyrazinamide vLess-DILI-Concern pyrazinamide
## 6 LT00098 fenofibrate vLess-DILI-Concern fenofibrate
## 7 LT00013 cyclophosphamide vLess-DILI-Concern cyclophosphamide
## 8 LT00068 chlorpromazine vLess-DILI-Concern chlorpromazine
## 9 LT00335 doxorubicin vLess-DILI-Concern doxorubicin
## 91 LT00335 doxorubicin vLess-DILI-Concern doxorubicin
## 92 LT00335 doxorubicin vLess-DILI-Concern doxorubicin
## 10 LT01225 clindamycin vLess-DILI-Concern clindamycin
## 11 LT01716 pyrimethamine vLess-DILI-Concern pyrimethamine
## 12 LT00429 oxaprozin vLess-DILI-Concern oxaprozin
## 13 LT02041 dicloxacillin vLess-DILI-Concern dicloxacillin
## 14 LT00059 captopril vLess-DILI-Concern captopril
## 15 LT00393 doxycycline vLess-DILI-Concern doxycycline
## 16 LT01167 cefadroxil vLess-DILI-Concern cefadroxil
## 17 LT01433 hydrochlorothiazide vLess-DILI-Concern hydrochlorothiazide
## 18 LT01492 lisinopril vLess-DILI-Concern lisinopril
## Cell-ID
## 2 MCF7, PC3
## 3 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 31 MCF7, PC3
## 4 A375, VCAP, A549, HA1E, HCC515, HT29, MCF7, PC3
## 5 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 6 MCF7, PC3, HA1E, HCC515, VCAP, A375, A549, ASC, HEPG2, HT29, NPC, PHH, SKB
## 7 A375, HT29, MCF7, PC3, VCAP, A549, HA1E, HCC515
## 8 A549, HCC515, A375, HA1E, HEPG2, HT29, MCF7, PC3, VCAP, ASC, NPC, PHH, SKB, FIBRNPC, NEU, NEU.KCL
## 9 A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, PHH, SKB
## 91 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 92 HA1E, HCC515, PC3, VCAP, A375, A549, HT29, MCF7, ASC, HEPG2, PHH, SKB
## 10 MCF7, PC3
## 11 HA1E, HCC515, PC3, VCAP, A375, A549, ASC, HEPG2, HT29, MCF7, NPC, PHH, SKB
## 12 A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, NPC, PHH, SKB, HUH7
## 13 A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP
## 14 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 15 PC3, VCAP, A375, A549, HA1E, HCC515, HT29, MCF7
## 16 MCF7, PC3
## 17 MCF7, PC3
## 18 MCF7, PC3
tmp_data <- unique.matrix(result_matrix_MCF7[,1:3])
dim(tmp_data)
## [1] 567 3
tmp_count <- table(tmp_data$DILI_Label)
barplot(tmp_count, col="green", main="MCF7 Cell Line statistics")
Similarly, We showed the perts having PC3 data:
result_matrix_PC3 <- matrix(0,nrow=0,ncol=5)
for (i in 1:nrow(result_matrix)){
if ("PC3" %in% unlist(result_matrix$cell_id[i]) ){
result_matrix_PC3 <- rbind(result_matrix_PC3, result_matrix[i,c(1,3,9, 19, 17)])
}
}
colnames(result_matrix_PC3) <- c("LTKB_ID", "Mapped_Drug_Name", "DILI_Label", "L1000-Pert-ID", "Cell-ID")
dim(result_matrix_PC3)
## [1] 808 5
head(result_matrix_PC3, n=20)
## LTKB_ID Mapped_Drug_Name DILI_Label L1000-Pert-ID
## 2 LT01842 trimethoprim vLess-DILI-Concern trimethoprim
## 3 LT00036 tetracycline vLess-DILI-Concern tetracycline
## 31 LT00036 tetracycline vLess-DILI-Concern tetracycline
## 4 LT00289 dapsone vLess-DILI-Concern dapsone
## 5 LT00166 pyrazinamide vLess-DILI-Concern pyrazinamide
## 6 LT00098 fenofibrate vLess-DILI-Concern fenofibrate
## 7 LT00013 cyclophosphamide vLess-DILI-Concern cyclophosphamide
## 8 LT00068 chlorpromazine vLess-DILI-Concern chlorpromazine
## 9 LT00335 doxorubicin vLess-DILI-Concern doxorubicin
## 91 LT00335 doxorubicin vLess-DILI-Concern doxorubicin
## 92 LT00335 doxorubicin vLess-DILI-Concern doxorubicin
## 10 LT01225 clindamycin vLess-DILI-Concern clindamycin
## 11 LT01716 pyrimethamine vLess-DILI-Concern pyrimethamine
## 12 LT00429 oxaprozin vLess-DILI-Concern oxaprozin
## 13 LT02041 dicloxacillin vLess-DILI-Concern dicloxacillin
## 14 LT00059 captopril vLess-DILI-Concern captopril
## 15 LT00393 doxycycline vLess-DILI-Concern doxycycline
## 16 LT01167 cefadroxil vLess-DILI-Concern cefadroxil
## 17 LT01433 hydrochlorothiazide vLess-DILI-Concern hydrochlorothiazide
## 18 LT01492 lisinopril vLess-DILI-Concern lisinopril
## Cell-ID
## 2 MCF7, PC3
## 3 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 31 MCF7, PC3
## 4 A375, VCAP, A549, HA1E, HCC515, HT29, MCF7, PC3
## 5 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 6 MCF7, PC3, HA1E, HCC515, VCAP, A375, A549, ASC, HEPG2, HT29, NPC, PHH, SKB
## 7 A375, HT29, MCF7, PC3, VCAP, A549, HA1E, HCC515
## 8 A549, HCC515, A375, HA1E, HEPG2, HT29, MCF7, PC3, VCAP, ASC, NPC, PHH, SKB, FIBRNPC, NEU, NEU.KCL
## 9 A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, PHH, SKB
## 91 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 92 HA1E, HCC515, PC3, VCAP, A375, A549, HT29, MCF7, ASC, HEPG2, PHH, SKB
## 10 MCF7, PC3
## 11 HA1E, HCC515, PC3, VCAP, A375, A549, ASC, HEPG2, HT29, MCF7, NPC, PHH, SKB
## 12 A375, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP, A549, ASC, NPC, PHH, SKB, HUH7
## 13 A375, A549, HA1E, HCC515, HEPG2, HT29, MCF7, PC3, VCAP
## 14 A375, A549, HA1E, HCC515, HT29, MCF7, PC3, VCAP
## 15 PC3, VCAP, A375, A549, HA1E, HCC515, HT29, MCF7
## 16 MCF7, PC3
## 17 MCF7, PC3
## 18 MCF7, PC3
tmp_data <- unique.matrix(result_matrix_PC3[,1:3])
dim(tmp_data)
## [1] 567 3
tmp_count <- table(tmp_data$DILI_Label)
barplot(tmp_count, col="blue", main="PC3 Cell Line statistics")