This document is describing the Data Processing of the Human PDZ domain prediction for both sequence and structure based predictions.
1- Load all human domains
pdz_domains <- read.csv('~/Documents/Work/PDZ/POWCLUI/domains/HumanPOWNames.txt', header=F)
pdz_domains <- unique(pdz_domains$V1)
## [1] "Number of PDZ domains is 218"
2- Get all sequence-based prediction results files
pdzpath <- '~/Documents/Work/PDZ/Human-seq/Seq/'
files <- list.files(pdzpath,full.names=T)
## [1] "Number of sequence-based prediction files is 11"
3- Read all sequence-based predicted PDZ PPIs
pdz_seq <- data.frame()
for (i in 1:length(files))
{
d <- read.table(files[i], fill=T, header=F)
pdz_seq <- rbind(pdz_seq,d)
}
pdz_seq <- pdz_seq [ pdz_seq[[1]] %in% pdz_domains,]
pdz_seq <- pdz_seq[,1:4]
names (pdz_seq) <- c("DomainName", "BindingSiteSequence", "PeptideSequence", "DecisionScore")
## [1] "Number of sequence-based predicted PPIs is 11281248"
## [1] "Number of domains with predicted PPIs is 198"
| DomainName | BindingSiteSequence | PeptideSequence | DecisionScore |
|---|---|---|---|
| AHNAK2-1 | ASGYSVTGKQLSYELE | YCIGG | -1.4038920541288675 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | PNAEP | -0.9744675488251316 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | KHEYE | -1.0880877951821866 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | EVSYA | -0.869097214900627 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | PASAS | -1.0076030891653431 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | TFQRA | -1.6691679310600334 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | QEDDE | -0.89593758624212 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | TPVKN | -1.2779236501085798 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | QNQNL | -1.035166200227473 |
| AHNAK2-1 | ASGYSVTGKQLSYELE | KSSIC | -0.7628681618461656 |
Preview of the first 10 records of the sequence-based predicted PDZ PPIs.
4- Get all structure-based prediction results files
pdzpath <- '~/Documents/Work/PDZ/Human-seq/Struct/'
sfiles <- list.files(pdzpath,full.names=T)
## [1] "Number of structure-based prediction files is 11"
5- Read all structure-based predicted PDZ PPIs
pdz_struct <- data.frame()
for (i in 1:length(sfiles))
{
d <- read.table(sfiles[i], fill=T, header=F)
pdz_struct <- rbind(pdz_struct,d)
}
pdz_struct <- pdz_struct [ pdz_struct[[1]] %in% pdz_domains,]
pdz_struct <- pdz_struct[,1:3]
names (pdz_struct) <- c("DomainName", "PeptideSequence", "DecisionScore")
## [1] "Number of sequence-based predicted PPIs is 5266638"
## [1] "Number of domains with predicted PPIs is 213"
| DomainName | PeptideSequence | DecisionScore |
|---|---|---|
| DLG1-3 | FYKAI | -1.334127431816098 |
| DLG1-3 | WSVTQ | -1.8271362367520325 |
| DLG1-3 | VVDCM | -2.106712804560324 |
| DLG1-3 | KFWGT | -2.1204977196802006 |
| DLG1-3 | QADKV | -1.6264223331780503 |
| DLG1-3 | KVDSS | -1.7435554130347501 |
| DLG1-3 | GEKAM | -1.5968062021165985 |
| DLG1-3 | NQDNP | -1.573107366795604 |
| DLG1-3 | TKPSP | -2.113319129979189 |
| DLG1-3 | KVDSV | -0.31159228150511353 |
Preview of the first 10 records of the structure-based predicted PDZ PPIs.