During the convergence check of a on the fly (OTF) string method calculation,we obtained 3d figures like this:
You must enable Javascript to view this page properly.
which represent scatered moments on the iterations of the strings from the initial position (blue) to the final iteration (red). By construction, we started the OTF calculations using the same protein framework and, provided certain flexibility, this framework was kept relatively constant by constraining the rotations and translations of the protein using the collective variable module in namd. However, it is important to control that the variations of the images during the calculations are caused by an optimization of the string and not by the motion of the protein. This document describes some test performed to validate these ideas.
For this series of test, the data was optained using tcl scripts on the software VMD. To capture the variation and its dependence of the protein framework we extracted, for each image, the coordinates of the center of mass (COM) of the ligand (DIH) and the protein (PNP) in two conditions: (1) as they apear in the output pdb files, (2) after fitting the protein to the first frame.
As an example we show beloow how the input files for this analysis were named:
-‘I2NP.dat’ represents the COM variations for image 2 ‘I2’, under condition (1) or non-fitted ‘N’ for the protein ‘P’;
-‘I7FL.dat’ represents the COM variations for image 7 ‘I7’, under condition (2) or fitted ‘F’ for the ligand ‘L’.
As an example we can see the file ‘I1NL.dat’:
exampleInput<-read.table("I1NL.dat")
head(exampleInput,4)
## V1 V2 V3
## 1 26.35094 44.20616 64.45627
## 2 26.34289 44.23058 64.53694
## 3 26.36228 44.04828 64.60131
## 4 26.39081 44.14964 64.46983
dim(exampleInput)
## [1] 115 3
the dimension of this file show that these coordinates (V1-V3) were obtained from 115 pdb files during the OTF string method calculation.
From the previous figure we can see that the ligand showed practically no variation (because we force it that way) in its initial image (upper left corner); this is the starting point for the three represented curves. Therefore, it is to be expected that the COM motions of the ligand are smaller with respect to the CON motions of the protein at image 1, but this tendency should be inverted at other images, particularly images 5,6,7 and others.
To test this hypothesis we fist generate a function to create a single data frame per image with the COM variation for both, the Ligand and the Protein.
#this function calculates the distances between the COM of the ligand (protein) during the iterations of the OTF string method calculation.
#Input N = index of the image
#output df = data frame
getDistances<-function(N){
a1<-read.table(paste("I",N,"NL",".dat",sep=""))
a2<-read.table(paste("I",N,"NP",".dat",sep=""))
d1<-0
d2<-0
for (i in 2:dim(a1)[1]){
d1<-c(d1,sum(sqrt((t(a1)[,1]-t(a1)[,i])^2)))
d2<-c(d2,sum(sqrt((t(a2)[,1]-t(a2)[,i])^2)))
}
Ligand<-as.vector(d1[2:dim(a1)[1]])
Protein<-as.vector(d2[2:dim(a2)[1]])
df<-data.frame(Ligand,Protein)
return(df)
}
Therefore for image 1:
df1<-getDistances(1)
head(df1,4)
## Ligand Protein
## 1 0.1131477 0.1665154
## 2 0.3142586 0.2189636
## 3 0.1099415 0.3025055
## 4 0.3847904 0.4101601
For a better visualization we will create a multipanel plot from 4 images that uses the function getDistances just introduced.
Images<-c(1,5,6,7)
multiDensPlotData<-function(Images){
d1<-getDistances(Images[1])
N<-dim(d1)[1]
for (i in 2:length(Images)){
dt<-getDistances(Images[i])
d1<-rbind(d1,dt)
}
for (i in 1:length(Images)){
if (i<10){
d1$Image[((i*N)-(N-1)):(i*N)]<-paste("Image","0",i,sep="")
} else {
d1$Image[((i*N)-(N-1)):(i*N)]<-paste("Image",i,sep="")
}
}
return(d1)
}
dfT<-multiDensPlotData(Images)
dfT$Image<-as.factor(dfT$Image)
summary(dfT)
## Ligand Protein Image
## Min. :0.06818 Min. :0.08611 Image01:114
## 1st Qu.:0.48793 1st Qu.:1.26831 Image02:114
## Median :3.36667 Median :1.69385 Image03:114
## Mean :2.70484 Mean :1.67335 Image04:114
## 3rd Qu.:4.36353 3rd Qu.:2.12397
## Max. :5.25189 Max. :3.03212
As we can seefrom this summary, the motions of the ligand COM are larger than those of the protein even when we included in this data frame image1 that is indeed a special case because we fixed the motions of the ligand in this image with constraints. This effect can also be clearly seeing in the following density plots of the selected images.
library(lattice)
## Warning: package 'lattice' was built under R version 3.1.3
densityplot(~ Ligand + Protein | Image, data = dfT, auto.key = TRUE)
Finaly, we show box plots comparing the ligand and protein COM using boxplots for all the images.
Note: use boxplots too
Images<-seq_along(1:21)
dfT<-multiDensPlotData(Images)
dfT$Image<-as.factor(dfT$Image)
boxplot(Ligand ~ Image, data=dfT,col="yellow",main="COM variability per image",ylab="Distances")
boxplot(Protein ~ Image, data=dfT,col="green",add=TRUE)
legend(16, 5, c("Ligand", "Protein"),fill = c("yellow", "green"))
The data seems to suggest that the motions of the Protein COM are similar for all the images and for almost all cases these motions are notably minor compared with the similar motions observed in the Ligand COM. Therefore it is likely that the observed 3D motions in the ligand COM are indeed caused by the effect of the OTF procedure.