1.- Overview

During the convergence check of a on the fly (OTF) string method calculation,we obtained 3d figures like this:

snapshot
You must enable Javascript to view this page properly.

which represent scatered moments on the iterations of the strings from the initial position (blue) to the final iteration (red). By construction, we started the OTF calculations using the same protein framework and, provided certain flexibility, this framework was kept relatively constant by constraining the rotations and translations of the protein using the collective variable module in namd. However, it is important to control that the variations of the images during the calculations are caused by an optimization of the string and not by the motion of the protein. This document describes some test performed to validate these ideas.

2.- Data

For this series of test, the data was optained using tcl scripts on the software VMD. To capture the variation and its dependence of the protein framework we extracted, for each image, the coordinates of the center of mass (COM) of the ligand (DIH) and the protein (PNP) in two conditions: (1) as they apear in the output pdb files, (2) after fitting the protein to the first frame.

As an example we show beloow how the input files for this analysis were named:

-‘I2NP.dat’ represents the COM variations for image 2 ‘I2’, under condition (1) or non-fitted ‘N’ for the protein ‘P’;
-‘I7FL.dat’ represents the COM variations for image 7 ‘I7’, under condition (2) or fitted ‘F’ for the ligand ‘L’.

As an example we can see the file ‘I1NL.dat’:

exampleInput<-read.table("I1NL.dat")
head(exampleInput,4)
##         V1       V2       V3
## 1 26.35094 44.20616 64.45627
## 2 26.34289 44.23058 64.53694
## 3 26.36228 44.04828 64.60131
## 4 26.39081 44.14964 64.46983
dim(exampleInput)
## [1] 115   3

the dimension of this file show that these coordinates (V1-V3) were obtained from 115 pdb files during the OTF string method calculation.

3.- Initial exploration

From the previous figure we can see that the ligand showed practically no variation (because we force it that way) in its initial image (upper left corner); this is the starting point for the three represented curves. Therefore, it is to be expected that the COM motions of the ligand are smaller with respect to the CON motions of the protein at image 1, but this tendency should be inverted at other images, particularly images 5,6,7 and others.

To test this hypothesis we fist generate a function to create a single data frame per image with the COM variation for both, the Ligand and the Protein.

#this function calculates the distances between the COM of the ligand (protein) during the iterations of the OTF string method calculation.
#Input  N  = index of the image
#output df = data frame
getDistances<-function(N){
        a1<-read.table(paste("I",N,"NL",".dat",sep=""))
        a2<-read.table(paste("I",N,"NP",".dat",sep=""))
        d1<-0
        d2<-0
        for (i in 2:dim(a1)[1]){
             d1<-c(d1,sum(sqrt((t(a1)[,1]-t(a1)[,i])^2)))
             d2<-c(d2,sum(sqrt((t(a2)[,1]-t(a2)[,i])^2)))
        }
        Ligand<-as.vector(d1[2:dim(a1)[1]])
        Protein<-as.vector(d2[2:dim(a2)[1]])
        df<-data.frame(Ligand,Protein)
        return(df)
}

Therefore for image 1:

df1<-getDistances(1)
head(df1,4)
##      Ligand   Protein
## 1 0.1131477 0.1665154
## 2 0.3142586 0.2189636
## 3 0.1099415 0.3025055
## 4 0.3847904 0.4101601

4.- Density plots

For a better visualization we will create a multipanel plot from 4 images that uses the function getDistances just introduced.

Images<-c(1,5,6,7)
multiDensPlotData<-function(Images){
        d1<-getDistances(Images[1])
        N<-dim(d1)[1]
        for (i in 2:length(Images)){
                dt<-getDistances(Images[i])
                d1<-rbind(d1,dt)
        }
        for (i in 1:length(Images)){
                if (i<10){
                        d1$Image[((i*N)-(N-1)):(i*N)]<-paste("Image","0",i,sep="")
                } else {
                        d1$Image[((i*N)-(N-1)):(i*N)]<-paste("Image",i,sep="")
                }
        }
        return(d1)
}
dfT<-multiDensPlotData(Images)
dfT$Image<-as.factor(dfT$Image)
summary(dfT)
##      Ligand           Protein            Image    
##  Min.   :0.06818   Min.   :0.08611   Image01:114  
##  1st Qu.:0.48793   1st Qu.:1.26831   Image02:114  
##  Median :3.36667   Median :1.69385   Image03:114  
##  Mean   :2.70484   Mean   :1.67335   Image04:114  
##  3rd Qu.:4.36353   3rd Qu.:2.12397                
##  Max.   :5.25189   Max.   :3.03212

As we can seefrom this summary, the motions of the ligand COM are larger than those of the protein even when we included in this data frame image1 that is indeed a special case because we fixed the motions of the ligand in this image with constraints. This effect can also be clearly seeing in the following density plots of the selected images.

library(lattice)
## Warning: package 'lattice' was built under R version 3.1.3
densityplot(~ Ligand + Protein | Image, data = dfT, auto.key = TRUE)

5.- Boxplots

Finaly, we show box plots comparing the ligand and protein COM using boxplots for all the images.

Note: use boxplots too

Images<-seq_along(1:21)
dfT<-multiDensPlotData(Images)
dfT$Image<-as.factor(dfT$Image)
boxplot(Ligand ~ Image, data=dfT,col="yellow",main="COM variability per image",ylab="Distances")
boxplot(Protein ~ Image, data=dfT,col="green",add=TRUE)
legend(16, 5, c("Ligand", "Protein"),fill = c("yellow", "green"))

6.- Conclussions

The data seems to suggest that the motions of the Protein COM are similar for all the images and for almost all cases these motions are notably minor compared with the similar motions observed in the Ligand COM. Therefore it is likely that the observed 3D motions in the ligand COM are indeed caused by the effect of the OTF procedure.