1.- Overview

When performing ‘on the fly’ (OTF) string calculations in this project, the most interesting and direct output data are the strings (one string at each iteration) formed by images. Each image on one specific itaration has the information of the collective variable (CV) at that moment. In this report we will made a first explotarion of this data that will include a raw visualization of the strings and their convergence.

2.- Data

In our case we used as CVs the coordinates of the center of mass of the ligand (DIH) asit approaches to the binding site of the protein PNP. Therefore if a calculation for a binding path has N images and M iterations then the file strin[M].dat is a file of N lines and in each line we will have the CVs for that image at iteration M. As an example we can load and see the content of one of these string data files.

string1<-read.table("string1.dat")
head(string1,4)
##        V1      V2      V3
## 1 26.3416 43.5485 64.3612
## 2 27.0886 43.2795 63.5288
## 3 27.5605 44.1171 63.1721
## 4 27.4592 45.0991 62.6164

We can see that in these files the coordinates x,y,z of the center of mass of the ligand (our CVs) are orgalized in three columns labeled V1,V2, and V3. and In total we have 21 images (lines)

dim(string1)
## [1] 21  3

3.- 3D plot of the strings

First we loaded three strings into their respective data frames:

stringA<-read.table("string1.dat")
stringB <-read.table("string500.dat")
stringC<-read.table("string1891.dat")

3D plots in markdown require seting up some options that are commented bellow:

library(knitr) #load the library to 'knit' a markdown doc in R.
library(rgl)   #load the 3D ploting library 'rgl' 
source("hooks-extra.R")  #source some extra functions to insert interactive 3d scenes in html 
knit_hooks$set(webgl = hook_webgl) #define the hooks

Fianly we can visualize the strings:

#ploting the strings
#string A
plot3d(stringA$V1, stringA$V2, stringA$V3,col="blue",type="s",size=.7,xlab="X",ylab="Y",zlab="Z")
lines3d(stringA$V1, stringA$V2, stringA$V3,col="blue",add=T)
#string B
plot3d(stringB$V1, stringB$V2, stringB$V3,col="green",type="s",size=.7,add=T)
lines3d(stringB$V1, stringB$V2, stringB$V3,col="green",add=T)
#string C
plot3d(stringC$V1, stringC$V2, stringC$V3,col="red",type="s",size=.7,add=T)
lines3d(stringC$V1, stringC$V2, stringC$V3,col="red",add=T)

snapshot
You must enable Javascript to view this page properly.

4.- Convergence

Although the exploratory 3D visualization of some of the strings during the itaration of the OTF method is already informative (it allow us to have a fast snapshot of the evolution of these calculations), it is important to have other complementary and if possible more precise measures. A better representation of the evolution of these canculation can be obtained by ploting a measure of convergence. While in the accompanying python functions of this work we used a normalized version of the Frechet distance http://en.wikipedia.org/wiki/Frechet_distance, here we will made a simplyfied non-normalized version that works just fine.

4.1.- Loading all strings in a single data frame

Unlike in the previous section, here we will use every single string[M].dat file from out OTF string method calculations. To facilitate the calculation of the convergence we will first build a function to create a single data frame containing all the strings. As each string has 3 variables (V1-V3), we will reorder these variables in a single column before adding this colum to the final data frame. Here is the code:

#this function just creates the name of the file given its index. 
#e.g., for index n=5,the file name should be string5.dat
buildName<-function(n){
        varName<-paste('string',n,'.dat',sep="")
    return(varName)
}

#This function uses the function 'buildName' and additionally returns one data frame
#in which each column has the V1-V3 values for a single string (e.g. string5.dat)
#the input value 'M' is the total number of string[i].dat files
pasteDF<-function(M){
        d1<-read.table(buildName(1))
    V1<-c(d1$V1,d1$V2,d1$V3)
    bigDataFrame<-data.frame(V1)
    for (i in 2:M){
        t<-read.table(buildName(i))
        v<-c(t$V1,t$V2,t$V3)
        s<-data.frame(v)
        names(s)[1]<-paste('V',i,sep="")
        bigDataFrame<-data.frame(bigDataFrame,s)        
    }
    return(bigDataFrame)
}

Now we can use the previous functions to create a big data frame with the first 1000 strin[i].dat files.

df<-pasteDF(1891)
dim(df)
## [1]   63 1891

Notice that the dimensions of the new data frame are 1891 (number of strings) and 63= 21*3 the V1-V3 coordinates for each string.

4.2.- Calculating the convergence using the ‘big’ data frame

Now we present simplified form of the convergence wrap up in a function named conDF:

#convergence function
convDF<-function(df){
        conv<-0
    for (i in 2:dim(df)[2]){
        conv<-c(conv,sqrt(sum((df[1]-df[i])^2)))
    }
    return(conv)
}

A plot of the convergence is now simply:

#convergence plot
plot(convDF(df),xlab="Iterations",ylab="Convergence",type="s")

5.- Conclusions

Given that the OTF string method calculation seems to have arrived to convergence it is now appropriate to select structures at the end of the iteration for reparametrization to a larger number of images or for a direct calculations of the forces (and associated free energy profile).