1.- Overview

To have a similar framework of reference for the of the fly (OTF) string method calculations, all the images were prepared using a protein framework with identical spatial orientation. Additionally, the orientation (and absolute position) of the protein was kept relatively constant but still with some flexibility by using the collective variables (CVs) module in namd simulations. In this report we check the effectivity of the selective namd CVs and parameters in keeping the protein framework relatively invariant.

2.- Data

For this exercise we will use as data the output trajectory CV files from namd simulations for each singleimage during MD simulations.

cv1<-read.table("cvTraj1")
head(cv1,2)
##    V1         V2 V3        V4 V5           V6 V7          V8 V9
## 1   0 0.11135711  ( 0.9996217  , -0.006724198  , 0.007942777  ,
## 2 100 0.08263418  ( 0.9996180  , -0.007951545  , 0.007439634  ,
##           V10 V11
## 1 -0.02545810   )
## 2 -0.02540209   )

We are particularly interested in the columns V4,V6,V8, and V10 wich represent the orientation as quaternions, that is, these four numbers represent a unit quaternion with the best fit rotations from a selected coordinates framework. We therefore will create a function that:

#build the name of the file
buildName<-function(n){
        a<-paste('cvTraj',n,sep="")
    a
}
#create the total data frame
getVars<-function(N){
        da<-read.table(buildName(1))
        sa<-data.frame(c(da[4],da[6],da[8],da[10]))
        dfa<-0
        for (i in 2:dim(sa)[1]){
                dfa<-c(dfa,sqrt(sum((sa[1,]-sa[i,])^2)))
        }
        dfa<-dfa[2:(dim(sa)[1])]
        for (i in 2:N){
                d<-read.table(buildName(i))
                s<-data.frame(c(d[4],d[6],d[8],d[10]))
                df<-0
                for (i in 2:dim(s)[1]){
                        df<-c(df,sqrt(sum((s[1,]-s[i,])^2)))
                }
                df<-df[2:(dim(s)[1])]
                dfa<-cbind(dfa,df)
        }
        dfa
}
dfT<-getVars(2)
head(dfT,3)
##              dfa          df
## [1,] 0.001327661 0.003503144
## [2,] 0.005254866 0.005733738
## [3,] 0.008991973 0.008196084

3.- Augmented Dickey-Fuller Test

The Augmented Dickey-Fuller Test (ADFT) is extensively used to test the stationarity of time series. This test has been documented elsewhere( http://en.wikipedia.org/wiki/Augmented_Dickey%E2%80%93Fuller_test) and we also created a python notebook on the corresponding theory (link here!!).

For the practical pourposes of this report is enough to say that for an input time series, ADFT returns an statistics; if this number is lower than the test critical values (99%,95%, and 90%) then the time series is stationary. Here we willonly compare with the first critical value (if stationarity is proven in this case, then is the same result is true for the other critocal values).

ADFT using “artificial” data

First we build two simple time series:

#Creating simple examples of stationary and non-stationary data
stationary<-rnorm(250)
nonStationary<-cumsum(stationary)
par(mfrow=c(1,2))
plot(stationary,main="Stationary Data",col="blue",type="l")
plot(nonStationary,main="Non-Stationary Data",col="red",type="l")

And now we apply ADFT to these time series

#Apply ADFT
library(urca)
## Warning: package 'urca' was built under R version 3.1.3
S <- ur.df(stationary,   type="none",lags=0)
NS<- ur.df(nonStationary,type="none",lags=0)
#extracting the test statistics and the first critical value(99%)
results<-c(slot(S,"teststat")[1],slot(S,"cval")[1],slot(NS,"teststat")[1],slot(NS,"cval")[1])
results
## [1] -15.1572245  -2.5800000  -0.4510948  -2.5800000

Is the test statistics for Stationary data (-15.1572245) < the first critical value (-2.58)?

## [1] TRUE

then the series is stationary.

Is the test statistics for Non-Stationary data (-0.4510948) < the first critical value (-2.58)?

## [1] FALSE

then the series is Non-Stationary.

ADFT using quaternion distance data

#Getting the quaternion distance data for a singe image
Image=1
da<-read.table(buildName(Image))
sa<-data.frame(c(da[4],da[6],da[8],da[10]))
dfa<-0
for (i in 2:dim(sa)[1]){
        dfa<-c(dfa,sqrt(sum((sa[1,]-sa[i,])^2)))
}
dfa<-dfa[2:(dim(sa)[1])]

#Applying ADFT
library(urca)
testResults<-ur.df(dfa,type="none",lags=0)
summary(testResults)
## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0072959 -0.0013415  0.0000736  0.0015173  0.0085751 
## 
## Coefficients:
##           Estimate Std. Error t value Pr(>|t|)    
## z.lag.1 -0.0041385  0.0009149  -4.523 6.16e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.002134 on 9998 degrees of freedom
## Multiple R-squared:  0.002042,   Adjusted R-squared:  0.001942 
## F-statistic: 20.46 on 1 and 9998 DF,  p-value: 6.16e-06
## 
## 
## Value of test-statistic is: -4.5232 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

As we can see the value of the ADFT statistics (-4.5232429) is smaller than the most conservative critical value at 99% (-2.58), therefore, the time series is stationary.

Now we will repeat this same calculation for every single image (this calculation is a bit more intensive):

#Getting the quaternion distance data for all images
Nimg=21 #total number of images
dfT<-getVars(Nimg)
FinalResults<-""
for (i in 1:Nimg){
        test<-ur.df(dfT[,i],type="none",lags=0)
        if (slot(test,"teststat")[1] < slot(test,"cval")[1]){
                FinalResults<-c(FinalResults,"Stationary")
        } else {
                FinalResults<-c(FinalResults,"Non-Stationary")
        }
}
FinalResults<-FinalResults[2:(Nimg+1)]
FinalResults
##  [1] "Stationary" "Stationary" "Stationary" "Stationary" "Stationary"
##  [6] "Stationary" "Stationary" "Stationary" "Stationary" "Stationary"
## [11] "Stationary" "Stationary" "Stationary" "Stationary" "Stationary"
## [16] "Stationary" "Stationary" "Stationary" "Stationary" "Stationary"
## [21] "Stationary"

4.- Conclusions

The results indicate that the quaternion distance time series for every image is stationary, therefore we can conclude that namd quaternion CV is working properly and that there are no signifficant rotations of the protein with respect to the first reference structure.