1 Introduction

Firstly you need to setup the R environment so that it is ready for data analysis.

## This clears the R workspace  
rm(list=ls())

## This installs the libraries for file import and R-markdown
library(knitr)
library(kableExtra)
library(tinytex)

# This is the library for the nonlinear model
library(nlme)


## This installs the plotting library
library(ggplot2)

The next step is loading and cleaning the data.

First things to note about the data is that there are 5 series labelled 101 to 105. There are measures of concentrations for these series at times 0 to 140 second in 7 second intervals. At 0 time all concentrations are set to zero but this is a sort of missing value because it does not form part of the time series. This would indicate the addition of a “substrate” to start the process that is being measured.

The data has a very large range which means that a plot of the raw values is not going to display a pattern very easily. For this reason the data needs to be transformed. There are two common transformations for the data either the logarithm or the reciprocal. As 0 is a value in the dataset to prevent infinite values we also need to add 1 to the values before taking the reciprocal or the logarithm.

\[\frac{1}{1+CONC}\]

data <- read.csv("Test1.csv")
str(data)

## 'data.frame':    105 obs. of  3 variables:
##  $ ID  : int  101 101 101 101 101 101 101 101 101 101 ...
##  $ TIME: int  0 7 14 21 28 35 42 49 56 63 ...
##  $ CONC: num  0 9020.6 3110.1 69.4 26.3 ...

data$ID <- as.factor(data$ID)

# Remove the time point 0 data
data$CONC[data$TIME==00]<- NA
data <- na.omit(data)

# As the data is across a wide range of values the data also needs to be transformed
# These are the logarithmic values
data$log_conc <- (log(1+data$CONC))
# This is the reciprocal values
data$recip <- 1/(1+data$CONC)

2 Exploratory Data Analysis

This is the most important step of data analysis as it helps to define the subsequent analysis that you will carry out. You need to carry out exploratory data analysis in order to check that the data is well behaved and so that you can refine your hypotheses for statistical testing.

## Scatterplots in the basic R graphics
plot(log_conc~TIME, data)

plot(recip~TIME, data)

## Boxplot in the basic R graphics
plot(log_conc~ID, data)

plot(recip~ID, data)

## These are the ggplot2 versions 
ggplot(data, aes(TIME, log_conc, colour=ID))+
  geom_point()

ggplot(data, aes(TIME, recip, colour=ID))+
  geom_point()

Time and concentration are associated and the reciprocal plot seems to the be best choice both from the shape of the curves and the distribution of data in the boxplots. It is also bounded by 0 and 1 which makes it easier to fit a curve to the data as it is clearly not linearly related.

2.1 Correlation Analysis

cor.test(data$TIME, data$recip, method="spearman")

## Warning in cor.test.default(data$TIME, data$recip, method = "spearman"): Cannot
## compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  data$TIME and data$recip
## S = 32189, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.8068495

cor.test(data$TIME, data$log_conc, method="spearman")

## Warning in cor.test.default(data$TIME, data$log_conc, method = "spearman"):
## Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  data$TIME and data$log_conc
## S = 301111, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.8068495

3 Statistical Tests

3.1 One Way ANOVA

The boxplots suggested that there is a difference between the reciprocal of the concentration in the different sample groups. As there are more than two groups we need to carry out a one-way ANOVA rather than t-tests.

The results of the one way ANOVA confirm that there is a difference between the five groups.

res.aov <- aov(recip~ID, data)
summary(res.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## ID           4  4.363  1.0906   8.829 4.18e-06 ***
## Residuals   95 11.735  0.1235                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.2 The Nonlinear Model

Now that we have simple sigmoid curves for the different concentrations we can fit a nonlinear model which has the general form of:

\[y=\frac{1}{1+e^{-x}}\]

The actual form used in R to account for shifts of the maximum and the positions of the point of inflection as well as it steepness is:

\[y= \frac{Asym}{1+e^{\frac{xmid-x}{scal}}}\]

fit_101 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==101)
fit_102 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==102)
fit_103 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==103)
fit_104 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==104)
fit_105 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==105)
summary(fit_101)

## 
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## Asym  0.996716   0.002938  339.30   <2e-16 ***
## xmid 65.443040   0.183218  357.19   <2e-16 ***
## scal 12.380081   0.154053   80.36   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.006436 on 17 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 3.667e-06

summary(fit_102)

## 
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## Asym   0.67183    0.06671   10.07 1.40e-08 ***
## xmid 105.73545    7.51353   14.07 8.49e-11 ***
## scal  33.29934    2.84688   11.70 1.49e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01693 on 17 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 2.179e-06

summary(fit_103)

## 
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## Asym  0.985094   0.006997  140.78  < 2e-16 ***
## xmid 39.396734   0.475087   82.92  < 2e-16 ***
## scal  8.765785   0.413783   21.18 1.17e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02243 on 17 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 3.397e-06

summary(fit_104)

## 
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## Asym 9.998e-01  4.249e-04  2353.1   <2e-16 ***
## xmid 4.699e+01  2.149e-02  2186.5   <2e-16 ***
## scal 4.969e+00  1.872e-02   265.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001421 on 17 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 4.261e-06

summary(fit_105)

## 
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## Asym 9.919e-01  5.073e-03   195.5   <2e-16 ***
## xmid 1.053e+02  2.419e-01   435.1   <2e-16 ***
## scal 1.728e+01  1.316e-01   131.3   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.00272 on 17 degrees of freedom
## 
## Number of iterations to convergence: 0 
## Achieved convergence tolerance: 8.58e-06

The models can then be plotted and you can see that there is a good fit.

plot(recip~TIME, data)
TimeVec <- (seq(0,140,length.out=21))
lines(TimeVec, predict(fit_101, data.frame(TIME=TimeVec)),lty=2)
lines(TimeVec, predict(fit_102, data.frame(TIME=TimeVec)),lty=3)
lines(TimeVec, predict(fit_103, data.frame(TIME=TimeVec)),lty=4)
lines(TimeVec, predict(fit_104, data.frame(TIME=TimeVec)),lty=5)
lines(TimeVec, predict(fit_105, data.frame(TIME=TimeVec)),lty=6)

There is a good fit and so the last issue is interepreting them in the context of the original variable TIME and CONC.

\[\frac{1}{1+CONC}= \frac{Asym}{1+e^{\frac{xmid-TIME}{scal}}}\]

Asym is usually 1 as the sigmoid curves are bounded by 1 except for the data from 102. For the full sigmoid curves where the final value approaches 1.

\[CONC=e^{\frac{xmid-TIME}{scal}}\] Or more generally:

\[Asym(1+CONC)=1+e^{\frac{xmid-TIME}{scal}}\] Where the model parameters can be found in the following table:

models <- read.csv("Yuki_models.csv", header=TRUE) 
kable(models) %>%
  kable_styling()