Firstly you need to setup the R environment so that it is ready for data analysis.
## This clears the R workspace
rm(list=ls())
## This installs the libraries for file import and R-markdown
library(knitr)
library(kableExtra)
library(tinytex)
# This is the library for the nonlinear model
library(nlme)
## This installs the plotting library
library(ggplot2)
The next step is loading and cleaning the data.
First things to note about the data is that there are 5 series labelled 101 to 105. There are measures of concentrations for these series at times 0 to 140 second in 7 second intervals. At 0 time all concentrations are set to zero but this is a sort of missing value because it does not form part of the time series. This would indicate the addition of a “substrate” to start the process that is being measured.
The data has a very large range which means that a plot of the raw values is not going to display a pattern very easily. For this reason the data needs to be transformed. There are two common transformations for the data either the logarithm or the reciprocal. As 0 is a value in the dataset to prevent infinite values we also need to add 1 to the values before taking the reciprocal or the logarithm.
\[\frac{1}{1+CONC}\]
<- read.csv("Test1.csv")
data str(data)
## 'data.frame': 105 obs. of 3 variables:
## $ ID : int 101 101 101 101 101 101 101 101 101 101 ...
## $ TIME: int 0 7 14 21 28 35 42 49 56 63 ...
## $ CONC: num 0 9020.6 3110.1 69.4 26.3 ...
$ID <- as.factor(data$ID)
data
# Remove the time point 0 data
$CONC[data$TIME==00]<- NA
data<- na.omit(data)
data
# As the data is across a wide range of values the data also needs to be transformed
# These are the logarithmic values
$log_conc <- (log(1+data$CONC))
data# This is the reciprocal values
$recip <- 1/(1+data$CONC) data
This is the most important step of data analysis as it helps to define the subsequent analysis that you will carry out. You need to carry out exploratory data analysis in order to check that the data is well behaved and so that you can refine your hypotheses for statistical testing.
## Scatterplots in the basic R graphics
plot(log_conc~TIME, data)
plot(recip~TIME, data)
## Boxplot in the basic R graphics
plot(log_conc~ID, data)
plot(recip~ID, data)
## These are the ggplot2 versions
ggplot(data, aes(TIME, log_conc, colour=ID))+
geom_point()
ggplot(data, aes(TIME, recip, colour=ID))+
geom_point()
Time and concentration are associated and the reciprocal plot seems to the be best choice both from the shape of the curves and the distribution of data in the boxplots. It is also bounded by 0 and 1 which makes it easier to fit a curve to the data as it is clearly not linearly related.
cor.test(data$TIME, data$recip, method="spearman")
## Warning in cor.test.default(data$TIME, data$recip, method = "spearman"): Cannot
## compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: data$TIME and data$recip
## S = 32189, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.8068495
cor.test(data$TIME, data$log_conc, method="spearman")
## Warning in cor.test.default(data$TIME, data$log_conc, method = "spearman"):
## Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: data$TIME and data$log_conc
## S = 301111, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.8068495
The boxplots suggested that there is a difference between the reciprocal of the concentration in the different sample groups. As there are more than two groups we need to carry out a one-way ANOVA rather than t-tests.
The results of the one way ANOVA confirm that there is a difference between the five groups.
<- aov(recip~ID, data)
res.aov summary(res.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## ID 4 4.363 1.0906 8.829 4.18e-06 ***
## Residuals 95 11.735 0.1235
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now that we have simple sigmoid curves for the different concentrations we can fit a nonlinear model which has the general form of:
\[y=\frac{1}{1+e^{-x}}\]
The actual form used in R to account for shifts of the maximum and the positions of the point of inflection as well as it steepness is:
\[y= \frac{Asym}{1+e^{\frac{xmid-x}{scal}}}\]
<- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==101)
fit_101 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==102)
fit_102 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==103)
fit_103 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==104)
fit_104 <- nls(recip~SSlogis(TIME, Asym, xmid, scal), data, subset=ID==105)
fit_105 summary(fit_101)
##
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 0.996716 0.002938 339.30 <2e-16 ***
## xmid 65.443040 0.183218 357.19 <2e-16 ***
## scal 12.380081 0.154053 80.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.006436 on 17 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 3.667e-06
summary(fit_102)
##
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 0.67183 0.06671 10.07 1.40e-08 ***
## xmid 105.73545 7.51353 14.07 8.49e-11 ***
## scal 33.29934 2.84688 11.70 1.49e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01693 on 17 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 2.179e-06
summary(fit_103)
##
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 0.985094 0.006997 140.78 < 2e-16 ***
## xmid 39.396734 0.475087 82.92 < 2e-16 ***
## scal 8.765785 0.413783 21.18 1.17e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02243 on 17 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 3.397e-06
summary(fit_104)
##
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 9.998e-01 4.249e-04 2353.1 <2e-16 ***
## xmid 4.699e+01 2.149e-02 2186.5 <2e-16 ***
## scal 4.969e+00 1.872e-02 265.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.001421 on 17 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 4.261e-06
summary(fit_105)
##
## Formula: recip ~ SSlogis(TIME, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 9.919e-01 5.073e-03 195.5 <2e-16 ***
## xmid 1.053e+02 2.419e-01 435.1 <2e-16 ***
## scal 1.728e+01 1.316e-01 131.3 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00272 on 17 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 8.58e-06
The models can then be plotted and you can see that there is a good fit.
plot(recip~TIME, data)
<- (seq(0,140,length.out=21))
TimeVec lines(TimeVec, predict(fit_101, data.frame(TIME=TimeVec)),lty=2)
lines(TimeVec, predict(fit_102, data.frame(TIME=TimeVec)),lty=3)
lines(TimeVec, predict(fit_103, data.frame(TIME=TimeVec)),lty=4)
lines(TimeVec, predict(fit_104, data.frame(TIME=TimeVec)),lty=5)
lines(TimeVec, predict(fit_105, data.frame(TIME=TimeVec)),lty=6)
There is a good fit and so the last issue is interepreting them in the context of the original variable TIME and CONC.
\[\frac{1}{1+CONC}= \frac{Asym}{1+e^{\frac{xmid-TIME}{scal}}}\]
Asym is usually 1 as the sigmoid curves are bounded by 1 except for the data from 102. For the full sigmoid curves where the final value approaches 1.
\[CONC=e^{\frac{xmid-TIME}{scal}}\] Or more generally:
\[Asym(1+CONC)=1+e^{\frac{xmid-TIME}{scal}}\] Where the model parameters can be found in the following table:
<- read.csv("Yuki_models.csv", header=TRUE)
models kable(models) %>%
kable_styling()
Data | Asym | Xmid | Scal |
---|---|---|---|
101 | 0.9967 | 65.44 | 12.380 |
102 | 0.6718 | 105.70 | 33.300 |
103 | 0.9851 | 39.40 | 8.766 |
104 | 0.9998 | 46.99 | 4.969 |
105 | 0.9919 | 105.30 | 17.280 |