In growth curves, a plot of log(biomass) versus time typically follows a sigmoid shape with three phases: an asymptotic phase followed by a steep increasing phase and then an asymptotic phase again.
The two asymptotic regions of the curve are the “lag” and “stationary” phases of the curve. The lag phase is found at the beginning of the experiment when the population is adapting to a new environment. The stationary phase is found at the end of the experiment when the population has reached its maximum capacity.
However, scientists usually are interested in the steep central region of this curve where growth behavior is closer to an exponential growth. This steep central region is the exponential growth phase. By fitting the central region of this curve to a straight line, the growth rate of the population can be computed.
Traditionally this method often requires selecting by hand the points in the curve that belong to the exponential phase, a process that can be very tedious if we need to fit many curves.
Here we describe the process used in Ecophytor to select the exponential growth phase in a growth curve automatically. It’s important to notice that this selection is just a guideline and that by no means is a substitute for a well-trained eye.
Buchanan et. al (1997) fit a three-phase segmented (or broken-line) linear relationship to the data. In an ideal growth curve, this would yield two segments for the asymptotic phases and one segment for the exponential phase. However, experimental data not always includes both asymptotes. Moreover, experimental data sometimes includes a decay period – named death phase – at the end of the experiment in which the population decreases rather than increasing. Sometimes we can also observe a reduction in the population at the beginning of the experiment.
First the curve is trimmed to keep only a continuous segment of positive growth. The curve is trimmed starting by its end and start points. This way we always keep the central section and we avoid selecting a positive growth segment after the death phase in a very old culture.
Sometimes the data does not include one or both of the asymptotes. In this case, fitting a three-phase is not the most appropriate method. That’s why here we fit the curve to broken-line relationships using 1, 2 and three segments, and then we select the best fit. We limit the number of segments to three at most because after trimming the curve it’s reasonable to assume that the exponential and stationary phases are only phases in our remaining data.
These models are not nested. Therefore, we cannot rely on a likelihood ratio test to select the best one. We will use AIC instead.
We will use some example data. Let’s suppose that we have two species A and B that are grown at seven temperatures. There are three replicates for each temperature and species combination. Therefore, we have a growth curve for each Species, Temperature and Replicate (48 curves). The data is stored in an Excel spreadsheet.
For each growth curve, we have a daily biomass measure. Date and time are stored in column “datetime”, and the biomass measured in the “biomass” column. Instead of working with full dates we will use a simpler representation for our data: days since our first data point in the curve. We also compute the log of the biomass and store it in column “log.biomass”.
We can identify each curve by its Species, Temperature and Replicate identifier. We also compute log of the biomass.
We will start our analysis with an easy case, curve “A 10 1”. A curve that includes data for its lag phase, exponential and stationary phase and the three are easy to the identify.
Log biomass always increases, so we don’t need to remove any data.
As described in the methods section, we will fit 3 models to this data. First a regular linear model (one segment) and then two segmented linear models with 2 and 3 segments using the segmented function from the segmented package.
# Data is stored in a data.frame named growing.data
segments.1 <- lin.mod <- lm(log.biomass~days,growing.data[,c("days","log.biomass")])
summary(segments.1)
##
## Call:
## lm(formula = log.biomass ~ days, data = growing.data[, c("days",
## "log.biomass")])
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.15635 -0.10074 0.01130 0.05876 0.23414
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.24205 0.07143 17.39 1.22e-07 ***
## days 0.19993 0.01352 14.79 4.31e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1277 on 8 degrees of freedom
## Multiple R-squared: 0.9647, Adjusted R-squared: 0.9603
## F-statistic: 218.6 on 1 and 8 DF, p-value: 4.31e-07
# We need to provide to the segmented function with one init break point.
breaks <- 1
# Build a sequence of breaks+2 points including the first and last day in the data.
first.data.day <- growing.data$days[1]
last.data.day <- growing.data$days[nrow(growing.data)]
break.points <- seq(first.data.day,last.data.day,length.out=breaks+2)
# Remove the first and last days
break.points <- break.points[2:(length(break.points)-1)]
# Compute the 2-phase fit using the linear model as a starting point
segments.2 <- segmented(lin.mod,seg.Z = ~days,psi=break.points)
summary(segments.2)
##
## ***Regression Model with Segmented Relationship(s)***
##
## Call:
## segmented.lm(obj = lin.mod, seg.Z = ~days, psi = break.points)
##
## Estimated Break-Point(s):
## Est. St.Err
## 7.492 1.703
##
## t value for the gap-variable(s) V: 0
##
## Meaningful coefficients of the linear terms:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.20408 0.08202 14.680 6.28e-06 ***
## days 0.21433 0.02070 10.356 4.74e-05 ***
## U1.days -0.10270 0.09399 -1.093 NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.131 on 6 degrees of freedom
## Multiple R-Squared: 0.9721, Adjusted R-squared: 0.9582
##
## Convergence attained in 2 iterations with relative change 5.339068e-16
The code used here is the same than in the two segments, case changing breaks from 2 to three.
##
## ***Regression Model with Segmented Relationship(s)***
##
## Call:
## segmented.lm(obj = lin.mod, seg.Z = ~days, psi = break.points)
##
## Estimated Break-Point(s):
## Est. St.Err
## psi1.days 2.374 0.6351
## psi2.days 6.737 0.7574
##
## t value for the gap-variable(s) V: 0 0
##
## Meaningful coefficients of the linear terms:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.35616 0.06934 19.559 4.03e-05 ***
## days 0.07523 0.06437 1.169 0.307
## U1.days 0.20374 0.06904 2.951 NA
## U2.days -0.17800 0.05999 -2.967 NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07795 on 4 degrees of freedom
## Multiple R-Squared: 0.9934, Adjusted R-squared: 0.9852
##
## Convergence attained in 2 iterations with relative change -4.368524e-16
Although the model with 3 segments is clearly the best, we cannot rely only in visual inspection of if we are automatizing this in some way. So we will use another criteria to compare the models.
As explained in the methods section, these models are not nested and we cannot rely on a likelihood ratio test to select the best one. We will use AIC instead. As we can see in the next figure the model with 3 segments clearly beats the other two.
models.AIC <- data.frame(segments=c(1,2,3),AIC=c(AIC(segments.1),AIC(segments.2),AIC(segments.3)))
ggplot(models.AIC,aes(x=segments,y=AIC))+geom_line()+geom_point()
Once we have selected our best model, we need to choose the data range that spans the exponential growth phase.
# Clearly the exponential growth phase segmentshould have the highest slope. The coefficients of the segmented model are not the slopes of the segment, but the increment in slope with respect the previous segment. First we compute the actual slopes
slopes <- cumsum(coefficients(segments.3)[2:4])
# Which one is the highest?
exponential.phase.segment <- which.max(slopes)
# Now we define which are the intervals for each segment
first.point <- growing.data$days[1]
break.points <- segments.3$psi[,2]
last.point <- growing.data$days[nrow(growing.data)]
intervals <- c(first.point,break.points,last.point)
# Determine in which interval is included each point in the curve
which.interval <- findInterval(growing.data$days,intervals,rightmost.closed=TRUE,all.inside=TRUE)
# Select points in the exponential phase segment
exponential.data <- growing.data[which.interval==exponential.phase.segment,]