Exponential growth phase selection in Ecophytor

Introduction

In growth curves, a plot of log(biomass) versus time typically follows a sigmoid shape with three phases: an asymptotic phase followed by a steep increasing phase and then an asymptotic phase again.

The two asymptotic regions of the curve are the “lag” and “stationary” phases of the curve. The lag phase is found at the beginning of the experiment when the population is adapting to a new environment. The stationary phase is found at the end of the experiment when the population has reached its maximum capacity.

However, scientists usually are interested in the steep central region of this curve where growth behavior is closer to an exponential growth. This steep central region is the exponential growth phase. By fitting the central region of this curve to a straight line, the growth rate of the population can be computed.

Traditionally this method often requires selecting by hand the points in the curve that belong to the exponential phase, a process that can be very tedious if we need to fit many curves.

Here we describe the process used in Ecophytor to select the exponential growth phase in a growth curve automatically. It’s important to notice that this selection is just a guideline and that by no means is a substitute for a well-trained eye.

Method

Buchanan et. al (1997) fit a three-phase segmented (or broken-line) linear relationship to the data. In an ideal growth curve, this would yield two segments for the asymptotic phases and one segment for the exponential phase. However, experimental data not always includes both asymptotes. Moreover, experimental data sometimes includes a decay period – named death phase – at the end of the experiment in which the population decreases rather than increasing. Sometimes we can also observe a reduction in the population at the beginning of the experiment.

Dealing with population decays.

First the curve is trimmed to keep only a continuous segment of positive growth. The curve is trimmed starting by its end and start points. This way we always keep the central section and we avoid selecting a positive growth segment after the death phase in a very old culture.

Dealing with missing asymptotes.

Sometimes the data does not include one or both of the asymptotes. In this case, fitting a three-phase is not the most appropriate method. That’s why here we fit the curve to broken-line relationships using 1, 2 and three segments, and then we select the best fit. We limit the number of segments to three at most because after trimming the curve it’s reasonable to assume that the exponential and stationary phases are only phases in our remaining data.

These models are not nested. Therefore, we cannot rely on a likelihood ratio test to select the best one. We will use AIC instead.

The data

We will use some example data. Let’s suppose that we have two species A and B that are grown at seven temperatures. There are three replicates for each temperature and species combination. Therefore, we have a growth curve for each Species, Temperature and Replicate (48 curves). The data is stored in an Excel spreadsheet.

For each growth curve, we have a daily biomass measure. Date and time are stored in column “datetime”, and the biomass measured in the “biomass” column. Instead of working with full dates we will use a simpler representation for our data: days since our first data point in the curve. We also compute the log of the biomass and store it in column “log.biomass”.

We can identify each curve by its Species, Temperature and Replicate identifier. We also compute log of the biomass.

Selecting exponential growth phase. Lag and starionary phases present,no death phase. Easy case, curve “A 10 1”

We will start our analysis with an easy case, curve “A 10 1”. A curve that includes data for its lag phase, exponential and stationary phase and the three are easy to the identify.

Log biomass always increases, so we don’t need to remove any data.

As described in the methods section, we will fit 3 models to this data. First a regular linear model (one segment) and then two segmented linear models with 2 and 3 segments using the segmented function from the segmented package.

Regular linear model (one segment)

# Data is stored in a data.frame named growing.data

segments.1 <- lin.mod <- lm(log.biomass~days,growing.data[,c("days","log.biomass")])

summary(segments.1)

## 
## Call:
## lm(formula = log.biomass ~ days, data = growing.data[, c("days", 
##     "log.biomass")])
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.15635 -0.10074  0.01130  0.05876  0.23414 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.24205    0.07143   17.39 1.22e-07 ***
## days         0.19993    0.01352   14.79 4.31e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1277 on 8 degrees of freedom
## Multiple R-squared:  0.9647, Adjusted R-squared:  0.9603 
## F-statistic: 218.6 on 1 and 8 DF,  p-value: 4.31e-07

Two segments

# We need to provide to the segmented function with one init break point.

breaks <- 1

# Build a sequence of breaks+2 points including the first and last day in the data.
first.data.day <- growing.data$days[1]
last.data.day <- growing.data$days[nrow(growing.data)]

break.points <- seq(first.data.day,last.data.day,length.out=breaks+2)

# Remove the first and last days
break.points <- break.points[2:(length(break.points)-1)]

# Compute the 2-phase fit using the linear model as a starting point
segments.2 <- segmented(lin.mod,seg.Z = ~days,psi=break.points)

summary(segments.2)

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = lin.mod, seg.Z = ~days, psi = break.points)
## 
## Estimated Break-Point(s):
##    Est. St.Err 
##  7.492  1.703 
## 
## t value for the gap-variable(s) V:  0 
## 
## Meaningful coefficients of the linear terms:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.20408    0.08202  14.680 6.28e-06 ***
## days         0.21433    0.02070  10.356 4.74e-05 ***
## U1.days     -0.10270    0.09399  -1.093       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.131 on 6 degrees of freedom
## Multiple R-Squared: 0.9721,  Adjusted R-squared: 0.9582 
## 
## Convergence attained in 2 iterations with relative change 5.339068e-16

Three segments

The code used here is the same than in the two segments, case changing breaks from 2 to three.

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = lin.mod, seg.Z = ~days, psi = break.points)
## 
## Estimated Break-Point(s):
##             Est. St.Err
## psi1.days 2.374 0.6351
## psi2.days 6.737 0.7574
## 
## t value for the gap-variable(s) V:  0 0 
## 
## Meaningful coefficients of the linear terms:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.35616    0.06934  19.559 4.03e-05 ***
## days         0.07523    0.06437   1.169    0.307    
## U1.days      0.20374    0.06904   2.951       NA    
## U2.days     -0.17800    0.05999  -2.967       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07795 on 4 degrees of freedom
## Multiple R-Squared: 0.9934,  Adjusted R-squared: 0.9852 
## 
## Convergence attained in 2 iterations with relative change -4.368524e-16

Select the model

Although the model with 3 segments is clearly the best, we cannot rely only in visual inspection of if we are automatizing this in some way. So we will use another criteria to compare the models.

As explained in the methods section, these models are not nested and we cannot rely on a likelihood ratio test to select the best one. We will use AIC instead. As we can see in the next figure the model with 3 segments clearly beats the other two.

models.AIC <- data.frame(segments=c(1,2,3),AIC=c(AIC(segments.1),AIC(segments.2),AIC(segments.3)))

ggplot(models.AIC,aes(x=segments,y=AIC))+geom_line()+geom_point()

Select the segment with the highest slope.

Once we have selected our best model, we need to choose the data range that spans the exponential growth phase.

# Clearly the exponential growth phase segmentshould have the highest slope. The coefficients of the segmented model are not the slopes of the segment, but the increment in slope with respect the previous segment. First we compute the actual slopes

slopes <- cumsum(coefficients(segments.3)[2:4])

# Which one is the highest?

exponential.phase.segment <- which.max(slopes)

# Now we define which are the intervals for each segment
first.point <- growing.data$days[1]
break.points <- segments.3$psi[,2]
last.point <- growing.data$days[nrow(growing.data)]
intervals <- c(first.point,break.points,last.point)

# Determine in which interval is included each point in the curve

which.interval <- findInterval(growing.data$days,intervals,rightmost.closed=TRUE,all.inside=TRUE)
                  
# Select points in the exponential phase segment
exponential.data <- growing.data[which.interval==exponential.phase.segment,]

References

Buchanan, R.L., Whiting, R.C., Damert, W.C., 1997. When is simple good enough: a comparison of the Gompertz, Baranyi, and three-phase linear models for fitting bacterial growth curves. Food Microbiology 14, 313–326. doi:10.1006/fmic.1997.0125