Introduction

Seedling emergence is a crucial event in the life cycle of plants as it determines the success or failure of the successive developmental stages and hence the fate of the new plant to come into existence. Unlike crops, which have been selected for temporally uniform emergence, weeds exhibit extreme variability in timing and magnitude of emergence, rendering the prediction of weed emergence (and other phenological events) an insurmountable challenge. This temporal variability acts as a bet-hedging strategy allowing a fraction of weed seedlings to survive by avoiding exposure to control or other types of disturbance. A quantitative knowledge of weed emergence is therefore needed to decide about the proper timing of applying control measures including chemical (e.g. POST herbicides) and nonchemical (e.g. cultivation) methods. A seed in the soil undergoes a multiphasic, linear process to emerge as a seedling above the soil surface. Three major steps can be envisioned for this process as shown in Figure 1:

**Figure 1.** Phases of seedling emergence.

Figure 1. Phases of seedling emergence.

Most weed species exhibit variable degrees of dormancy where their seeds may not germinate over a wide range of normal environmental conditions that are otherwise permissive to germination. Temperature is deemed to be the primary factor regulating the seasonal change in dormancy level though some seeds, particularly the small ones, require light to provoke germination. Germination starts in non-dormant seeds through passive water uptake by the seed followed by embryo expansion giving rise to the protrusion of radicle (embryonic root). Water availability and temperature are known to be the most important factors determining both the extend and rapidity of germination (more details below). While the radicle grows downward and produces lateral roots to anchors the germinant (germinated seed) in the soil, the shoot elongates towards the soil surface to reach light. In dicots, the emergence of shoot (plumule) can be epigeous or hypogeous. In epigeous seedlings, the hypocotyl (stem-like axis below the cotyledon) elongates and emerges as hook: hypocotyl pulls the cotyledon and plumule above the ground once straightens. In hypogeous seedlings, the hypocotyl does not grow–leaving the cotyledons underground: epicotyl (stem-like axis above the cotyledon) elongates and pulls the plumule up into the air. In monocots, both radicle and plumule are enclosed within a sheath structure known as coleorhiza or coleoptile, respectively. Coleorhiza grows through the seed coat (pericarp) first followed by rapid elongation and penetration of radicle through coleorhiza. Following the emergence of radicle, coleoptile penetrates the seed coat and grows upward but shortly after reaching the soil surface it stops to grow. The plumule and first leaves then emerge through the coleoptile.

Models of seedling emergence differ in their degree of complexity as to how they incorporate the above processes. In the following sections, I will provide a brief summary of various approaches used by researchers to predict seedling emergence, the mathematical underpinnings of some more common models, and the R codes for executing these models.

Types of Seedling Emergence Models

We can classify emergence models in several ways. Based on the emergence phases described above, a range of models have been developed that vary in the number or combination of processes they deploy to predict emergence. AlomySys is perhaps the most detailed mechanistic model that incorporates all the three phases to simulate the emergence pattern in Alopecurus myosuroides Huds. (Colbach et al., 2006). The model developed by Vleeshouwers & Kropff (2000) for Chenopodium album has separate sub-models for germination and shoot elongation but lacks the dormancy component. The most common type of emergence model is monophasic where the emergence of seedling is modeled as a single process with no distinct account of the dormancy or germination dynamics (e.g. Leblanc et al., 2004; Leon et al., 2015).

Inputs of Seedling Emergence Models

Emergence models can also differ in type or number of inputs that they use to characterize emergence. We can categorize inputs into environmental and species-specific variables (or parameters). Soil (or air) temperature, moisture, texture and compaction, burial depth, and light represent some of the environmental inputs used in models of emergence: owing to their high predictive power, soil temperature and moisture are the most important and widely used variables in mergence modeling. Species-specific inputs may include base temperature, base water potential, seed size, shoot/root relative growth rate, etc. It should be noted, however, that these inputs are often referred to as “parameters” of the model when estimated as part of the model fitting procedure. While species-specific parameters are tightly linked to the biology of plant, models may include another set of parameters (coefficients) that have no direct link to the plant biology. A good example of this latter type of parameters is the “shape” parameter in many s-shaped or sigmoid regression models. Shape parameter gives flexibility to the model but is not informative in terms of the physiological mechanisms.

What Does the Model Predict?

Emergence models also vary in what they predict i.e. their outputs. Obviously, the type of output generated by a model depends on the processes implemented in that model. For example, if the seedling emergence model has separate submodels for germination and shoot/root elongation, then this model should be able to predict the seed germination in the soil in addition to the emergence of seedlings above the soil. However, clarification is needed as to what we mean with predicting “seedling emergence” (or seed germination alike). There exist two quantities for seedling emergence (and seed germination): 1- absolute number (magnitude) of seedlings emerged, and 2- rapidity of emergence i.e. how much emergence per unit of time (or the length of time required for emergence of a given number of seedlings: the shorter this time, the faster the emergence/germination). As seedbank size is enormously variable across space and also difficult to measure, most seedling emergence models convert the “number” of seedlings to a “relative” measure expressed as the percentage or proportion of emergence. The total emergence is fixed at 100% (or 1 if using proportion or fraction) and the model predicts the amount of time required to any given emergence percentage (fraction). As result, the model can be used in different regions regardless of the level of weed infestation (seedbank size). We essentially model “time-to-emergence” rather than “percent emergence”, though the latter is often used in non-linear regression analysis. In other word, a model only distributes these percentiles across time as shown in Figure 2.

**Figure 2.** Time (horizontal dashed lines) to a given emergence percentile (vertical dashed lines) assuming that time-to-emergence is normally distributed with the mean 25 days and standard deviation 8.5 days. In this hypothetical example, it takes ~5 days for the first percentile to emerge while for the 99^th^ percentile (i.e. 99% emergence) time-to-emergence is 45 days. The first, 50^th^ (median), and the 99^th^ percentiles are shown with solid orange circles and lines.

Figure 2. Time (horizontal dashed lines) to a given emergence percentile (vertical dashed lines) assuming that time-to-emergence is normally distributed with the mean 25 days and standard deviation 8.5 days. In this hypothetical example, it takes ~5 days for the first percentile to emerge while for the 99th percentile (i.e. 99% emergence) time-to-emergence is 45 days. The first, 50th (median), and the 99th percentiles are shown with solid orange circles and lines.

When percent emergence (probability of emergence) is considered as the dependent variable, the cumulative distribution function (CDF) of a distribution (e.g. Weibull) is fitted to the data whilst the quantile function (i.e. inverse cumulative distribution function) is used when the dependent variable is time-to-emergence. For example, if time to emergence, \(t\), has a Weibull distribution, the probability of emergence, \(p\), as a function of time can be described by the Weibull CDF: \[ p=F(t)=1-exp{\left(-\frac{t}{\sigma}\right)}^{\lambda} \] where \(F(t)\) represents the CDF, σ is the scale parameter (if t=σ then p = 0.63), and \(\lambda\) is a shape parameter. The above formulation (with various modifications) is widely used in modeling seedling emergence but we can also use the inverse of the above function (i.e. its quantile, \(Q(p)=F^{-1})\) to directly model time (i.e. t): \[t=Q(p)=\sigma(-log(1-p))^{\frac{1}{\lambda}}\]
**Figure 3.** Two methods for fitting models to emergence data. Often the emergence proportion is modeled as a function of time using the cumulative distribution function (CDF) of a distribution. Alternatively, the quintile function of the same distribution can be used to directly model time to emergence data. In this hypothetical example, the CDF (left) and quantile (right) functions of a normal distribution with the mean μ and standard deviation σ were used (values in the parenthesis are standard errors). Note that when emergence proportion (percentage) is used as the dependent variable, the unit of error (root mean square error, RMSE) is proportion (percentage) while when time is the dependent variable, the unit of error is time e.g. day or hour. To be able to compare the size of error between the two methods, the normalized RMSE (*NRMSE*), was estimated, which is independent of the unit of variable ($NRMSE=rac{RMSE}{y_{max}-y_{min}}$, where $y_{max}$ and $y_{min}$ are the observed maximum and minimum values for the dependent variable) . Solid circles are the observed values and lines are the fitted models.

Figure 3. Two methods for fitting models to emergence data. Often the emergence proportion is modeled as a function of time using the cumulative distribution function (CDF) of a distribution. Alternatively, the quintile function of the same distribution can be used to directly model time to emergence data. In this hypothetical example, the CDF (left) and quantile (right) functions of a normal distribution with the mean μ and standard deviation σ were used (values in the parenthesis are standard errors). Note that when emergence proportion (percentage) is used as the dependent variable, the unit of error (root mean square error, RMSE) is proportion (percentage) while when time is the dependent variable, the unit of error is time e.g. day or hour. To be able to compare the size of error between the two methods, the normalized RMSE (NRMSE), was estimated, which is independent of the unit of variable (\(NRMSE= rac{RMSE}{y_{max}-y_{min}}\), where \(y_{max}\) and \(y_{min}\) are the observed maximum and minimum values for the dependent variable) . Solid circles are the observed values and lines are the fitted models.

We essentially swap \(x\) (independent) and \(y\) (dependent) when using the above equations. The two approaches may provide similar, but almost never identical, estimates for model parameters (Figure 3). One of the main issues with the quantile formulation is the difficulty with the estimation of \(t\) when \(p = 0\) or \(p = 1\): \(t\) is either inestimable or infinite (positive or negative) at these probability values.

What Is This Document About?

In this tutorial we are going to learn how to develop various models for seed germination and emergence in response to temperature and water, as the two most pivotal factors regulating the extent and rate (rapidity) of germination/emergence. Although the majority of codes and example are directed towards modeling seed germination, the same concept and methods will also apply to modeling seedling emergence.

I will start with thermal time models, otherwise know as Growing Degree Day, GDD models, and then expand them to include the effects of water availability i.e. hydrothermal time model. For thermal time model, we first explore the germination/emergence responses to sub-optimal temperature range followed by models that account for the inhibition occurring at supra-optimal temperatures. The main objective of this modeling practice is to gain some insight into biological aspects of modeling rather than the statistical issues. For example, the use of cumulative germination data can result in underestimated standard errors for model parameters but we will not worry about this issue here. For an in depths discussion pertinent to statistical issues you may refer to the following papers:

Initializing R and Rquired Packages

Before we start with R programming, create a folder for saving your R files and all other data and figures to be used or produced during this practice. Assuming that you have created a folder named as “Seedling_Emergence_Worksop”, we first set our working directory by using:

setwd("Provide the path to you folder here")

Now every code that you execute will use this directory as the source for inputs and outputs. The two main R packages needed for this modeling practice are drc and drcSeedGerm, though several other packages are required for data manipulation. The most updated versions of the above two packages are available on github. To install them through github you need devtools package. So, we install this package first:

install.packages('devtools')
library(devtools)

Now we install the following packages:

install_github('doseResponse/drc')
install_github('OnofriAndreaPG/drcSeedGerm')
install.packages('tidyverse') 
install.packages('ggplot2')
install.packages('gridExtra')
install.packages('reshape2')
install.packages('cowplot')

And load these packages:

library(tidyverse)
library(drc)
library(ggplot2)
library(gridExtra)
library(drcSeedGerm)
library(reshape2)
library(cowplot)

Thermal Time Model: Sub-Optimal Temperatures

For this model we will use the germination data available for barley in drcSeedGerm package. In this experiment the germination of barley seeds were tested across 9 constant temperatures: 1, 3, 7, 10, 15, 25, 30, and 40 °C, each replicated three times. These temperatures include both sub- and supra-optimal ranges but we will use the temperatures \(\leq\) 15 \(^{\circ}\)C as we are developing a model for sub-optimal range (the use of 15 \(^{\circ}\)C doesn’t mean that this is the optimal temperature \(T_o\) but we know that it is within the sub-optimal range). These are the main steps that we need to undertake to develop a thermal time (GDD) model:

  1. Read the data into R;
  2. Extract the data from the sub-optimal range i.e. \(\leq\) 15 \(^{\circ}\)C;
  3. Fit a sigmoid type nonlinear regression model to cumulative germination data from each individual temperature. A good candidate nonlinear model can be the logistic function: \[ G(t)=\frac{G_{max}}{1+exp(b(log(t)-log(t_{50})))} \]
    • \(G(t)\) = cumulative germination over time, \(t\)
    • \(G_{max}\) = maximum germination as \(t\) approaches infinity
    • \(b\) = slope around the inflection point
    • \(t_{50}\) = time at which germination is half the \(G_{max}\)
  4. Use the regression model to estimate time to various germination percetages (we will use \(g\) to refer to these germination percentages, also called as percentile or germination fraction and use \(t_g\) to indicate the time to a given \(g\));
  5. Calculate germination rate (rapidity), \(GR\), which is simply the reciprocal of time to a given germination percentile i.e. \[GR=\frac{1}{t_g}\]
  6. Investigate the relationship between \(GR\) and temperature, \(T\), for each individual \(g\) using a linear regression model of the form: \[ GR=b(T-T_b) \]
    • \(T_b\) = the base temperature (unit \(^{\circ}\)C)
    • \(b\) is the slope (unit \(1/time\) \(^{\circ}\)C, so the reciprocal of \(b\) is thermal time, \(\theta\), with unit \(time\) \(^{\circ}\)C)
  7. Find the relationship between the parameters (i.e. \(b\) and \(T_b\)) of the linear model and \(g\);
  8. Build the final thermal time model by integrating models from step 6 and 7. The details for each of the above steps are discussed below.

Step 1. Read the data into R

Let’s read in the barley data and view its first 6 rows:

data(barley)
head(barley, n=6)
##   Dish Temp timeBef timeAf nSeeds nCum propCum
## 1    1    1       0      1      0    0       0
## 2    1    1       1      2      0    0       0
## 3    1    1       2      3      0    0       0
## 4    1    1       3      4      0    0       0
## 5    1    1       4      5      0    0       0
## 6    1    1       5      6      0    0       0

Step 2. Subset the Data

For this practice we will only work with Temp, timeBef and propCumForm columns which represent temperature, time, and cumulative germination (proportion). As we are only interested in the sub-optimal range, we will use the germination data from temperatures \(\leq\) 15 \(^{\circ}\)C only. subset function does the job:

barley_sub <- subset(barley, Temp<=15)

We will also use the mean values to make things more tractable and simple (Note: in real data analysis we must use replicates NOT the means). We use aggregate function to calculate the means.

barley_sub_mean <- aggregate(propCum~Temp+timeBef,mean, data=barley_sub)

The Temp column is a continuous variable but we need to convert it to a factor variable to be able to fit the logistic equation for each level of Temp's individual:

barley_sub_mean$Temp<-as.factor(barley_sub_mean$Temp)

Let’s plot these data to see how the germination curves look like (Note the use of expression for adding the celsius symbol to the legend’s title):

g1<- ggplot(barley_sub_mean,aes(x= timeBef, y=propCum*100, color=Temp)) +
  geom_line()
g2<- g1+labs(x= "Time (day)" , y="Cumulative germination (%)", color=expression("Temperature " (degree*C)))
theme_set(theme_gray())
g2

If interested in saving the figure, use ggsave function:

ggsave("Fig3-Thermal-SubOpt", plot = g2, units = "cm", width = 18, height = 12, device = "jpeg", dpi = 350)

Step 3. Fit Nonlinear Regression Model

Here we fit the logistic equation to the germination data using the drm function of drc package:

reg_model<- drm(propCum~timeBef,Temp, data=barley_sub_mean,fct = LL.3())
summary(reg_model)
## 
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)
## 
## Parameter estimates:
## 
##         Estimate  Std. Error  t-value   p-value    
## b:1  -16.9642479   1.9381125  -8.7530 9.383e-15 ***
## b:3  -12.1827225   1.1360422 -10.7238 < 2.2e-16 ***
## b:7   -9.3899890   0.9237101 -10.1655 < 2.2e-16 ***
## b:10  -3.2700300   0.2225531 -14.6933 < 2.2e-16 ***
## b:15  -7.0456780   1.6238670  -4.3388 2.851e-05 ***
## d:1    0.8780705   0.0099220  88.4977 < 2.2e-16 ***
## d:3    0.9262330   0.0085521 108.3044 < 2.2e-16 ***
## d:7    0.9536767   0.0076520 124.6308 < 2.2e-16 ***
## d:10   0.9522153   0.0086197 110.4695 < 2.2e-16 ***
## d:15   0.9433282   0.0069731 135.2807 < 2.2e-16 ***
## e:1   12.6084772   0.0885223 142.4328 < 2.2e-16 ***
## e:3    9.0140213   0.0808350 111.5113 < 2.2e-16 ***
## e:7    5.8667302   0.0737689  79.5285 < 2.2e-16 ***
## e:10   3.7777280   0.1040565  36.3046 < 2.2e-16 ***
## e:15   2.0619644   0.0421758  48.8898 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error:
## 
##  0.03417789 (130 degrees of freedom)

In the above snippet propCum is the dependent variable (germination) to be modeled as function of time (timeBef) for each level of temperature (Temp). fct determines the type of function to be used. The drc package has many in-built nonlinear regression functions that can be called in through fct statement of drm function. Here we use LL.3() which is identical to the logistic equation described above. For a list of all equations provided by drc use getMeanFunctions().

Note that drm uses labels b, d, and e to indicate \(b\), \(G_{max}\), and \(t_{50}\), respectively. Numbers that follow the letters indicate the temperature (Temp level) e.g. e:1 means \(t_{50}\) for Temp = 1 \(^{\circ}\)C.

To visually inspect the model fits, we extract the fitted values from the model object reg_model:

fits<-fitted(reg_model)

and then add the fitted values to the original dataframe:

barley_sub_mean$fits<-fits

Finally, we use ggplot to plot the fitted curves over the observed points:

g1<- ggplot(barley_sub_mean) +
  geom_point(aes(x= timeBef, y=propCum*100, color=Temp))+
  geom_line(aes(x= timeBef, y=fits*100, color=Temp))
g2<- g1 +labs(x= "Time (day)" , y="Cumulative germination (%)", color=expression("Temperature " (degree*C))) 
g2

Step 4. Calculate time to germination (\(t_g\))

The model parametrized above can predict germination percent, \(g\), as function of time, \(t\), but for estimating germination rate (how fast the germination proceeds, \(GR\)) we need to do reverse calculation i.e. estimating \(t\) as function of \(g\). The inverse form of logistic function can be written as: \[g(t)=t_{50} \left(\frac{G_{max}}{g} - 1\right)^{\frac{1}{b}}\] Fortunately, the drc package has already included the inverse function and we do not need to write any equations of our own. Rather we use the ED function. For example, for estimating time to 50% germination (absolute value NOT relative to \(G_{max}\)) we write:

ED(reg_model,0.5, type = "absolute")
## 
## Estimated effective doses
## 
##           Estimate Std. Error
## e:1:0.5  12.817954   0.099203
## e:3:0.5   9.132903   0.083332
## e:7:0.5   5.927789   0.073267
## e:10:0.5  3.895574   0.105979
## e:15:0.5  2.097473   0.044634

According to the above table, time to 50% germination is about 12.8 days when seeds were incubated at 1 \(^{\circ}\)C but reduced to 2.1 days at 15 \(^{\circ}\)C.

If we set tyep = relative, the ED function returns time to 50% germination relative to the maximum germination observed at the tested temperature, which is essentially identical to parameter \(e\) reported in the summary table.

ED(reg_model,50, type = "relative")
## 
## Estimated effective doses
## 
##          Estimate Std. Error
## e:1:50  12.608477   0.088522
## e:3:50   9.014021   0.080835
## e:7:50   5.866730   0.073769
## e:10:50  3.777728   0.104056
## e:15:50  2.061964   0.042176

For some reasons, not clear to myself, for relarive type we need to use percent values, i.e. 50, whilst for the type = absolute we must use proportion values i.e. 0.5!

Now let’s calculate \(t\) for several values of \(g\) e.g. from 0.1 (10% or 10th percentile) to 0.9 (90% or 90th percentile):

g<-seq(0.1,0.9,0.1)
tg<- ED(reg_model,g, type = "absolute", display = FALSE)

Step 5. Calculate \(GR\)

As described above germination rate, \(GR\), is the reciprocal of \(t_g\):

GR<-1/tg

Let’s take a look at GR data:

head(GR,n=6)
##           Estimate Std. Error
## e:1:0.1 0.08950776   6.521628
## e:1:0.2 0.08523029   9.187614
## e:1:0.3 0.08243832  11.406476
## e:1:0.4 0.08014968  11.788011
## e:1:0.5 0.07801557  10.080335
## e:1:0.6 0.07579649   7.754301

Step 5. Explore \(GR\)-temperature relationship

The GR table (above) needs to be reformatted for the analyses to be followed. As can be seen in the table, the temperature and \(g\) levels are embedded as the row names of this table. To convert row names to genuine column variables we first need to convert GR table to a dataframe:

GR<-as.data.frame(GR)

We then use stringr::str_split_fixed function to separate row names to its three components using “:” as the separator:

rn<- stringr::str_split_fixed(rownames(GR), ":", 3) 
head(rn,n=6)
##      [,1] [,2] [,3] 
## [1,] "e"  "1"  "0.1"
## [2,] "e"  "1"  "0.2"
## [3,] "e"  "1"  "0.3"
## [4,] "e"  "1"  "0.4"
## [5,] "e"  "1"  "0.5"
## [6,] "e"  "1"  "0.6"

Column 2 represent temperature levels while column 3 includes all \(g\) levels. We add these two columns to the original GR dataframe and then omit any possible Nan values. We also convert variable g to a factor column as our next analysis requires a factor format for this variable:

GR$Temp <- as.numeric(rn[,2])
GR$g<- as.factor(rn[,3])
GR<-na.omit(GR)

Let’s plot the \(GR\) values against temperature:

g1<- ggplot(GR,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line()
g2<- g1 +labs(x= "Temperature (C)" , y="Germination rate (1/tg)", color="Percentile" ) 
g2

As shown above, germination rate increases with temperature almost linearly. It seems that all germination percentiles are converging to a single point on x-axis (temperature): where they intercept this axis (so \(GR=0\)), represents the base temperature, \(T_b\). While \(T_b\) seems to be constant across all \(g\)s, the slope of lines are different, suggesting a model with a single x-axis intercept but variable slope for each \(g\) level should be a good candidate. This relationship can be explained by: \[GR_g=(T-T_b)b_g\] Note the use of subscript \(_g\) for parameter \(b\) to show that this parameter varies between germination percentiles \(g\). Here as we have nine levels of \(g\), there will be nine estimates for \(b\).

Although the above model is linear, we will use the nls function–which fits nonlinear models– for its versatility:

GR_model<-nls(Estimate~b[g]*(Temp-Tb), start=list(b=rep(0.1,9),Tb=0), data=GR)

To tell nls function that parameter \(b\) varies with factor g we use b[g]. In start we list the parameters and their potential initial values. Note that for \(b\) we need to provide nine initial estimates as per the number of the levels of g factor.

summary(GR_model)
## 
## Formula: Estimate ~ b[g] * (Temp - Tb)
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## b1  0.042100   0.002801  15.031  < 2e-16 ***
## b2  0.036229   0.002670  13.569 2.78e-15 ***
## b3  0.032834   0.002601  12.625 2.18e-14 ***
## b4  0.030300   0.002552  11.872 1.21e-13 ***
## b5  0.028142   0.002514  11.196 5.98e-13 ***
## b6  0.026116   0.002479  10.534 3.03e-12 ***
## b7  0.024021   0.002446   9.821 1.85e-11 ***
## b8  0.021544   0.002410   8.940 1.89e-10 ***
## b9  0.017400   0.002356   7.385 1.46e-08 ***
## Tb -0.496183   0.431737  -1.149    0.258    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04626 on 34 degrees of freedom
## 
## Number of iterations to convergence: 4 
## Achieved convergence tolerance: 1.606e-07

The estimated \(T_b\) is -0.496183 while \(b\)s vary from 0.042 for \(g=0.1\) to 0.017 for \(g=0.9\).

Let’s see how the above model fits the \(GR\) values by extracting the fitted values:

GR_fits <- fitted(GR_model)
GR$fits<-GR_fits

and plot them against temperature along with original \(GR\) points:

g1<- ggplot(GR) +
  geom_point(aes(x= Temp, y=Estimate, color=g))+
  geom_line(aes(x= Temp, y=fits, color=g))
g2<- g1+labs(x= "Temperature (C)" , y="Germination rate (1/tg)", color="Percentile" ) 
g2

As the estimated \(T_b\) is outside the range of temperature levels included in the study data, the fitted lines do not intercept this value on the x-axis. To see the full picture, we need to generate some new (kind of dummy) levels and predict the corresponding \(GR\) values for them. expand.grid is a great tool for generation of new data when coupled with predict function:

X <- expand.grid(Temp = seq(-0.496183, 16,2), g = unique(GR$g))
X$fits <- predict(GR_model,newdata=X)

Now we plot these expanded data:

g1<- ggplot(GR,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line(data=X,aes(x= Temp, y=fits, color=g))
g2<- g1+labs(x= "Temperature (C)" , y="Germination rate (1/tg)", color="Percentile" ) 
g2

While the model with fixed \(T_b\) and variable \(b_g\) seems adequate, we need to explore other contending models. If we adhere to a linear model, there exist two other models for comparison: - a model with variable \(T_b\) (so \(T_{b(g)}\)) but fixed \(b\) (model 2) - a model with both \(T_b\) and \(b\) being variable (model 3)

To fit model 2 we use:

GR_model2<-nls(Estimate~b*(Temp-Tb[g]), start=list(Tb=rep(0,9),b=0.03), data=GR)
summary(GR_model2)
## 
## Formula: Estimate ~ b * (Temp - Tb[g])
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## Tb1 -3.78884    1.18785  -3.190  0.00306 ** 
## Tb2 -2.30957    1.13295  -2.039  0.04933 *  
## Tb3 -1.45526    1.10380  -1.318  0.19618    
## Tb4 -0.81829    1.08339  -0.755  0.45527    
## Tb5 -0.27587    1.06697  -0.259  0.79754    
## Tb6  0.23373    1.05238   0.222  0.82557    
## Tb7  0.76176    1.03815   0.734  0.46812    
## Tb8  1.39047    1.02245   1.360  0.18280    
## Tb9  3.30118    1.11992   2.948  0.00575 ** 
## b    0.02915    0.00189  15.426  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06196 on 34 degrees of freedom
## 
## Number of iterations to convergence: 2 
## Achieved convergence tolerance: 3.513e-08

and plot the fitted values as:

X2 <- expand.grid(Temp = seq(-3.78884, 15.5,0.1), g = unique(GR$g))
X2$fits <- predict(GR_model2,newdata=X2)

g1<- ggplot(GR,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line(data=X2,aes(x= Temp, y=fits, color=g)) +
  ylim(0,0.7)
g2<- g1 +labs(x= "Temperature (C)" , y="Germination rate (1/tg)", color="Percentile" ) 
g2

As we used a fixed slope (\(b\) instead of \(b_g\)), all the fitted lines are parallel but intercept x-axis at different values i.e. each \(g\) has a unique \(T_b\). This phenomenon is very rare and most previous studies have demonstrated that \(T_b\) is invariant.

To fit model 3 we use:

GR_model3<-nls(Estimate~b[g]*(Temp-Tb[g]), start=list(Tb=rep(0,9), b=rep(0.03,9)), data=GR)
summary(GR_model3)
## 
## Formula: Estimate ~ b[g] * (Temp - Tb[g])
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## Tb1 -0.210385   0.973525  -0.216  0.83059    
## Tb2 -0.348493   1.160654  -0.300  0.76637    
## Tb3 -0.457576   1.306673  -0.350  0.72902    
## Tb4 -0.558237   1.442217  -0.387  0.70186    
## Tb5 -0.660273   1.581800  -0.417  0.67980    
## Tb6 -0.772291   1.739214  -0.444  0.66068    
## Tb7 -0.906249   1.936618  -0.468  0.64372    
## Tb8 -1.081592   2.227095  -0.486  0.63128    
## Tb9  0.110159   3.163799   0.035  0.97249    
## b1   0.043229   0.004709   9.180 1.22e-09 ***
## b2   0.036725   0.004709   7.799 2.84e-08 ***
## b3   0.032950   0.004709   6.997 1.98e-07 ***
## b4   0.030129   0.004709   6.398 8.88e-07 ***
## b5   0.027726   0.004709   5.888 3.29e-06 ***
## b6   0.025473   0.004709   5.410 1.14e-05 ***
## b7   0.023153   0.004709   4.917 4.18e-05 ***
## b8   0.020450   0.004709   4.343  0.00019 ***
## b9   0.018385   0.006005   3.062  0.00506 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05261 on 26 degrees of freedom
## 
## Number of iterations to convergence: 2 
## Achieved convergence tolerance: 4.291e-07

Note that model 3 has the largest number of parameters (18 parameters).

Let’s plot the fitted values for model 3:

X3 <- expand.grid(Temp = seq(-1.081592, 15.5,0.1), g = unique(GR$g))
X3$fits <- predict(GR_model3,newdata=X3)

g1<- ggplot(GR,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line(data=X3,aes(x= Temp, y=fits, color=g)) +
  ylim(0,0.7)
g2<- g1 +labs(x= "Temperature (C)" , y="Germination rate (1/tg)", color="Percentile" ) 
g2

But which model can better describe the variation in \(GR\) in relation to temperature? To answer this question we can look at the size of error across the models e.g. by comparing the AIC values:

AIC(GR_model,GR_model2,GR_model3)
##           df       AIC
## GR_model  11 -134.9386
## GR_model2 11 -109.2252
## GR_model3 19 -119.4364

The original model with fixed \(T_b\) has the smallest AIC suggesting that this model might be the best model. Another option for model selection is the F-test.

anova(GR_model,GR_model2)
## Analysis of Variance Table
## 
## Model 1: Estimate ~ b[g] * (Temp - Tb)
## Model 2: Estimate ~ b * (Temp - Tb[g])
##   Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
## 1     34   0.072768                         
## 2     34   0.130540  0      0

When the number of parameters are the same across two models, the model with a smaller error should be selected. As shown in the above table, the model with fixed \(T_b\) has a smaller error (0.07) than the model with variable \(T_b\) (0.13), so the model with fixed \(T_b\) outperforms model 2.

When the original model is compared with model 3:

anova(GR_model,GR_model3)
## Analysis of Variance Table
## 
## Model 1: Estimate ~ b[g] * (Temp - Tb)
## Model 2: Estimate ~ b[g] * (Temp - Tb[g])
##   Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
## 1     34   0.072768                             
## 2     26   0.071950  8 0.00081865   0.037      1

there is no significant difference between the two models in terms of the size of error (Pr(>F) = 1). Using the rule of parsimony, we should select the model that is less complex i.e. has fewer number of parameters. Our original model (fixed \(T_b\)) with 10 parameters is therefore preferred over the more complex model (with 18 parameters) as it provides the same level of performance as with the complex model whilst having much fewer parameters.

Step 7. Relating \(b\) to \(g\)

In the previous section, we found compelling evidence for the selection of constant \(T_b\) model. However, this model has still a large number of parameters equating 1+ number of \(g\) levels. Fortunately, \(b\) values vary across \(g\)s in a predictable manner. To explore this relationship, we we first need to extract the values of \(b\) from the model object GR_model:

parms <- coefficients(GR_model)
b <- parms[1:9]
kable(b,col.names = "b")
b
b1 0.0420997
b2 0.0362295
b3 0.0328335
b4 0.0302998
b5 0.0281420
b6 0.0261160
b7 0.0240211
b8 0.0215443
b9 0.0174002
gdd_g <- data.frame(b = b, g = g)

Now we plot \(b\) against \(g\):

g1<- ggplot(gdd_g,aes(x= g, y=b)) +
  geom_point() 
g2<- g1 +labs(x= "Germination fraction (g)" , y="Slope (b)") 
g2

Assuming that \(b\) is normally distributed among germination percentiles \(g\), the inverse (quantile) of a normal distribution can be used to describe \(b\) as function of \(g\): \[b(g)=\Phi^{-1}(1-g, \mu_b,\sigma_b)\] where \(\Phi^{-1}\) is the quantile function for the normal distribution, \(\mu_b\) is the slope for the median germination percentile with \(\sigma_b\) representing the standard deviation of \(b\). If we assume a logistic distribution, \(b\) as function of \(g\) can be described: \[b(g)=b_{50}\left(-\frac{g}{g-1}\right)^{\frac{1}{c}}\] We use nls to fit above models to the \(b\)\(g\) relation. For model based on a normal distribution of \(b\) we obtain (qnorm stands for the quantile form normal distribution):

b_normal<-nls(b~qnorm(1-g,mu,sigma), start=list(mu=0.03, sigma=0.02), data=gdd_g)
summary(b_normal)
## 
## Formula: b ~ qnorm(1 - g, mu, sigma)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## mu    0.0287429  0.0002704  106.31 1.72e-12 ***
## sigma 0.0092372  0.0003497   26.41 2.85e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0008111 on 7 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 7.386e-08

For the logistic model we obtain:

b_logistic<-nls(b~b50*(-g/(g - 1))^(1/c), start=list(b50=0.03, c=-2), data=gdd_g)
summary(b_logistic)
## 
## Formula: b ~ b50 * (-g/(g - 1))^(1/c)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## b50  0.0278860  0.0001503  185.58 3.48e-14 ***
## c   -5.2515999  0.1073227  -48.93 3.90e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0004275 on 7 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 4.117e-06

AIC comparison suggest that logistic-based model is a better fit as it has smaller AIC:

AIC(b_normal,b_logistic)
##            df        AIC
## b_normal    3  -98.82838
## b_logistic  3 -110.35650

The difference between the two models can be better examined when plotted:

X_gdd <- expand.grid(g=seq(0.09,0.91,0.01))
X_gdd$fits <- predict(b_normal,newdata = X_gdd)

X_gdd2 <- expand.grid(g=seq(0.09,0.91,0.01))
X_gdd2$fits <- predict(b_logistic,newdata = X_gdd2)


g1<- ggplot() +
  geom_line(data=X_gdd, aes(x= g, y=fits), color="#D7690A",linetype="longdash" )+
  geom_line(data=X_gdd2, aes(x= g, y=fits), color="#046C9A")+
  geom_point(data=gdd_g, aes(x= g, y=b), color="#398882") 
g2<- g1 +labs(x= "Germination percentile (g)" , y="Slope (b)") 
g2

As shown in the above figure, the normal distribution (dashed line) poorly fits the data.

Step 8. Putting all pieces together

Thus far, we have estimated the base temperature (\(T_b = -0.49\)) and modeled the relationship between \(g\) and \(b\) using a logistic equation with \(b_{50}=0.0279\) and \(c = -5.2516\). This means that we should be able to describe the germination rates of any percentiles by using as few as three parameters only. However, the estimation of \(T_b\) was done separately from that of the \(b\) following a two-step fitting procedure. With the knowledge gained through this process, we are now in a good position to model \(GR\) directly as a one step procedure. Let’s re-write the \(GR\) formula: \[GR=b(T-T_b)\] we also know that \(b\) varies among \(g\)s: \[b=f(g)\] where \(f\) function could be the quantile function of a logistic (or normal) distribution. Using logistic equation as the proper function for \(b\), we directly model \(GR\) as function of \(g\) and \(T\):

\[GR(g,T)=\overbrace{b_{50}\left(-\frac{g}{g-1}\right)^{\frac{1}{c}}}^{b_g}(T-T_b)\] Note that \(GR=0\) if \(T\leq T_b\). The above equation can be fitted to \(GR\) values directly for which we first write a function:

GR_logistic <- function (Temp,g,Tb,b50,c){
  TD <- (Temp-Tb)*(Temp>Tb) # (Temp>Tb) will make sure that TD = 0 if T<= Tb
  bg <- b50*(-g/(g - 1))^(1/c)
  gr_est <- TD*bg
  return(gr_est)
}

and then use nls to fit it to GR data:

GR2<- GR 
GR2$g <- as.numeric(as.character(GR2$g)) # add as.character to keep decimal during the conversion 
GR_model_lgst<-nls(Estimate~GR_logistic(Temp,g,Tb,b50,c), start=c(Tb = 0, b50 = 0.03, c = -5), data=GR2)

summary(GR_model_lgst)
## 
## Formula: Estimate ~ GR_logistic(Temp, g, Tb, b50, c)
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## Tb  -0.495972   0.395039  -1.256    0.216    
## b50  0.027887   0.001238  22.523  < 2e-16 ***
## c   -5.252190   0.517787 -10.144 9.64e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04228 on 41 degrees of freedom
## 
## Number of iterations to convergence: 3 
## Achieved convergence tolerance: 1.306e-06

And we plot the fitted values for better examination of the \(GR\) model:

GR2$logisticfit <- fitted(GR_model_lgst)

g1<- ggplot(GR2) +
  geom_point(aes(x=Temp, y=Estimate), color ="#398882" ) +
  geom_line(aes(x=Temp, y=logisticfit),color="#D7690A") +
  facet_wrap(~factor(g), nrow=3)
g2<- g1 +labs(x= expression("Temperature " ( degree*C)) , y="Germination rate (1/day)") 
g2

The \(GR\) fits look good but let’s plot them against \(g\):

GR2$logisticfit <- fitted(GR_model_lgst)

g1<- ggplot(GR2) +
  geom_point(aes(x=g, y=Estimate), color ="#398882" ) +
  geom_line(aes(x=g, y=logisticfit),color="#D7690A") +
  facet_wrap(~factor(Temp), nrow=3)
g2<- g1 +labs(x= "Germination percentile (g)" , y="Germination rate (1/day)") 
g2

Poor fits for temperatures 1 and 7 \(^{\circ}\)C are obvious.

But how shall we move from \(GR\) modeling to actual germination/emergence curves? We are more familiar with the latter way of presenting germination data. Also how can we make GDD based germination curves? The conversion is simple. Let’s take a glance at our \(GR\) data, which has been stored in dataframe GR2:

head(GR2, n=6)
##           Estimate Std. Error Temp   g       fits logisticfit
## e:1:0.1 0.08950776   6.521628    1 0.1 0.06298880  0.06338888
## e:1:0.2 0.08523029   9.187614    1 0.2 0.05420589  0.05431986
## e:1:0.3 0.08243832  11.406476    1 0.3 0.04912498  0.04902188
## e:1:0.4 0.08014968  11.788011    1 0.4 0.04533403  0.04506668
## e:1:0.5 0.07801557  10.080335    1 0.5 0.04210553  0.04171847
## e:1:0.6 0.07579649   7.754301    1 0.6 0.03907438  0.03861902

In this table Estimate is the actual \(GR\) values (with actual I mean that they were directly estimated from germination curves, so they are not actually actual), and logisticfit represent the fitted values. We first convert them to their time equivalent by using \(t_g=\frac{1}{GR}\):

GR2$lgsFit_time<- 1/GR2$logisticfit
head(GR2, n=6)
##           Estimate Std. Error Temp   g       fits logisticfit lgsFit_time
## e:1:0.1 0.08950776   6.521628    1 0.1 0.06298880  0.06338888    15.77564
## e:1:0.2 0.08523029   9.187614    1 0.2 0.05420589  0.05431986    18.40947
## e:1:0.3 0.08243832  11.406476    1 0.3 0.04912498  0.04902188    20.39905
## e:1:0.4 0.08014968  11.788011    1 0.4 0.04533403  0.04506668    22.18934
## e:1:0.5 0.07801557  10.080335    1 0.5 0.04210553  0.04171847    23.97020
## e:1:0.6 0.07579649   7.754301    1 0.6 0.03907438  0.03861902    25.89398

As shown in the above table we have the estimated time to germination (g) and hence can plot them against the actual germination data:

g1<- ggplot() +
  geom_point(data=barley_sub_mean, aes(x=timeBef, y=propCum*100, color =as.factor(Temp)) ) +
  geom_line(data=GR2, aes(x=lgsFit_time, y=g*100, color=as.factor(Temp)))
g2<- g1 +labs(x= "Time (day)" , y="Cumulative germination (%)") 
g2

The fits for temperature 1 \(^{\circ}\)C are really bad. For the moment, I ignore this poor performance of the model to show how we derive GDDs. Let’s reiterate the GDD formula: \[GDD = (T-T_b)t_g\] We have all the parameters (and variables) of the above model. Using lgsFit_time and estimated \(T_b\) (from GR_model_lgst model) we add GDD values to the GR2 dataframe:

Tb<- coef(GR_model_lgst)['Tb']
GR2$GDD <- (GR2$Temp-Tb)*GR2$lgsFit_time

and do the same for the original data (i.e. barley_sub_mean). Note that Temp column was a factor variable so we need to back-convert it to numeric to be able to apply mathematical operations on it:

barley_sub_mean$Temp <- as.numeric(as.character(barley_sub_mean$Temp))
barley_sub_mean$GDD <- (barley_sub_mean$Temp-Tb)*barley_sub_mean$timeBef

Now we plot germination data against GDD:

g1<- ggplot() +
  geom_point(data=barley_sub_mean, aes(x=GDD, y=propCum*100, color =as.factor(Temp)) ) +
  geom_line(data=GR2, aes(x=GDD, y=g*100, group=as.factor(Temp)), color="#D7690A")
g2<- g1 +labs(x= "GDD (degree day)" , y="Cumulative germination (%)", color="Temperature") 
g2

So why not fitting a GDD model directly to cumulative germination data? To this end, all we need to do is to replace time \(t\) in the logistic equation with \(GDD\): \[g(GDD)=\frac{1}{1+\left(\frac{GDD}{GDD_{50}}\right)^c}\] which after replacing GDD with its formula we obtain: \[g(t,T)=\frac{1}{1+ \left(\frac{(T-T_b)t_g}{GDD_{50}}\right)^c}\] Note that the above equation assumes that germination will finally reach 100%, which may not be true. To fit the above equation we need to write a function:

cumGerm_gdd <- function(t,Temp, Tb, gdd50,c_gdd) {
  GDD <- (Temp-Tb)*t*(Temp>Tb)
  cum_gr <- 1/(1+(GDD/gdd50)^c_gdd)
 return(cum_gr)
}

and then plug it into the nls function of R:

germ_gdd_fit<- nls(propCum~cumGerm_gdd(timeBef, Temp, Tb, gdd50, c_gdd), 
            start=c(Tb = -0.5, gdd50 = 100, c_gdd = -3), data=barley_sub_mean)
summary(germ_gdd_fit)
## 
## Formula: propCum ~ cumGerm_gdd(timeBef, Temp, Tb, gdd50, c_gdd)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## Tb     -2.8653     0.1934  -14.82   <2e-16 ***
## gdd50  52.6674     1.9397   27.15   <2e-16 ***
## c_gdd  -5.7736     0.4373  -13.20   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08672 on 142 degrees of freedom
## 
## Number of iterations to convergence: 15 
## Achieved convergence tolerance: 5.767e-06

Note the difference in the estimation of \(T_b = -2.86\) when we directly fit a GDD model as compared with the \(GR\)-based approach (\(T_b = -0.49\)). Let’s plot the fitted values:

Tb2 <- coef(germ_gdd_fit)['Tb']
barley_sub_mean$cumGerm_fits<- fitted(germ_gdd_fit)
barley_sub_mean$directGDD<- (barley_sub_mean$Temp-Tb2)*barley_sub_mean$timeBef

g1<-ggplot(barley_sub_mean, aes(x=directGDD, y=propCum*100))+
  geom_point(color = "#398882")+
  geom_line(aes(x=directGDD,y=cumGerm_fits*100), color="#D7690A")
g2<- g1+labs(x= "GDD (degree day"  , y = "Cumulative germination (%)" ) 
g2

The model fits well but as its asymptote is not constrained, it predicts 100% germination while the actual germination has never reached that level. We can overcome this problem in two ways: 1- modify the model to account for non-complete germination, or 2- normalize the data (see the Introduction for more details on normalizing germination/emergence data). The logistic equation can become truncated by replacing the numerator, currently fixed at 1, by a parameter, refereed to as \(G_{max}\): \[g(t,T)=\frac{G_{max}}{1+\left(\frac{(T-T_b)t_g}{GDD_{50}}\right)^c}\] We modify the previous R function:

cumGerm_gdd_Gmax <- function(t,Temp, Tb, Gmax, gdd50,c_gdd) {
  GDD <- (Temp-Tb)*t*(Temp>Tb)
  cum_gr <- (1/(1+(GDD/gdd50)^c_gdd))*Gmax
 return(cum_gr)
}

and then plug it into the nls:

germ_gdd_fit_Gmax<- nls(propCum~cumGerm_gdd_Gmax(timeBef, Temp, Tb, Gmax, gdd50, c_gdd), start=c(Tb = -1, Gmax = 0.95, gdd50 = 50, c_gdd = -3), data=barley_sub_mean)
summary(germ_gdd_fit_Gmax)
## 
## Formula: propCum ~ cumGerm_gdd_Gmax(timeBef, Temp, Tb, Gmax, gdd50, c_gdd)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Tb    -2.879315   0.157021  -18.34   <2e-16 ***
## Gmax   0.938917   0.008031  116.91   <2e-16 ***
## gdd50 51.315387   1.534740   33.44   <2e-16 ***
## c_gdd -7.523445   0.638380  -11.79   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07422 on 141 degrees of freedom
## 
## Number of iterations to convergence: 11 
## Achieved convergence tolerance: 6.574e-06

This mode provides better fit based on AIC:

AIC(germ_gdd_fit,germ_gdd_fit_Gmax)
##                   df       AIC
## germ_gdd_fit       4 -292.5969
## germ_gdd_fit_Gmax  5 -336.7775

Let’s plot the new fitted values:

Tb2 <- coef(germ_gdd_fit_Gmax)['Tb']
barley_sub_mean$cumGerm_fits<- fitted(germ_gdd_fit_Gmax)
barley_sub_mean$directGDD<- (barley_sub_mean$Temp-Tb2)*barley_sub_mean$timeBef

g1<-ggplot(barley_sub_mean, aes(x=directGDD, y=propCum*100))+
  geom_point(color = "#398882")+
  geom_line(aes(x=directGDD,y=cumGerm_fits*100), color="#D7690A")
g2<- g1 +labs(x= "GDD (degree day)"  , y = "Cumulative germination (%)" ) 
g2

The constrained model fits the data very well. Now we move forward to include supra-optimal range into our model.

Thermal Time Model: Sub- and Supr-Optimal Temperatures

For this part of the modeling practice we will use a different dataset named as TT-sub-supra.csv. We read the data:

tt_sub_supra <- read.csv('TT-sub-supra.csv')
head(tt_sub_supra, n=6)
##   temp timeBef propCum
## 1   15       0       0
## 2   20       0       0
## 3   25       0       0
## 4   30       0       0
## 5   35       0       0
## 6   15       1       0

The data includes germination observations from five temperatures and we don’t know what the optimal temperature is. As with previous practice, we fit a three-parameter logistic function to data from each temperature:

reg_sub_supra<- drm(propCum~timeBef,temp, fct = LL.3(), data=tt_sub_supra)
summary(reg_sub_supra)
## 
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)
## 
## Parameter estimates:
## 
##        Estimate Std. Error  t-value   p-value    
## b:15 -9.9227230  0.8318172 -11.9290 < 2.2e-16 ***
## b:20 -8.9146263  0.9685268  -9.2043 2.167e-13 ***
## b:25 -7.3694736  0.8084325  -9.1158 3.100e-13 ***
## b:30 -4.8878387  0.4995473  -9.7845 2.095e-14 ***
## b:35 -6.0808097  0.4990810 -12.1840 < 2.2e-16 ***
## d:15  0.9593854  0.0165889  57.8331 < 2.2e-16 ***
## d:20  0.9813314  0.0098996  99.1288 < 2.2e-16 ***
## d:25  0.9897807  0.0092404 107.1143 < 2.2e-16 ***
## d:30  0.9840231  0.0097896 100.5171 < 2.2e-16 ***
## d:35  0.9831142  0.0115994  84.7556 < 2.2e-16 ***
## e:15  8.1946106  0.0850265  96.3772 < 2.2e-16 ***
## e:20  3.6886635  0.0522969  70.5332 < 2.2e-16 ***
## e:25  2.3959818  0.0487726  49.1255 < 2.2e-16 ***
## e:30  2.2940371  0.0535537  42.8362 < 2.2e-16 ***
## e:35  4.1325305  0.0703331  58.7565 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error:
## 
##  0.03076315 (65 degrees of freedom)

Let’s visually inspect the model fits:

tt_sub_supra$fits<-fitted(reg_sub_supra) 

g1<- ggplot(tt_sub_supra) +
  geom_point(aes(x=timeBef, y=propCum*100), color ="#398882" ) +
  geom_line(aes(x=timeBef, y=fits*100),color="#D7690A") +
  facet_wrap(~factor(temp), nrow=2)
g2<- g1 +labs(x= "Time (day)" , y="Cumulative germination (%)") 
g2

All looks good so let’s move to the next step, which is the estimation of \(GR\) values for various germination percentiles, \(g\), for each level of temperature:

g<-seq(0.1,0.9,0.1)
GR_sub_supra<- 1/ED(reg_sub_supra,g, type = "absolute", display = FALSE)
head(GR_sub_supra,6) 
##           Estimate Std. Error
## e:15:0.1 0.1515717   8.194287
## e:15:0.2 0.1395938  10.542776
## e:15:0.3 0.1321113  12.220592
## e:15:0.4 0.1262264  12.575077
## e:15:0.5 0.1209940  11.435775
## e:15:0.6 0.1158882   9.486812

We use stringr::str_split_fixed function to separate row names to its three components using “:” as the separator:

GR_sub_supra<-as.data.frame(GR_sub_supra)

rn<- stringr::str_split_fixed(rownames(GR_sub_supra), ":", 3) 
GR_sub_supra$Temp <- as.numeric(rn[,2])
GR_sub_supra$g<- as.factor(rn[,3])
GR_sub_supra<-na.omit(GR_sub_supra)

Now let’s see how the \(GR-T\) curves look like:

g1<- ggplot(GR_sub_supra,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line()
g2<- g1 +labs(x= expression("Temperature " (degree*C)), y="Germination rate (1/tg)", color="Percentile" ) 
g2

It seems that the optimal temperature \(T_o\) is ~30 \(^{\circ}\)C for all \(g \leq0.5\) but \(T_o\) is cooler for higher germination percentiles (\(g>0.5\)). In general the \(GR-T\) relations resemble a triangle with the peak occurring at \(T_o\). The \(GR\) lines seem to converge and intercept x-axis (\(T\)) at both sides of the peak. These two intercepts are known as base temperature (\(T_b\), on the left side of the peak) and ceiling temperature (\(T_c\), on the right side of the peak). The convergence on the right side, \(T_c\), is not as clear as the one on the left side, \(T_b\), perhaps because fewer temperatures have been tested at the supra-optimal range. The other possibility is that \(T_c\) may not be fixed value but rather varies among seeds (\(g\)), so better be denoted as \(T_{c(g)}\). We will test whether \(T_c\) is constant or variable (i.e. \(T_{c(g)}\)). The model with fixed \(T_c\) was proposed by Washitani(1987) while Bradford (2002) prposed a variable \(T_c\); overall most studies support the latter model.

Model with fixed \(T_c\)

A triangle model to relate \(GR\) to \(T\) can be written as: \[\begin{equation} GR(T,g)=\begin{cases} \left(\frac{T-T_b}{T_o-T_b}\right)b_g & \text{if $T_b< T\leq T_o$}\\ \left (\frac{T_c-T}{T_c-T_o}\right)b_g & \text{if $T_o< T<T_c$}\\ 0 & \text{otherwise.} \end{cases} \end{equation}\]

where \(b_g\) denotes the maximum value of \(GR\) for the germination fraction \(g\). The respective R function for the above model is:

gr_sub_supra_model<- function(Temp,Tb,To,Tc, b) {
  gr <- ifelse(Temp<=To,((Temp-Tb)/(To-Tb))*b,((Tc-Temp)/(Tc-To))*b)*(Temp>Tb & Temp<Tc)
}

and then fit it to GR data using the nls function:

gr_washitani<-nls(Estimate~gr_sub_supra_model(Temp,Tb,To,Tc, b[g]), start=list(Tb = 10, To = 25, Tc = 40, b=c(rep(0.4,9))), data=GR_sub_supra)
summary(gr_washitani)
## 
## Formula: Estimate ~ gr_sub_supra_model(Temp, Tb, To, Tc, b[g])
## 
## Parameters:
##    Estimate Std. Error t value Pr(>|t|)    
## Tb 11.01378    0.27881   39.50   <2e-16 ***
## To 28.37712    0.15902  178.46   <2e-16 ***
## Tc 40.93912    0.35975  113.80   <2e-16 ***
## b1  0.72915    0.01417   51.45   <2e-16 ***
## b2  0.63846    0.01394   45.80   <2e-16 ***
## b3  0.58469    0.01381   42.33   <2e-16 ***
## b4  0.54404    0.01373   39.63   <2e-16 ***
## b5  0.50921    0.01366   37.29   <2e-16 ***
## b6  0.47651    0.01359   35.06   <2e-16 ***
## b7  0.44305    0.01353   32.74   <2e-16 ***
## b8  0.40485    0.01347   30.06   <2e-16 ***
## b9  0.35124    0.01339   26.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01836 on 33 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 4.316e-06

The estimated cardinal temperatures are: \(T_b=11.01\), \(T_o=28.37\), and \(T_c=40.94\) \(^{\circ}\)C. Let’s plot the fitted values for better inspection of the model:

X_gr<- expand.grid(Temp=seq(5,45,0.1), g = factor(g))
X_gr$fits<- predict(gr_washitani,newdata=X_gr)

GR_sub_supra$fits<-fitted(gr_washitani)

g1<- ggplot(GR_sub_supra,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line(data=X_gr, aes(x= Temp, y=fits, color=g))
g2<- g1 +labs(x= expression("Temperature " ( degree*C)), y="Germination rate (1/tg)", color="Percentile" ) 
g2

The model fits well, though we do some extrapolation for estimation of \(T_b\) and \(T_c\), particularly for the latter parameter.

Model with variable \(T_c\)

Now we explore Bradford’s model wherein the parameter \(T_c\) is variable across \(g\)s. We don’t need to make any changes in our triangle model gr_sub_supra_model but rather need to change Tc to Tc[g] and Tc = 40 to Tc = c(rep(40,9)) in the nls function:

gr_bradford<-nls(Estimate~gr_sub_supra_model(Temp,Tb,To,Tc[g], b[g]), start=list(Tb = 10, To = 25, Tc = c(rep(40,9)), b=c(rep(0.4,9))), data=GR_sub_supra)
summary(gr_bradford)
## 
## Formula: Estimate ~ gr_sub_supra_model(Temp, Tb, To, Tc[g], b[g])
## 
## Parameters:
##     Estimate Std. Error t value Pr(>|t|)    
## Tb  11.02615    0.31041   35.52   <2e-16 ***
## To  28.35458    0.17798  159.31   <2e-16 ***
## Tc1 41.50891    0.85658   48.46   <2e-16 ***
## Tc2 41.25319    0.92544   44.58   <2e-16 ***
## Tc3 41.08248    0.97435   42.16   <2e-16 ***
## Tc4 40.94202    1.01639   40.28   <2e-16 ***
## Tc5 40.81248    1.05659   38.63   <2e-16 ***
## Tc6 40.68194    1.09840   37.04   <2e-16 ***
## Tc7 40.53780    1.14595   35.38   <2e-16 ***
## Tc8 40.35721    1.20707   33.43   <2e-16 ***
## Tc9 40.06484    1.30735   30.65   <2e-16 ***
## b1   0.72360    0.01734   41.73   <2e-16 ***
## b2   0.63586    0.01720   36.96   <2e-16 ***
## b3   0.58373    0.01713   34.08   <2e-16 ***
## b4   0.54428    0.01708   31.87   <2e-16 ***
## b5   0.51042    0.01704   29.96   <2e-16 ***
## b6   0.47861    0.01700   28.16   <2e-16 ***
## b7   0.44602    0.01696   26.29   <2e-16 ***
## b8   0.40877    0.01693   24.15   <2e-16 ***
## b9   0.35640    0.01688   21.12   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02051 on 25 degrees of freedom
## 
## Number of iterations to convergence: 8 
## Achieved convergence tolerance: 1.832e-06

This model has nine additional parameters compared to Washitani’s model but doesn’t seem to reduce the size of model error as shown by an AIC test:

AIC(gr_washitani,gr_bradford)
##              df       AIC
## gr_washitani 13 -220.0388
## gr_bradford  21 -206.5734

or F-test:

anova(gr_washitani,gr_bradford)
## Analysis of Variance Table
## 
## Model 1: Estimate ~ gr_sub_supra_model(Temp, Tb, To, Tc, b[g])
## Model 2: Estimate ~ gr_sub_supra_model(Temp, Tb, To, Tc[g], b[g])
##   Res.Df Res.Sum Sq Df     Sum Sq F value Pr(>F)
## 1     33   0.011123                             
## 2     25   0.010514  8 0.00060919  0.1811 0.9915

But how the model with variable \(T_c\) looks like:

X_gr2<- expand.grid(Temp=seq(5,45,0.1), g = factor(g))
X_gr2$fits<- predict(gr_bradford,newdata=X_gr2)

g1<- ggplot(GR_sub_supra,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line(data=X_gr2, aes(x= Temp, y=fits, color=g))
g2<- g1+labs(x= expression("Temperature " ( degree*C)), y="Germination rate (1/tg)", color="Percentile" ) 
g2

As shown above, the lines tend to converge at supra-optimal range even when we allowed this parameter to freely vary across \(g\)s.

Model with variable \(T_o\)

Another parameter that we need to test for potential variability is \(T_o\): In my own studies with two grass weeds, I found that \(T_o\) is not a fixed value but rather differ with \(g\) (as well as water potential). To test a model with variable \(T_o\) we simply replace To with To[g] in the original nls function (note that \(T_c\) is kept constant):

gr_mesgaran<-nls(Estimate~gr_sub_supra_model(Temp,Tb,To[g],Tc, b[g]), start=list(Tb = 10, To = c(rep(25,9)), Tc = 40, b=c(rep(0.4,9))), data=GR_sub_supra)
summary(gr_mesgaran)
## 
## Formula: Estimate ~ gr_sub_supra_model(Temp, Tb, To[g], Tc, b[g])
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## Tb  11.001805   0.122261   89.99   <2e-16 ***
## To1 29.211266   0.120608  242.20   <2e-16 ***
## To2 28.812844   0.137251  209.93   <2e-16 ***
## To3 28.545077   0.149730  190.64   <2e-16 ***
## To4 28.323528   0.160915  176.02   <2e-16 ***
## To5 28.118216   0.172022  163.46   <2e-16 ***
## To6 27.910437   0.184032  151.66   <2e-16 ***
## To7 27.680176   0.198293  139.59   <2e-16 ***
## To8 27.391108   0.217687  125.83   <2e-16 ***
## To9 26.924866   0.252652  106.57   <2e-16 ***
## Tc  40.910867   0.156444  261.50   <2e-16 ***
## b1   0.719889   0.006403  112.42   <2e-16 ***
## b2   0.634825   0.006229  101.92   <2e-16 ***
## b3   0.583606   0.006137   95.10   <2e-16 ***
## b4   0.544486   0.006075   89.63   <2e-16 ***
## b5   0.510671   0.006028   84.72   <2e-16 ***
## b6   0.478682   0.005990   79.92   <2e-16 ***
## b7   0.445691   0.005959   74.79   <2e-16 ***
## b8   0.407695   0.005937   68.67   <2e-16 ***
## b9   0.353740   0.005936   59.59   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.008041 on 25 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 3.749e-06

The above table shows that \(T_o\) is becoming cooler with higher \(g\)s: for example, for the 10th percentile \(T_o=29.21\) \(^{\circ}\)C which decreases to \(26.92\) \(^{\circ}\)C for \(g=0.9\). A comparison with Washitani’s model suggest a better fit for the model with a variable \(T_o\)either using AIC test:

AIC(gr_washitani,gr_mesgaran) 
##              df       AIC
## gr_washitani 13 -220.0388
## gr_mesgaran  21 -290.8365

or the F-test:

anova(gr_washitani,gr_mesgaran) 
## Analysis of Variance Table
## 
## Model 1: Estimate ~ gr_sub_supra_model(Temp, Tb, To, Tc, b[g])
## Model 2: Estimate ~ gr_sub_supra_model(Temp, Tb, To[g], Tc, b[g])
##   Res.Df Res.Sum Sq Df    Sum Sq F value    Pr(>F)    
## 1     33  0.0111230                                   
## 2     25  0.0016164  8 0.0095066   18.38 1.108e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

and here is a plot depicting the fitted values:

X_gr3<- expand.grid(Temp=seq(5,45,0.1), g = factor(g))
X_gr3$fits<- predict(gr_mesgaran,newdata=X_gr3)

g1<- ggplot(GR_sub_supra,aes(x= Temp, y=Estimate, color=g)) +
  geom_point()+
  geom_line(data=X_gr3, aes(x= Temp, y=fits, color=g))
g2<- g1 +labs(x= expression("Temperature " ( degree*C)), y="Germination rate (1/tg)", color="Percentile" ) 
g2

Although the mode with variable \(T_o\) provided the best fit, for simplicity we will use Washitani’s model for the rest of the analysis.

Variation in \(b_g\) among seeds (\(g\))

For each \(g\) we have a specific \(b_g\) (maximum value of \(GR\)) for which we can develop a model and hence reduce the number of parameters. We first investigate the changes in parameter \(b_g\) as function of germination percentiles \(g\):

parms <- coef(gr_washitani)
b<- parms[4:12]
kable(b, col.names = "b_g")
b_g
b1 0.7291453
b2 0.6384649
b3 0.5846902
b4 0.5440433
b5 0.5092077
b6 0.4765103
b7 0.4430530
b8 0.4048545
b9 0.3512384

As before we fit a normal distribution to \(b_g\) values:

bg_normal<-nls(b~qnorm(1-g,b50,b_sd), start=c(b50=0.5, b_sd=0.1))
summary(bg_normal)
## 
## Formula: b ~ qnorm(1 - g, b50, b_sd)
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## b50  0.520134   0.004606   112.9 1.13e-12 ***
## b_sd 0.143557   0.005958    24.1 5.39e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01382 on 7 degrees of freedom
## 
## Number of iterations to convergence: 1 
## Achieved convergence tolerance: 2.464e-08

and a logistic distribution as well:

bg_logist<-nls(b~b50*(-g/(g - 1))^(1/b_sd), start=c(b50=0.5, b_sd=-2))
summary(bg_logist)
## 
## Formula: b ~ b50 * (-g/(g - 1))^(1/b_sd)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## b50   0.5083670  0.0004443  1144.1  < 2e-16 ***
## b_sd -6.0714826  0.0235991  -257.3 3.54e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.001279 on 7 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 6.491e-06

followed by an AIC or F-test for model selection:

AIC(bg_normal,bg_logist) 
##           df       AIC
## bg_normal  3 -47.79257
## bg_logist  3 -90.62624

which shows that the logistic model provides much better fits as shown by a very small AIC.

Plotting the data is always useful:

b_normal_fit<-fitted(bg_normal)
b_logist_fit<-fitted(bg_logist)
g1<-ggplot()+
  geom_point(aes(x=g,y=b), size=3)+
  geom_line(aes(x=g,y=b_normal_fit), color="#D7690A",linetype="longdash")+
  geom_line(aes(x=g,y=b_logist_fit), color="#046C9A")
g2<- g1 +labs(x= expression("Germination percentile " ( italic(g))), y=expression(paste(italic("b"[g]), " (1/time)")), 
                               color="Percentile" ) 
g2

As shown, the normal model (dashed line) is performing poorly compared to the logistic model (solid line).

Modeling \(GR\) directly

We can model \(GR\) across both sub- and supra-optimal temperatures by only using 5 parameters where three parameters represent cardinal thresholds (i.e. \(T_b\), \(T_o\), \(T_c\)) and two relate to the logistic model (i.e. \(b_{50}\) and \(b_{sd}\)). The one-step model for \(GR\) can be formulated as (note that we simply replace \(b_g\) in the original triangle model with its logistic equivalent): \[\begin{equation} GR(T,g)=\begin{cases} \left[\frac{T-T_b}{T_o-T_b}\right].\overbrace{\left(b_{50}(-\frac{g}{(g - 1)})^{\frac{1}{b_{sd}}}\right)}^{b_g} & \text{if $T_b< T\leq T_o$}\\ \left[\frac{T_c-T}{T_c-T_o}\right].\overbrace{\left(b_{50}(-\frac{g}{(g - 1)})^{\frac{1}{b_{sd}}}\right)}^{b_g} & \text{if $T_o< T<T_c$}\\ 0 & \text{otherwise.} \end{cases} \end{equation}\]

which in R function format can be written as:

GR_sub_supra_logistic <- function(Temp, g, Tb, To, Tc, b50, b_sd) {
 bg <- b50*(-g/(g - 1))^(1/b_sd)
 gT <- ifelse(Temp<=To,(Temp-Tb)/(To-Tb),(Tc-Temp)/(Tc-To))*(Temp>Tb & Temp<Tc)
 gr <- bg*gT
 return(gr)
}

and then fitted using nls (note that we need to convert g column to a numeric variable; the as.character is also used because when as.numeric is used directly it will eliminate the decimals!):

GR_sub_supra$g<-as.numeric(as.character(GR_sub_supra$g))
gr_lg<- nls(Estimate~GR_sub_supra_logistic(Temp, g, Tb, To, Tc, b50, b_sd), 
            start=c(Tb = 11, To = 28, Tc = 40, b50 = 0.5, b_sd = -3), data=GR_sub_supra)
summary(gr_lg)
## 
## Formula: Estimate ~ GR_sub_supra_logistic(Temp, g, Tb, To, Tc, b50, b_sd)
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## Tb   11.013814   0.253492   43.45   <2e-16 ***
## To   28.377100   0.144578  196.28   <2e-16 ***
## Tc   40.939093   0.327086  125.16   <2e-16 ***
## b50   0.508368   0.005342   95.17   <2e-16 ***
## b_sd -6.071480   0.220430  -27.54   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01669 on 40 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 3.736e-06

And here is the corresponding plot (for a better visualization, \(GR\) is plotted for each \(g\) separately using facet_wrap(~factor(g),nrow=3)):

X_gr_lg <- expand.grid(Temp = seq(5,45,0.1), g=g)
X_gr_lg$fits<- predict(gr_lg, newdata=X_gr_lg)

g1<-ggplot(GR_sub_supra, aes(x=Temp, y=Estimate))+
  geom_point()+
  geom_line(data=X_gr_lg, aes(x=Temp,y=fits), color="#D7690A")+
  facet_wrap(~factor(g),nrow=3)
g2<- g1 +labs(x= expression("Temperature " ( degree*C)), y="Germination rate (1/time)" ) 
g2

We now have all the required parameters to generate germination curves (i.e. cumulative germination over time). We first need to convert \(GR\) values to their time equivalent by using \(t_g=\frac{1}{GR}\):

GR_sub_supra$GR_lgs_fit<-fitted(gr_lg)
GR_sub_supra$time_lgs_fit<- 1/GR_sub_supra$GR_lgs_fit
head(GR_sub_supra, n=6)
##           Estimate Std. Error Temp   g      fits GR_lgs_fit time_lgs_fit
## e:15:0.1 0.1515717   8.194287   15 0.1 0.1673949  0.1675990     5.966621
## e:15:0.2 0.1395938  10.542776   15 0.2 0.1465768  0.1466444     6.819217
## e:15:0.3 0.1321113  12.220592   15 0.3 0.1342313  0.1341871     7.452278
## e:15:0.4 0.1262264  12.575077   15 0.4 0.1248998  0.1247689     8.014815
## e:15:0.5 0.1209940  11.435775   15 0.5 0.1169023  0.1167088     8.568336
## e:15:0.6 0.1158882   9.486812   15 0.6 0.1093958  0.1091693     9.160085

The estimate time data can then be overlaid over the actual germination points.

g1<- ggplot(tt_sub_supra) +
  geom_point(aes(x=timeBef, y=propCum*100, color =as.factor(temp) )) +
  geom_line(data=GR_sub_supra, aes(x=time_lgs_fit, y=g*100, color=as.factor(Temp)) )
g2<- g1 +labs(x= "Time (day)" , y="Cumulative germination (%)") 
g2

GDD model for Sub- and Supra-optimal ranges

The accumulation of heat increases with \(T\) over the sub-optimal range, reaches it maximum at \(T=T_o\) but then starts to decline when \(T>T_o\). The GDD formula can therefore be described by: \[\begin{equation} GDD=\begin{cases} \left[\frac{T-T_b}{T_o-T_b}\right].(T_o-T_b)t_g & \text{if $T_b< T\leq T_o$}\\ \left[\frac{T_c-T}{T_c-T_o}\right].(T_o-T_b)t_g & \text{if $T_o< T<T_c$}\\ 0 & \text{otherwise.} \end{cases} \end{equation}\]

Which its R function is given by:

GDD_sub_supra <- function(Temp,t,Tb, To, Tc, gdd_50, b_gdd) {
  gT <- ifelse(Temp<=To,(Temp-Tb)/(To-Tb),(Tc-Temp)/(Tc-To))*(Temp>Tb & Temp<Tc)
  GDD <- (To-Tb)*gT*t
  fit_lg <- 1/(1+(GDD/gdd_50)^b_gdd)
  return(fit_lg)
}

We fit GDD_sub_supra model to cumulative germination data directly:

GDD_sub_supra_cum <- nls(propCum~GDD_sub_supra(temp,timeBef,Tb, To, Tc, gdd_50,b_gdd), 
                         start=c(Tb = 10, To = 28, Tc = 40, gdd_50 =50, b_gdd = -3), data=tt_sub_supra)
summary(GDD_sub_supra_cum)
## 
## Formula: propCum ~ GDD_sub_supra(temp, timeBef, Tb, To, Tc, gdd_50, b_gdd)
## 
## Parameters:
##        Estimate Std. Error t value Pr(>|t|)    
## Tb      10.9335     0.1281   85.33   <2e-16 ***
## To      28.1284     0.2531  111.12   <2e-16 ***
## Tc      41.1380     0.4078  100.89   <2e-16 ***
## gdd_50  33.7355     0.8608   39.19   <2e-16 ***
## b_gdd   -7.0960     0.3340  -21.24   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03915 on 75 degrees of freedom
## 
## Number of iterations to convergence: 12 
## Achieved convergence tolerance: 7.668e-06

The estimated cardinal temperature are very similar to those obtained from a direct fit to \(GR\), but it may not be the case always. We plot the fitted germination:

tt_sub_supra$fit_gdd <- fitted(GDD_sub_supra_cum)

g1<-ggplot(tt_sub_supra, aes(x=timeBef, y=propCum*100))+
  geom_point()+
  geom_line(aes(x=timeBef,y=fit_gdd*100), color="#D7690A")+
  facet_wrap(~factor(temp),nrow=3)
g2<- g1 +labs(x= "Time (day)", y="Cumulative germination (%)" ) 
g2

The model describes the data very well. If interested in plotting cumulative germination against GDD, then some small changes in GDD_sub_supra function are needed:

GDD <- function(Temp,t,Tb, To, Tc, gdd_50, b_gdd) {
  gT <- ifelse(Temp<=To,(Temp-Tb)/(To-Tb),(Tc-Temp)/(Tc-To))*(Temp>Tb & Temp<Tc)
  GDD <- (To-Tb)*gT*t
  fit_lg <- 1/(1+(GDD/gdd_50)^b_gdd)
  out_gdd <- data.frame(GDD = GDD, Germ = fit_lg)
  return(out_gdd)
}

We then plug the parameter estimates of the GDD model into the above GDD function and create a new dataframe TT which includes the GDD, fitted germination, observed germination, and temperature variable (temp):

parms <- coef(GDD_sub_supra_cum)
TT<- GDD(tt_sub_supra$temp,tt_sub_supra$timeBef,parms[1],parms[2],parms[3],parms[4],parms[5])
TT$temp<-tt_sub_supra$temp
TT$propCum<-tt_sub_supra$propCum

Let’s plot the data and see how they look like:

g1<-ggplot(TT)+
  geom_point(aes(x=GDD, y=propCum*100, color=factor(temp)))+
  geom_line(aes(x=GDD,y=Germ*100))
g2<- g1 +labs(x= "GDD (°C day)", y="Cumulative germination (%)", color="Temperature (°C)")+
  theme(legend.position=c(0.85,0.25))
g2

Note that this model uses the unconstrained form of the logistic function, which assumes complete germination (100%). We can add an asymptote (<100%) to the model in the same manner as was described for the sub-optimal model.

Hydrotime model

The effect of water availability (measured in osmotic potential, \(\psi\)) on germination rate is often described by a linear function (Bradford, 1995; 2002): \[\frac{1}{t_g}=GR=\frac{(\psi-\psi_{b(g)})}{\theta}\] or \[\theta=(\psi-\psi_{b(g)})t_g\] where \(\psi_{b(g)}\) is the base water potential and \(\theta\) is the hydrotime constant (with unit MPa day or MPa hour). Unlike thermal time model, which all seeds have the same \(T_b\) but variable slope (\(b\) or thermal requirement), \(\psi_{b(g)}\) varies among seeds following a normal distribution whilst the slope (\(\theta\)) is constant. We use the following example to show the relationship between water potential and germination.

data("phalaris")
head(phalaris, n=6)
##   temp water Dish timeBef timeAf nViable nSeeds nCum propCum
## 1   12  -0.2    1       0     24      25      0    0    0.00
## 2   12  -0.2    1      24     48      25      0    0    0.00
## 3   12  -0.2    1      48     72      25      0    0    0.00
## 4   12  -0.2    1      72     96      25      0    0    0.00
## 5   12  -0.2    1      96    120      25      1    1    0.04
## 6   12  -0.2    1     120    144      25      1    2    0.08

In this dataset, water indicates the water potential at which the seed germination was tested. For this practice, we will only use the data from temp=16. Let’s estimate the \(GR\)s for various \(g\) values using the mean germination data:

phal16<- subset(phalaris, temp==16)
phal16_mean<-aggregate(propCum~water+timeBef, mean, data=phal16)
ht_reg<-drm(propCum~timeBef,as.factor(water), fct=LL.3(), data=phal16_mean)
summary(ht_reg)
## 
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)
## 
## Parameter estimates:
## 
##          Estimate Std. Error  t-value   p-value    
## b:-1    -4.409050   0.632203  -6.9741 3.145e-10 ***
## b:-0.8  -3.924200   0.339032 -11.5747 < 2.2e-16 ***
## b:-0.6  -2.663020   0.182306 -14.6074 < 2.2e-16 ***
## b:-0.4  -2.739350   0.176865 -15.4883 < 2.2e-16 ***
## b:-0.2  -3.154121   0.218213 -14.4543 < 2.2e-16 ***
## b:0     -2.783774   0.175465 -15.8651 < 2.2e-16 ***
## d:-1     0.328445   0.013318  24.6625 < 2.2e-16 ***
## d:-0.8   0.524702   0.012231  42.9001 < 2.2e-16 ***
## d:-0.6   0.688027   0.019564  35.1675 < 2.2e-16 ***
## d:-0.4   0.766469   0.021571  35.5317 < 2.2e-16 ***
## d:-0.2   0.671583   0.010474  64.1184 < 2.2e-16 ***
## d:0      0.769531   0.012800  60.1181 < 2.2e-16 ***
## e:-1   186.373868   7.141401  26.0977 < 2.2e-16 ***
## e:-0.8 175.815854   4.108002  42.7984 < 2.2e-16 ***
## e:-0.6 165.148737   5.592968  29.5279 < 2.2e-16 ***
## e:-0.4 181.129209   5.542450  32.6804 < 2.2e-16 ***
## e:-0.2 119.010099   2.737743  43.4701 < 2.2e-16 ***
## e:0    123.492373   2.871132  43.0117 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error:
## 
##  0.01928091 (102 degrees of freedom)

We plot the fitted germination data to ensure everything is fine:

phal16_mean$reg_fit<-fitted(ht_reg)
g1<- ggplot(phal16_mean) +
  geom_point(aes(x= timeBef, y=propCum*100, color=as.factor(water)))+
  geom_line(aes(x= timeBef, y=reg_fit*100, color=as.factor(water)))
g2<- g1 +labs(x= "Time (day)" , y="Cumulative germination (%)", color="Water potential (MPa)")
g2

We estimate the \(GR\) values using ED function of drc:

g<- seq(0.1, 0.9, 0.1)
gr_ht<-1/ED(ht_reg, g, type="absolute", display=FALSE)

gr_ht<- as.data.frame(gr_ht)
rn_ht<- stringr::str_split_fixed(rownames(gr_ht), ":", 3) 
gr_ht$water <- as.numeric(rn_ht[,2])
gr_ht$g<- as.factor(rn_ht[,3])
gr_ht<-na.omit(gr_ht)

We then plot the \(GR\) values again water:

g1<- ggplot(gr_ht) +
  geom_line(aes(x= water, y=Estimate, color=as.factor(g)))+
geom_point(aes(x= water, y=Estimate, color=as.factor(g)))
g2<- g1 +labs(x= "Water potential (MPa)" , y="Germination rate (1/time)", color="Germination percentile (g)")
g2

\(GR\) lines seem to be parallel i.e. having the same slope. Let’s fit the hydrotime model to these \(GR\)s:

ht_gr_mod<-nls(Estimate~b*(water-wb[g]), start=list(b=0.02, wb=rep(-0.5,7)), data=gr_ht)
summary(ht_gr_mod)
## 
## Formula: Estimate ~ b * (water - wb[g])
## 
## Parameters:
##       Estimate Std. Error t value Pr(>|t|)    
## b    0.0064330  0.0004923  13.067 6.20e-13 ***
## wb1 -2.2653886  0.1461400 -15.501 1.19e-14 ***
## wb2 -1.8121967  0.1148407 -15.780 7.82e-15 ***
## wb3 -1.5386073  0.0970652 -15.851 7.03e-15 ***
## wb4 -1.3397968  0.0943268 -14.204 9.17e-14 ***
## wb5 -1.1367164  0.0830876 -13.681 2.18e-13 ***
## wb6 -0.9150904  0.0828965 -11.039 2.60e-11 ***
## wb7 -0.6562471  0.1026218  -6.395 8.96e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0008779 on 26 degrees of freedom
## 
## Number of iterations to convergence: 2 
## Achieved convergence tolerance: 1.754e-08

In the above table, the reciprocal of b represents the hydrotime constant ($== = 155.45 $MPa h) meaning that 155.45 MPa h is needed for seed to germinate but some will do it fast because they have a more negative \(\psi_b\) (e.g. -2.26 for \(g=0.1\)) or slow if they have a more positive \(\psi_b\) (e.g. -0.65 for \(g=0.7\)). A plot of data shows That fits match the data very well:

X_wt<- expand.grid(water=seq(-2.3,0,0.01), g = unique(gr_ht$g))
X_wt$fits<-predict(ht_gr_mod, newdata = X_wt)

g1<- ggplot(gr_ht) +
  geom_point(aes(x= water, y=Estimate, color=as.factor(g)))+
  geom_line(data=X_wt, aes(x= water, y=fits, color=as.factor(g))) + ylim(0, 0.02)
g2<- g1 +labs(x= "Water potential (MPa)" , y="Germination rate (1/time)", color="Germination percentile (g)")
g2
## Warning: Removed 648 rows containing missing values (geom_path).

Assuming that the distribution of \(\psi_b\) among seeds is normal, we can directly fit the hydrotime model to cumulative germination. We first rearrange the hydrotime model for \(\psi_{b(g)}\): \[\psi_{b(g)}=\psi-\frac{\theta}{t_g}\] which if normally distributed we can write: \[g(\psi,t_g)=\Phi(\overbrace{\psi-\frac{\theta}{t_g}}^{\psi_{b(g)}},\psi_{b(50)},\sigma_{\psi})\] where \(\Phi\) is the CDF of a normal distribution. In R the CDF of normal distribution is given by pnorm function. We fit the above model to germination data using nls:

ht_cumGerm <- nls(propCum~pnorm(water-(theta/timeBef), wb_50, sig_wb), start = c(theta=150, wb_50 = -1, sig_wb = 0.2), data=phal16_mean)
summary(ht_cumGerm)
## 
## Formula: propCum ~ pnorm(water - (theta/timeBef), wb_50, sig_wb)
## 
## Parameters:
##         Estimate Std. Error t value Pr(>|t|)    
## theta  175.26636    7.85370   22.32   <2e-16 ***
## wb_50   -1.18736    0.03386  -35.07   <2e-16 ***
## sig_wb   0.96121    0.03571   26.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0432 on 117 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 2.249e-06

Our hydrotime model has three parameters:

  • theta (\(\theta\)) = hydrotime constant, a measure of rapidity of germination (the smaller the faster)
  • wb_50 (\(\psi_{b(50)}\)) = the median base water potential i.e. the water potential at or below which 50% of seed cannot germinate, a measure of tolerance to water stress (the more negative the more tolerant)
  • sig_wb(\(\sigma_{\psi}\)) = standard deviation for the normally distributed \(\psi_{b(g)}\) values, also a measure of germination uniformity (the smaller the more uniform). Now we plot the fitted values to examine the performance of model:
phal16_mean$ht_fits<-fitted(ht_cumGerm)

g1<- ggplot(phal16_mean) +
  geom_point(aes(x= timeBef, y=propCum*100, color=as.factor(water)))+
  geom_line(aes(x= timeBef, y=ht_fits*100, color=as.factor(water))) 
g2<- g1 +labs(x= "Time (h)" , y="Cumulative germination (%)", color="Water potential (MPa)")
g2

The overall performance is good but the model fails to properly describe the germination pattern in seeds incubated under maximum water stress of -1 MPa. Now that we know the estimates for the parameters, we can check how the distribution of \(\psi_{b(g)}\) looks like (dnorm stands for the probability distribution function, PDF, of normal distribution):

water <- seq(-4,2,0.05)
fit_gmax<-dnorm(water,coef(ht_cumGerm)["wb_50"],coef(ht_cumGerm)["sig_wb"])

and then can be plotted:

g1<- ggplot() +
    geom_line(aes(x= water, y=fit_gmax), color="#00998a")+
    geom_area(aes(x = water[water>=0], y=fit_gmax[water>=0]), fill="#00998a", alpha =0.5)
g2<- g1 +labs(x= "Water potential (MPa)" , y="Probability density")
g2

As you can see in the above figure, a considerable proportion of seeds has a positive base water potential (shaded area) meaning that they will not germinate even in pure (distilled, \(\psi=0\)) water. This shaded area represents the fraction of seeds that are dormant possessing a \(\psi_b\geq 0\)). To estimate the share of shaded area (i.e. proportion of dormant seeds, \(Q\)) we use: \[Q=1-\Phi(0,\psi_{b(50)},\sigma_{\psi})\] which in R language translates to:

Q <- 1-pnorm(0,coef(ht_cumGerm)["wb_50"],coef(ht_cumGerm)["sig_wb"])
Q
## [1] 0.1083638

The estimate \(Q\) suggest that ~10% of seeds in this seedlot are dormant.

Hydrothermal Time Model (HTT)

As with thermal time model, HTT models can be classified into two categories of sub-optimal and supra-optimal models. We start with a sub-optimal HTT and then modify it to include the supra-optimal temperatures.

HTT Model for Sub-optmil Temperature Range

The effect of [sub-optimal] temperature can be incorporated into a hydrotime model following Gummerson (1986) and Bradford (1995): \[\theta_{HT}=\left(\psi-\psi_{b(g)}\right)\left(T-T_b\right)t_g\] where \(\theta_{HT}\) is the hydrothermal time constant (with unit MPa \(^{\circ}\)C day) meaning that all germinable seed will germinate after a the same amount of hydrothermal time accumulation but as they have different base water potential values they will germinate at different time. The model also assumes that \(\psi_{b(g)}\) is independent of temperature and that \(T_b\) is independent of water potential. As before, we also assume that \(\psi_{b(g)}\) is normally distributed. To fit the above model to cumulative germination data we rearrange it for \(\psi_{b(g)}\): \[\psi_{b(g)}= \psi-\left(\frac{\theta_{HT}}{(T-T_b)t_g}\right)\] which if we hold by the assumption of a normal distribution, we obtain: \[g(\psi,T,t_g)=\Phi\left(\overbrace{\psi-\left(\frac{\theta_{HT}}{(T-T_b)t_g}\right)}^{\psi_{b(g)}},\psi_{b(50)},\sigma_{\psi}\right)\] Let’s include the data from other [sub-optimal] temperature levels of the phalaris data and fit the above HTT model:

phal_SubHTT<- subset(phalaris, temp<=16)
phal_SubHTT_mean <- aggregate(propCum~water+temp+timeBef,mean, data=phal_SubHTT)
head(phal_SubHTT_mean, n=6)
##   water temp timeBef propCum
## 1  -1.0    8       0       0
## 2  -0.8    8       0       0
## 3  -0.6    8       0       0
## 4  -0.4    8       0       0
## 5  -0.2    8       0       0
## 6   0.0    8       0       0

The above dataset includes three temperature levels (8, 12, and 16 \(^{\circ}\)C) crossed with six level of water potentials (-1.0, -0.8, -0.6, -0.4, -0.2, and 0 MPa). We first create a function for the HTT model as follows:

sub_HTT_fun <- function(water, temp, tg, theta_ht, Tb, wb_50, sigma_wb){
  wbg <- water-(theta_ht/((temp-Tb)*tg))
  CumGerm <- pnorm(wbg,wb_50, sigma_wb)
  return(CumGerm)
}

and then fit it to the data using nls:

sub_HTT<- nls(propCum~sub_HTT_fun(water, temp, timeBef, theta_ht, Tb, wb_50, sigma_wb), start = c(theta_ht = 2000, Tb = 0, wb_50 = -1.5, sigma_wb = 0.5), data=phal_SubHTT_mean)
summary(sub_HTT)
## 
## Formula: propCum ~ sub_HTT_fun(water, temp, timeBef, theta_ht, Tb, wb_50, 
##     sigma_wb)
## 
## Parameters:
##            Estimate Std. Error t value Pr(>|t|)    
## theta_ht 2503.65998  107.89261  23.205   <2e-16 ***
## Tb          1.89419    0.21875   8.659   <2e-16 ***
## wb_50      -1.25926    0.02635 -47.784   <2e-16 ***
## sigma_wb    0.93653    0.02388  39.221   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04729 on 356 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 1.031e-06

The estimated base temperature and base water potential for seed germination in this species are 1.9 \(^{\circ}\)C and -1.26 MPa respectively. Let’s visually examine the model’s fits:

phal_SubHTT_mean$htt_fits<-fitted(sub_HTT)

g1<- ggplot(phal_SubHTT_mean) +
  geom_point(aes(x= timeBef, y=propCum*100, color=as.factor(water)))+
  geom_line(aes(x= timeBef, y=htt_fits*100, color=as.factor(water)))+
  facet_wrap(~as.factor(temp), nrow=1)
g2<- g1 +labs(x= "Time (h)" , y="Cumulative germination (%)", color="Water potential \n(MPa)") +
    theme(legend.position=c(0.09,0.73), legend.title = element_text(size=10), legend.background = element_rect(fill="transparent"))
g2

Deviation from actual germination is clear for some temperature \(\times\) water treatments. The use of a normal distribution and a single \(\psi_{b(g)}\) might explain part of this lack of fit.

HTT Model for Sub- and Supra-Optimal Temperature Ranges

Several studies have shown that \(\psi_{b(g)}\) varies with temperature particularly at \(T>T_o\). In fact the slower germination rate (and extent) at supra-optimal ranges has been attributed to an upward shift in \(\psi_{b(g)}\). An increase in \(\psi_{b(g)}\) (i.e. more positive) produces an effect that is similar to that of a reduced water availability (i.e. more negative water potential in the medium). Further, as the \(\psi_{b(g)}\) become larger (more positive), more seeds will be shifted to values above 0 MPa, which means they will never germinate even if the water potential is 0 MPa. Observing an upward shift in \(\psi_{b(g)}\) at \(T>T_o\), Alvarado and Bradford (2000) suggested the following modified HTT model: \[\begin{equation} \theta_{HT}=\begin{cases} \left(\psi-\psi_{b(g)}\right)\left(T-T_b\right)t_g & \text{if $T<T_o$}\\ \left(\psi-\left[\psi_{b(g)_o}+k(T-T_o)\right]\right)\left(T-T_b\right)t_g & \text{if $T\geq T_o$} \end{cases} \end{equation}\]

where \(\psi_{b(g)_o}\) denotes the \(\psi_{b(g)}\) at \(T=T_o\), which will increase with supra-optimal \(T\) at a rate equal to \(k\), thus \(k\) represents the rate of increase in \(\psi_{b(g)}\) as supra-optimal \(T\) increases. Mesgaran et al. (2018), however, found that the upward shift in \(\psi_{b(g)}\) occurs over all temperatures above \(T_b\), i.e. at both sub- and supra-optimal, and suggested the following model: \[\theta_{HT}=\left(\psi-\left[\psi_{b(g)_{Tb}}+k(T-T_b)\right]\right)\left(T-T_b\right)t_g\] where \(\psi_{b(g)_{Tb}}\) denotes the base water potential when \(T=T_b\). Now we fit the two models to the whole dataset of phalaris, which includes both sub- and supra-optimal ranges. The respective R function for the Alvarado-Bradford model can be formulated as:

HTT_AB_fun <- function(water, Temp,tg,theta_ht, Tb, To, wb_o, k, sigma_wb) {
  wb_50<-ifelse(Temp<To,wb_o,wb_o+k*(Temp-To))
  wb_g<- water-(theta_ht/((Temp-Tb)*tg))
  cumgerm <- pnorm(wb_g, wb_50,sigma_wb)
  return(cumgerm)
}

We fit HTT_AB_fun function to mean germination data of phalaris dataset using nls:

phal_mean<- aggregate(propCum~water+temp+timeBef, mean, data=phalaris)

HTT_AB<-nls(propCum~HTT_AB_fun(water, temp,timeBef, theta_ht, Tb, To, wb_o, k, sigma_wb), start=c(theta_ht = 200, Tb=1, To=16, wb_o=-1, k=0.01, sigma_wb=0.4), data=phal_mean)
summary(HTT_AB)
## 
## Formula: propCum ~ HTT_AB_fun(water, temp, timeBef, theta_ht, Tb, To, 
##     wb_o, k, sigma_wb)
## 
## Parameters:
##            Estimate Std. Error t value Pr(>|t|)    
## theta_ht  2.157e+03  5.372e+01   40.16   <2e-16 ***
## Tb        3.086e+00  1.359e-01   22.70   <2e-16 ***
## To        1.228e+01  3.593e-01   34.17   <2e-16 ***
## wb_o     -1.322e+00  2.069e-02  -63.90   <2e-16 ***
## k         5.243e-02  1.411e-03   37.16   <2e-16 ***
## sigma_wb  9.398e-01  1.371e-02   68.56   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0405 on 714 degrees of freedom
## 
## Number of iterations to convergence: 8 
## Achieved convergence tolerance: 8.716e-07

The model suggests a \(T_o=12.3\) \(^\circ\)C at which the median base water potential is -1.3 MPa but increase with a slope \(k=0.052\). Investigating the model shows a reasonable fit to data:

phal_mean$fits_AB<-fitted(HTT_AB)

g1<- ggplot(phal_mean) +
  geom_point(aes(x= timeBef, y=propCum*100, color=as.factor(water)))+
  geom_line(aes(x= timeBef, y=fits_AB*100, color=as.factor(water)))+
  facet_wrap(~as.factor(temp), nrow=2)
g2<- g1 +labs(x= "Time (h)" , y="Cumulative germination (%)", color="Water potential \n(MPa)") +
    theme(legend.position="top", legend.title = element_text(size=10), legend.background = element_rect(fill="transparent"))
g2

Now we fit the HTT model with a \(\psi_{b(g)}\) that varies across all temperatures and compare it with Alvarado-Bradford’s model:

HTT_MBM_fun <- function(water, Temp,tg,theta_ht, Tb, wb_Tb, k, sigma_wb) {
  wb_50<-wb_Tb+k*(Temp-Tb)
  wb_g<- water-(theta_ht/((Temp-Tb)*tg))
  cumgerm <- pnorm(wb_g, wb_50,sigma_wb)
  return(cumgerm)
}

We fit HTT_MBM_fun to germination data:

HTT_MBM<-nls(propCum~HTT_MBM_fun(water, temp,timeBef, theta_ht, Tb, wb_Tb, k, sigma_wb), start=c(theta_ht = 2000, Tb=1, wb_Tb=-1, k=0.01, sigma_wb=0.4), data=phal_mean)
summary(HTT_MBM)
## 
## Formula: propCum ~ HTT_MBM_fun(water, temp, timeBef, theta_ht, Tb, wb_Tb, 
##     k, sigma_wb)
## 
## Parameters:
##            Estimate Std. Error t value Pr(>|t|)    
## theta_ht  1.960e+03  4.483e+01   43.73   <2e-16 ***
## Tb        4.148e+00  8.462e-02   49.02   <2e-16 ***
## wb_Tb    -1.757e+00  2.620e-02  -67.07   <2e-16 ***
## k         5.375e-02  1.214e-03   44.29   <2e-16 ***
## sigma_wb  9.398e-01  1.344e-02   69.92   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03973 on 715 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 4.486e-06

The AIC test shows that the latter model (Mesgaran’s) fits better:

AIC(HTT_AB,HTT_MBM)
##         df       AIC
## HTT_AB   7 -2566.004
## HTT_MBM  6 -2594.683

Further, Mesgaran’s model has one fewer parameter producing a smaller error (0.03973 vs. 0.0405), which strongly suggests that \(\psi_{b(g)}\) changes occur across all temperature but NOT only beyond \(T_o\). We plot the data to better inspect the fits:

phal_mean$fits_MBM<-fitted(HTT_MBM)

g1<- ggplot(phal_mean) +
  geom_point(aes(x= timeBef, y=propCum*100, color=as.factor(water)))+
  geom_line(aes(x= timeBef, y=fits_MBM*100, color=as.factor(water)))+
  facet_wrap(~as.factor(temp), nrow=2)
g2<- g1 +labs(x= "Time (h)" , y="Cumulative germination (%)", color="Water potential \n(MPa)") +
    theme(legend.position="top", legend.title = element_text(size=10), legend.background = element_rect(fill="transparent"))
g2

A robust test as to whether \(\psi_{b(g)}\) is changing over all \(T\) or \(T>T_o\) can be done by fitting a hydrotime model for each individual level of temperature and then exploring \(\psi_{b(g)}\) changes in response to \(T\).

HT_indvl<-nls(propCum~pnorm(water-(theta[as.factor(temp)]/timeBef),wb50[as.factor(temp)],s[as.factor(temp)]), start=list(theta=rep(100,6), wb50=rep(-1,6), s =rep(0.3,6)), data=phal_mean)
summary(HT_indvl)
## 
## Formula: propCum ~ pnorm(water - (theta[as.factor(temp)]/timeBef), wb50[as.factor(temp)], 
##     s[as.factor(temp)])
## 
## Parameters:
##         Estimate Std. Error t value Pr(>|t|)    
## theta1 486.16246   23.46029   20.72   <2e-16 ***
## theta2 253.29050   10.36309   24.44   <2e-16 ***
## theta3 175.26636    6.93234   25.28   <2e-16 ***
## theta4 143.23788    6.12458   23.39   <2e-16 ***
## theta5  88.23167    4.31374   20.45   <2e-16 ***
## theta6  73.26737    3.81691   19.20   <2e-16 ***
## wb501   -1.52589    0.06298  -24.23   <2e-16 ***
## wb502   -1.34525    0.03845  -34.99   <2e-16 ***
## wb503   -1.18736    0.02989  -39.73   <2e-16 ***
## wb504   -0.94577    0.02563  -36.91   <2e-16 ***
## wb505   -0.62505    0.01982  -31.53   <2e-16 ***
## wb506   -0.47747    0.01812  -26.35   <2e-16 ***
## s1       0.82777    0.03322   24.92   <2e-16 ***
## s2       0.98903    0.03482   28.41   <2e-16 ***
## s3       0.96121    0.03152   30.49   <2e-16 ***
## s4       0.97015    0.03166   30.64   <2e-16 ***
## s5       0.96799    0.03111   31.12   <2e-16 ***
## s6       0.87709    0.02709   32.37   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.03813 on 702 degrees of freedom
## 
## Number of iterations to convergence: 7 
## Achieved convergence tolerance: 8.389e-06

Now we plot wb50 against temp:

wb50<-coef(HT_indvl)[7:12]
theta<-coef(HT_indvl)[1:6]
temp <- unique(phal_mean$temp)

g1<- ggplot() +
  geom_line(aes(x= temp, y=wb50), linetype="dashed")+
  geom_point(aes(x= temp, y=wb50), size=2.5, color="#00998a")
g2<- g1 +labs(x= expression("Temperature " (degree*C)) , y=TeX("$\\psi_{b(g)}$ (MPa)"))
g2

It is very clear that \(\psi_{b(g)}\) changes linearly with \(T\) over both sub- and supra-optimal ranges.

In case interested in plotting changes in parameter \(\theta\) along with \(\psi_{b(g)}\) the use:

g3<- ggplot() +
  geom_line(aes(x= temp, y=theta), linetype="dashed")+
  geom_point(aes(x= temp, y=theta), size=2.5, color="#00998a")
g4<- g3 +labs(x= expression("Temperature " (degree*C)) , y=TeX("$\\theta$ (MPa hour)"))

theme_set(theme_gray())
plot_grid(g2, g4, labels = NULL)

Modeling Seedling Emergence

Almost all the concepts and mathematical models described above for seed germination can be applied to modeling seedling emergence. For seedling emergence, we often normalize the data (see Introduction) and use soil moisture and temperature to develop HTT models. However, almost all seedling emergence models that I have seen in the literature are not an actual HTT because they use a single value for base water potential (i.e. \(\psi_b\) not \(\psi_{b(g)}\)) which is not what we observed above and in many germination studies (see Bradford, 2002). Most seedling emergence HTT models use a form of constrained GDD model wherein the accumulation of heat (GDD) becomes nil when the soil moisture is below a single water potential level, \(\psi_b\), which is defined as the base water potential of the weed species. Mathematically the HTT model can be formulated as: \[ HTT=\sum_{t=1}^{n} (T_t-T_b)H \] where \(T_t\) is the mean daily or hourly temperature and \(H = \{0,1\}\) is a binary parameter whose value is determined by the soil moisture, \(\psi_s\). When soil water potential is higher than \(\psi_b\), then \(H =1\) but if the soil is dry so that \(\psi_s\leq \psi_b\), then \(H =0\) and hence there will be no heat accumulation. We use sum \(\sum\) as temperature is variable at each day, otherwise this is the same equation as with \(GDD=(T-T_b)t_g\) used for the analysis of seed germination. In most seedling emergence studies, the two physiological parameters (\(T_b\) and \(\psi_b\)) are derived from lab or controlled condition experiments rather than being estimated as part of the model fitting process. Following the calculation of GDD, a wide range of sigmoid models can be fitted to cumulative, normalized germination data. Three most widely used functions are shown below:

  • Log-logistic equation: \[ g = \frac{E_{max}}{1+{\left(\frac{HTT}{\sigma_L}\right)}^b} \]

    • \(E_{max}\) = maximum emergence,
    • \(\sigma_L\) = hydrothermal time to 50% of \(E_{max}\),
    • \(b\) = slope around inflection point.
  • Weibull equation: \[ g=E_{max}\left(1-exp\left(-k{\left(HTT-\mu_W\right)}^c\right)\right) \]

    • \(E_{max}\) = maximum emergence,
    • \(k\) = rate of increase in emergence,
    • \(\mu_W\) = hydrothermal time to initiation of emergence (lag phase),
    • \(c\) = shape parameter.
  • Weibull has other forms too: \[ g=E_{max}\left(1-exp\left(-{\left(\frac{HTT-\mu_W}{\sigma_W}\right)}^c\right)\right) \]

    • \(\mu_W\) = hydrothermal time to initiation of emergence (lag phase),
    • \(\sigma_W\) = the sum \(\mu_W + \sigma_W\) equals the HTT to \(\approx63%\) of \(E_{max}\).
  • Gompertz equation: \[ g=E_{max}\left(exp\left(-exp\left(-\left(\frac{HTT-\mu_G}{\sigma_G}\right)\right)\right)\right) \]

    • \(\mu_G\) = hydrothermal time to \(\approx36%\) of \(E_{max}\),
    • \(\sigma_G\) = scale parameter.

Here we use an example data (for barnyardgrass, Echinochloa crus-galli) to practice the calculation of HTT followed by fitting one of the above equations.

emerg<- read.csv('Emergence.csv')
head(emerg,n=6)
##   Site   Date DAP SoilTemp SoilWater Emergence
## 1    A 5/2/04   1      6.0       -10         0
## 2    A 5/3/04   2     10.4       -10         0
## 3    A 5/4/04   3      6.8       -10         0
## 4    A 5/5/04   4     14.6       -10         0
## 5    A 5/6/04   5      6.5       -10         0
## 6    A 5/7/04   6     10.4       -10         0
g1<- ggplot(emerg) +
  geom_line(aes(x= DAP, y=Emergence, color=Site))
g2<- g1 +labs(x= "Time (day)" , y="Emergence (%)")
g2

The data are from two different sites with varying level of infestation. We therefore need to normalize the emergence data (relative to maximum of each site) for each site:

emerg$norm_emerg = 100*emerg$Emergence / ave(emerg$Emergence, emerg$Site, FUN=max)

now if we plot them, the final emergence at both sites is always 100%:

g1<- ggplot(emerg) +
  geom_line(aes(x= DAP, y=norm_emerg, color=Site))
g2<- g1 +labs(x= "Time (day)" , y="Normalized emergence (%)")
g2

Now we write a function to calculate the HTT accumulation:

HTT_sum<-function(SoilTemp, SoilWater, Tb, H)
  {
  HTT<-(SoilTemp-Tb)*(SoilWater>H & SoilTemp>Tb)
  cumHTT<- cumsum(HTT)
return(cumHTT)
  }

And, a function for Weibull equations

Weibull<- function(cumHTT, mu_w, sigma_w, c)
  {
CumGerm<- 100*(1-exp(-((cumHTT-mu_w)/sigma_w)^c))*(cumHTT>mu_w)
return(CumGerm)
  }

and one for log-logistic equation:

Loglogistic<- function(cumHTT, mu_w, c){
CumGerm<- (100/(1+(cumHTT/mu_w)^c))*(cumHTT>0)
return(CumGerm)
}

We need to calculate the HTT for each site separately before we can fit any of the above models to emergence data. We use lapply function to apply our HTT_sum function to each level of Site variable:

Tb <- 10
H <- -0.1
emerg$cumHTT<- unlist(lapply(split(emerg,emerg$Site),function(x) HTT_sum(x[,4], x[,5], Tb, H)))

Note that in the HTT_sum function we used \(T_b = 10\) \(^\circ\)C and \(\psi_b=-0.1\) MPa which represent the base temperature and base water potential, respectively, for barnyardgrass. Now we have a new column, named as cumHTT in our dataset which will be used in nls function for fitting the log-logistic model to the emergence data:

HTT_logistic<-nls(norm_emerg~Loglogistic(cumHTT, mu_w, c),start=c(mu_w = 100,  c = -3), data=emerg)
summary(HTT_logistic)
## 
## Formula: norm_emerg ~ Loglogistic(cumHTT, mu_w, c)
## 
## Parameters:
##      Estimate Std. Error t value Pr(>|t|)    
## mu_w 79.70520    0.66662   119.6   <2e-16 ***
## c    -3.30639    0.07835   -42.2   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.685 on 192 degrees of freedom
## 
## Number of iterations to convergence: 5 
## Achieved convergence tolerance: 9.144e-07

The amount of HTT needed to 50% emergence is about 80 \(^\circ\)C MPa day. Here we plot the data:

emerg$logisticfit<-fitted(HTT_logistic)

g1<- ggplot(emerg) +
  geom_point(aes(x= cumHTT, y=norm_emerg, color=Site))+
  geom_line(aes(x= cumHTT, y=logisticfit),color="#5F6B61")
g2<- g1 +labs(x= TeX("HTT (degree MPa day)") , y="Normalized emergence (%)")
g2

And that’s how the model looks like if emergence is plotted against time (day since planting) rather than the cumulative HTT:

emerg$logisticfit<-fitted(HTT_logistic)

g1<- ggplot(emerg) +
  geom_point(aes(x= DAP, y=norm_emerg, color=Site))+
  geom_line(aes(x= DAP, y=logisticfit, color=Site))
g2<- g1 +labs(x= TeX("Time (day since planting") , y="Normalized emergence (%)")
g2

Although the two sites have different emergence pattern (emergence at site A is late), the HTT model could describe these patterns very well. We can fine tune the model by changing the assumed values for base temperature and base water potential and/or by adopting a different type of sigmoid model e.g. a Weibull. We may even combine two models (e.g. biweibull as in Leno et al., 2015) when the emergence is bimodal. I will include more advanced topics in this document in the near future: so stay tuned!

Refrences

Alvarado, V., & Bradford, K. J. (2002). A hydrothermal time model explains the cardinal temperatures for seed germination. Plant, Cell & Environment, 25(8), 1061-1069.

Bradford, K. J. (1995). Water relations in seed germination. Seed development and germination, 1(13), 351-396.

Bradford, K. J. (2002). Applications of hydrothermal time to quantifying and modeling seed germination and dormancy. Weed Science, 50(2), 248-260.

Colbach, N., Chauvel, B., Gauvrit, C., & Munier-Jolain, N. M. (2007). Construction and evaluation of ALOMYSYS modelling the effects of cropping systems on the blackgrass life-cycle: from seedling to seed production. Ecological Modelling, 201(3-4), 283-300.

Gummerson, R. J. (1986). The effect of constant temperatures and osmotic potentials on the germination of sugar beet. Journal of Experimental Botany, 37(6), 729-741.

Leblanc, M. L., Cloutier, D. C., Stewart, K. A., & Hamel, C. (2004). Calibration and validation of a common lambsquarters (Chenopodium album) seedling emergence model. Weed science, 52(1), 61-66.

Leon, R. G., Izquierdo, J., & González-Andújar, J. L. (2015). Characterization and modeling of itchgrass (Rottboellia cochinchinensis) biphasic seedling emergence patterns in the tropics. Weed Science, 63(3), 623-630.

Mesgaran, M. B., Onofri, A., Mashhadi, H. R., & Cousens, R. D. (2017). Water availability shifts the optimal temperatures for seed germination: A modelling approach. Ecological Modelling, 351, 87-95.

Vleeshouwers, L. M., & Kropff, M. J. (2000). Modelling field emergence patterns in arable weeds. New Phytologist, 148(3), 445-457.

Washitani, I. (1987). A convenient screening test system and a model for thermal germination responses of wild plant seeds: behaviour of model and real seeds in the system. Plant, Cell & Environment, 10(7), 587-598.