This chapter introduces the concept of modeling growth of a sample of individual fish. When there is a direct method to age a fish species, it is likely to have one or more datasets of paired sizes and ages. When ages are measured from a fishery sample, the form of the data is unsually nonlinear over time, usually measured in years.
Using basic R code, we will fit female growth data to a linear and then a non-linear model, to demonstrate the approach. We will evaluate growth of two groups (i.e., females and males) to expand on that approach, using both basic R code and an R package (‘fishmethods’).
We will begin by constrasting a linear and a nonlinear model of somatic growth. Both models are capable of demonstrating growth, defined here as increases in body size per unit time; however, growth over a lifetime will generally be non-linear, because allocations to reproductive growth causes somatic growth of immature fish to be faster than mature fish. This can be illustrated here using a well known data set of mean length at age data for Pacific hake (Kimura, 1980; Fish. Bull. 77: 765-776). Kimura reported an average size of females for each of 13 age classes as:
age = c(1,2,seq(3.3,13.3,1)) # see Table 1 at http://fishbull.noaa.gov/77-4/kimura.pdf
length=c(15.40, 28.03, 41.18, 46.20, 48.23, 50.26, 51.82, 54.27, 56.98, 58.93, 59.00, 60.91, 61.83)
Kimura_F <- data.frame(age, length)
Plotting the data (left, below, triangles) shows that growth is fastest among the youngest fish and slowest among the oldest fish.
l.fit<-lm(Kimura_F$length~Kimura_F$age); par(mfrow=c(1,2))
AIC.l.fit <- signif(AIC(l.fit), digits = 3)
plot(Kimura_F$age, Kimura_F$length, pch = 2, type="b", xlab = 'Age (years)', ylab = 'Length (cm)', xlim=range(Kimura_F$age), ylim=range(Kimura_F$length), main="Female hake")
abline(l.fit, lwd=3, lty=3); legend (0, 65, paste("AIC =", AIC.l.fit), bty = 'n')
hist(residuals(l.fit), xlab='Residuals', main='Quality check')
Although there is a strong linear correlation between age and length (r = 0.907), the predicted fit of a linear model (heavy dashed line, \(Y=a+bX\)) severly over- or underestimates size at most age classes. The value for the Akaike’s information criterion (AIC) has no context yet, but we will return to it later.
A histogram of the model residuals (right, above) shows a skewed distribution to suggest a poor fit between data and model. In sum, a simple linear model is not appropriate for this data set.
Instead, many growth modeling investigations use a nonlinear equation, particularly the von Bertalanffy growth model (VBGM). There are several forms of the VBGM but we will use a common one:
\[L_t = L_∞ (1 - e^{(-K(t-t_o)}),\]
where \(L_t\) is the length at age (t, years); \(L_∞\) is the average size at the maximum age; \(K\) is the Brody coefficient (yr\(^{-1}\)); and \(t_o\) is the age when the average size is zero.
To fit the data to the VBGM in R we need to specify the model and initialize some starting values for the parameters as part of the “nls” procedure. Assigned names and suitable inital values for each parameter are:\(L_∞\), (Linf) = the maximum size observed in the data, \(K\), (K) = a small value, say 0.2, and \(t_o\), (t0) = zero (see “list” statement, below).
VBGM <- Kimura_F$length ~ Linf * (1 - exp(-K*(Kimura_F$age-t0)))
VBGM_KF<-nls(VBGM, start = list (Linf = max(Kimura_F$length), K = 0.2, t0 = 0))
Type “summary(VBGM_KF)” in your console to print the model statement (formula), as well as the parameter estimates and their uncertainty:
##
## Formula: Kimura_F$length ~ Linf * (1 - exp(-K * (Kimura_F$age - t0)))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Linf 61.23331 1.21410 50.435 2.27e-13 ***
## K 0.29625 0.02874 10.308 1.20e-06 ***
## t0 -0.05727 0.17534 -0.327 0.751
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.697 on 10 degrees of freedom
##
## Number of iterations to convergence: 7
## Achieved convergence tolerance: 2.07e-06
Note that the standard errors of both \(L_∞\) and \(K\) are small relative to the predicted point values (the estimate). If you double the SE then you approximate a 95% confidence limit, at least at reasonably large sample sizes; in this case, a 95% confidence limit added or substracted from the point estimates does not overlap with zero. The table makes this point with the student’s t-test (P < 0. 001 for both parameters). Of course, biologically speaking, we expect to see \(L_∞\) and \(K\) as significantly greater than zero.
This is not true for \(t_o\), which cannot be said to be different than zero; however, that is fine, too, because \(t_o\) is interpreted as an embryonic age. It is generally expected that \(t_o\) should be close to zero for oviparous species with small eggs. If not, it probably indicates that sampling of young fishes was inadequate to fit this parameter without bias. When working with livebearing fishes, there is an alternative form of the VBGM that uses size at partuition, \(L_o\), instead of \(t_o\). This is not specificied here but mentioned an example of how the VBGM is quite flexible.
In terms of overall model fit, the nonlinear model follows the Female Pacific hake data more closely (below, left) than we saw previously for the linear model. The residuals are not only smaller but they are more evenly distributed, both positive and negative (below, right). At this small sample size, the residuals are not obviously different than normally distributed.
par(mfrow=c(1,2)); AIC.nl.fit <- signif(AIC(VBGM_KF), digits = 3);
plot(Kimura_F$length~Kimura_F$age, main="Female hake", xlab = 'Age (years)', ylab = 'Length (cm)')
lines(Kimura_F$age, coef(VBGM_KF)[1]*(1 - exp(-coef(VBGM_KF)[2]*(Kimura_F$age-coef(VBGM_KF)[3])))); legend (0, 65, paste("AIC =", AIC.nl.fit), bty = 'n')
hist(residuals(VBGM_KF), xlab='Residuals', main='Quality check')
This graphical impression that the non-linear VBGM fits better than the simple linear model is supported by a quick check of Akiake’s Informaiton Criterion. The AIC for the VBGM is 55.2 whereas the AIC for the linear model is much higher (i.e., \(\Delta\)AIC = 32.2).
Checking the model fit is important. Here, we have used three approaches: 1) graphic check (scatterplot of data with predicted model fit overlayed); 2) residual check (histogram plot); and 3) evaluation of model fit using an information criterion (AIC). There are others methods (e.g., qqplots, etc.). A new book by Derek Ogle on Introduction to Fisheries Analyses with R, due in late 2015, is a good place to start (https://fishr.wordpress.com/books/ifar/)
It is not unusual to have more than one group of paired size-age data, for example, samples from different sexes, years, or geographic areas. Kimura also reported an average size of male Pacific hake, so it is natural that we would want to test for sexually dimorphic growth, which occurs in many species. When sex-specific growth occurs, recognizing this will lower the unexplained variance in a model that pools the sexes together, so we want to check for this early on in analysis. As before, a simple plot of the data is an excellent start.
age = c(1,2,seq(3.3,11.3,1))
length=c(15.40, 26.93, 42.23, 44.59, 47.63, 49.67, 50.87, 52.30, 54.77, 56.43, 55.88)
Kimura_M <- data.frame(age, length)
plot(Kimura_F$age, Kimura_F$length, pch = 2, type="b", xlab = '', ylab = '',
xlim=range(Kimura_F$age), ylim=range(Kimura_F$length))
lines(Kimura_M$age, Kimura_M$length, pch=16, type="b", xlab = 'Age (years)',
ylab = 'Length (cm)', xlim=range(Kimura_F$age), ylim=range(Kimura_F$length))
legend (10, 30, c("Females", "Males"), pch = c(2, 16))
At about age 4 and for all older fish, females are larger on average than males. The question is whether it is appropriate to pool these data, and disregard sex, or whether we need to account for sex in the growth model. Expanding the VBGM model to consider sex-specific parameters is a bit tedious without an R package, but very instructure, so let’s begin there by combining the datasets and adding a dummy variable for each sex.
Kimura_B<-data.frame(rbind(cbind(Kimura_F, sexF=1, sexM=0), cbind(Kimura_M, sexF=0, sexM=1)))
attach(Kimura_B)
a<-60; b<-0.2; c<-0;
KB_Gen<-(nls(Kimura_B$length ~ sexF*LinfF*(1-exp(-KF*(Kimura_B$age-t0F))) +
sexM*LinfM*(1-exp(-KM*(Kimura_B$age-t0M))),
start = list (LinfF=a, KF=b, t0F=c, LinfM=a, KM=b, t0M=c)))
Type “summary(KB_Gen)” in your console to print the model statement (formula), as well as the complete list of parameters, a total of six.
##
## Formula: Kimura_B$length ~ sexF * LinfF * (1 - exp(-KF * (Kimura_B$age -
## t0F))) + sexM * LinfM * (1 - exp(-KM * (Kimura_B$age - t0M)))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## LinfF 61.23331 1.17098 52.293 < 2e-16 ***
## KF 0.29625 0.02772 10.687 3.18e-09 ***
## t0F -0.05727 0.16912 -0.339 0.739
## LinfM 55.97801 1.13801 49.189 < 2e-16 ***
## KM 0.38558 0.04097 9.411 2.26e-08 ***
## t0M 0.17134 0.14892 1.151 0.265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.637 on 18 degrees of freedom
##
## Number of iterations to convergence: 7
## Achieved convergence tolerance: 1.356e-06
This model is considered the most general model because it allows each of the 3 parameters to vary by sex. A complete hierarchy of of the VBGM includes 7 specifications: 3 with one parameter in common, 3 more with two parameters in common, and 1 with all 3 parameters in common. Here we will consider 4 of these 7 models to illustrate the process:
#This model has three common parameters, as done when we introducted the VBGM
KB_<-(nls(Kimura_B$length ~ sexF*Linf*(1-exp(-K*(Kimura_B$age-t0))) +
sexM*Linf*(1-exp(-K*(Kimura_B$age-t0))),
start = list(Linf=a, K=b, t0=c)))
#These are models with one common parameter: Linf, K, or t0
KB_KT<-(nls(Kimura_B$length ~ sexF*Linf*(1-exp(-KF*(Kimura_B$age-t0F))) +
sexM*Linf*(1-exp(-KM*(Kimura_B$age-t0M))),
start = list(Linf=a, KF=b, t0F=c, KM=b, t0M=c)))
KB_LT<-(nls(Kimura_B$length ~ sexF*LinfF*(1-exp(-K*(Kimura_B$age-t0F))) +
sexM*LinfM*(1-exp(-K*(Kimura_B$age-t0M))),
start = list(LinfF=a, K=b, t0F=c, LinfM=a, t0M=c)))
KB_LK<-(nls(Kimura_B$length ~ sexF*LinfF*(1-exp(-KF*(Kimura_B$age-t0))) +
sexM*LinfM*(1-exp(-KM*(Kimura_B$age-t0))),
start = list(LinfF=a, KF=b, t0=c, LinfM=a, KM=b)))
Which of the models is the best? Since these are all ‘nested’ within a hierarchy, we can evaluate each by a ratio measure. This was explored by Kimura (1980) and will be covered in the next section. An AIC approach is suitable as well and is presented here.
AIC(KB_Gen, KB_KT, KB_LT, KB_LK, KB_)
## df AIC
## KB_Gen 7 98.8562
## KB_KT 6 106.3427
## KB_LT 6 100.5881
## KB_LK 6 98.0853
## KB_ 4 104.9336
The AIC penalizes models that have more parameters, since having more parameters can be a mindless way to increase the fit between the data and a model. Thus, the degrees of freedom, which are generally the number of obvious model parameters plus an error parameter, have already been accounted for in model evaluation by AIC.
Focus on the AIC score. As a general recommendation, the model with the lowest AIC score is the best, but differences in scores < 2 are not particularly compelling. Here, the model without sex-specific parameters (KB_) was the second worse in terms of AIC score. Again, this argues against pooling the data between sexes when generate parameter estimates. The model with a common value for \(t_o\) (KB_LK) has the lowest AIC score but it is within 1 unit of the full model (KB_Gen). Technically, it is difficult to select between these two models based on the AIC score alone; however it could be argued that since the \(t_o\) values are not strongly different between sexes or between zero, the KB_LK model will be the most practical to use going forward. We will return the subject of model evaluation in the next section, too.
Writing out and comparing each individual model in R code is a hard slog. Kimura’s approach is already part of an R package called “fishmethods”" so we will continue our illustration using this package. If you do not already have this installed, then you need to install this package by typing the following into the prompt on your console: install.packages (“fishmethods”). Once you have this package installed, you will still need to call it to the working directory using the library command (see below). This package also has the Pacific hake data from Kimura (1980), so call that up using the data command (also below).
library (fishmethods)
data (Kimura)
As before, the question is whether it is appropriate to pool these data, and disregard sex, or whether we need to account for sex in the growth model. The package “fishmethods’ has a command, vblrt, to test for differences between nested VBGMs. The details of the R code can be found at http://cran.r-project.org/web/packages/fishmethods/fishmethods.pdf. Here, the default code by Gary Nelson, the package’s author, is used.
Kimura_LLR<-vblrt(len=Kimura$length, age = Kimura$age, group=Kimura$sex, error=2, select=1, plottype=2)
The sex-specific residuals are overlapping in all models. The model-specific residuals are also within a common range of mostly -3 to 3. Nothing seems particularly amiss.
Kimura_LLR$results
## tests hypothesis chisq df p
## 1 Ho vs H1 Linf1=Linf2 9.49 1 0.002
## 2 Ho vs H2 K1=K2 3.73 1 0.053
## 3 Ho vs H3 t01=t02 1.23 1 0.267
## 4 Ho vs H4 Linf1=Linf2,K1=K2,t01=t02 12.08 3 0.007
The ratio values are tested using a \(\chi^2\) stastistic to see if models that have one or more common parameters (H1-H4) are different than the general or full model (Ho) that allowed all three parameters to vary by sex. The strong ratio difference between the common model with no sex-specific parameters (H4) and the full model (Ho) can be interpreted biologically to mean there is sexually dimorphic growth. Thus, the model that holds all three parameters in common (H4) should be rejected.
In terms of evaluating each individual parameter, Kimura states: “[T]here is a significant difference (\(\alpha\) = 0.002) in \(L_∞\) between sexes, a borderline significant difference (\(\alpha\) = 0.05) in \(K\), but no significant difference in \(t_o\)·” (Kimura, 1980p. 773)
We can also evaluate different models using the residual sums of squares (RSS), which is a measure of the discrepancy between the data and each model. Small RSS values indicate a good fit of the model to the data.
data.frame(Hypothesis = c("Full", as.character(Kimura_LLR$results$hypothesis)), Model=Kimura_LLR$rss[,1],
RSS=signif(as.numeric(as.character(Kimura_LLR$rss[,2])), 3))
## Hypothesis Model RSS
## 1 Full Ho 48.2
## 2 Linf1=Linf2 H1 71.6
## 3 K1=K2 H2 56.3
## 4 t01=t02 H3 50.8
## 5 Linf1=Linf2,K1=K2,t01=t02 H4 79.8
As before, the full model (Ho), which has sex-specific estimates for each of the three parmeters, is the best fit to the data based on score alone. Interpret this table differently than the \(\chi^2\) table presented previously. The higher the RSS, the more likely that the ‘hypothesis’ of no difference will be rejected. The model that holds \(t_o\) constant (H3) is not strongly different than the full model (Ho), which suggests that if we just held \(t_o\) constant between sexes we would have a pretty decent model.
We should examine these two models more closely. The display is a bit different, where the default (i.e., Linf) is for females, and the contrast (i.e., ls) tabulates how much the male parameter differs from females.
Kimura_LLR$`model Ho` #the general model
##
## Formula: len ~ (Linf + ls * cat) * (1 - exp(-(K + ks * cat) * (age - (t0 +
## ts * cat))))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Linf 61.23328 1.17097 52.293 < 2e-16 ***
## ls -5.25527 1.63286 -3.218 0.00477 **
## K 0.29625 0.02772 10.687 3.18e-09 ***
## ks 0.08933 0.04947 1.806 0.08771 .
## t0 -0.05726 0.16912 -0.339 0.73883
## ts 0.22860 0.22534 1.014 0.32380
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.637 on 18 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 3.185e-06
Kimura_LLR$`model H3` #the model that has a single value for t0
##
## Formula: len ~ (Linf + ls * cat) * (1 - exp(-(K + ks * cat) * (age - t0)))
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Linf 60.76750 1.02367 59.362 < 2e-16 ***
## ls -4.31371 1.39327 -3.096 0.00595 **
## K 0.31322 0.02359 13.280 4.59e-11 ***
## ks 0.04765 0.02859 1.667 0.11196
## t0 0.05705 0.11290 0.505 0.61914
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.634 on 19 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 2.547e-06
Once a model is slected, then these types of printouts provide the parameters for your predictive function.
We have approached this data set from several different, complementary angles. We easily dismissed the linear model approach and highlight the VBGM. We concluded that sexually dimorphic growth was evident based on graphs and statistical inference. We have not explored alternatives to the VBGM, which is another subject entirely. Nonetheless, the VBGM is based on biological principles of growth, familar to most everyone, and a perfectly good place to start.
A non-linear model with sex-specific parameters was well justified with this dataset. Older females are about 5 cm larger, on average, than males, and this is likely be to important in fishery applications. Using this form of the VBGM, \(K\) is correlated with \(L_∞\) so it is not surprising that \(K\) is at least marginally different between sexes.
Nonetheless, choosing between the specific models, such as Ho and H3, may depend on the context of the study. Since there is no statistical or biological support that \(t_o\) varies between sexes, then the H3 model is probably adequate. In fact, if simplicity is a premium, it could be argued that \(t_o\) could become fixed at zero and the VBGM would become a 2-parameter model. On the other hand, if growth at the end of the first year is important, sex-specific values for \(t_o\) may be more predictive, at least based on this dataset. Then the question shifts to how representative this dataset is for future applications. That may depend on the sample coverage and the number of independent samples, which goes back to sample design, something we can only evaluate after the fact.