Here are some examples of fitting a Beverton-Holt model (a saturating hyperbolic model of the basic form \( y = x/(1+x) \), also known as Michaelis-Menten; Monod; Holling type II; … ?) to data. It can be done via any nonlinear/maximum likelihood fitting routine, but with a bit of a trick you can also do it via glm(), which allows it be put automatically into a ggplot-based figure …
The data and question come from Holly Kindsvater: they're from Hocking MD and Reimchen TE. 2009. Salmon species, density, and watershed size predict magnitude of marine enrichment in riparian food webs. Oikos 118:1307-1318 (Table 1). \( X \) is salmon spawning biomass (kg/m); \( Y \) is the \( \delta^{15}N \) index (isotopic signature). Other than that, I don't know anything about the biological context.
load("bevholt_ex.RData")
library(bbmle)
## Loading required package: stats4
Here's (one parameterization of) the Beverton-Holt function, in which \( a \) is the asymptote (\( \lim_{x \to \infty} f(x) \)) and \( b \) is the half-maximum. (Since this model has two parameters and three places you could put them (\( \bullet x/(\bullet + \bullet x) \)), and each parameter could be specified as \( c \) or \( 1/c \), there are at least 8 ways to parameterize the model … alternatively, one could specify the asymptote, half-maximum, initial slope, or reciprocals of any of them …)
bevholt = function(S, a, b) {
S * a/(b + S)
}
Preliminary view of the data along with initial guess:
par(las = 1, bty = "l")
plot(Y ~ X, data = dat)
maxY <- max(dat$Y)
curve(bevholt(x, a = maxY, b = maxY), add = TRUE)
We might already guess that this fit will be fairly poorly determined because both the asymptote and the half-maximum will be large (i.e. the fit may be pretty close to linear over the range of the data …)
If we were just approaching this as a MLE/non-linear least squares problem we might run into trouble:
nls(Y ~ bevholt(X, a, b), start = list(a = maxY, b = maxY), data = dat)
## Error: singular gradient
If we use bbmle::mle2 it might be a little more robust, but we
still run into convergence failures
fit2 <- mle2(Y ~ dnorm(mean = bevholt(X, a, b), sd = exp(logsd)), data = dat,
start = list(a = maxY, b = maxY, logsd = 0))
## Warning: convergence failure: code=1 (iteration limit 'maxit' reached)
Increasing the number of iterations (e.g. control=list(maxit=10000))
or using other optimizers doesn't really help much; we'll see
why shortly.
coef(fit2)
## a b logsd
## 12.1243 164.4537 0.7619
We see that as previously hinted both the asymptote \( a \) and the half-maximum \( b \) are estimated as being well beyond the range of the data: the best fit is very close to a straight line, which would correspond to making both the asymptote and the half-maximum infinite …
par(las = 1, bty = "l")
plot(Y ~ X, data = dat)
with(as.list(coef(fit2)), curve(bevholt(x, a = a, b = b), lwd = 2, add = TRUE))
We can see a very slight curvature, but this is very close to straight – and we can't really tell whether a straight line would be better, since the parameters would have to be infinite and the optimizer can't really handle that. One alternative would be to fit a straight line and compare likelihoods:
fit2L <- mle2(Y ~ dnorm(mean = c + d * X, sd = exp(logsd)), data = dat, start = list(c = 0,
d = 0.01, logsd = 0))
fit2L0 <- mle2(Y ~ dnorm(mean = d * X, sd = exp(logsd)), data = dat, start = list(d = 0.01,
logsd = 0))
-logLik(fit2)
## 'log Lik.' 187.7 (df=3)
-logLik(fit2L)
## 'log Lik.' 183.1 (df=3)
-logLik(fit2L0)
## 'log Lik.' 186.2 (df=2)
It looks like the linear model is actually better (the negative
log-likelihood is about 3 units lower).
However, the main reason for the improvement is that the linear
model allows the intercept to be non-zero: if we force the intercept
to be zero, then the linear fit is only slightly better than
the Beverton-Holt (but with one less parameter).
We could
also have used lm(), of course, but sometimes when comparing
different fitting methods it's hard to
figure out whether the log-likelihoods are really
using the same additive constants or not.
We can also reparameterize the model so that at least one of the parameters is 0, rather than infinite, in the limit of a straight line. For example, if \( y=x/(ex+f) \), then \( e \) is the reciprocal of the asymptote and \( f \) is the reciprocal of the initial slope (it will turn out that this is a useful parameterization for other reasons as well):
fit3 <- mle2(Y ~ dnorm(mean = X/(e * X + f), sd = exp(logsd)), data = dat, start = list(e = 0.1,
f = 1, logsd = 1))
-logLik(fit3)
## 'log Lik.' 186.1 (df=3)
We get a log-likelihood that is very slightly better than the linear model through the origin, but the reciprocal-asymptote parameter is actually negative:
coef(fit3)
## e f logsd
## -0.02049 22.49721 0.74536
This may or may not make any biological sense (it implies that the function goes through a singularity at \( x=-f/e=1098 \))
We can constrain \( e \) to be \( \geq 0 \):
fit4 <- update(fit3, lower = c(a = 0, b = 0, logsd = -Inf), method = "L-BFGS-B")
Here's the comparison – it's worth keeping in mind that none of these curves is likely to be “significantly” different in the non-statistical sense:
par(las = 1, bty = "l")
plot(Y ~ X, data = dat)
with(as.list(coef(fit2)), curve(bevholt(x, a = a, b = b), add = TRUE))
with(as.list(coef(fit2L)), curve(c + d * x, col = 2, add = TRUE))
with(as.list(coef(fit2L0)), curve(d * x, col = 4, add = TRUE))
with(as.list(coef(fit3)), curve(x/(e * x + f), col = 5, add = TRUE))
with(as.list(coef(fit4)), curve(x/(e * x + f), col = 6, add = TRUE))
(the red line is the linear fit that is allowed to have a non-zero intercept).
AICtab(fit2, fit2L, fit2L0, fit3, fit4)
## dAIC df
## fit2L 0.0 3
## fit2L0 4.1 2
## fit4 6.0 3
## fit3 6.0 3
## fit2 9.2 3
But what if we want confidence intervals? Getting confidence intervals on a nonlinear fit is a bit of a nuisance (we can use the delta method, which may under some circumstances be an unreliable approximation, or bootstrapping or MCMC, which may be nuisances to implement).
In principle we can fit the last form of the Beverton-Holt shown above (\( y=x/(ex+f) \)) by using a Gaussian GLM with an inverse link and a response variable of \( 1/x \), i.e. \( 1/y \sim \text{Normal}(\mu=e+f\cdot(1/x)) \):
g1 <- glm(Y ~ I(1/X), family = gaussian(link = "inverse"), data = dat)
However, this doesn't actually work:
coef(g1)
## (Intercept) I(1/X)
## 1.700 -2.041
It's OK if we start from the results of fit3:
coef(g2 <- glm(Y ~ I(1/X), family = gaussian(link = "inverse"), data = dat,
start = coef(fit3)[1:2]))
## (Intercept) I(1/X)
## -0.02112 22.56702
Or even from the starting values that we used for fit3:
coef(g2B <- glm(Y ~ I(1/X), family = gaussian(link = "inverse"), data = dat,
start = c(0.01, 1)))
## (Intercept) I(1/X)
## -0.02112 22.56697
The convenience of doing things this way is that now ggplot2 can
automatically add the curve to a plot, with error bars:
library(ggplot2)
theme_set(theme_bw())
p1 <- ggplot(dat, aes(X, Y)) + theme_bw()
p1 + geom_point() + geom_smooth(method = "glm", family = gaussian(link = "inverse"),
formula = y ~ I(1/x), start = c(0.01, 1))
glm() fit is a local maximum (e.g. via emdbook::slice())?glm2::glm2() does better with unspecified starting points? How about nls2, minpack.lm ?glm2 gets the same bad answer as glm, although at least we get a warning message:
coef(g3 <- glm2::glm2(Y ~ I(1/X), family = gaussian(link = "inverse"), data = dat))
## (Intercept) I(1/X)
## 1.700 -2.041
-logLik(g3)
## 'log Lik.' 204.5 (df=3)
-logLik(g2)
## 'log Lik.' 186.1 (df=3)
What does the likelihood/SSQ surface look like?
## dnorm with sd profiled out
dnorm2 <- function(x, mean, log = FALSE) {
ssq <- sum((x - mean)^2)
dnorm(x, mean, sd = sqrt(ssq/length(x)), log = log)
}
ff <- function(e, f, X = dat$X, Y = dat$Y) {
mu <- e + f * (1/X)
pred <- 1/mu
-sum(dnorm2(Y, mean = pred, log = TRUE))
}
This agrees with the previous fit:
do.call(ff, as.list(coef(fit3)[1:2]))
## [1] 186.1
-logLik(fit3)
## 'log Lik.' 186.1 (df=3)
We can draw a likelihood surface to see what's going wrong with the GLM (or MLE) fits, picking a range that includes both the 'bad' and 'good' fits:
library(emdbook)
par(las = 1)
cc <- curve3d(ff(x, y), xlim = c(-0.5, 2), ylim = c(-5, 25), sys3d = "image",
useRaster = TRUE, col = gray((100:1)/100), xlab = "e", ylab = "f")
with(cc, contour(x, y, z, add = TRUE))
points(coef(g1)[1], coef(g1)[2], pch = 16, col = 4)
points(coef(g2)[1], coef(g2)[2], pch = 16, col = 5)
points(0, 0, col = 2, pch = 16)
points(0.1, 1, col = 6, pch = 16)
tred <- adjustcolor("red", alpha = 0.5)
curve(-x, add = TRUE, lwd = 3, col = tred)
curve(-114 * x, add = TRUE, lwd = 3, col = tred)
legend("topright", c("good fit", "bad fit", "GLM start", "MLE start"), pch = 16,
col = c(5, 4, 2, 6))
We know we're in trouble whenever \( 1/y = e+(f/x) \approx 0 \) for any value of \( x \) in the data set, which corresponds to \( f=-ex \). The range of \( x \) is {1,114}: the red lines show the curves corresponding to the extreme values, which show exactly the range where we have trouble. If we start inside this zone it's likely that we'll end up somewhere arbitrary. By luck, the starting value for the MLE was just outside this zone …
So despite the general cleverness of GLM, it's not necessarily
easier/more robust for this case. It is apparently possible to fit a GLM with inequality constraints,
supposedly one goes to p. 445 of Venables and Ripley. That page isn't
shown in Google Books, and I don't have a copy with me at the
moment … but since this is apparently an example of fitting a
GLM via direct maximization rather IRLS, it may not be that much
better than what I've already shown above using bbmle::mle2.
From Holly again: \( X \)=salmon density (kg/m2), \( Y \)=number of salmon captured by brown bears:
dat2 <- setNames(read.csv("gende&quinn_1.csv", header = FALSE), c("X", "Y"))
par(las = 1, bty = "l")
plot(Y ~ X, data = dat2)
curve(x/((1/50) + 0.01 * x), add = TRUE)
Let's say this is a straight line with an nearly infinite asymptote (\( e \approx 0 \)) and initial slope approximately 50 (\( f \approx 1/50 \))
I tried to use method="L-BFGS-B" again to impose bounds, but
it complained, so I used bobyqa from the optimx/minqa packages:
library(optimx)
library(minqa)
d2fit1 <- mle2(Y ~ dnorm2(mean = X/(e * X + f)), data = dat2, start = list(e = 0.01,
f = 1/50), optimizer = "optimx", method = "bobyqa", lower = c(e = 0, f = 0))
Check out the likelihood surface (not really necessary; I had some initial confusion over whether \( f \) was the slope or the reciprocal slope, which had me plotting the likelihood surface to see what was going on …)
cc2 <- curve3d(ff(x, y, X = dat2$X, Y = dat2$Y), xlim = c(0, 0.2), ylim = c(0,
0.1), sys3d = "image", useRaster = TRUE, col = gray((100:1)/100), xlab = "e",
ylab = "f")
with(cc2, contour(x, y, z, add = TRUE))
points(coef(d2fit1)[1], coef(d2fit1)[2], col = 4, pch = 16)
points(0.01, 1/50, col = 2, pch = 16)
Check out the fit in base graphics:
par(las = 1, bty = "l")
plot(Y ~ X, data = dat2)
with(as.list(coef(d2fit1)), curve(x/(e * x + f), add = TRUE))
Since we now know where to tell glm() where to
start we can get ggplot2 to do it:
ggplot(dat2, aes(X, Y)) + theme_bw() + geom_point() + geom_smooth(method = "glm",
formula = y ~ I(1/x), family = gaussian(link = "inverse"), start = coef(d2fit1)[1:2])
If we needed to generate the confidence intervals and plot themselves,
we would use the approach used by glm()/ggplot(), which is: (1) compute the variances of the predicted values via \( X^T V X \), where \( X \) is the model matrix of the predictor variables for the desired prediction and \( V \) is the variance-covariance matrix of the parameters; (2) compute the confidence intervals of the predictions on the linear predictor scale via Normal approximation (i.e. \( \hat x \pm 1.96 \sigma \)); see an example of this general approach on the GLMM FAQ back-transform the confidence intervals to the raw scale; (4) use geom_ribbon() with no border line (color=NA outside of aes()) and alpha set accordingly to add the ribbon to the plot.
Similarly difficult fitting was done by Poulsen et al.:
Poulsen, J. R., C. W. Osenberg, C. J. Clark, D. J. Levey, and B. M. Bolker. 2007. “Plants as Reef Fish: Fitting the Functional Form of Seedling Recruitment.” The American Naturalist 170 (2) (August): 167–183. doi:10.1086/518945. JSTOR.
If I were trying to implement an automatic (self-starting) algorithm that would succeed on as many noisy/small B-H data sets as possible, I would probably try some combination of
nloptwrap package; an AD Model Builder solution would be pretty easy to implement
for this simple case too); glm2 inverse-link fit, and use the best combination of those approaches).It gets worse. Even if glm() reaches a plausible (positive-coefficient)
answer, the confidence intervals can include implausible values.
dat3 <- setNames(read.csv("nagasaka.csv", header = FALSE), c("X", "Y"))
g3 <- ggplot(dat3, aes(X, Y)) + geom_point()
g3 + geom_smooth(method = "glm", formula = y ~ I(1/x), family = gaussian(link = "inverse"),
start = c(0.2, 0.01), fill = "blue", alpha = 0.2) + geom_smooth(method = "loess",
color = "red", fill = "red", alpha = 0.2) + coord_cartesian(ylim = c(-4,
10))
(g4 <- glm(Y ~ I(1/X), family = gaussian(link = "inverse"), data = dat3, start = c(0.2,
0.01)))
##
## Call: glm(formula = Y ~ I(1/X), family = gaussian(link = "inverse"),
## data = dat3, start = c(0.2, 0.01))
##
## Coefficients:
## (Intercept) I(1/X)
## 0.0907 473.8105
##
## Degrees of Freedom: 11 Total (i.e. Null); 10 Residual
## Null Deviance: 77.6
## Residual Deviance: 31.9 AIC: 51.8
pframe <- with(dat3, data.frame(X = seq(min(X), max(X), length = 101)))
pred <- predict(g4, newdata = pframe, se.fit = TRUE)
minval <- 1e-06
cc <- qnorm(0.975)
invlink <- g4$family$linkinv
pframe <- with(pred, transform(pframe, Y = invlink(fit), lwr = invlink(fit -
cc * se.fit), clwr = invlink(pmax(minval, fit - cc * se.fit)), upr = invlink(fit +
cc * se.fit)))
g3 + geom_line(data = pframe) + geom_ribbon(data = pframe, aes(ymin = lwr, ymax = upr),
colour = NA, alpha = 0.2) + geom_ribbon(data = pframe, aes(ymin = clwr,
ymax = upr), fill = "blue", colour = NA, alpha = 0.2) + coord_cartesian(ylim = c(-4,
10))
cc3 <- curve3d(ff(x, y, X = dat3$X, Y = dat3$Y), xlim = c(0, 0.8), ylim = c(100,
2500), sys3d = "image", useRaster = TRUE, n = c(71, 71), col = gray((100:1)/100),
xlab = "e", ylab = "f")
with(cc3, contour(x, y, z, add = TRUE, levels = min(cc3$z) + qchisq(c(0.9, 0.95,
0.975), 2)/2))
points(coef(g4)[1], coef(g4)[2], col = 4, pch = 16, cex = 2)
The figure shows the nominal 90%, 95% and 97.5% LRT confidence intervals – as we will see, these are incorrect (too conservative) because part of the area gets cut off at the zero boundaries, so we have to include more area in the feasible region.
Sample from a quasi-Bayesian posterior density (sample automatically
rescales its weights to sum to 1.0):
set.seed(101)
ss <- sample(seq_along(cc3$z), prob = exp(-cc3$z), size = 1000, replace = TRUE)
## Warning: Walker's alias method used: results are different from R < 2.2.0
ind2vals <- function(k) {
i <- k%/%ncol(cc3$z) + 1
j <- k%%ncol(cc3$z) + 1
c(cc3$x[i], cc3$y[j])
}
samp <- t(sapply(ss, ind2vals))
image(cc3, xlim = c(0, 0.8), ylim = c(100, 2500), useRaster = TRUE, col = gray((100:1)/100),
xlab = "e", ylab = "f")
with(cc3, contour(x, y, z, level = min(cc3$z) + qchisq(c(0.9, 0.95, 0.975),
2)/2, add = TRUE))
points(samp[, 1], samp[, 2], col = adjustcolor("red", alpha = 0.2), pch = 16,
cex = 1.5)
points(coef(g4)[1], coef(g4)[2], col = 4, pch = 16, cex = 2)
pframe2 <- pframe
allpred <- apply(samp, 1, function(cc) {
1/(cc[1] + cc[2]/pframe2$X)
})
pframe2[, c("lwr", "upr")] <- setNames(t(apply(allpred, 1, quantile, c(0.025,
0.975))), c("lwr", "upr"))
g3 + geom_line(data = pframe2) + geom_ribbon(data = pframe2, aes(ymin = lwr,
ymax = upr), colour = NA, alpha = 0.2) + coord_cartesian(ylim = c(-4, 10))
Alternatively we could do this by real MCMC, either with BUGS or with MCMCpack …
dat4 <- na.omit(setNames(read.csv("Hocking_Reynolds_d15N_blueberry.csv", header = FALSE),
c("X", "Y")))
p4 <- qplot(X, Y, data = dat4) + geom_smooth(method = "glm", family = gaussian(link = "inverse"))
coef(glm(Y ~ X, data = dat4, family = gaussian(link = "inverse")))
## (Intercept) X
## 0.22934 -0.04653
d4fit1 <- mle2(Y ~ dnorm2(mean = X/(e * X + f)), data = dat4, start = list(e = 0.01,
f = 1/50), optimizer = "optimx", method = "bobyqa", lower = c(e = 1e-06,
f = 1e-06))
pframe <- with(dat4, data.frame(X = seq(min(X), max(X), length = 101)))
pframe$Y <- with(c(as.list(coef(d4fit1)), pframe), X/(e * X + f))
p4 + geom_line(data = pframe, colour = "red")
Since geom_smooth(method="glm",...) doesn't work here, we'd have
to do something else to get the confidence intervals
cc4 <- curve3d(ff(x, y, X = dat4$X, Y = dat4$Y), xlim = c(0.1, 0.3), ylim = c(1e-06,
0.03), sys3d = "image", useRaster = TRUE, n = c(71, 71), col = gray((100:1)/100),
xlab = "e", ylab = "f")
with(cc4, contour(x, y, z, add = TRUE, levels = min(cc4$z) + qchisq(c(0.9, 0.95,
0.975), 2)/2, labels = c(0.9, 0.95, 0.975)))
et cetera …