Remember to set your working directory before you start. You also need to load today’s packages.
library(arm)
library(car)
library(effects)
library(GGally)
library(ggeffects)
library(ggplot2)
library(ggthemes)
library(lme4)
library(MuMIn)
library(performance)
library(predictmeans)
glm()We will briefly cover two common forms of generalized linear model: logistic regression and Poisson regression. This tutorial covers key points from Ch. 13 of Quinn & Keough (2024).
The dataset we will use for logistic regression was published by Blas et al. (2007). It contains two variables recorded in 34 European white storks (Ciconia ciconia):
cort: plasma corticosterone (ng/ml) measured in
nestling storks after experimentally induced stresssurv: whether the bird died (0) or survived (1) within
the next five years of the studyIn this analysis, cort is the predictor variable and
surv is the response variable. Because we have a continuous
predictor and a binary response variable, a logistic regression is a
reasonable choice. Note: because surv is a categorical
variable, we may be tempted to convert it to a factor. However, for a
logistic regression, you should not do so. You should leave it as
numeric.
First, we will read the data, summarize it, and plot it.
d <- read.csv("Data/storks.csv")
summary(d)
## cort surv
## Min. :26.00 Min. :0.0000
## 1st Qu.:34.90 1st Qu.:0.0000
## Median :40.75 Median :0.0000
## Mean :40.72 Mean :0.3824
## 3rd Qu.:47.40 3rd Qu.:1.0000
## Max. :61.10 Max. :1.0000
Next, let’s plot the data. To remind ourselves that surv
is the response variable, we will use coord_flip() to
create horizontal boxplots.
ggplot(data = d, aes(x = factor(surv, labels=c("No", "Yes")), y = cort)) +
geom_boxplot(outliers = FALSE) +
geom_jitter(width = 0.1, height = 0, alpha = 0.5) +
labs(x = "Survival", y = "Corticosterone (ng/ml)") +
coord_flip() +
theme_few()
It appears that storks with higher corticosterone levels may be more likely to die.
Next, let’s fit the model. By default, when you specify
family=binomial, it will use a logit link.1 Then plot the binned
residuals using the binnedplot() function from the
arm package.
stork.glm <- glm(surv ~ cort, family = binomial, data = d, na.action = "na.fail")
x <- predict(stork.glm)
y <- resid(stork.glm)
binnedplot(x, y, main = NULL)
Ideally, ~95% of the binned residuals would be within the grey lines, but because we have a smallish sample (only 34 observations), the binned residual plot is difficult to interpret. We will assume it is okay.
Next we will perform model selection using the AICc.
dredge(stork.glm)
## Fixed term is "(Intercept)"
## Global model call: glm(formula = surv ~ cort, family = binomial, data = d, na.action = "na.fail")
## ---
## Model selection table
## (Intrc) cort df logLik AICc delta weight
## 2 2.7030 -0.0798 2 -20.698 45.8 0.00 0.687
## 1 -0.4796 1 -22.617 47.4 1.58 0.313
## Models ranked by AICc(x)
The top-ranked model contains cort as a term. (Although
some people might argue that cort should be excluded
because the intercept-only model has \(\Delta\)AICc<2.) We
will keep it in our model so we can continue with the tutorial…
Next, show the coefficient table.
S(stork.glm, header = FALSE)
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.70304 1.74725 1.547 0.1219
## cort -0.07980 0.04368 -1.827 0.0677 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 45.234 on 33 degrees of freedom
## Residual deviance: 41.396 on 32 degrees of freedom
##
## logLik df AIC BIC
## -20.70 2 45.40 48.45
##
## Number of Fisher Scoring iterations: 4
##
## Exponentiated Coefficients and Confidence Bounds
## Estimate 2.5 % 97.5 %
## (Intercept) 14.9250213 0.5721667 623.023934
## cort 0.9232967 0.8397103 1.000028
Overdispersion does not appear to be a problem: the residual deviance is 41.4 on 32 df, where a 1:1 ratio is expected. To more formally check for overdispersion, we can use the Pearson dispersion statistic. The easiest way to get this value is to refit the model using a quasi-binomial distribution, which estimates the dispersion parameter (rather than forcing it to be equal to 1 as in a binomial distribution). Values greater than 1.5 suggest potentially worrisome overdispersion.
stork.qglm <- glm(surv ~ cort, family = quasibinomial, d)
S(stork.qglm)$dispersion
## [1] 1.025242
Overdispersion is not a problem.2
Now we will calculate Tjur’s D, a pseudo-\(R^2\) value used as a measure of goodness
of fit for a logistic regression. It is equal to the difference in the
average predicted probability for the two groups. The
r2_tjur() function comes from the performance
package.
r2_tjur(stork.glm)
## Tjur's R2
## 0.1022956
On average, the predicted probability of survival among storks that actually survived is 10.2 percentage points higher than the predicted probability of survival among storks that died. This suggests that our model has low to moderate predictive ability.
The model suggests that for every unit increase in plasma corticosterone, the odds of survival decline multiplicatively by a factor of 0.92 (95% CI: 0.84-1.00); in other words, the odds of survival decrease by 8% (95% CI: 0%–16%).
Finally, we will plot the logistic regression. The easiest way to do
this is possibly with the ggpredict() function from the
ggeffects package.
pred <- ggpredict(stork.glm, terms = "cort [all]")
ggplot(pred, aes(x = x, y = predicted)) +
geom_point(data = d, aes(x = cort, y = surv), alpha = 0.5, size = 2) +
geom_line(colour = "blue") +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) +
labs(x = "Corticosterone (ng/ml)", y = "Probability of survival") +
theme_few()
If we wanted to use this model to make predictions, we could validate it and make predictions as in Tutorial 8.
The dataset we will use for Poisson regression was published by Beckerman et al. (2017). It contains two variables recorded in 50 Soay sheep (Ovis aries):
fitness: lifetime reproductive success, measured as the
number of offspringbody.size: standardized measure of body mass (in kg)
for each eweIn this analysis, body.size is the predictor variable
and fitness is the response variable. Because we have a
continuous predictor and response variable that is a count, a Poisson
regression is a reasonable choice.
First, we will read the data, summarize it, and plot it.
d <- read.csv("Data/sheep.csv")
summary(d)
## fitness body.size
## Min. : 0.00 Min. :4.785
## 1st Qu.: 3.00 1st Qu.:6.628
## Median : 4.00 Median :7.129
## Mean : 4.52 Mean :7.100
## 3rd Qu.: 5.00 3rd Qu.:7.728
## Max. :14.00 Max. :8.595
Next, let’s plot the data. To see whether a straight line (blue) fits the data, we will compare it to a LOESS curve (in magenta).
ggplot(data = d, aes(x = body.size, y = fitness)) +
geom_point() +
geom_smooth(colour = "magenta") +
geom_smooth(method=lm) +
labs(x = "Standardized body mass (kg)", y = "Lifetime reproductive success") +
theme_few()
It appears that ewes with higher body masses have higher lifetime reproductive success, and that the relationship does not appear to be linear.
Next, let’s fit the model and plot the residuals. Because the
response variable is a count, it is reasonable to try a Poisson
regression. By default, when you specify family=poisson, it
will use a log link. When you plot the residuals, R will automatically
convert them into deviance residuals and Pearson residuals.
sheep.glm <- glm(fitness ~ body.size, family = poisson, data = d, na.action = "na.fail")
opar <- par(mfrow = c(2, 2))
plot(sheep.glm)
par(opar)
The residuals do not look too bad.
Next we will perform model selection using the AICc.
dredge(sheep.glm)
## Fixed term is "(Intercept)"
## Global model call: glm(formula = fitness ~ body.size, family = poisson, data = d,
## na.action = "na.fail")
## ---
## Model selection table
## (Int) bdy.siz df logLik AICc delta weight
## 2 -2.422 0.5409 2 -103.424 211.1 0.00 1
## 1 1.509 1 -121.944 246.0 34.87 0
## Models ranked by AICc(x)
The top-ranked model contains body.size as a term. The
model without this term has a \(\Delta\)AICc of 34.9,
so it is obviously far worse.
Next, show the coefficient table.
S(sheep.glm, header = FALSE)
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.42203 0.69432 -3.488 0.000486 ***
## body.size 0.54087 0.09316 5.806 6.41e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 85.081 on 49 degrees of freedom
## Residual deviance: 48.040 on 48 degrees of freedom
##
## logLik df AIC BIC
## -103.42 2 210.85 214.67
##
## Number of Fisher Scoring iterations: 4
##
## Exponentiated Coefficients and Confidence Bounds
## Estimate 2.5 % 97.5 %
## (Intercept) 0.08874154 0.02211508 0.3363099
## body.size 1.71750081 1.43477775 2.0672237
Overdispersion does not appear to be a problem: the residual deviance is 48.0 on 48 df, where a 1:1 ratio is expected. To more formally check for overdispersion, we can use the Pearson dispersion statistic. The easiest way to get this value is to refit the model using a quasi-Poisson distribution, which estimates the dispersion parameter (rather than forcing it to be equal to 1 as in a Poisson distribution). Values greater than 1.5 suggest potentially worrisome overdispersion.
sheep.qglm <- glm(fitness ~ body.size, family = quasipoisson, d)
S(sheep.qglm)$dispersion
## [1] 0.9026887
Overdispersion is not a problem. If anything, we appear to have slight underdispersion.3
Next, we might like to calculate something like an \(R^2\) value… Unfortunately, whereas we could use Tjur’s D as a pseudo-\(R^2\) value for a logistic regression, no easily interpretable pseudo-\(R^2\) value exists for a Poisson regression.
The model suggests that for every unit increase in standardized body mass, the expected number of offspring increases multiplicatively by a factor of 1.72 (95% CI: 1.43–2.07); in other words, it increases by 72% (95% CI: 43%–107%).
Finally, we will plot the Poisson regression. The easiest way to do
this is possibly with the ggpredict() function from the
ggeffects package.
pred <- ggpredict(sheep.glm, terms = "body.size [all]")
ggplot(pred, aes(x = x, y = predicted)) +
geom_point(data = d, aes(x = body.size, y = fitness), alpha = 0.5, size = 2) +
geom_line(colour = "blue") +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) +
labs(x = "Standardized body mass (kg)", y = "Lifetime reproductive success") +
theme_few()
If we wanted to use this model to make predictions, we could validate it and make predictions as in Tutorial 8.
lmer()The owls.txt dataset (described in Zuur et al., 2009) contains information on vocal-begging behaviour of nestling barn owls when their parents brought prey back to the nest. Their primary interest was in how chicks might respond differently to their father rather than their mother. The columns in the dataset are:
Nest: The nest ID. In total, 599 observations have been
made on 27 different nests.FoodTreatment: “Deprived” nests had prey removed;
“Satiated” nests had prey added. The experiment was conducted over two
nights; nests that were deprived on the first night were satiated on the
second and vice versa.SexParent: Sex of the arriving parent (female or
male)ArrivalTime: Time that the parent arrived at the nest,
measured between 21.5 (= 9:30 PM) and 29.5 (= 5:30 AM)SiblingNegotiation: Number of calls made by the owl
chicks in the 30 seconds prior to parent arrivalBroodSize: Total number of chicks in the nestNegPerChick: (response variable; calculated as
SiblingNegotiation/BroodSize)Read in the data, convert to factors where appropriate. Then inspect
the output of the summary() command (which is not shown
because it is too wide for this page).
d <- read.delim(file = "Data/owls.txt")
d$Nest <- factor(d$Nest)
d$FoodTreatment <- factor(d$FoodTreatment)
d$SexParent <- factor(d$SexParent)
summary(d)
To increase interpretability of our random-slope model, we will center the arrival time within each nest. To match the analysis in Zuur et al. (2009) and the paper from which they took this data, we will log-transform our response variable to reduce heteroskedasticity.
d$ArrivalTime.mean <- ave(d$ArrivalTime, d$Nest)
d$ArrivalTime.centered <- d$ArrivalTime - d$ArrivalTime.mean
d$NegPerChick.log10 <- log10(d$NegPerChick + 1)
Next, let’s check how the variables are related to each other with a scatterplot matrix.
ggpairs(d[,c(2, 3, 8, 9, 10)]) +
theme_few()
Looks pretty good to me. At a glance, it appears that
FoodTreatment seems to have an effect on chick begging and
that SexParent probably does not. The effect of arrival
time is unclear.
The scatterplot matrix did not show the nest effects because
Nest contains too many levels. So, let’s conduct a
regression of NegPerChick.log10 on centered arrival time
for each nest:
ggplot(data = d, aes(x = ArrivalTime.centered, y = NegPerChick.log10)) +
facet_wrap(~Nest) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Arrival time (centered for each nest)",
y = expression(log[10]("Negotiations per chick"))) +
theme_few()
Does it look like there is substantial variation in slopes and intercepts among the different nests?
Now we will begin our model-selection procedure.
We will start with a “beyond-optimal” model: a full model containing the maximal random-effects structure justified by the design. It also contains all fixed predictor variables and as many of their two-way and three-way interactions as possible (or that we believe might be important). There is usually no need to include interactions between four or more variables: they usually explain little and are difficult to interpret.
For us, our full model will be a “varying-intercepts-and-slopes”
model. (These are often called “random-slope” models.) This model will
allow each nest to differ in its average level of chick begging (i.e.,
the intercept varies among nests), and it allows the effect of
(centered) arrival time to vary in each nest (i.e., the slope varies
among nests). This model is specified in lmer() using
(ArrivalTime.centered | Nest).
We will also include all possible main effects, two-way interactions,
and three-way interactions between our predictor variables, using
(...)^3 notation. Notice that in the model below,
ArrivalTime.centered appears both as a fixed effect and as
a random effect. This is standard practice for random-slope models: The
fixed effect of ArrivalTime.centered estimates the average
effect across all nests. The random slope models deviations from that
average for each nest.
randomslope.lmer <- lmer(
NegPerChick.log10 ~ (SexParent + FoodTreatment + ArrivalTime.mean + ArrivalTime.centered)^3 +
(ArrivalTime.centered | Nest), data = d)
If you use plot() on an lmer object, it
will produce only a single residual plot for the residuals vs. fitted
values. The residplot() function in the
predictmeans package produces additional plots (including a
Q-Q plot for the random intercepts) that can be useful for assessing
whether the assumptions of the model are satisfied. We will colour the
data points in the “Residuals vs Fitted” plot based on the
FoodTreatment variable.
residplot(randomslope.lmer, group = "FoodTreatment")
We can also produce a Q-Q plot for the random slopes.
residplot(randomslope.lmer, group = "FoodTreatment", slope = TRUE)
We might also want to plot standardized residuals against each of our predictor variables. (Output not shown to save space.)
randomslope.resid <- residuals(randomslope.lmer, scaled = TRUE)
plot(randomslope.resid ~ SexParent, data = d)
plot(randomslope.resid ~ FoodTreatment, data = d)
plot(randomslope.resid ~ ArrivalTime.mean, data = d)
plot(randomslope.resid ~ ArrivalTime.centered, data = d)
plot(randomslope.resid ~ Nest, data = d)
In this case, our residuals are not too horrible. The straight line
of residuals in the bottom row of plots arises from the lower boundary
at NegPerChick=0. For now, we will ignore any problems, but
we might want to try other data transformations or modelling approaches,
such as modelling the total number of calls from the nest as being
distributed according to a (zero-inflated) Poisson distribution (in a
generalized linear mixed model, generalized estimating equation,
generalized additive mixed model, etc.)
We will first determine the optimal random-effects structure. To do
this, we will compare our “varying-intercepts-and-slopes” model to a
“varying-intercept” model. (These are often called “random-intercept”
models.) This model allows each nest to differ in its average level of
chick begging (i.e., the intercept varies among nests), but it does not
estimate the effect of (centered) arrival time for each nest. This model
is specified in lmer() using (1 | Nest).
We will also fit a fixed-effects model where the average level of
chick begging is assumed to be equal for all nests. This must be fitted
in lm(), not lmer() (which is for random- and
mixed-effects models only).
randomintercept.lmer <- lmer(
NegPerChick.log10 ~ (SexParent + FoodTreatment + ArrivalTime.mean + ArrivalTime.centered)^3 +
(1 | Nest), data = d)
fixedeffects.lm <- lm(
NegPerChick.log10 ~ (SexParent + FoodTreatment + ArrivalTime.mean + ArrivalTime.centered)^3,
data = d)
There are multiple ways to compare models that differ in their random effects; however, all must use REML (not ML). Currently, the preferred approach is to use likelihood ratio tests (LRTs) that have been corrected for “testing on the boundary” (i.e., we are testing against the possibility that variance components are equal to zero). This requires use of the following function to obtain a corrected p-value for the likelihood ratio test. The number of degrees of freedom for the test is the difference in the number of parameters estimated for each model.
REMLBoundaryTest <- function(fullerModel, reducedModel, df) {
D <- -2 * c((logLik(reducedModel, REML = TRUE) - logLik(fullerModel, REML = TRUE)))
p.value <- (pchisq(D, df, lower.tail = FALSE) + pchisq(D, df - 1, lower.tail = FALSE))/2
output <- c(D, df, p.value)
names(output) <- c("test statistic", "df", "p-value")
return(output)
}
Using this function, we will conduct two tests. First, we will test whether a random-intercept model is significantly better than a fixed-effects model. These models differ by one degree of freedom (for the random intercept).
REMLBoundaryTest(randomintercept.lmer, fixedeffects.lm, df = 1)
## test statistic df p-value
## 2.763626e+01 1.000000e+00 7.320434e-08
The test result suggests that the random-intercept model is significantly better than the fixed-effects model. Therefore, we will test whether the random-slope model is also a significant improvement over the random-intercept model. These models differ by two degrees of freedom (one for the random slope and one for the correlation between the random intercept and the random slope).4
REMLBoundaryTest(randomslope.lmer, randomintercept.lmer, df = 2)
## test statistic df p-value
## 1.495198e+01 2.000000e+00 3.384036e-04
The random-slope model is significantly better than the random-intercept model. The random-slope model appears to have the best random-effect structure among those tested.
Now that we have determined the optimal random effect structure, we
will next determine the optimal fixed-effects structure. To do this, we
need to use models fitted using ML, not REML (i.e.,
REML = FALSE). We will use MuMIn to dredge for
the best model, which requires that we tell it to fail if there are any
missing data in the variables used (i.e.,
na.action="na.fail").
randomslope.lmer.ml <- lmer(
NegPerChick.log10 ~ (SexParent + FoodTreatment + ArrivalTime.mean + ArrivalTime.centered)^3 +
(ArrivalTime.centered | Nest), data = d, REML = FALSE, na.action = "na.fail")
Rank models by AICc and save models with \(\Delta\)AICc < 4. You can also plot the table. (Because the model-selection table is too wide for this page, results are not shown).
(dd <- subset(dredge(randomslope.lmer.ml), delta < 4))
plot(dd)
If you run the code above, you will find that our best model contains terms for the intercept, centered arrival time, food treatment, and the interaction between centered arrival time and food treatment. Sex of the parent does not appear in the top model, although it does appear in the 2nd best model, which is almost tied with the best model (\(\Delta\)AICc = 0.11). Therefore, I would consider the importance of parental sex to be ambiguous; however, we will likely make better predictions by omitting it from the final model.
Now we can refit the top-ranked model using REML (and re-including any observations that had missing data).
bestmodel.lmer <- lmer(
NegPerChick.log10 ~ FoodTreatment * ArrivalTime.centered + (ArrivalTime.centered | Nest),
data = d)
Let’s check if our reduced model meets the assumptions, the same way we did for the full model above. (Output not shown to save space.)
residplot(bestmodel.lmer, group = "FoodTreatment")
residplot(bestmodel.lmer, slope = TRUE, group = "FoodTreatment")
bestmodel.resid <- residuals(bestmodel.lmer, scaled = TRUE)
plot(bestmodel.resid ~ FoodTreatment, data = d)
plot(bestmodel.resid ~ ArrivalTime.centered, data = d)
These plots look similar to the ones for the full model above, suggesting that simplifying the model did not negatively impact model adequacy.
The assumptions seem to be satisfied, so let’s look at the coefficient table and the variance components.
S(bestmodel.lmer, brief = TRUE)
##
## Estimates of Fixed Effects:
## Estimate Std. Error z value
## (Intercept) 4.115e-01 2.235e-02 2.934e+01
## FoodTreatmentSatiated -1.684e-01 1.968e-02 5.873e+02
## ArrivalTime.centered -4.415e-02 9.468e-03 3.408e+01
## FoodTreatmentSatiated:ArrivalTime.centered 2.316e-02 1.081e-02 5.509e+02
## Pr(>|z|) Pr(>|t|)
## (Intercept) 1.841e+01 1.153e-17 0
## FoodTreatmentSatiated -8.554e+00 1.035e-16 0
## ArrivalTime.centered -4.663e+00 4.649e-05 0
## FoodTreatmentSatiated:ArrivalTime.centered 2.142e+00 3.266e-02 0
##
## Estimates of Random Effects (Covariance Components):
## Groups Name Std.Dev. Corr
## Nest (Intercept) 0.09099
## ArrivalTime.centered 0.03125 -0.48
## Residual 0.22499
##
## Number of obs: 599, groups: Nest, 27
##
## logLik df AIC BIC
## 2.52 8 10.96 46.13
And get the confidence intervals:
confint(bestmodel.lmer, oldNames = FALSE)
## 2.5 % 97.5 %
## sd_(Intercept)|Nest 0.057831380 0.13143076
## cor_ArrivalTime.centered.(Intercept)|Nest -0.998315638 0.15455504
## sd_ArrivalTime.centered|Nest 0.015236639 0.04851347
## sigma 0.211893295 0.23892094
## (Intercept) 0.367648012 0.45643566
## FoodTreatmentSatiated -0.207196073 -0.12942820
## ArrivalTime.centered -0.063199691 -0.02570132
## FoodTreatmentSatiated:ArrivalTime.centered 0.001712906 0.04446430
We can also obtain predicted values for the random effects (i.e., the intercept and slope for each nest). (Output not shown to save space.)
ranef(bestmodel.lmer)
And get the marginal R2 and conditional R2. Marginal R2 represents the proportion of variance in the response variable explained by the fixed effects only. Conditional R2 is the proportion of variance in the response variable explained by both the fixed effects and the random effects.
r.squaredGLMM(bestmodel.lmer)
## R2m R2c
## [1,] 0.1513617 0.3091264
Our final random-slope model explains 30.9% of the variance in chick
begging (fixed effects alone explain 15.1%) and contains
FoodTreatment, ArrivalTime.centered, and their
interaction as predictors.
Now, let’s plot the fixed effects in our top-ranked model using the
allEffects() function in the effects
package.
plot(allEffects(bestmodel.lmer),
main = NULL,
xlab = "Arrival time (centered for each nest)",
ylab = expression(log[10]("Negotiations per chick")))
Our results suggest that all else being equal, chicks beg less when they are food-satiated as opposed to food-deprived. On average, they beg less later in the night (relative to the centered arrival time for the nest). This effect of arrival time is more pronounced when chicks started as food-deprived. Therefore, the effect of arrival time likely represents chicks becoming satiated as parents provide them with more and more food during the night. There was variation among nests in the average amount of chick begging and in the change in chick begging throughout the night. This might be caused by variation in hunting success among nests.
The orthodont.csv dataset (originally published in the
nlme package) contains information about the growth of 27
children (16 males, 11 females) from age 8 until age 14. Every two
years, they measured the distance between the pituitary and the
pterygomaxillary fissure.
The dataset contains the following columns:
distance: numeric vector of distances from the
pituitary to the pterygomaxillary fissure (mm). These distances are
measured on x-ray images of the skull (the response variable)age: a numeric vector of ages of the subject (in
years).Subject: a factor indicating the individual on which
the measurement was made. Subjects are labelled F01 to F13 for the
females and M01 to M16 for the males.Sex: a factor with levels Female and MaleFit a linear mixed model that is appropriate for this dataset.
To use a probit link, you can use
family = binomial(link=probit).↩︎
If we had overdispersion, we could use the
quasi-binomial GLM, a beta-binomial GLM using the vglm()
function in the VGAM package, or using a generalized linear
mixed model using the glmer() function in the
lme4 package.↩︎
If we had overdispersion, we could use the quasi-Poisson
GLM, a negative-binomial GLM using the glm.nb() function in
the MASS package, or using a generalized linear mixed model
using the glmer() or glmer.nb() functions in
the lme4 package.↩︎
To force the random intercept and random slope to be
uncorrelated, you can use || instead of |. For
instance, (ArrivalTime.centered || Nest) instead of
(ArrivalTime.centered | Nest).↩︎