GLMM Example

This document provides an example of how to analyze count data with generalized linear mixed models (GLMMs) in R. Specifically, this tutorial outlines how to deal with overdispersion and zero-inflation.

Data description

We will conduct a sample analysis of data that was collected at the Rocky Mountain Biological Laboratory (RMBL) in 2012. All ants and aphids were counted in aspen trees within plots at low and high elevations. Sampling occurred within three low elevation and three high elevation valleys, and within each valley there were three plots. The question that we will address is whether the abundance of ants within the aspen tree canopy varies with elevation.

Map of field sites located in three high and three low elevation valleys

How did I decide to use a GLMM?

First, I tried using a linear mixed model to evaluate whether the mean number of ants per tree differs between low vs. high elevations:

Response variable: Number of ants per tree
Fixed effects:
- Elevation (high vs. low)
- Time (because we spent different lengths of time surveying each tree)
Random effects:
- Valley nested within elevation
- Plot nested within valley

If you have equal sample sizes between groups, you could use the ‘aov()’ function. However, since I did not have equal sample sizes, I instead used the ‘lmer()’ function in the ‘lme4’ package for linear mixed models.

# define variables
Ants <- as.numeric(ant_abundance$Total_Ants)
Elevation <- as.factor(ant_abundance$Elevation)
Valley <- as.factor(ant_abundance$Valley)
Plot <- as.factor(ant_abundance$Plot)
Time <- as.numeric(as.character(ant_abundance$Search_time))

hist(Ants)

model <- lmer(Ants ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), data = ant_abundance) # You should only use the 'aov()' for one-way ANOVAs with equal sample sizes
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## Ants ~ Elevation + Time + (1 | Elevation:Valley) + (1 | Valley:Plot)
##    Data: ant_abundance
## 
## REML criterion at convergence: 1579.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.8659 -0.4487 -0.2062  0.2090  6.6076 
## 
## Random effects:
##  Groups           Name        Variance Std.Dev.
##  Valley:Plot      (Intercept) 0.2467   0.4966  
##  Elevation:Valley (Intercept) 0.0000   0.0000  
##  Residual                     2.0965   1.4479  
## Number of obs: 432, groups:  Valley:Plot, 18; Elevation:Valley, 6
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  0.202821   0.213364   0.951
## ElevationLow 0.494774   0.280768   1.762
## Time         0.007673   0.002090   3.672
## 
## Correlation of Fixed Effects:
##             (Intr) ElvtnL
## ElevationLw -0.566       
## Time        -0.446 -0.096

Anova(model) # Tests for the significance of elevation

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: Ants
##             Chisq Df Pr(>Chisq)    
## Elevation  3.1054  1   0.078032 .  
## Time      13.4809  1   0.000241 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

shapiro.test(resid(model)) # Test for the assumption of normally distributed residuals

## 
##  Shapiro-Wilk normality test
## 
## data:  resid(model)
## W = 0.73921, p-value < 2.2e-16

qqnorm(resid(model)) # plot of the normal theoretical quantiles against the exponential data quantiles
qqline(resid(model)) # the residuals should fall along this line if they are normally distributed

From this model, the mean number of ants in aspen trees is marginally significantly greater at low (vs. high) elevations after accounting for the significant effect of time spent surveying each tree. However, the shapiro test and qqplot indicate that the residuals are not normally distributed. Thus, this test is not appropriate.

One option could be to transform the data, but this is generally not considered to be appropriate for count data (O’Hara & Kotze 2010). Instead, we will use a GLMM with a different distribution (i.e., Poisson, negative binomial, or quasi-Poisson) that better models the data.

GLMM analysis

Let’s visually inspect the data again:

plot(table(Ants), ylab = "Frequency", xlab = "Number of ants")

The majority of trees contained no ants – approximately 52% at low elevations and 72% at high elevations. If you don’t have random effects, you can use the ‘glm()’ function to analyze generalized linear models. However, in this case there are random effects, and we should instead use a generalized linear mixed model.

First, let’s try out a GLMM with a poisson distribution (https://en.wikipedia.org/wiki/Poisson_distribution), which we can build with the ‘glmmadmb()’ function. There are multiple packages that you could choose between for analyzing GLMMs (see Bolker et al. 2009 and http://glmm.wikidot.com/pkg-comparison for options). I used glmmADMB because it allows for multiple random effects, negative binomial distributions, and zero inflation and also uses Laplace approximation. However, ‘glmmadmb()’ tends to be slower than some other functions.

poismod <- glmmadmb(Ants ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), family = "poisson", data = ant_abundance)
summary(poismod)

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + Time + (1 | Elevation:Valley) + 
##     (1 | Valley:Plot), data = ant_abundance, family = "poisson")
## 
## AIC: 1119.2 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.31776    0.29398   -4.48  7.4e-06 ***
## ElevationLow  0.78758    0.39448    2.00    0.046 *  
## Time          0.00777    0.00134    5.80  6.7e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Elevation:Valley=6, Valley:Plot=18 
## Random effect variance(s):
## Group=Elevation:Valley
##             Variance StdDev
## (Intercept)  0.08388 0.2896
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.3522 0.5934
## 
## 
## Log-likelihood: -554.611

Similar to the linear mixed model, this model indicates that ants are more abundant at low elevations after accounting for search time. But how do we know whether a poisson distribution is appropriate for these data?

For a Poisson distribution, the variance is assumed to be equal to the mean. If the variance is greater than the mean, the data are considered to be “overdispersed,” and this should be accounted for in the model (see Ver Hoef & Boveng 2007). The data could be overdispersed due to high values, which increase the variance, or due to an excess number of zeros, which lower the mean.

To formally test for overdispersion, we can use the following function, which tests the null hypothesis of no overdispersion:

overdisp_fun <- function(model) {
  vpars <- function(m) {
    nrow(m)*(nrow(m)+1)/2
  } 
  model.df <- sum(sapply(VarCorr(model), vpars)) + length(fixef(model))
  rdf <- nrow(model.frame(model))-model.df
  rp <- residuals(model, type = "pearson") # computes pearson residuals
  Pearson.chisq <- sum(rp^2)
  prat <- Pearson.chisq/rdf
  pval <- pchisq(Pearson.chisq, df = rdf, lower.tail = FALSE)
  c(chisq = Pearson.chisq, ratio = prat, rdf = rdf, p = pval)
} 
overdisp_fun(poismod) # If the p-value is < 0.05, the data are overdispersed.

##         chisq         ratio           rdf             p 
##  1.434166e+03  3.358703e+00  4.270000e+02 4.971049e-109

The data are clearly overdispersed.

Negative binomial and quasi-Poisson models deal with overdispersion by including an “overdispersion parameter” to adjust the variance in relation to the mean. These models differ in how the variance relates to the mean. Alternatively, we can use zero-inflated models to account for overdispersion (by assuming that there are excess zeroes in the data caused by two separate processes).

As far as I can tell, it is standard to first account for overdispersion using negative binomial or quasi-Poisson distributions. If the data are still overdispersed using these two distributions, then we can also try accounting for zero-inflation in the model (set zeroInflation = T) or using a hurdle model (more details below).

nbinommod1 <- glmmadmb(Ants ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), family = "nbinom", data = ant_abundance) # the "nbinom" family is a negative binomial distribution
summary(nbinommod1)

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + Time + (1 | Elevation:Valley) + 
##     (1 | Valley:Plot), data = ant_abundance, family = "nbinom")
## 
## AIC: 999.2 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.16565    0.27037   -4.31  1.6e-05 ***
## ElevationLow  0.65443    0.34192    1.91  0.05562 .  
## Time          0.00834    0.00252    3.31  0.00094 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Elevation:Valley=6, Valley:Plot=18 
## Random effect variance(s):
## Group=Elevation:Valley
##             Variance StdDev
## (Intercept)  0.03751 0.1937
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.2477 0.4977
## 
## Negative binomial dispersion parameter: 0.64315 (std. err.: 0.11623)
## 
## Log-likelihood: -493.594

quasipoismod1 <- glmmadmb(Ants ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), family = "nbinom1", data = ant_abundance) # the "nbinom1" family is a quasi-poisson distribution
summary(quasipoismod1)

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + Time + (1 | Elevation:Valley) + 
##     (1 | Valley:Plot), data = ant_abundance, family = "nbinom1")
## 
## AIC: 984.1 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.15893    0.24955   -4.64  3.4e-06 ***
## ElevationLow  0.66643    0.30822    2.16    0.031 *  
## Time          0.00778    0.00177    4.40  1.1e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Elevation:Valley=6, Valley:Plot=18 
## Random effect variance(s):
## Group=Elevation:Valley
##             Variance  StdDev
## (Intercept)   0.0072 0.08485
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.2653  0.515
## 
## Negative binomial dispersion parameter: 2.4313 (std. err.: 0.27529)
## 
## Log-likelihood: -486.058

We can similarly test whether these models are overdispersed:

overdisp_fun(nbinommod1)

##        chisq        ratio          rdf            p 
## 5.191453e+02 1.215797e+00 4.270000e+02 1.471483e-03

overdisp_fun(quasipoismod1)

##        chisq        ratio          rdf            p 
## 9.471930e+02 2.218251e+00 4.270000e+02 1.828486e-41

Since both models are overdispersed, we could additionally make these models zero-inflated. However, when I do this, the models do not converge. To get the models to converge, I removed the random effect of valley from the model. I believe that the problem was that the models were over-fitted and that valley and plot were colinear.

nbinommod2 <- glmmadmb(Ants ~ Elevation + Time + (1|Valley:Plot), family = "nbinom", zeroInflation = T, data = ant_abundance)
summary(nbinommod2)

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + Time + (1 | Valley:Plot), 
##     data = ant_abundance, family = "nbinom", zeroInflation = T)
## 
## AIC: 999.3 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.16926    0.25392   -4.60  4.1e-06 ***
## ElevationLow  0.64831    0.31570    2.05    4e-02 *  
## Time          0.00848    0.00250    3.39    7e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Valley:Plot=18 
## Random effect variance(s):
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.2833 0.5323
## 
## Negative binomial dispersion parameter: 0.64276 (std. err.: 0.11615)
## Zero-inflation: 1.0002e-06  (std. err.:  1.659e-05 )
## 
## Log-likelihood: -493.659

quasipoismod2 <- glmmadmb(Ants ~ Elevation + Time + (1|Valley:Plot), family = "nbinom1", zeroInflation = T, data = ant_abundance)
summary(quasipoismod2)

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + Time + (1 | Valley:Plot), 
##     data = ant_abundance, family = "nbinom1", zeroInflation = T)
## 
## AIC: 983.1 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.09914    0.26758   -4.11  4.0e-05 ***
## ElevationLow  0.68485    0.31167    2.20    0.028 *  
## Time          0.00844    0.00197    4.28  1.9e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Valley:Plot=18 
## Random effect variance(s):
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.2878 0.5365
## 
## Negative binomial dispersion parameter: 2.2387 (std. err.: 0.29958)
## Zero-inflation: 0.10757  (std. err.:  0.095931 )
## 
## Log-likelihood: -485.526

overdisp_fun(nbinommod2)

##       chisq       ratio         rdf           p 
## 516.6948379   1.2072309 428.0000000   0.0020739

overdisp_fun(quasipoismod2)

##        chisq        ratio          rdf            p 
## 8.333882e+02 1.947169e+00 4.280000e+02 2.281506e-28

Now the models appear to be less over-dispersed. How do we decide whether the model with the negative binomial or quasi-Poisson distribution is best? One option could be to compare the AIC values of the models, although Ver Hoef & Boveng (2007) question this approach. QAIC may be more appropriate than AIC, although it is unclear whether either are very appropriate.

AICtab(nbinommod1, quasipoismod1, poismod) # I decided to compare the three models that contained the same random effects, so I didn't compare the AIC values of the zero-inflated models here.

##               dAIC  df
## quasipoismod1   0.0 6 
## nbinommod1     15.1 6 
## poismod       135.1 5

As expected, the poisson model fits poorly, whereas the quasi-Poisson model is the best fit. Another way to test which distribution is best (advocated for by Ver Hoef & Boveng 2007) is to plot the relationship between the variance and the group means.

For Poisson models, the variance equals the mean.
For quasi-Poisson models, the variance is directly proportional to the mean.
For negative binomial models, the variance is quadratic to the mean.

Since there were only two groups (high vs. low elevation), I instead plotted the the plot variances against the plot means.

grpVarL <- with(ant_abundance, tapply(log(Ants+1), list(Plot), var))
grpMeans <- with(ant_abundance, tapply(Ants, list(Plot), mean))
grpVars <- with(ant_abundance, tapply(Ants, list(Plot), var))
lm1 <- lm(grpVars~grpMeans-1)
n1s1 <- nls(grpVars~grpMeans*(1+grpMeans/k), start = list(k = 1))
plot(grpVars~grpMeans, xlab = "group means", ylab = "group variances") # plots lines for Poisson, NB, and quasi-poisson-- if slopes are much greater than Poisson, should try NB or QP instead
abline(c(0,1), lty = 2)
text(2, 2 ,"Poisson")
abline(lm1, col = 2)
text(0.5, 2, substitute(paste("QP: ", sigma^2 == x*mu), list(x = round(coef(lm1), 1))), col = 2)
curve(x*(1+x/coef(n1s1)), col = 4, add = T)
text(1.5, 3, substitute(paste("NB: ", k==x), list(x = round(coef(n1s1), 1))), col = 4)

Similar to our conclusions based on AIC values, the quasi-Poisson distribution appears to best fit the data on the graph. However, it is unclear to me whether it would be better to account for zero-inflation at the expense of removing one of the random effects from the model due to failure of the model to converge or to not account for zero-inflation. Either way, the results are almost exactly the same.

Hurdle model:

Instead of making the quasi-Poisson model zero-inflated, I could also account for overdispersion by using a hurdle model. There are some key conceptual differences between zero-inflated and hurdle models:

A zero-inflated model models the distribution of ants as a mixture of both Bernoulli probabilities (0 or 1) and a Poisson distribution. By creating the model this way, we can allow zeros to come from two different sources – there can be both structural and sampling zeros. Example: first you must decide whether you want to go to the store (0 or 1), and then you must decide how many things to buy at the store, although you could still decide not to buy anything.

In contrast, a hurdle model splits the analysis into one binary analysis (0 or 1) and one analysis of all non-zero count data. This analysis allows for only structural zeros. Example: first you must decide whether you want to go to the store (0 or 1), and then you decide how many things to buy at the store (>0), but you always buy something.

One of these analyses may be more appropriate than the other, depending on the questions you’re asking and what you know about your data. I decided to use a hurdle analysis in this situaiton.

First, I’ll create the binomial model (like a logistic regression) to determine whether the presence/absence of ants in a tree depends on elevation:

antpres <- mutate(ant_abundance, 
                  Ants_present = ifelse(ant_abundance$Total_Ants == "0", "0", "1")) # creates a variable for whether or not ants were present

Ants_pres <- as.factor(antpres$Ants_present)
Elevation <- as.factor(antpres$Elevation)
Plot <- as.factor(antpres$Plot)
Valley <- as.factor(antpres$Valley)
Tree_ID <- as.factor(antpres$Tree)
Time <- as.numeric(as.character(antpres$Search_time))

aggregate(Tree_ID ~ Ants_pres + Elevation, antpres, length) # number of trees where ants were present and absent at high and low elevations

##   Ants_pres Elevation Tree_ID
## 1         0      High     192
## 2         1      High      69
## 3         0       Low      88
## 4         1       Low      83

ant_hurdle <- glmmadmb(formula = Ants_pres ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), data = antpres, family = "binomial")
summary(ant_hurdle)

## 
## Call:
## glmmadmb(formula = Ants_pres ~ Elevation + Time + (1 | Elevation:Valley) + 
##     (1 | Valley:Plot), data = antpres, family = "binomial")
## 
## AIC: 520.5 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.62728    0.31606   -5.15  2.6e-07 ***
## ElevationLow  0.81250    0.38768    2.10  0.03610 *  
## Time          0.01123    0.00328    3.42  0.00062 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Elevation:Valley=6, Valley:Plot=18 
## Random effect variance(s):
## Group=Elevation:Valley
##              Variance   StdDev
## (Intercept) 1.025e-05 0.003202
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.4225   0.65
## 
## 
## Log-likelihood: -255.228

A significantly greater proportion of trees contained ants at low vs. high elevations after accounting for a significant effect of search time:

Now I’ll create the non-zero count model to determine whether the number of ants in a tree depends on elevation:

For this model we can again use Poisson, negative binomial, or quasi-Poisson distributions. However, they must be truncated-at-zero so that they do not allow for any zero values.

antspertree <- ant_abundance %>%
  filter(Total_aphids < 1 & Other_aphid_species != "Yes" & Total_Ants > 0)

Ants <- as.numeric(as.character(antspertree$Total_Ants))
Elevation <- as.factor(antspertree$Elevation)
Plot <- as.factor(antspertree$Plot)
Valley <- as.factor(antspertree$Valley)
Tree_num <- as.factor(antspertree$Tree)
Time <- as.numeric(as.character(antspertree$Search_time))

truncpoiss <- glmmadmb(formula = Ants ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), data = antspertree, family = "truncpoiss")
truncquasipois <- glmmadmb(formula = Ants ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), data = antspertree, family = "truncnbinom1")
truncnbinom <- glmmadmb(formula = Ants ~ Elevation + Time + (1|Valley:Plot), data = antspertree, family = "truncnbinom") # This model didn't work, so I removed valley.
AICtab(truncpoiss, truncquasipois, truncnbinom)

##                dAIC df
## truncquasipois  0.0 6 
## truncnbinom    40.8 5 
## truncpoiss     71.3 5

summary(truncquasipois)

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + Time + (1 | Elevation:Valley) + 
##     (1 | Valley:Plot), data = antspertree, family = "truncnbinom1")
## 
## AIC: 436.6 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept)   0.18744    0.21084    0.89    0.374  
## ElevationLow  0.15707    0.25455    0.62    0.537  
## Time          0.00407    0.00160    2.54    0.011 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=152, Elevation:Valley=6, Valley:Plot=17 
## Random effect variance(s):
## Group=Elevation:Valley
##              Variance   StdDev
## (Intercept) 2.108e-05 0.004592
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.1717 0.4144
## 
## Negative binomial dispersion parameter: 1.001 (std. err.: 0.00016971)
## 
## Log-likelihood: -212.287

Again, the truncated quasi-Poisson model is the best fitting, as was true for the mixture model above. The number of ants per tree does not depend on elevation after accounting for a significant effect of search time.

Interpretation

In summary, if you are working with count data and model residuals are not normally distributed, you should use a GL(M)M with a different distribution. It is important to choose the correct distribution, based on whether the data are overdispered and/or zero-inflated. In this example I conclude that ants are more abundant at low elevations. More specifically, a greater proportion of trees contained ants at low elevations, although the number of ants per tree did not depend on elevation. However, all models yielded qualitatively similar results, which is reassuring that these conclusions are correct.

Additional considerations

Choice of test statistics

Hypothesis testing with GLMMs is a somewhat contentious issue, and it is not necessarily clear to me which test statistic is best to use. Figure 1 in Bolker et al. (2009) provides some basic instructions for choosing test statistics. Here are some examples of how to conduct different hypothesis tests in R:

summary(ant_hurdle) # Wald Z test, only appropriate without overdispersion

## 
## Call:
## glmmadmb(formula = Ants_pres ~ Elevation + Time + (1 | Elevation:Valley) + 
##     (1 | Valley:Plot), data = antpres, family = "binomial")
## 
## AIC: 520.5 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.62728    0.31606   -5.15  2.6e-07 ***
## ElevationLow  0.81250    0.38768    2.10  0.03610 *  
## Time          0.01123    0.00328    3.42  0.00062 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Elevation:Valley=6, Valley:Plot=18 
## Random effect variance(s):
## Group=Elevation:Valley
##              Variance   StdDev
## (Intercept) 1.025e-05 0.003202
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.4225   0.65
## 
## 
## Log-likelihood: -255.228

Anova(ant_hurdle, type = 3) # Wald chi-square, only appropriate without overdispersion

## Analysis of Deviance Table (Type III tests)
## 
## Response: Ants_pres
##             Df   Chisq Pr(>Chisq)    
## (Intercept)  1 26.5085  2.624e-07 ***
## Elevation    1  4.3924  0.0361002 *  
## Time         1 11.7189  0.0006187 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Anova(ant_hurdle, type = 3, test.statistic = "F") # F test

## Analysis of Deviance Table (Type III tests)
## 
## Response: Ants_pres
##              Df       F    Pr(>F)    
## (Intercept)   1 26.5085 4.013e-07 ***
## Elevation     1  4.3924 0.0366884 *  
## Time          1 11.7189 0.0006784 ***
## Residuals   427                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

These tests provide results that are all qualitatively very similar.

Offsets?

Offsets allow us to estimate rates or densities rather than counts. Offsets regress the response variable against the log of the factor you’re using for the offset. Offsets can be factors such as time or area. If I expect that the number of ants observed in a tree is directly proportional to the amount of time spent searching the tree, I could make time an offset.

Ants <- as.numeric(ant_abundance$Total_Ants)
Elevation <- as.factor(ant_abundance$Elevation)
Valley <- as.factor(ant_abundance$Valley)
Plot <- as.factor(ant_abundance$Plot)
Time <- as.numeric(as.character(ant_abundance$Search_time))

quasipoismod1 <- glmmadmb(Ants ~ Elevation + offset(log(Time)) + (1|Elevation:Valley) + (1|Valley:Plot), family = "nbinom1", data = ant_abundance)
summary(quasipoismod1)

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + offset(log(Time)) + (1 | 
##     Elevation:Valley) + (1 | Valley:Plot), data = ant_abundance, 
##     family = "nbinom1")
## 
## AIC: 982.9 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -4.587      0.230  -19.98   <2e-16 ***
## ElevationLow    0.542      0.309    1.75    0.079 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Elevation:Valley=6, Valley:Plot=18 
## Random effect variance(s):
## Group=Elevation:Valley
##              Variance   StdDev
## (Intercept) 1.053e-05 0.003244
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)    0.292 0.5404
## 
## Negative binomial dispersion parameter: 2.3625 (std. err.: 0.26023)
## 
## Log-likelihood: -486.468

quasipoismod2 <- glmmadmb(Ants ~ Elevation + Time + (1|Elevation:Valley) + (1|Valley:Plot), family = "nbinom1", data = ant_abundance)
summary(quasipoismod2) # Whether or not we include time as an offset affects the significance of elevation.

## 
## Call:
## glmmadmb(formula = Ants ~ Elevation + Time + (1 | Elevation:Valley) + 
##     (1 | Valley:Plot), data = ant_abundance, family = "nbinom1")
## 
## AIC: 984.1 
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.15893    0.24955   -4.64  3.4e-06 ***
## ElevationLow  0.66643    0.30822    2.16    0.031 *  
## Time          0.00778    0.00177    4.40  1.1e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of observations: total=432, Elevation:Valley=6, Valley:Plot=18 
## Random effect variance(s):
## Group=Elevation:Valley
##             Variance  StdDev
## (Intercept)   0.0072 0.08485
## Group=Valley:Plot
##             Variance StdDev
## (Intercept)   0.2653  0.515
## 
## Negative binomial dispersion parameter: 2.4313 (std. err.: 0.27529)
## 
## Log-likelihood: -486.058

AICtab(quasipoismod1, quasipoismod2) # The model with the offset appears to be better fitting.

##               dAIC df
## quasipoismod1 0.0  5 
## quasipoismod2 1.2  6