1 Introductory comments

Overdispersion, i.e., greater than binomial variation, is an issue for experimental data that examines insect mortality. I am not aware of published work that has taken account of a common pattern of change in which dispersion, defined here as the ratio of the estimated variance to binomial variance, have been highest at midrange exposures and/or mortalities. Abilities that have recently become available in the glmmTMB package for R facilitate modeling that can account for varying dispersion.

Detailed dispersion estimates inevitably depend on how they are modeled and how adjustments are made for them, adding complication and uncertainty to confidence intervals for lethal time or other such estimates. Dispersion estimates in cold storage data that I have examined have been high relative to what has been observed for experiments with methyl bromide. Large dispersion values have large implications for the design of future experiments.

1.1 Source of data

Data that are analyzed in what follows are from Dr Peter Follett (USDA, Hawaii), and are used with his permission. The data are available in the R package qra (“quantal response analysis”) that can be downloaded from Github.

There are 4 lifestages of each of two species, thus:

  • Species are: Mediterranean fruit fly (MedFly); and Melon fly (MelonFly)
  • Lifestages are: Egg, L1 (Larval 1), L2, L3
    • There is 1 replicate of L1, 3 replicates of others
  • Mortality is for varying times in coolstorage at 1.5 - 2\(^o\)C

To install the qra package, type, at the R command line and with a live internet connection:

devtools::install_github("jhmaindonald/qra")

Code that provides the data, in the form required, is:

HawCon <- as.data.frame(qra::HawCon)
HawCon <- within(HawCon, {
  trtGp <- paste0(CN,LifestageTrt, sep=":")
  trtGp <- gsub("Fly","",trtGp)
  trtGpRep <- paste0(CN,LifestageTrt,RepNumber)
  scTime <- scale(TrtTime)
})

The discussion that follows will assume a logit link. See Follett and Neven (2006) for background information on trends and issues in quarantine entomology. The reference for glmmTMB is Brooks et al. (2017). (This, however, does not discuss or mention glmmTMB’s abilities for modeling binomial-like data.) The ability to model the scale parameter as a fixed effect has come on stream very recently, and has to my knowledge not been much tested. Results given here are broadly consistent with the author’s earlier attempts, for this same dataset, to model dispersion effects.

2 Beta-binomial models

2.1 Notation

For present purposes, it is the scale parameter that is of interest. Take the location parameter \(\pi\), with observed value \(P\), to be the expected mortality. Then, in notation consistent with that used in glmmTMB: \[ \mbox{var}[P] = \frac{\pi(1-\pi)}{n}(1+(n-1)\rho), \; \mbox{where}\; \rho = (1+e^\eta)^{-1} \] Here \(\rho\) is the intra-class correlation, which measures the extent to which insects die in concert. It also has the role of scale parameter. The function glmmTMB() has provision to model variation in \(\eta\), and hence \(\rho\), as a function of one or more explanatory variables and/or factors.

The quantity \(1+(n-1)\rho\) is a close analogue of dispersion as defined for quasibinomial models, albeit now a function of \(n\). It will be termed dispersion in what follows, in line with the terminology used for quasibinomial models. The function sigma() returns (at least in version 0.2.0.9000) the exponentiated values of the coefficients of the linear predictor.

2.2 Fit initial glmmTMB models

We proceed to fit a glmmTMB model that allows for dispersion to change with time in coolstorage. Code used is:

suppressMessages(library(lme4))
suppressMessages(require(glmmTMB))
suppressMessages(library(bbmle))
abRho0.TMB <- glmmTMB(cbind(Dead,Live)~0+trtGp/TrtTime+(scTime|trtGpRep),
                      family=list(family="betabinomial",link="logit"),
                      data=HawCon)
abBin.glmer <- glmer(cbind(Dead,Live)~0+trtGp/TrtTime+(scTime|trtGpRep),
                      family=binomial,data=HawCon, nAGQ=0)
abRho2.TMB <- update(abRho0.TMB,
                    dispformula=~trtGp+splines::ns(TrtTime,2))
abRho2.TMBclog <- update(abRho2.TMB,
                      family=list(family="betabinomial",link="cloglog"))

The following extracts estimates of \(\rho\).

## Function to extract estimates of 'rho'
getRho <- function(obj){
  mm <- model.matrix(obj$modelInfo$allForm$dispformula,
                     data=obj$frame)
  fixdisp <- fixef(obj)[["disp"]]
  1/(1+exp(mm%*%fixdisp))
}

Figures 2.1A and B plot the estimates of \(\rho\), with the dispersion estimates in the right panel. Figures 2.1C and D show the corresponding plots when a complementary log-log model is fitted.

Intra-class correlation estimates 
and consequent dispersion estimates, from the beta binomial
model with logit link.
Panels A and B are for a logit link, while Panels C and D
are for a complementary log-log link.

Figure 2.1: Intra-class correlation estimates and consequent dispersion estimates, from the beta binomial model with logit link. Panels A and B are for a logit link, while Panels C and D are for a complementary log-log link.

Use of the AIC statistic to compare the two models suggests that a complementary log-log link should be preferred:

               dAIC df
abRho2.TMBclog  0.0 29
abRho2.TMB     16.3 29

2.3 LT99s and confidence intervals – model comparisons

Confidence intervals can be obtained either using Fieller’s formula (Morgan 1992), or, in principle, using a simulation approach. Where direct comparisons have been made between these approaches, they have given relatively simular results. A usable implementation of the simulation approach, which seems needed for the calculation of confidence intervals for LT99 differences, is not yet available for use with models that have been fitted using abilities in the glmmTMB package.

Figure 2.2 shows confidence intervals, as obtained using four different models. The second model is a repeat of the first, but using a complementary log-log rather than a logit link. The third and fourth models rely on assumptions that are clearly false.

  • Beta binomial (BB) model “BB-ab-Rho2”, random intercept and slope (‘ab’), varying scale (‘Rho2’)
    • A two degrees of freedom normal spline (‘Rho2’) was used to model the parameter that determines scale, and hence \(\rho\)
  • Beta binomial (BB) model “BB-ab-Rho2-clog”, random intercept and slope (‘ab’), varying scale (‘Rho2’), complementary log-log link
  • BB model “BB-ab-Rho0”, logit link, random intercept and slope (‘ab’), constant scale (‘Rho0’)
  • “Binomial-ab”, random intercept and slope (‘ab’)

Notice that the second set of LT99s are generally smaller (with a complementary log-log link) than the first set (with a logit link).

95 percent confidence intervals are compared between
alternative models for within replicate variation. 
The first set is for the beta-binomial
model of Figures 1A and B, with a logit link. The second
set is for the model of Figures 1C and D, which instead
used  a complementary log-log link. The third used a 
beta-binomial model, logit link, and assumed a constant
scale parameter.  The fourth assumed a binomial error family,
again with a logit link.

Figure 2.2: 95 percent confidence intervals are compared between alternative models for within replicate variation. The first set is for the beta-binomial model of Figures 1A and B, with a logit link. The second set is for the model of Figures 1C and D, which instead used a complementary log-log link. The third used a beta-binomial model, logit link, and assumed a constant scale parameter. The fourth assumed a binomial error family, again with a logit link.

3 Concluding remarks

Modeling dispersion effects in binomial-like data is a challenge. The abilities now available in glmmTMB allow more realistic modeling of insect mortality data, and to other data that offer similar analysis challenges. Models that do not account for dispersion effects that are a function of explanatory variables and/or factors will, with data of the type considered here, give confidence intervals for LT99s, or for other such statistics, that may be seriously in error. In another dataset that I have examined, with much larger numbers (a maximum of more than 4800 as opposed to a maximum of 120) the modeling of dispersion makes the difference between intervals that are clearly different, and intervals where there is substantial overlap.

References

Brooks, Mollie E., Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Maechler, and Benjamin M. Bolker. 2017. “glmmTMB Balances Speed and Flexibility Among Packages for Zero-Inflated Generalized Linear Mixed Modeling.” The R Journal 9 (2): 378–400. https://journal.r-project.org/archive/2017/RJ-2017-066/index.html.

Follett, Peter A., and Lisa G. Neven. 2006. “Current Trends in Quarantine Entomology.” Annual Review of Entomology 51 (1): 359–85. doi:10.1146/annurev.ento.49.061802.123314.

Morgan, B. J. T. 1992. Analysis of Quantal Response Data. Chapman & Hall.