Is an R package built on the Template Model Builder automatic differentiation engine, for fitting generalized linear mixed models and extensions.
General framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family. It allows all the parameters of the distribution of the response variable to be modeled as linear/non-linear or smooth functions of the explanatory variables.
library(dplyr)
library(gamlss)
library(glmmTMB)
library(DHARMa)
library(ggplot2)
library(plotrix)
dat <- data.frame(y =c(rZINBI(100, mu = 10, sigma = .6, nu=0.1),rZINBI(100, mu = 5, sigma = .3, nu=.5)),sites =c(rep("a", 100), rep("b", 100)),year = rep(1:4, each = 10, times = 5),trans = rep(1:40, each = 5, times = 1), area=rNO(200,20))
head(dat)
## y sites year trans area
## 1 6 a 1 1 21.03874
## 2 3 a 1 1 20.84432
## 3 29 a 1 1 21.74034
## 4 1 a 1 1 19.00565
## 5 18 a 1 1 19.78353
## 6 2 a 1 2 19.40191
#see the data structure
sizetree(dat[2:4])
#plot the data
p <- ggplot(dat, aes(sites,y))
p + geom_jitter(aes(colour = year, size=area), width = 0.2, alpha=0.4)+stat_summary(fun.data = "mean_cl_boot",colour="purple")+
ylab("abundance")+xlab("sites")
Two distributions available the nbinom1 and the nbinom2, in which the variance increases linearly and quadratically with the mean (respectively).
nbinom1: sigma = 𝜇(1 + a), nbinom2: sigma = 𝜇(1 + 𝜇/b)
a, b = dispersion parameter
A description of the dispersion parameter for each distribution can be accessed by typing ?sigma.glmmTMB in R.
Fixed effect= site (dummy variable), random effect= year, zero-inflation= not included (zi). So we are interested - modeling the differences in the mean of the abundance between both sites, while adjusting for variations-correlation between transects (variance parameter).
In glmmTMB random effects are assumed to be Gaussian
m1 <- glmmTMB(y ~ sites + (1|trans),
zi=~0,
family=nbinom1, data=dat)
summary(m1)
## Family: nbinom1 ( log )
## Formula: y ~ sites + (1 | trans)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 1047.6 1060.8 -519.8 1039.6 196
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trans (Intercept) 0.06409 0.2532
## Number of obs: 200, groups: trans, 40
##
## Dispersion parameter for nbinom1 family (): 7.74
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.2494 0.1099 20.467 < 2e-16 ***
## sitesb -1.4311 0.1889 -7.577 3.55e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary of the model showed the std and the variance of the trans. The rest of the information is provided in a similar way to linear models and GLM in R.
# for model evaluation
m1_resi <- simulateResiduals(m1)
plot(m1_resi)
Same as model 1 but includes an offset to account varying levels of “effort” (area). Under a typical count model (Poisson or negative binomial):
log(𝜇) = 𝛽0 + 𝛽1𝑥1 + ⋯ + 𝛽𝑝𝑥𝑝
An offset variable (𝐴) is a variable whose 𝛽 coefficient is constrained to 1, allowing to model a rate (density) instead of the count:
log(𝜇) = 1 ∙ log(𝐴) + 𝛽0 + 𝛽1𝑥1 + ⋯ + 𝛽𝑝𝑥𝑝
which is equivalent to,
log(𝜇/𝐴) = 𝛽0 + 𝛽1𝑥1 + ⋯ + 𝛽𝑝𝑥𝑝
We need to specify a log function in the offset because the link of the nbinom1 is log
m2 <- glmmTMB(y ~ sites + (1|trans)+offset(log(area)),
zi=~0,
family=nbinom1, data=dat)
summary(m2)
## Family: nbinom1 ( log )
## Formula: y ~ sites + (1 | trans) + offset(log(area))
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 1048.0 1061.2 -520.0 1040.0 196
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trans (Intercept) 0.06654 0.258
## Number of obs: 200, groups: trans, 40
##
## Dispersion parameter for nbinom1 family (): 7.74
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7549 0.1105 -6.834 8.24e-12 ***
## sitesb -1.4298 0.1896 -7.542 4.64e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fixed effect =site, random effects= transect nested within year, zero-inflation= not included (zi). In this model we account for variations between years and for transects within years (see summary= trans:year).
m3 <- glmmTMB(y ~ sites + (1|year/trans)+offset(log(area)),
zi=~0,
family=nbinom1, data=dat)
summary(m3)
## Family: nbinom1 ( log )
## Formula: y ~ sites + (1 | year/trans) + offset(log(area))
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 1050.0 1066.5 -520.0 1040.0 195
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trans:year (Intercept) 6.654e-02 2.580e-01
## year (Intercept) 9.952e-10 3.155e-05
## Number of obs: 200, groups: trans:year, 40; year, 4
##
## Dispersion parameter for nbinom1 family (): 7.74
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7549 0.1105 -6.834 8.24e-12 ***
## sitesb -1.4298 0.1896 -7.542 4.64e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We include include a parameter for the zero inflation.
m4 <- glmmTMB(y ~ sites + (1|year/trans)+offset(log(area)),
zi=~1,
family=nbinom1, data=dat)
We include the variable sites as predictor for the zero inflation and for the dispersion parameters.
m5 <- glmmTMB(y ~ sites + (1|trans)+offset(log(area)),
zi=~sites,
disp=~sites,
family=nbinom1, data=dat)
summary(m5)
## Family: nbinom1 ( log )
## Formula: y ~ sites + (1 | trans) + offset(log(area))
## Zero inflation: ~sites
## Dispersion: ~sites
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 1023.8 1046.9 -504.9 1009.8 193
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trans (Intercept) 0.1478 0.3844
## Number of obs: 200, groups: trans, 40
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7338 0.1305 -5.623 1.87e-08 ***
## sitesb -0.6728 0.1727 -3.897 9.76e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.1117 0.9022 -3.449 0.000562 ***
## sitesb 3.3197 0.9250 3.589 0.000332 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8668 0.2241 8.331 <2e-16 ***
## sitesb -6.6273 45.0699 -0.147 0.883
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#a lot of families beyond exponential distributions
?gamlss.family
## starting httpd help server ... done
Two negative binomial distributions available in gamlss are the NBI and the NBII, in which the variance increases quadratically and linearly (respectively) with the mean.
Similar notation, but for random effect we include: re(random=~1|“factor”).
m1_gamlss <- gamlss(y ~ sites + re(random=~1|year),family=NBII, data=dat, trace = FALSE)
summary(m1_gamlss)
## ******************************************************************
## Family: c("NBII", "Negative Binomial type II")
##
## Call: gamlss(formula = y ~ sites + re(random = ~1 | year),
## family = NBII, data = dat, trace = FALSE)
##
## Fitting method: RS()
##
## ------------------------------------------------------------------
## Mu link function: log
## Mu Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.27486 0.09425 24.135 < 2e-16 ***
## sitesb -1.40231 0.17160 -8.172 3.55e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------------------------------
## Sigma link function: log
## Sigma Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.126 0.161 13.21 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------------------------------
## NOTE: Additive smoothing terms exist in the formulas:
## i) Std. Error for smoothers are for the linear effect only.
## ii) Std. Error for the linear terms maybe are not accurate.
## ------------------------------------------------------------------
## No. of observations in the fit: 200
## Degrees of Freedom for the fit: 2
## Residual Deg. of Freedom: 198
## at cycle: 3
##
## Global Deviance: 1041.6
## AIC: 1045.6
## SBC: 1052.197
## ******************************************************************
The summary of a gamlss model includes the coefficients for the mean (mu) and for the dispersion parameter (sigma) (just an intercept for sigma in this model). To extract the random component of a gamlss model we need the mu.coefSmo function.
m1_gamlss$mu.coefSmo
## [[1]]
## Linear mixed-effects model fit by maximum likelihood
## Data: Data
## Log-likelihood: -366.6986
## Fixed: fix.formula
## (Intercept)
## 5.993402e-10
##
## Random effects:
## Formula: ~1 | year
## (Intercept) Residual
## StdDev: 3.305458e-05 0.9999994
##
## Variance function:
## Structure: fixed weights
## Formula: ~W.var
## Number of Observations: 200
## Number of Groups: 4
Plot Residual Diagnostics for an GAMLSS Object
plot(m1_gamlss)
## ******************************************************************
## Summary of the Randomised Quantile Residuals
## mean = -0.01471342
## variance = 1.104109
## coef. of skewness = -0.4097675
## coef. of kurtosis = 2.691559
## Filliben correlation coefficient = 0.9897367
## ******************************************************************
m2_gamlss <- gamlss(y ~ sites + re(random=~1|year)+offset(log(area)),family=NBII, data=dat,trace = FALSE)
#summary(m2_gamlss)
m3_gamlss <- gamlss(y ~ sites + re(random=~1|year/trans)+offset(log(area)),family=NBII, data=dat, trace = FALSE)
#summary(m3_gamlss)
m4_gamlss <- gamlss(y ~ sites + re(random=~1|year/trans)+offset(log(area)),
sigma.formula = ~sites,
family=NBII, data=dat, trace = FALSE)
#summary(m4_gamlss)
ZINBI()
##
## GAMLSS Family: ZINBI Zero inflated negative binomial type I
## Link function for mu : log
## Link function for sigma: log
## Link function for nu : logit
m4_gamlss <- gamlss(y ~ sites + re(random=~1|year)+offset(log(area)),family=ZINBI, data=dat,trace = FALSE)
# include a predictor to the zero inflation (nu) and sigma parameters
m5_gamlss <- gamlss(y ~ sites + re(random=~1|year)+offset(log(area)),
nu.fo=~sites,
sigma.fo=~sites,
family=ZINBI, data=dat,n.cyc = 100, trace = FALSE)
summary(m5_gamlss)
## ******************************************************************
## Family: c("ZINBI", "Zero inflated negative binomial type I")
##
## Call: gamlss(formula = y ~ sites + re(random = ~1 | year) +
## offset(log(area)), sigma.formula = ~sites, nu.formula = ~sites,
## family = ZINBI, data = dat, n.cyc = 100, trace = FALSE)
##
## Fitting method: RS()
##
## ------------------------------------------------------------------
## Mu link function: log
## Mu Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.63953 0.09313 -6.867 8.54e-11 ***
## sitesb -0.74113 0.13831 -5.358 2.36e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------------------------------
## Sigma link function: log
## Sigma Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4536 0.2211 -2.051 0.0416 *
## sitesb -1.0433 0.5234 -1.993 0.0476 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------------------------------
## Nu link function: logit
## Nu Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.5504 0.5511 -4.628 6.72e-06 ***
## sitesb 2.7289 0.5899 4.626 6.78e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## ------------------------------------------------------------------
## NOTE: Additive smoothing terms exist in the formulas:
## i) Std. Error for smoothers are for the linear effect only.
## ii) Std. Error for the linear terms maybe are not accurate.
## ------------------------------------------------------------------
## No. of observations in the fit: 200
## Degrees of Freedom for the fit: 5
## Residual Deg. of Freedom: 195
## at cycle: 7
##
## Global Deviance: 1015.286
## AIC: 1025.286
## SBC: 1041.777
## ******************************************************************
#compare with model 5 from glmmTMB
summary(m5)
## Family: nbinom1 ( log )
## Formula: y ~ sites + (1 | trans) + offset(log(area))
## Zero inflation: ~sites
## Dispersion: ~sites
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 1023.8 1046.9 -504.9 1009.8 193
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trans (Intercept) 0.1478 0.3844
## Number of obs: 200, groups: trans, 40
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7338 0.1305 -5.623 1.87e-08 ***
## sitesb -0.6728 0.1727 -3.897 9.76e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Zero-inflation model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.1117 0.9022 -3.449 0.000562 ***
## sitesb 3.3197 0.9250 3.589 0.000332 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Dispersion model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8668 0.2241 8.331 <2e-16 ***
## sitesb -6.6273 45.0699 -0.147 0.883
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In cases that we have densities as a response variable we cant use discrete probability distributions (Poisson, negative binomial, etc).
One possible alternative is to use the Tweedie distribution. This is a a very flexible continuous distribution, allowing to adjust variables with high dispersion and a peak at zero.
densi<-dat$y/dat$area
m1_tweedie <- glmmTMB(densi ~ sites + (1|trans),
family=tweedie, data=dat)
summary(m1_tweedie)
## Family: tweedie ( log )
## Formula: densi ~ sites + (1 | trans)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 236.1 252.6 -113.0 226.1 195
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trans (Intercept) 2.159e-09 4.647e-05
## Number of obs: 200, groups: trans, 40
##
## Dispersion parameter for tweedie family (): 0.443
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7141 0.0860 -8.303 <2e-16 ***
## sitesb -1.4515 0.1685 -8.615 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m1_tweedie_resi <- simulateResiduals(m1_tweedie)
plot(m1_tweedie_resi)
An interesting aspect of glmmTMB its that it has the capability to simulate from a fitted model.
#simulate from the tweedie model
simo=simulate(m1_tweedie, seed=1,nsim =10)
data_sim<-cbind(dat,simo )
data_sim[1,]
## y sites year trans area sim_1 sim_2 sim_3 sim_4 sim_5
## 1 6 a 1 1 21.03874 0.2991762 0 0.6064395 0 0.1660954
## sim_6 sim_7 sim_8 sim_9 sim_10
## 1 0.7302457 0.1390962 1.084313 0 0.5485966
p <- ggplot(data_sim, aes(sites,y/area))
p + stat_summary(fun.data = "mean_cl_boot",colour="purple")+ylab("observed-density")+xlab("sites")
psim <- ggplot(data_sim, aes(sites,sim_1))
psim + stat_summary(fun.data = "mean_cl_boot",colour="purple")+ylab("sim-density")+xlab("sites")