DEM 7473 Week 4: Hierarchical Models for Non-Normal Responses - Generalized Linear Mixed Models (GLMMs)

In this example, I will use the ECLS-K 2011 data. In this example, I will illustrate how to fit Generalized Linear Mixed models to outcomes that are not continuous. I will illustrate two different methods of estimation, Penalized Quasi Likelihood using the glmmPQL() function in the MASS library and the Laplace approximation using the glmer() function in the lme4 library.

Both methods will be used to analyze a binary and a count data outcome. As has been discussed by lots of statisticians, PQL is known to give downwardly biased (too small) estimates for the variance components in both the Binomial and Poisson models, so we’ll need to be on the lookout for that in our example.

A third method of estimation called Adaptive Gaussian Quadrature is also available, but it not really implemented in R. Stata and SAS both allow this method. In general AGQ is more accurate, but it is much slower, as the number of random effects increases.

This Rpubs document provides an excellent section on the technical matters involved in fitting the models in R.

Data and recodes

First we load our data

library(haven)
library (car)
library(lme4)
library(sjstats)
library(lmerTest)
library(MuMIn)
library(MASS)
library(arm)

eclsk11<-read_sas("~/Google Drive/classes/dem7473/data/childk4p.sas7bdat")
names(eclsk11)<-tolower(names(eclsk11))

myvars<-c("x_chsex_r", "x1locale", "x_raceth_r", "x2povty", "x12par1ed_i","p1curmar", "x1htotal", "x1mscalk4","p1hscale","x2fsstat2", "x12sesl", "p2parct1", "p2parct2", "s1_id", "p2safepl","x2krceth","p1o2near", "x_distpov" , "w1c0", "w1p0", "w2p0", "w1c0str", "w1p0str", "w1c0psu", "w1p0psu", "x1height", "x1kage_r")
#subset the data
eclsk.sub<-eclsk11[,myvars]
rm(eclsk11); gc()

##           used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells 2170907 116.0    4532980  242.1   4532980  242.1
## Vcells 4160199  31.8  650069916 4959.7 784073503 5982.1

Next, I do some recoding of variables using a mixture of the ifelse() function and the Recode () function.

#recode our outcomes, the  first is the child's math standardized test score  in Kindergarten
eclsk.sub$math<-ifelse(eclsk.sub$x1mscalk4<0, NA, eclsk.sub$x1mscalk4)

#Second outcome is child's height for age, continuous outcome
eclsk.sub$height<-ifelse(eclsk.sub$x1height<=-7, NA, eclsk.sub$x1height)
eclsk.sub$age_yrs<-ifelse(eclsk.sub$x1kage_r<0, NA, round(eclsk.sub$x1kage_r/12, 0))
eclsk.sub<- eclsk.sub[is.na(eclsk.sub$age_yrs)==F, ]
table(eclsk.sub$age_yrs)

## 
##    4    5    6    7    8 
##   13 6292 9195  272    3

eclsk.sub$height_z<-ave(eclsk.sub$height, as.factor(eclsk.sub$age_yrs), FUN=scale)

#Household food insecurity, dichotomous outcome
eclsk.sub$foodinsec<-Recode(eclsk.sub$x2fsstat2, recodes="2:3=1; 1=0; else=NA")

#Child health assessment Excellent to poor , ordinal outcome
eclsk.sub$chhealth<-ifelse(eclsk.sub$p1hscale<0, NA, eclsk.sub$p1hscale)

#Number of ear infections after age 2 , count outcome
eclsk.sub$earinfect<-ifelse(eclsk.sub$p1o2near<0, NA, eclsk.sub$p1o2near)

#First we recode some Child characteristics
#Child's sex: recode as male =1
eclsk.sub$male<-Recode(eclsk.sub$x_chsex_r, recodes="1=1; 2=0; -9=NA")

#Recode race with white, non Hispanic as reference using dummy vars
eclsk.sub$hisp<-recode (eclsk.sub$x_raceth_r, recodes="3:4=1;-9=NA; else=0")
eclsk.sub$black<-recode (eclsk.sub$x_raceth_r, recodes="2=1;-9=NA; else=0")
eclsk.sub$asian<-recode (eclsk.sub$x_raceth_r, recodes="5=1;-9=NA; else=0")
eclsk.sub$nahn<-recode (eclsk.sub$x_raceth_r, recodes="6:7=1;-9=NA; else=0")
eclsk.sub$other<-recode (eclsk.sub$x_raceth_r, recodes="8=1;-9=NA; else=0")


#Then we recode some parent/mother characteristics
#Mother's education, recode as 2 dummys with HS = reference
eclsk.sub$lths<-Recode(eclsk.sub$x12par1ed_i, recodes = "0:2=1; 3:8=0; else = NA")
eclsk.sub$gths<-Recode(eclsk.sub$x12par1ed_i, recodes = "1:3=0; 4:8=1; else =NA") 

#marital status, recode as 2 dummys, ref= married
eclsk.sub$single<-Recode(eclsk.sub$p1curmar, recodes="4=1; -7:-9=NA; else=0")
eclsk.sub$notmar<-Recode(eclsk.sub$p1curmar, recodes="2:3=1; -7:-9=NA; else=0")


#Then we do some household level variables
#SES
eclsk.sub$hhses<-ifelse(eclsk.sub$x12sesl==-9, NA, scale(eclsk.sub$x12sesl))

#Urban school location = 1
eclsk.sub$urban<-Recode(eclsk.sub$x1locale, recodes = "1:3=1; 4=0; -1:-9=NA")

#poverty level in poverty = 1
eclsk.sub$pov<-Recode(eclsk.sub$x2povty , recodes ="1:2=1; 3=0; -9=NA")

#Household size
eclsk.sub$hhsize<-eclsk.sub$x1htotal

#school % minority student body
eclsk.sub$minorsch<-ifelse(eclsk.sub$x2krceth <0, NA, eclsk.sub$x2krceth/10)

#Unsafe neighborhood
eclsk.sub$unsafe<-Recode(eclsk.sub$p2safepl , recodes = "1:2='unsafe'; 3='safe'; else=NA", as.factor=T)

#school district poverty
eclsk.sub$dist_pov<-ifelse(eclsk.sub$x_distpov==-9, NA, scale(eclsk.sub$x_distpov))

Modeling

Model 1 - Binary response

The First model will fit a model that considers a binary response, if a child was living in a household considered to have low food security.

A research question corresponding to this could be:

“Do children living in households in poverty have higher risk of food insecurity than children living in households that are not in poverty, net of other demographic and socioeconomic factors?”

So the hypothesis of interest here involves us testing whether the effect of household poverty shows a $\beta$ coefficient different from zero.

If we were to write the symbolic form of this model it would be the logistic regression model: \[ln \frac {p_{ij}}{1-p{ij}} = \beta_{0j} + \sum {\beta_k x_{ik}} \] \[\beta_{0j} = \beta_0 + u_j\] \[u_j \sim N(0, \sigma^2_u)\]

hist(eclsk.sub$foodinsec)

#First using PQL
fit1.pql<-glmmPQL(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~1|s1_id, data=eclsk.sub, family = binomial)

## iteration 1

## iteration 2

## iteration 3

## iteration 4

summary(fit1.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | s1_id
##         (Intercept)  Residual
## StdDev:   0.4807976 0.9150427
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: foodinsec ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error    DF   t-value p-value
## (Intercept) -2.6038312 0.08119770 10486 -32.06780  0.0000
## male         0.1055669 0.05615575 10486   1.87989  0.0602
## hisp         0.2507215 0.07684178 10486   3.26283  0.0011
## black        0.0496467 0.09566373 10486   0.51897  0.6038
## asian       -0.2495778 0.15266365 10486  -1.63482  0.1021
## nahn         0.4659297 0.22648964 10486   2.05718  0.0397
## other        0.4049495 0.12708278 10486   3.18650  0.0014
## lths        -0.0423022 0.08513081 10486  -0.49691  0.6193
## gths         0.3496666 0.07784762 10486   4.49168  0.0000
## hhses       -1.3206545 0.05973797 10486 -22.10746  0.0000
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.375                                                        
## hisp  -0.357  0.012                                                 
## black -0.231 -0.010  0.373                                          
## asian -0.196  0.021  0.219  0.152                                   
## nahn  -0.115  0.009  0.142  0.108  0.065                            
## other -0.186  0.004  0.232  0.197  0.112  0.086                     
## lths  -0.231 -0.012 -0.160  0.044 -0.063  0.009  0.003              
## gths  -0.701  0.006  0.046 -0.053  0.022  0.001 -0.020  0.267       
## hhses  0.381 -0.011  0.175  0.225 -0.020  0.042  0.048  0.261 -0.486
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.9675577 -0.4396852 -0.2877323 -0.1671991 14.4104143 
## 
## Number of Observations: 11351
## Number of Groups: 856

#Using Laplace approximation
fit1.l<-glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = binomial, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit1.l, detail=T)

## glmer(formula = foodinsec ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)  -2.64     0.09  -29.75    0.00  
## male          0.10     0.06    1.71    0.09  
## hisp          0.28     0.08    3.36    0.00  
## black         0.05     0.10    0.53    0.59  
## asian        -0.26     0.16   -1.56    0.12  
## nahn          0.51     0.25    2.07    0.04  
## other         0.42     0.14    3.06    0.00  
## lths         -0.04     0.09   -0.39    0.70  
## gths          0.35     0.08    4.13    0.00  
## hhses        -1.34     0.06  -20.63    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.39    
##  Residual             1.00    
## ---
## number of obs: 11351, groups: s1_id, 856
## AIC = 7437.3, DIC = 6868.6
## deviance = 7142.0

#Using AGQ with 10  quadrature points
fit1.a<-glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = binomial, nAGQ = 10, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit1.a, detail=T)

## glmer(formula = foodinsec ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)), nAGQ = 10)
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)  -2.64     0.09  -29.67    0.00  
## male          0.10     0.06    1.71    0.09  
## hisp          0.28     0.08    3.36    0.00  
## black         0.05     0.10    0.53    0.59  
## asian        -0.26     0.16   -1.55    0.12  
## nahn          0.51     0.25    2.06    0.04  
## other         0.42     0.14    3.06    0.00  
## lths         -0.04     0.09   -0.39    0.69  
## gths          0.35     0.09    4.12    0.00  
## hhses        -1.34     0.07  -20.59    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.38    
##  Residual             1.00    
## ---
## number of obs: 11351, groups: s1_id, 856
## AIC = 7437.6, DIC = 6871.9
## deviance = 7143.7

The between school variances for the three methods are:

VarCorr(fit1.pql)[1,2]

## [1] "0.4807976"

VarCorr(fit1.l)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.38529

VarCorr(fit1.a)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.38366

In this case, the PQL model had a higher estimate of the between school variance. How do the regression effects differ by estimation method?

round(data.frame(PQL=fit1.pql$coefficients$fixed, Laplace=fixef(fit1.l), AGQ=fixef(fit1.a)), 3)

We see very close agreement between the three methods here, with some downward bias in the models estimated by PQL.

Model 2 - Poisson outcome

If we were to write the symbolic form of this model it would be the Poisson regression model: \[ln \left( y_{ij} \right) = ln \left( n_{ij} \right) + \beta_{0j} + \sum {\beta_k x_{ik}} \] \[\beta_{0j} = \beta_0 + u_j\] \[u_j \sim N(0, \sigma^2_u)\] and the optional offset term $ln \left( n_{ij} \right)$ to account for different population sizes or risk set sizes.

hist(eclsk.sub$earinfect)

#First using PQL
fit2.pql<-glmmPQL(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~1|s1_id, data=eclsk.sub, family =poisson)

## iteration 1

## iteration 2

## iteration 3

summary(fit2.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | s1_id
##         (Intercept) Residual
## StdDev:   0.1002443 1.369685
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: earinfect ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error   DF    t-value p-value
## (Intercept)  0.5037680 0.03421994 7078  14.721477  0.0000
## male         0.0986195 0.02448773 7078   4.027301  0.0001
## hisp        -0.2234249 0.03562337 7078  -6.271863  0.0000
## black       -0.5815350 0.05053346 7078 -11.507919  0.0000
## asian       -0.2070363 0.06471245 7078  -3.199327  0.0014
## nahn        -0.0275606 0.10252444 7078  -0.268820  0.7881
## other       -0.1382460 0.05641935 7078  -2.450329  0.0143
## lths        -0.1440775 0.05335497 7078  -2.700358  0.0069
## gths         0.0558431 0.03677558 7078   1.518484  0.1289
## hhses       -0.0131460 0.02023670 7078  -0.649611  0.5160
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.388                                                        
## hisp  -0.249 -0.001                                                 
## black -0.153 -0.002  0.182                                          
## asian -0.128  0.035  0.080  0.051                                   
## nahn  -0.089  0.014  0.088  0.059  0.028                            
## other -0.114 -0.002  0.126  0.090  0.059  0.045                     
## lths  -0.333 -0.018 -0.157  0.009 -0.029  0.000 -0.004              
## gths  -0.777 -0.001  0.041 -0.017  0.037  0.007 -0.013  0.296       
## hhses  0.233 -0.004  0.160  0.161 -0.114  0.059  0.038  0.154 -0.501
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.0553348 -0.8139818 -0.3079529  0.3594691  4.4828822 
## 
## Number of Observations: 7938
## Number of Groups: 851

#Using Laplace approximation
fit2.l<-glmer(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = poisson,control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit2.l, detail=T)

## glmer(formula = earinfect ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = poisson, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)   0.45     0.03   16.44    0.00  
## male          0.10     0.02    5.68    0.00  
## hisp         -0.18     0.03   -6.38    0.00  
## black        -0.56     0.04  -13.95    0.00  
## asian        -0.19     0.05   -3.87    0.00  
## nahn         -0.02     0.08   -0.22    0.82  
## other        -0.12     0.04   -2.89    0.00  
## lths         -0.15     0.04   -3.68    0.00  
## gths          0.05     0.03    1.88    0.06  
## hhses        -0.02     0.02   -1.23    0.22  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.26    
##  Residual             1.00    
## ---
## number of obs: 7938, groups: s1_id, 851
## AIC = 28728.8, DIC = -1154.1
## deviance = 13776.4

#Using AGQ with 10  quadrature points
fit2.a<-glmer(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = poisson, nAGQ = 10, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit2.a, detail=T)

## glmer(formula = earinfect ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = poisson, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)), nAGQ = 10)
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)   0.45     0.03   16.37    0.00  
## male          0.10     0.02    5.66    0.00  
## hisp         -0.18     0.03   -6.36    0.00  
## black        -0.56     0.04  -13.90    0.00  
## asian        -0.19     0.05   -3.85    0.00  
## nahn         -0.02     0.08   -0.22    0.82  
## other        -0.12     0.04   -2.87    0.00  
## lths         -0.15     0.04   -3.67    0.00  
## gths          0.05     0.03    1.88    0.06  
## hhses        -0.02     0.02   -1.22    0.22  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.26    
##  Residual             1.00    
## ---
## number of obs: 7938, groups: s1_id, 851
## AIC = 14749.9, DIC = 12823
## deviance = 13775.5

The between school variances for the three methods are:

VarCorr(fit2.pql)[1,2]

## [1] "0.1002443"

VarCorr(fit2.l)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.25797

VarCorr(fit2.a)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.25827

Here, we see a much smaller between school variance for the PQL fit method. How do the regression effects differ by estimation method?

round(data.frame(PQL=fit2.pql$coefficients$fixed, Laplace=fixef(fit2.l), AGQ=fixef(fit2.a)), 3)

The PQL fit has slightly different estimates of the fixed effects, but the substantive interpretation of all the parameters is consistent between the three fits.

Analytical note::

NOTE GLMM’s can take much longer to estimate than LMM’s, don’t be surprised if your model takes a long time to run. Also don’t be surprised if you see warning messages like:

Warning message: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = somenumber (tol = 0.001)

This is a warning message that the computer did not reach a satisfactory numerical solution for your model. This just means we may have to tweak your model a little to get R to converge. This Rpubs document provides an excellent section on the technical matters involved in fitting the models in R.

You may have to refit the model, meaning restart the optimization process where it stopped. This can be done using the refit() function in lme4. OR, and this is typically what I do, change the optimizing routine and the number of model iterations. A typical call would look like:

glmer(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = poisson,control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))

Where I change the defaulty optimizer and increase the iteration number, so the model will run longer if it needs it, and sometimes it will. You can change maxfun to be pretty large, now it’s 20,000, but you can make it larger if you’re having convergence issues.

Model 3 - Random slopes for both types of model

Here we fit models that consider both random intercepts and random slopes for the discrete outcomes. I only present the estimates from the PQL estimation and the Laplace approximation.

#Binomial
#First using PQL
fit3.pql<-glmmPQL(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~hhses|s1_id, data=eclsk.sub, family = binomial)

## iteration 1

## iteration 2

## iteration 3

## iteration 4

## iteration 5

## iteration 6

## iteration 7

## iteration 8

summary(fit3.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~hhses | s1_id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.6256147 (Intr)
## hhses       0.6291712 0.454 
## Residual    0.8351838       
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: foodinsec ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error    DF   t-value p-value
## (Intercept) -2.6274216 0.07780833 10486 -33.76787  0.0000
## male         0.1064495 0.05229296 10486   2.03564  0.0418
## hisp         0.2117791 0.07353615 10486   2.87993  0.0040
## black        0.0369259 0.09149280 10486   0.40359  0.6865
## asian       -0.2617305 0.14514686 10486  -1.80321  0.0714
## nahn         0.4011204 0.21256402 10486   1.88706  0.0592
## other        0.3932301 0.11870994 10486   3.31253  0.0009
## lths        -0.0649186 0.08070030 10486  -0.80444  0.4212
## gths         0.3658647 0.07219017 10486   5.06807  0.0000
## hhses       -1.3801997 0.06155565 10486 -22.42198  0.0000
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.365                                                        
## hisp  -0.352  0.014                                                 
## black -0.230 -0.010  0.361                                          
## asian -0.193  0.019  0.218  0.152                                   
## nahn  -0.114  0.010  0.142  0.105  0.067                            
## other -0.183  0.004  0.228  0.199  0.114  0.091                     
## lths  -0.211 -0.013 -0.144  0.042 -0.062  0.006  0.003              
## gths  -0.682  0.006  0.045 -0.053  0.019  0.001 -0.020  0.263       
## hhses  0.398 -0.009  0.159  0.196 -0.017  0.037  0.044  0.241 -0.447
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.0352333 -0.4773136 -0.3068905 -0.1623277  6.5093510 
## 
## Number of Observations: 11351
## Number of Groups: 856

#Using Laplace approximation
fit3.l<-glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(hhses|s1_id), data=eclsk.sub, family = binomial, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
#fit3l<-refit(fit3.l)
display(fit3.l, detail=T)

## glmer(formula = foodinsec ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (hhses | s1_id), data = eclsk.sub, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)  -2.64     0.09  -29.59    0.00  
## male          0.11     0.06    1.72    0.08  
## hisp          0.28     0.08    3.35    0.00  
## black         0.06     0.10    0.60    0.55  
## asian        -0.25     0.16   -1.53    0.13  
## nahn          0.51     0.25    2.09    0.04  
## other         0.43     0.14    3.09    0.00  
## lths         -0.04     0.09   -0.42    0.68  
## gths          0.35     0.09    4.15    0.00  
## hhses        -1.33     0.07  -20.18    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev. Corr  
##  s1_id    (Intercept) 0.34           
##           hhses       0.08     -1.00 
##  Residual             1.00           
## ---
## number of obs: 11351, groups: s1_id, 856
## AIC = 7440.7, DIC = 6889
## deviance = 7151.9

#this model may be overfit because of the -1 correlation

#Poisson
#First using PQL
fit4.pql<-glmmPQL(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~hhses|s1_id, data=eclsk.sub, family =poisson)

## iteration 1

## iteration 2

## iteration 3

## iteration 4

summary(fit4.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~hhses | s1_id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 0.08015014 (Intr)
## hhses       0.13210727 -0.084
## Residual    1.35888997       
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: earinfect ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error   DF    t-value p-value
## (Intercept)  0.5013610 0.03404547 7078  14.726216  0.0000
## male         0.0999822 0.02435778 7078   4.104733  0.0000
## hisp        -0.2223294 0.03541014 7078  -6.278691  0.0000
## black       -0.5804736 0.05026043 7078 -11.549316  0.0000
## asian       -0.2107166 0.06486472 7078  -3.248555  0.0012
## nahn        -0.0349371 0.10178071 7078  -0.343259  0.7314
## other       -0.1376583 0.05612187 7078  -2.452847  0.0142
## lths        -0.1476911 0.05332075 7078  -2.769862  0.0056
## gths         0.0579752 0.03658670 7078   1.584597  0.1131
## hhses       -0.0154868 0.02091239 7078  -0.740557  0.4590
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.388                                                        
## hisp  -0.246 -0.001                                                 
## black -0.150 -0.002  0.180                                          
## asian -0.125  0.035  0.079  0.050                                   
## nahn  -0.089  0.014  0.088  0.059  0.028                            
## other -0.114 -0.001  0.126  0.089  0.059  0.045                     
## lths  -0.328 -0.018 -0.152  0.009 -0.028  0.000 -0.003              
## gths  -0.780 -0.002  0.040 -0.017  0.035  0.008 -0.014  0.292       
## hhses  0.235 -0.002  0.157  0.157 -0.110  0.057  0.038  0.156 -0.490
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.1423532 -0.8109696 -0.3107447  0.3628364  4.5561754 
## 
## Number of Observations: 7938
## Number of Groups: 851

#Using Laplace approximation
fit4.l<-glmer(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses+(hhses|s1_id), data=eclsk.sub, family = poisson, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit4.l, detail=T )

## glmer(formula = earinfect ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (hhses | s1_id), data = eclsk.sub, 
##     family = poisson, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)   0.43     0.03   15.19    0.00  
## male          0.11     0.02    5.88    0.00  
## hisp         -0.18     0.03   -6.02    0.00  
## black        -0.56     0.04  -13.62    0.00  
## asian        -0.21     0.05   -4.05    0.00  
## nahn         -0.04     0.08   -0.55    0.58  
## other        -0.12     0.04   -2.73    0.01  
## lths         -0.19     0.04   -4.58    0.00  
## gths          0.07     0.03    2.54    0.01  
## hhses        -0.03     0.02   -1.51    0.13  
## 
## Error terms:
##  Groups   Name        Std.Dev. Corr  
##  s1_id    (Intercept) 0.24           
##           hhses       0.30     -0.22 
##  Residual             1.00           
## ---
## number of obs: 7938, groups: s1_id, 851
## AIC = 28594.8, DIC = -2407.4
## deviance = 13080.7

The between school variances for the three methods are:

#Results for the binomial
VarCorr(fit3.pql)[1:2,2] #first is Intercept Variance, second is slope variance

## (Intercept)       hhses 
## "0.6256147" "0.6291712"

attr(VarCorr(fit3.l)$s1_id, "stddev")

## (Intercept)       hhses 
##  0.33815241  0.08290273

#Results for the poisson
VarCorr(fit4.pql)[1:2,2] #first is Intercept Variance, second is slope variance

##  (Intercept)        hhses 
## "0.08015014" "0.13210727"

attr(VarCorr(fit4.l)$s1_id, "stddev")

## (Intercept)       hhses 
##   0.2356143   0.2960484

So the binomial models may be a tough comparison, because the fit3.l model may be over-fit, because of the -1 correlation. The Poisson models’ fit4.pql, fit4.l variance components are very different from one another as well, again with the PQL estimates being much lower than those produced from the Laplace approximation.

Another Method - Bayesian Estimation of GLMM’s

Bayesian methods are probably the best in this situation, where multiple random effects are present, and you get different results between type of estimation used. Bayesian methods use prior information to allow for more robust estimation of the variance terms in these models.

We will talk much more about these types of models in the upcoming weeks. Here are two illustrations, the first uses Stan to fit the model based on Hamiltonian Monte Carlo simulation, the second method uses an extension of the Laplace approximation to do full Bayesian estimation using R-INLA.

library(rstanarm)

## Loading required package: Rcpp

## rstanarm (Version 2.17.4, packaged: 2018-04-13 01:51:52 UTC)

## - Do not expect the default priors to remain the same in future rstanarm versions.

## Thus, R scripts should specify priors explicitly, even if they are just the defaults.

## - For execution on a local, multicore CPU with excess RAM we recommend calling

## options(mc.cores = parallel::detectCores())

## - Plotting theme set to bayesplot::theme_default().

## 
## Attaching package: 'rstanarm'

## The following object is masked from 'package:MuMIn':
## 
##     loo

## The following object is masked from 'package:sjstats':
## 
##     se

fit3.bayes<-stan_glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(hhses|s1_id), data=eclsk.sub, family = "binomial",algorithm = "meanfield")

## ------------------------------------------------------------
## EXPERIMENTAL ALGORITHM:
##   This procedure has not been thoroughly tested and may be unstable
##   or buggy. The interface is subject to change.
## ------------------------------------------------------------
## 
## 
## Rejecting initial value:
##   Log probability evaluates to log(0), i.e. negative infinity.
##   Stan can't start sampling from this initial value.
## 
## Gradient evaluation took 0.01 seconds
## 1000 transitions using 10 leapfrog steps per transition would take 100 seconds.
## Adjust your expectations accordingly!
## 
## 
## Begin eta adaptation.
## Iteration:   1 / 250 [  0%]  (Adaptation)
## Iteration:  50 / 250 [ 20%]  (Adaptation)
## Iteration: 100 / 250 [ 40%]  (Adaptation)
## Iteration: 150 / 250 [ 60%]  (Adaptation)
## Iteration: 200 / 250 [ 80%]  (Adaptation)
## Success! Found best value [eta = 1] earlier than expected.
## 
## Begin stochastic gradient ascent.
##   iter       ELBO   delta_ELBO_mean   delta_ELBO_med   notes 
##    100    -4e+003             1.000            1.000
##    200    -4e+003             0.520            1.000
##    300    -4e+003             0.348            0.039
##    400    -4e+003             0.265            0.039
##    500    -4e+003             0.212            0.016
##    600    -4e+003             0.177            0.016
##    700    -4e+003             0.153            0.005   MEDIAN ELBO CONVERGED
## 
## Drawing a sample of size 1000 from the approximate posterior... 
## COMPLETED.

## Setting 'QR' to TRUE can often be helpful when using one of the variational inference algorithms. See the documentation for the 'QR' argument.

print(fit3.bayes, digits=3)

## stan_glmer
##  family:       binomial [logit]
##  formula:      foodinsec ~ male + hisp + black + asian + nahn + other + lths + 
##     gths + hhses + (hhses | s1_id)
##  observations: 11351
## ------
##             Median MAD_SD
## (Intercept) -2.573  0.070
## male         0.008  0.053
## hisp         0.269  0.054
## black        0.079  0.084
## asian       -0.196  0.143
## nahn         0.531  0.262
## other        0.349  0.134
## lths        -0.062  0.088
## gths         0.341  0.080
## hhses       -1.291  0.055
## 
## Error terms:
##  Groups Name        Std.Dev. Corr 
##  s1_id  (Intercept) 0.2158        
##         hhses       0.1050   -0.47
## Num. levels: s1_id 856 
## 
## Sample avg. posterior predictive distribution of y:
##          Median MAD_SD
## mean_PPD 0.119  0.006 
## 
## ------
## For info on the priors used see help('prior_summary.stanreg').

VarCorr(fit3.bayes)

##  Groups Name        Std.Dev. Corr  
##  s1_id  (Intercept) 0.21576        
##         hhses       0.10505  -0.469

library(bayesplot)

## This is bayesplot version 1.6.0

## - Online documentation and vignettes at mc-stan.org/bayesplot

## - bayesplot theme set to bayesplot::theme_default()

##    * Does _not_ affect other ggplot2 plots

##    * See ?bayesplot_theme_set for details on theme setting

#plot(fit3.bayes)
posterior<-as.matrix(fit3.bayes)
mcmc_intervals(posterior, pars = c("male", "hisp", "black", "lths", "gths"), prob = .95)

mcmc_areas(
  posterior, 
  pars = c( "Sigma[s1_id:(Intercept),(Intercept)]","Sigma[s1_id:hhses,hhses]"),
  prob = 0.8, # 80% intervals
  prob_outer = 0.99, # 99%
  point_est = "mean"
)

Using INLA

#plot(fit3.bayes, plotfun="trace")

library(INLA)

## Loading required package: sp

## This is INLA_18.07.12 built 2018-07-12 11:05:18 UTC.
## See www.r-inla.org/contact-us for how to get help.

nwnsch<-table(eclsk.sub$s1_id)
eclsk.sub$sch<-rep(1:length(unique(eclsk.sub$s1_id)), nwnsch)
eclsk.sub$sch2<- eclsk.sub$sch
eclsk.sub<-eclsk.sub[is.na(eclsk.sub$s1_id)==F,]
N<-length(eclsk.sub$foodinsec)
fit3.inla<-inla(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+
                  f(sch, model="iid2d", n=2*N) +
                  f(sch2,hhses, copy="sch"), 
                data=eclsk.sub, family="binomial",
                Ntrials = rep(1,length(eclsk.sub$foodinsec) ),
                num.threads=2,control.inla=list(strategy="gaussian"))
summary(fit3.inla)

## 
## Call:
## c("inla(formula = foodinsec ~ male + hisp + black + asian + nahn + ",  "    other + lths + gths + hhses + f(sch, model = \"iid2d\", n = 2 * ",  "    N) + f(sch2, hhses, copy = \"sch\"), family = \"binomial\", data = eclsk.sub, ",  "    Ntrials = rep(1, length(eclsk.sub$foodinsec)), control.inla = list(strategy = \"gaussian\"), ",  "    num.threads = 2)")
## 
## Time used:
##  Pre-processing    Running inla Post-processing           Total 
##          2.3047        120.0018          0.9769        123.2835 
## 
## Fixed effects:
##                mean     sd 0.025quant 0.5quant 0.975quant    mode kld
## (Intercept) -2.5904 0.0852    -2.7577  -2.5904    -2.4233 -2.5904   0
## male         0.0996 0.0604    -0.0189   0.0996     0.2180  0.0996   0
## hisp         0.2962 0.0776     0.1438   0.2962     0.4485  0.2962   0
## black        0.0481 0.0965    -0.1413   0.0481     0.2374  0.0481   0
## asian       -0.2624 0.1597    -0.5759  -0.2624     0.0508 -0.2624   0
## nahn         0.5467 0.2372     0.0810   0.5467     1.0120  0.5467   0
## other        0.4356 0.1362     0.1682   0.4356     0.7028  0.4356   0
## lths        -0.0303 0.0905    -0.2080  -0.0303     0.1472 -0.0303   0
## gths         0.3456 0.0837     0.1813   0.3456     0.5097  0.3456   0
## hhses       -1.3285 0.0640    -1.4541  -1.3285    -1.2030 -1.3285   0
## 
## Random effects:
## Name   Model
##  sch   IID2D model 
## sch2   Copy 
## 
## Model hyperparameters:
##                                    mean     sd 0.025quant 0.5quant
## Precision for sch (component 1) 15.0968 4.0697     8.7060  14.5598
## Precision for sch (component 2)  3.8995 2.7711     0.6763   3.2400
## Rho1:2 for sch                  -0.0018 0.2393    -0.4606  -0.0032
##                                 0.975quant    mode
## Precision for sch (component 1)     24.580 13.5475
## Precision for sch (component 2)     11.022  1.8782
## Rho1:2 for sch                       0.462 -0.0091
## 
## Expected number of effective parameters(std dev): 67.33(13.51)
## Number of equivalent replicates : 168.65 
## 
## Marginal log-Likelihood:  -3781.60

1/fit3.inla$summary.hyperpar[,1]

## [1]    0.0662391    0.2564450 -551.5941138

m<-inla.tmarginal(function(x) 1/x, fit3.inla$marginals.hyperpar$'Precision for sch (component 1)')
m2<-inla.tmarginal(function(x) 1/x, fit3.inla$marginals.hyperpar$'Precision for sch (component 2)')
plot(m, type="l",main="Posterior distribution for between school variance")

plot(m2, type="l",main="Posterior distribution for between school slope variance in hhses effect")

DEM 7473 Week 4: Hierarchical Models for Non-Normal Responses - Generalized Linear Mixed Models (GLMMs)

Corey S. Sparks, PhD

September 24, 2018

Data and recodes

Modeling

Model 1 - Binary response

Model 2 - Poisson outcome

Analytical note::

Model 3 - Random slopes for both types of model

Another Method - Bayesian Estimation of GLMM’s

Using INLA