DEM 7903 Week 4: Basic Hierarchical Models

In this example, I will use the ECLS-K 2011 data. In this example, I will illustrate how to fit Generalized Linear Mixed models to outcomes that are not continuous. I will illustrate two different methods of estimation, Penalized Quasi Likelihood using the glmmPQL() function in the MASS library and the Laplace approximation using the glmer() function in the lme4 library.

Both methods will be used to analyze a binary and a count data outcome. As has been discussed by lots of statisticians, PQL is known to give downwardly biased (too small) estimates for the variance components in both the Binomial and Poisson models, so we’ll need to be on the lookout for that in our example. A third method of estimation called Adaptive Gaussian Quadrature is also available, but it not really implemented in R. Stata and SAS both allow this method. In general AGQ is more accurate, but it is much slower, as the number of random effects increases.

This Rpubs document provides an excellent section on the technical matters involved in fitting the models in R.

Data and recodes

First we load our data

load("~/Google Drive/dem7903_App_Hier/data/eclsk_11.Rdata")
names(eclsk11)<-tolower(names(eclsk11))
library (car)
library(lmtest)

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(arm)

## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lme4
## 
## arm (Version 1.8-6, built: 2015-7-7)
## 
## Working directory is /Users/ozd504/Google Drive/dem7903_App_Hier/code/code14
## 
## 
## Attaching package: 'arm'
## 
## The following object is masked from 'package:car':
## 
##     logit

library(MASS)
#get out only the variables I'm going to use for this example

myvars<-c("x_chsex_r", "x1locale", "x_raceth_r", "x2povty", "x12par1ed_i","p1curmar", "x1htotal", "x1mscalk1","p1hscale","x2fsstat2", "x12sesl", "p2parct1", "p2parct2", "s1_id", "p2safepl","x2krceth","p1o2near", "x_distpov" , "w1c0", "w1p0", "w2p0", "w1c0str", "w1p0str", "w1c0psu", "w1p0psu", "x1height", "x1kage_r")
#subset the data
eclsk.sub<-eclsk11[,myvars]
rm(eclsk11); gc()

##           used (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells 1547021 82.7    2637877  140.9   1940317  103.7
## Vcells 2541636 19.4  221670528 1691.3 182928630 1395.7

Next, I do some recoding of variables using a mixture of the ifelse() function and the recode () function.

#recode our outcomes, the  first is the child's math standardized test score  in Kindergarten
eclsk.sub$math<-ifelse(eclsk.sub$x1mscalk1<0, NA, eclsk.sub$x1mscalk1)

#Second outcome is child's height for age, continuous outcome
eclsk.sub$height<-ifelse(eclsk.sub$x1height<=-7, NA, eclsk.sub$x1height)
eclsk.sub$age_yrs<-ifelse(eclsk.sub$x1kage_r<0, NA, round(eclsk.sub$x1kage_r/12, 0))
eclsk.sub<- eclsk.sub[is.na(eclsk.sub$age_yrs)==F, ]
table(eclsk.sub$age_yrs)

## 
##    4    5    6    7    8 
##   13 6292 9195  272    3

eclsk.sub$height_z<-ave(eclsk.sub$height, as.factor(eclsk.sub$age_yrs), FUN=scale)

#Household food insecurity, dichotomous outcome
eclsk.sub$foodinsec<-recode(eclsk.sub$x2fsstat2, recodes="2:3=1; 1=0; else=NA")

#Child health assessment Excellent to poor , ordinal outcome
eclsk.sub$chhealth<-ifelse(eclsk.sub$p1hscale<0, NA, eclsk.sub$p1hscale)

#Number of ear infections after age 2 , count outcome
eclsk.sub$earinfect<-ifelse(eclsk.sub$p1o2near<0, NA, eclsk.sub$p1o2near)

#First we recode some Child characteristics
#Child's sex: recode as male =1
eclsk.sub$male<-recode(eclsk.sub$x_chsex_r, recodes="1=1; 2=0; -9=NA")

#Recode race with white, non Hispanic as reference using dummy vars
eclsk.sub$hisp<-recode (eclsk.sub$x_raceth_r, recodes="3:4=1;-9=NA; else=0")
eclsk.sub$black<-recode (eclsk.sub$x_raceth_r, recodes="2=1;-9=NA; else=0")
eclsk.sub$asian<-recode (eclsk.sub$x_raceth_r, recodes="5=1;-9=NA; else=0")
eclsk.sub$nahn<-recode (eclsk.sub$x_raceth_r, recodes="6:7=1;-9=NA; else=0")
eclsk.sub$other<-recode (eclsk.sub$x_raceth_r, recodes="8=1;-9=NA; else=0")


#Then we recode some parent/mother characteristics
#Mother's education, recode as 2 dummys with HS = reference
eclsk.sub$lths<-recode(eclsk.sub$x12par1ed_i, recodes = "0:2=1; 3:8=0; else = NA")
eclsk.sub$gths<-recode(eclsk.sub$x12par1ed_i, recodes = "1:3=0; 4:8=1; else =NA") 

#marital status, recode as 2 dummys, ref= married
eclsk.sub$single<-recode(eclsk.sub$p1curmar, recodes="4=1; -7:-9=NA; else=0")
eclsk.sub$notmar<-recode(eclsk.sub$p1curmar, recodes="2:3=1; -7:-9=NA; else=0")


#Then we do some household level variables
#SES
eclsk.sub$hhses<-ifelse(eclsk.sub$x12sesl==-9, NA, scale(eclsk.sub$x12sesl))

#Urban school location = 1
eclsk.sub$urban<-recode(eclsk.sub$x1locale, recodes = "1:3=1; 4=0; -1:-9=NA")

#poverty level in poverty = 1
eclsk.sub$pov<-recode(eclsk.sub$x2povty , recodes ="1:2=1; 3=0; -9=NA")

#Household size
eclsk.sub$hhsize<-eclsk.sub$x1htotal

#school % minority student body
eclsk.sub$minorsch<-ifelse(eclsk.sub$x2krceth <0, NA, eclsk.sub$x2krceth/10)

#Unsafe neighborhood
eclsk.sub$unsafe<-recode(eclsk.sub$p2safepl , recodes = "1:2='unsafe'; 3='safe'; else=NA", as.factor.result = T)

#school district poverty
eclsk.sub$dist_pov<-ifelse(eclsk.sub$x_distpov==-9, NA, scale(eclsk.sub$x_distpov))

Modeling

Model 1 - Binary response

The First model will fit a model that considers a binary response, if a child was living in a household considered to have low food security. A research question corresponding to this could be:

“Do children living in households in poverty have lower math test scores than children living in households that are not in poverty, net of other demographic and socioeconomic factors?”

So the hypothesis of interest here involves us testing whether the effec of household poverty shows a \(\beta\) coefficient different from zero.

If we were to write the symbolic form of this model it would be the logistic regression model: \[ln \frac {p_{ij}}{1-p{ij}} = \beta_{0j} + \sum {\beta_k x_{ik}} \] \[\beta_{0j} = \beta_0 + u_j\] \[u_j \sim N(0, \sigma^2_u)\]

hist(eclsk.sub$foodinsec)

#First using PQL
fit1.pql<-glmmPQL(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~1|s1_id, data=eclsk.sub, family = binomial)

## iteration 1
## iteration 2
## iteration 3
## iteration 4

summary(fit1.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | s1_id
##         (Intercept)  Residual
## StdDev:   0.4803783 0.9150815
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: foodinsec ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error    DF   t-value p-value
## (Intercept) -2.6050699 0.08122193 10486 -32.07348  0.0000
## male         0.1054222 0.05615886 10486   1.87721  0.0605
## hisp         0.2538574 0.07686523 10486   3.30263  0.0010
## black        0.0512113 0.09567687 10486   0.53525  0.5925
## asian       -0.2506222 0.15267268 10486  -1.64157  0.1007
## nahn         0.4765600 0.22686039 10486   2.10068  0.0357
## other        0.4063227 0.12709641 10486   3.19696  0.0014
## lths        -0.0425406 0.08514602 10486  -0.49962  0.6174
## gths         0.3497332 0.07784848 10486   4.49249  0.0000
## hhses       -1.3201751 0.05974028 10486 -22.09858  0.0000
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.375                                                        
## hisp  -0.357  0.011                                                 
## black -0.232 -0.011  0.374                                          
## asian -0.197  0.021  0.219  0.152                                   
## nahn  -0.114  0.008  0.141  0.108  0.064                            
## other -0.186  0.003  0.232  0.197  0.112  0.086                     
## lths  -0.231 -0.012 -0.160  0.044 -0.064  0.011  0.003              
## gths  -0.701  0.006  0.047 -0.052  0.022  0.001 -0.020  0.267       
## hhses  0.381 -0.011  0.175  0.225 -0.020  0.043  0.048  0.261 -0.485
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.9674764 -0.4397293 -0.2876038 -0.1670240 14.3771436 
## 
## Number of Observations: 11351
## Number of Groups: 856

#Using Laplace approximation
fit1.l<-glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = binomial, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit1.l, detail=T)

## glmer(formula = foodinsec ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)  -2.65     0.09  -29.75    0.00  
## male          0.10     0.06    1.71    0.09  
## hisp          0.28     0.08    3.40    0.00  
## black         0.06     0.10    0.55    0.58  
## asian        -0.26     0.16   -1.56    0.12  
## nahn          0.52     0.25    2.11    0.04  
## other         0.42     0.14    3.07    0.00  
## lths         -0.04     0.09   -0.39    0.69  
## gths          0.35     0.08    4.13    0.00  
## hhses        -1.34     0.06  -20.62    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.38    
##  Residual             1.00    
## ---
## number of obs: 11351, groups: s1_id, 856
## AIC = 7436.6, DIC = 6869.2
## deviance = 7141.9

#Using AGQ with 10  quadrature points
fit1.a<-glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = binomial, nAGQ = 10, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit1.a, detail=T)

## glmer(formula = foodinsec ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)), nAGQ = 10)
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)  -2.65     0.09  -29.68    0.00  
## male          0.10     0.06    1.70    0.09  
## hisp          0.28     0.08    3.40    0.00  
## black         0.06     0.10    0.55    0.58  
## asian        -0.26     0.16   -1.56    0.12  
## nahn          0.52     0.25    2.11    0.04  
## other         0.43     0.14    3.07    0.00  
## lths         -0.04     0.09   -0.40    0.69  
## gths          0.35     0.09    4.12    0.00  
## hhses        -1.34     0.07  -20.58    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.38    
##  Residual             1.00    
## ---
## number of obs: 11351, groups: s1_id, 856
## AIC = 7436.8, DIC = 6872.5
## deviance = 7143.7

The between school variances for the three methods are:

VarCorr(fit1.pql)[1,2]

## [1] "0.4803783"

VarCorr(fit1.l)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.38477

VarCorr(fit1.a)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.38312

In this case, the PQL model had a higher estimate of the between school variance. How do the regression effects differ by estimation method?

round(data.frame(PQL=fit1.pql$coefficients$fixed, Laplace=fixef(fit1.l), AGQ=fixef(fit1.a)), 3)

##                PQL Laplace    AGQ
## (Intercept) -2.605  -2.646 -2.645
## male         0.105   0.105  0.105
## hisp         0.254   0.279  0.279
## black        0.051   0.056  0.056
## asian       -0.251  -0.257 -0.257
## nahn         0.477   0.517  0.518
## other        0.406   0.425  0.425
## lths        -0.043  -0.036 -0.037
## gths         0.350   0.351  0.351
## hhses       -1.320  -1.339 -1.340

We see very close agreement between the three methods here.

Model 2 - Poisson outcome

If we were to write the symbolic form of this model it would be the Poisson regression model: \[ln y_{ij} = \beta_{0j} + \sum {\beta_k x_{ik}} \] \[\beta_{0j} = \beta_0 + u_j\] \[u_j \sim N(0, \sigma^2_u)\]

hist(eclsk.sub$earinfect)

#First using PQL
fit2.pql<-glmmPQL(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~1|s1_id, data=eclsk.sub, family =poisson)

## iteration 1
## iteration 2
## iteration 3

summary(fit2.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~1 | s1_id
##         (Intercept) Residual
## StdDev:   0.1004049 1.369644
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: earinfect ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error   DF    t-value p-value
## (Intercept)  0.5037529 0.03421620 7078  14.722643  0.0000
## male         0.0986613 0.02448723 7078   4.029094  0.0001
## hisp        -0.2233753 0.03565462 7078  -6.264974  0.0000
## black       -0.5814188 0.05053470 7078 -11.505339  0.0000
## asian       -0.2101508 0.06470880 7078  -3.247638  0.0012
## nahn        -0.0176119 0.10253710 7078  -0.171761  0.8636
## other       -0.1381485 0.05641827 7078  -2.448648  0.0144
## lths        -0.1441546 0.05335304 7078  -2.701900  0.0069
## gths         0.0556993 0.03677151 7078   1.514741  0.1299
## hhses       -0.0130446 0.02023730 7078  -0.644580  0.5192
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.388                                                        
## hisp  -0.249 -0.001                                                 
## black -0.153 -0.002  0.183                                          
## asian -0.128  0.035  0.080  0.051                                   
## nahn  -0.088  0.015  0.088  0.059  0.028                            
## other -0.114 -0.002  0.126  0.090  0.059  0.045                     
## lths  -0.333 -0.018 -0.157  0.009 -0.028 -0.001 -0.004              
## gths  -0.777 -0.001  0.040 -0.017  0.038  0.005 -0.013  0.296       
## hhses  0.233 -0.003  0.161  0.161 -0.113  0.059  0.038  0.154 -0.501
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.0554594 -0.8139462 -0.3071237  0.3594061  4.4825089 
## 
## Number of Observations: 7938
## Number of Groups: 851

#Using Laplace approximation
fit2.l<-glmer(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = poisson,control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit2.l, detail=T)

## glmer(formula = earinfect ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = poisson, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)   0.45     0.03   16.44    0.00  
## male          0.10     0.02    5.68    0.00  
## hisp         -0.18     0.03   -6.38    0.00  
## black        -0.56     0.04  -13.94    0.00  
## asian        -0.20     0.05   -3.94    0.00  
## nahn         -0.01     0.08   -0.08    0.94  
## other        -0.12     0.04   -2.88    0.00  
## lths         -0.15     0.04   -3.68    0.00  
## gths          0.05     0.03    1.88    0.06  
## hhses        -0.02     0.02   -1.22    0.22  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.26    
##  Residual             1.00    
## ---
## number of obs: 7938, groups: s1_id, 851
## AIC = 28728.2, DIC = -1155
## deviance = 13775.6

#Using AGQ with 10  quadrature points
fit2.a<-glmer(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses+(1|s1_id), data=eclsk.sub, family = poisson, nAGQ = 10, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit2.a, detail=T)

## glmer(formula = earinfect ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (1 | s1_id), data = eclsk.sub, 
##     family = poisson, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)), nAGQ = 10)
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)   0.45     0.03   16.38    0.00  
## male          0.10     0.02    5.66    0.00  
## hisp         -0.18     0.03   -6.35    0.00  
## black        -0.56     0.04  -13.89    0.00  
## asian        -0.20     0.05   -3.93    0.00  
## nahn         -0.01     0.08   -0.08    0.94  
## other        -0.12     0.04   -2.87    0.00  
## lths         -0.15     0.04   -3.67    0.00  
## gths          0.05     0.03    1.87    0.06  
## hhses        -0.02     0.02   -1.21    0.22  
## 
## Error terms:
##  Groups   Name        Std.Dev.
##  s1_id    (Intercept) 0.26    
##  Residual             1.00    
## ---
## number of obs: 7938, groups: s1_id, 851
## AIC = 14749.3, DIC = 12822.1
## deviance = 13774.7

The between school variances for the three methods are:

VarCorr(fit2.pql)[1,2]

## [1] "0.1004049"

VarCorr(fit2.l)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.258

VarCorr(fit2.a)

##  Groups Name        Std.Dev.
##  s1_id  (Intercept) 0.2583

Here, we see a much smaller between school variance for the PQL fit method. How do the regression effects differ by estimation method?

round(data.frame(PQL=fit2.pql$coefficients$fixed, Laplace=fixef(fit2.l), AGQ=fixef(fit2.a)), 3)

##                PQL Laplace    AGQ
## (Intercept)  0.504   0.454  0.454
## male         0.099   0.104  0.104
## hisp        -0.223  -0.182 -0.182
## black       -0.581  -0.563 -0.563
## asian       -0.210  -0.197 -0.197
## nahn        -0.018  -0.006 -0.006
## other       -0.138  -0.122 -0.122
## lths        -0.144  -0.147 -0.147
## gths         0.056   0.052  0.052
## hhses       -0.013  -0.019 -0.019

The PQL fit has slightly different estimates of the fixed effects, but the substantive interpretation of all the parameters is consistent between the three fits.

Model 3 - Random slopes for both types of model

Here we fit models that consider both random intercepts and random slopes for the discrete outcomes. I only present the estimates from the PQL estimation and the Laplace approximation.

#Binomial
#First using PQL
fit3.pql<-glmmPQL(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~hhses|s1_id, data=eclsk.sub, family = binomial)

## iteration 1
## iteration 2
## iteration 3
## iteration 4
## iteration 5
## iteration 6
## iteration 7
## iteration 8

summary(fit3.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~hhses | s1_id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.6248303 (Intr)
## hhses       0.6285082 0.454 
## Residual    0.8353228       
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: foodinsec ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error    DF   t-value p-value
## (Intercept) -2.6285048 0.07783152 10486 -33.77173  0.0000
## male         0.1063680 0.05230062 10486   2.03378  0.0420
## hisp         0.2148508 0.07356848 10486   2.92042  0.0035
## black        0.0384170 0.09150989 10486   0.41981  0.6746
## asian       -0.2628260 0.14516465 10486  -1.81054  0.0702
## nahn         0.4113985 0.21295429 10486   1.93186  0.0534
## other        0.3945548 0.11873216 10486   3.32307  0.0009
## lths        -0.0651764 0.08072047 10486  -0.80743  0.4194
## gths         0.3659084 0.07219725 10486   5.06818  0.0000
## hhses       -1.3795864 0.06154919 10486 -22.41437  0.0000
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.365                                                        
## hisp  -0.353  0.013                                                 
## black -0.230 -0.010  0.361                                          
## asian -0.193  0.019  0.219  0.152                                   
## nahn  -0.113  0.009  0.142  0.105  0.066                            
## other -0.184  0.004  0.228  0.199  0.114  0.091                     
## lths  -0.210 -0.013 -0.144  0.041 -0.063  0.007  0.003              
## gths  -0.682  0.006  0.046 -0.053  0.019  0.000 -0.019  0.263       
## hhses  0.398 -0.009  0.159  0.196 -0.017  0.038  0.044  0.241 -0.447
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.0244803 -0.4775935 -0.3069269 -0.1623862  6.5184855 
## 
## Number of Observations: 11351
## Number of Groups: 856

#Using Laplace approximation
fit3.l<-glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(hhses|s1_id), data=eclsk.sub, family = binomial, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
#fit3l<-refit(fit3.l)
display(fit3.l, detail=T)

## glmer(formula = foodinsec ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (hhses | s1_id), data = eclsk.sub, 
##     family = binomial, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)  -2.64     0.09  -29.59    0.00  
## male          0.11     0.06    1.72    0.09  
## hisp          0.28     0.08    3.39    0.00  
## black         0.06     0.10    0.62    0.54  
## asian        -0.25     0.16   -1.54    0.12  
## nahn          0.52     0.25    2.13    0.03  
## other         0.43     0.14    3.10    0.00  
## lths         -0.04     0.09   -0.42    0.67  
## gths          0.35     0.09    4.15    0.00  
## hhses        -1.33     0.07  -20.18    0.00  
## 
## Error terms:
##  Groups   Name        Std.Dev. Corr  
##  s1_id    (Intercept) 0.34           
##           hhses       0.08     -1.00 
##  Residual             1.00           
## ---
## number of obs: 11351, groups: s1_id, 856
## AIC = 7440, DIC = 6889.7
## deviance = 7151.8

#this model may be overfit because of the -1 correlation

#Poisson
#First using PQL
fit4.pql<-glmmPQL(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses, random=~hhses|s1_id, data=eclsk.sub, family =poisson)

## iteration 1
## iteration 2
## iteration 3
## iteration 4

summary(fit4.pql)

## Linear mixed-effects model fit by maximum likelihood
##  Data: eclsk.sub 
##   AIC BIC logLik
##    NA  NA     NA
## 
## Random effects:
##  Formula: ~hhses | s1_id
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev     Corr  
## (Intercept) 0.08023761 (Intr)
## hhses       0.13208910 -0.081
## Residual    1.35887170       
## 
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt 
## Fixed effects: earinfect ~ male + hisp + black + asian + nahn + other + lths +      gths + hhses 
##                  Value  Std.Error   DF    t-value p-value
## (Intercept)  0.5013753 0.03404173 7078  14.728253  0.0000
## male         0.1000164 0.02435757 7078   4.106174  0.0000
## hisp        -0.2223876 0.03543863 7078  -6.275287  0.0000
## black       -0.5803778 0.05026021 7078 -11.547460  0.0000
## asian       -0.2139209 0.06485635 7078  -3.298381  0.0010
## nahn        -0.0251135 0.10179080 7078  -0.246716  0.8051
## other       -0.1375990 0.05612115 7078  -2.451821  0.0142
## lths        -0.1477419 0.05331835 7078  -2.770939  0.0056
## gths         0.0578320 0.03658274 7078   1.580855  0.1140
## hhses       -0.0153895 0.02091323 7078  -0.735874  0.4618
##  Correlation: 
##       (Intr) male   hisp   black  asian  nahn   other  lths   gths  
## male  -0.388                                                        
## hisp  -0.246 -0.001                                                 
## black -0.150 -0.002  0.180                                          
## asian -0.125  0.034  0.079  0.050                                   
## nahn  -0.088  0.015  0.088  0.059  0.028                            
## other -0.114 -0.001  0.126  0.089  0.059  0.045                     
## lths  -0.328 -0.018 -0.152  0.009 -0.027 -0.002 -0.003              
## gths  -0.780 -0.002  0.040 -0.017  0.036  0.006 -0.014  0.292       
## hhses  0.234 -0.002  0.158  0.157 -0.109  0.057  0.038  0.156 -0.490
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.1430601 -0.8108535 -0.3105119  0.3631434  4.5563264 
## 
## Number of Observations: 7938
## Number of Groups: 851

#Using Laplace approximation
fit4.l<-glmer(earinfect~male+hisp+black+asian+nahn+other+lths+gths+hhses+(hhses|s1_id), data=eclsk.sub, family = poisson, control=glmerControl(optimizer="bobyqa",optCtrl=list(maxfun=2e5)))
display(fit4.l, detail=T )

## glmer(formula = earinfect ~ male + hisp + black + asian + nahn + 
##     other + lths + gths + hhses + (hhses | s1_id), data = eclsk.sub, 
##     family = poisson, control = glmerControl(optimizer = "bobyqa", 
##         optCtrl = list(maxfun = 2e+05)))
##             coef.est coef.se z value Pr(>|z|)
## (Intercept)   0.43     0.03   15.20    0.00  
## male          0.11     0.02    5.89    0.00  
## hisp         -0.18     0.03   -6.03    0.00  
## black        -0.56     0.04  -13.62    0.00  
## asian        -0.21     0.05   -4.12    0.00  
## nahn         -0.03     0.08   -0.41    0.68  
## other        -0.12     0.04   -2.73    0.01  
## lths         -0.19     0.04   -4.58    0.00  
## gths          0.07     0.03    2.53    0.01  
## hhses        -0.03     0.02   -1.50    0.13  
## 
## Error terms:
##  Groups   Name        Std.Dev. Corr  
##  s1_id    (Intercept) 0.24           
##           hhses       0.30     -0.22 
##  Residual             1.00           
## ---
## number of obs: 7938, groups: s1_id, 851
## AIC = 28594.3, DIC = -2407.2
## deviance = 13080.5

The between school variances for the three methods are:

#Results for the binomial
VarCorr(fit3.pql)[1:2,2] #first is Intercept Variance, second is slope variance

## (Intercept)       hhses 
## "0.6248303" "0.6285082"

attr(VarCorr(fit3.l)$s1_id, "stddev")

## (Intercept)       hhses 
##  0.33719154  0.08360688

#Results for the poisson
VarCorr(fit4.pql)[1:2,2] #first is Intercept Variance, second is slope variance

##  (Intercept)        hhses 
## "0.08023761" "0.13208910"

attr(VarCorr(fit4.l)$s1_id, "stddev")

## (Intercept)       hhses 
##   0.2356173   0.2959358

So the binomial models may be a tough comparison, becuase the fit3.l model may be overfit, because of the -1 correlation. The Poisson models’ fit4.pql, fit4.l variance components are very different from one another as well, again with the PQL estimates being much lower than those produced from the Laplace approximation.

Another Method - Bayesian Estimation of GLMM’s

Bayesian methods are probably the best in this situation, where multiple random effects are present, and you get different results between type of estimation used. Bayesian methods use prior information to allow for more robust estimation of the variance terms in these models. We will talk much more about these types of models in the upcoming weeks. Here are two illustrations, the first uses Stan to fit the model based on Hamiltonian Monte Carlo simulation, the second method uses an extension of the Laplace approximation to do full Bayesian estimation using R-INLA.

library(rstanarm)

## Loading required package: Rcpp
## rstanarm (Version 2.9.0-1, packaged: 2016-01-09 18:39:57 UTC)
## - Do not expect the default priors to remain the same in future rstanarm versions.
## Thus, R scripts should specify priors explicitly, even if they are just the defaults.
## - For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores())
## 
## Attaching package: 'rstanarm'
## 
## The following object is masked from 'package:lme4':
## 
##     sigma

fit3.bayes<-stan_glmer(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+(hhses|s1_id), data=eclsk.sub, family = "binomial", chains=2, cores=2)
print(fit3.bayes, digits=3)

## stan_glmer(formula = foodinsec ~ male + hisp + black + asian + 
##     nahn + other + lths + gths + hhses + (hhses | s1_id), data = eclsk.sub, 
##     family = "binomial", chains = 2, cores = 2)
## 
## Estimates:
##             Median MAD_SD
## (Intercept) -2.645  0.086
## male         0.105  0.062
## hisp         0.280  0.082
## black        0.061  0.100
## asian       -0.259  0.174
## nahn         0.511  0.260
## other        0.423  0.139
## lths        -0.049  0.090
## gths         0.357  0.088
## hhses       -1.346  0.068
## 
## Error terms:
##  Groups Name        Std.Dev. Corr 
##  s1_id  (Intercept) 0.3160        
##         hhses       0.1876   -0.29
## Num. levels: s1_id 856 
## 
## Sample avg. posterior predictive 
## distribution of y (X = xbar):
##          Median MAD_SD
## mean_PPD 0.123  0.004

VarCorr(fit3.bayes)

##  Groups Name        Std.Dev. Corr  
##  s1_id  (Intercept) 0.31601        
##         hhses       0.18760  -0.292

plot(fit3.bayes)

## 'pars' not specified. Showing first 10 parameters by default.
## ci_level: 0.8 (80% intervals)
## outer_level: 0.95 (95% intervals)

plot(fit3.bayes, plotfun="dens")

## 'pars' not specified. Showing first 10 parameters by default.

plot(fit3.bayes, plotfun="trace")

## 'pars' not specified. Showing first 10 parameters by default.

library(INLA)

## Loading required package: sp
## Loading required package: splines

nwnsch<-table(eclsk.sub$s1_id)
eclsk.sub$sch<-rep(1:length(unique(eclsk.sub$s1_id)), nwnsch)
eclsk.sub$sch2<- eclsk.sub$sch
eclsk.sub<-eclsk.sub[is.na(eclsk.sub$s1_id)==F,]
N<-length(eclsk.sub$foodinsec)
fit3.inla<-inla(foodinsec~male+hisp+black+asian+nahn+other+lths+gths+hhses+f(sch, model="iid2d", n=2*N) + f(sch2,hhses, copy="sch"), data=eclsk.sub, family="binomial", Ntrials = rep(1,length(eclsk.sub$foodinsec) ), num.threads=2, control.inla=list(strategy="gaussian"))
summary(fit3.inla)

## 
## Call:
## c("inla(formula = foodinsec ~ male + hisp + black + asian + nahn + ",  "    other + lths + gths + hhses + f(sch, model = \"iid2d\", n = 2 * ",  "    N) + f(sch2, hhses, copy = \"sch\"), family = \"binomial\", data = eclsk.sub, ",  "    Ntrials = rep(1, length(eclsk.sub$foodinsec)), control.inla = list(strategy = \"gaussian\"), ",  "    num.threads = 2)")
## 
## Time used:
##  Pre-processing    Running inla Post-processing           Total 
##          1.2914        114.1847          2.8312        118.3073 
## 
## Fixed effects:
##                mean     sd 0.025quant 0.5quant 0.975quant    mode kld
## (Intercept) -2.5918 0.0852    -2.7591  -2.5918    -2.4246 -2.5918   0
## male         0.0994 0.0604    -0.0191   0.0994     0.2178  0.0994   0
## hisp         0.2992 0.0776     0.1468   0.2992     0.4515  0.2992   0
## black        0.0497 0.0965    -0.1397   0.0497     0.2391  0.0497   0
## asian       -0.2631 0.1597    -0.5765  -0.2631     0.0501 -0.2631   0
## nahn         0.5576 0.2375     0.0912   0.5576     1.0235  0.5576   0
## other        0.4370 0.1362     0.1696   0.4370     0.7042  0.4370   0
## lths        -0.0306 0.0905    -0.2083  -0.0306     0.1470 -0.0306   0
## gths         0.3457 0.0837     0.1814   0.3457     0.5098  0.3457   0
## hhses       -1.3279 0.0640    -1.4536  -1.3279    -1.2024 -1.3279   0
## 
## Random effects:
## Name   Model
##  sch   IID2D model 
## sch2   Copy 
## 
## Model hyperparameters:
##                                    mean     sd 0.025quant 0.5quant
## Precision for sch (component 1) 15.1022 4.0689     8.7056  14.5686
## Precision for sch (component 2)  3.9285 2.7649     0.6726   3.2820
## Rho1:2 for sch                   0.0001 0.2383    -0.4592   0.0001
##                                 0.975quant    mode
## Precision for sch (component 1)    24.5659 13.5562
## Precision for sch (component 2)    11.0077  1.8900
## Rho1:2 for sch                      0.4588  0.0002
## 
## Expected number of effective parameters(std dev): 67.28(13.51)
## Number of equivalent replicates : 168.76 
## 
## Marginal log-Likelihood:  -3781.21

1/fit3.inla$summary.hyperpar[,1]

## [1] 6.621541e-02 2.545498e-01 7.557525e+03

m<-inla.tmarginal(function(x) 1/x, fit3.inla$marginals.hyperpar$'Precision for sch (component 1)')
m2<-inla.tmarginal(function(x) 1/x, fit3.inla$marginals.hyperpar$'Precision for sch (component 2)')
plot(m, type="l",main="Posterior distribution for between school variance")

plot(m2, type="l",main="Posterior distribution for between school slope variance in hhses effect")

DEM 7903 Week 4: Basic Hierarchical Models - GLMMs

Data and recodes

Modeling

Model 1 - Binary response

Model 2 - Poisson outcome

Model 3 - Random slopes for both types of model

Another Method - Bayesian Estimation of GLMM’s