library(tidyverse) #for data manipulation
library(lme4) #for constructing generalised linear model
library(knitr) #for pretty tables
library(prLogistic) #to get the Thailand dataset
library(sjstats) #to get ICC
library(broom) #for tidying model output
library(ggeffects) #for plotting predicted probabilities

Introduction

An example of estimating an intracluster correlation coefficient from a random effects model.

Data are from a national survey of primary education in Thailand, including information for 8,582 sixth graders nested within 411 schools (Raudenbush & Bhumirat, 1992). The outcome binary variable “rgi” indicates whether a pupil has repeated a grade during primary education (0=no, 1=yes). The predictor variables are: child’s sex (0=male, 1=female) and child’s pre-primary education (0=no, 1=yes).

First, import the data.

df <- Thailand

Model

Now we construct a generalised linear mixed model with a logit link function for the log-odds of repeating repeating a year of school, and incorporating random effects for cluster of school, and the fixed effects of sex and whether the child went to preschool.

(Note, we use an adaptive Gaussian Hermite approximation of the likelihood with 10 integration points.)

g1 <- glmer(rgi ~ sex + pped + (1 | schoolid), data = df, family = binomial(link = "logit"),
          control = glmerControl(optimizer = "bobyqa"),
          nAGQ = 10)


Here is a summary of the model.

summary(g1)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 10) [glmerMod]
##  Family: binomial  ( logit )
## Formula: rgi ~ sex + pped + (1 | schoolid)
##    Data: df
## Control: glmerControl(optimizer = "bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##   6328.1   6356.4  -3160.1   6320.1     8578 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1692 -0.4121 -0.2507 -0.1695  6.0792 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  schoolid (Intercept) 1.632    1.278   
## Number of obs: 8582, groups:  schoolid, 411
## 
## Fixed effects:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.19732    0.09767 -22.496  < 2e-16 ***
## sex          0.54994    0.07037   7.815 5.52e-15 ***
## pped        -0.62407    0.09034  -6.908 4.91e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr) sex   
## sex  -0.432       
## pped -0.385 -0.004


Now estimate odds ratio and 95% confidence intervals.

cc <- confint(g1,parm="beta_")
## Computing profile confidence intervals ...
## Warning in if (parm == "theta_") {: the condition has length > 1 and only
## the first element will be used
## Warning in if (parm == "beta_") {: the condition has length > 1 and only
## the first element will be used
ctab <- cbind(oddsratio=fixef(g1),cc)
rtab <- exp(ctab)

kable(rtab,digits=3)
oddsratio 2.5 % 97.5 %
(Intercept) 0.111 0.091 0.134
sex 1.733 1.511 1.990
pped 0.536 0.449 0.639


And plot the marginal effects (note, doesn’t really make sense with binary variable, but would be useful with continous variables)

df$sex <- as.factor(df$sex)
df$pped <- as.factor(df$pped)

marginals <- tibble(predictors = c("sex", "pped"), fit = list(g1)) %>%
  mutate(marginal = purrr::map2(fit, predictors, ggpredict)) %>%
  select(-fit) %>%
  unnest()
## Note: uncertainty of the random effects parameters are not taken into account for confidence intervals.
## Note: uncertainty of the random effects parameters are not taken into account for confidence intervals.
ggplot(marginals, aes(x=x, y=predicted, fill=predictors)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha=0.25) +
  theme_bw() +
  facet_wrap(~predictors, scales="free") +
  ylab("Predicted probability")

Finally, estimate the intracluster correlation coefficient

tidy(icc(g1)) %>%
  kable()
names x
schoolid 0.3315984