Introduction

In this example, I will fit a hierarchical linear model to the ECLS-K 2011 data. The outcome of interest here is the child’s kindergarten math score. I will illustrate how to fit the basic multilevel model with random intercepts, a model with random intercepts and random slopes, and compare models with a likelihood ratio test. I also show how to extract the variance components of the models and form the intra class correlation coefficient. Lastly, I will visualize the implied regression lines for the random intercepts and slopes model, highlighting the variation in the effect of household poverty status on children’s math scores between schools in the data.

Rationale for random slopes

The rationale for including random intercepts in the models from last week was to account for unobserved heterogeneity at the group level in the overall mean of our outcome. This allowed there to be baseline differences in an outcome variable across the groups considered in our analysis. An example of this, if our outcome was obesity status, and our grouping variable were state of residence, we could effectively model the differences between the 50 states in our random intercept model.

The logic for why you would incorporate random slopes is less straightforward to most people, and it involves questioning a very strong, and commonly relied upon assumption in regression models: Is the relationship between my predictors and my outcome always the same? This is called the assumption of stationarity in the regression function. Last week, we saw this introduced by considering the *ANCOVA model, where the homogeneity of regression slopes was tested directly as a test of the parallel slopes assumption. The random slopes model allows us the freedom of examining the heterogeneity in the regression function across our groups, just as the ANCOVA model did. Thus our rationale for using it would be when we want to question the homogeneity of the regression assumption. This can be very productive for examining how our regression model works in the various group-level settings in our data. This approach is used to great effect in the literature, with these papers being examples(O׳Campo et al. 2015, Dahlin and Härkönen (2013), Bernardi and Radl (2014), Sparks (2015))

Random Slopes

The random intercepts model allowed the mean for each group to be different and to allow for inclusion of group level covariates
The random slopes model allows the relationship between the individual level covariates to vary between groups
In a way, this lets us see if the relationship between any of our x’s and our outcome is consistent across our higher level units

Random slope model equations

We expand the random effect model to include group-varying slopes by: $\text {E( Y)} = \beta_{0j}+ \beta_{1j} x$

$\beta_{0j} = \beta_0 + u_{1j}$

$\beta_{1j} = \beta + u_{2j}$

Where $\beta_0$ is the average intercept and $u_{1j}$ is the group-specific deviation in the intercept And where $\beta$ is the average slope, and $u_{2j}$ is the group-specific deviation in the average slope

*Now we have two random effects in our model, $u_1$ and $u_2$ + instead of having each of them come from independent Normal distributions We instead let $u_1$ and $u_2$ be Multivariate Normal random variables:

Multivariate Normal Distribution - Random Effects

Let $\mathbf{u}$ be the vector of $u_1$ and $u_2$

\[\mathbf{u} \text{~} \mathbf{MVN (0, \Sigma )}\] + Where $\Sigma$ is the variance-covariance matrix of the u’s \[\mathbf{\Sigma = \begin{bmatrix} \sigma_1&\sigma_{12} \\ \sigma_{21}&\sigma_{2} \end{bmatrix}}\]

And $\sigma_1$ and $\sigma_2$ being the variances of each of our random effects
Which is just like we saw in the random intercept model, only now we have 2 random effects

Graphical Representation of Random Slopes

Graphically the random intercepts and slopes can be seen as:

test

Variance components (again!)

Now we have a more complex model, and we have more variance components we can write the total variance as:

$\sigma^2$ = $\sigma^2 _{e}+\sigma^2 _{u1}+\sigma^2 _{u2}$

These are called the variance components of the random slope and intercept model

Data and recodes

First we load our data

library(haven)
library (car)
library(lme4)
library(sjstats)
library(lmerTest)
library(MuMIn)

eclsk11<-read_sas("~/Google Drive/classes/dem7473/data/childk4p.sas7bdat")
names(eclsk11)<-tolower(names(eclsk11))

#get out only the variables I'm going to use for this example

myvars<-c("x_chsex_r", "x1locale", "x_raceth_r", "x2povty", "x12par1ed_i","p1curmar", "x1htotal", "x1mscalk4",  "s1_id", "p2safepl","x2krceth" )
#subset the data
eclsk.sub<-eclsk11[,myvars]
rm(eclsk11); gc()

##           used  (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells 2127903 113.7    4376217  233.8   4376217  233.8
## Vcells 3817529  29.2  816540714 6229.8 783956429 5981.2

Next, I do some recoding of variables using a mixture of the ifelse() function and the recode () function.

#recode our outcomes, the  first is the child's math standardized test score  in Kindergarten
eclsk.sub$math<-ifelse(eclsk.sub$x1mscalk4<0, NA, eclsk.sub$x1mscalk4)

#First we recode some Child characteristics
#Child's sex: recode as male =1
eclsk.sub$male<-recode(eclsk.sub$x_chsex_r, recodes="1=1; 2=0; -9=NA")

#Recode race with white, non Hispanic as reference using dummy vars
eclsk.sub$hisp<-recode (eclsk.sub$x_raceth_r, recodes="3:4=1;-9=NA; else=0")
eclsk.sub$black<-recode (eclsk.sub$x_raceth_r, recodes="2=1;-9=NA; else=0")
eclsk.sub$asian<-recode (eclsk.sub$x_raceth_r, recodes="5=1;-9=NA; else=0")
eclsk.sub$nahn<-recode (eclsk.sub$x_raceth_r, recodes="6:7=1;-9=NA; else=0")
eclsk.sub$other<-recode (eclsk.sub$x_raceth_r, recodes="8=1;-9=NA; else=0")


#Then we recode some parent/mother characteristics
#Mother's education, recode as 2 dummys with HS = reference
eclsk.sub$lths<-recode(eclsk.sub$x12par1ed_i, recodes = "0:2=1; 3:8=0; else = NA")
eclsk.sub$gths<-recode(eclsk.sub$x12par1ed_i, recodes = "1:3=0; 4:8=1; else =NA") 

#marital status, recode as 2 dummys, ref= married
eclsk.sub$single<-recode(eclsk.sub$p1curmar, recodes="4=1; -7:-9=NA; else=0")
eclsk.sub$notmar<-recode(eclsk.sub$p1curmar, recodes="2:3=1; -7:-9=NA; else=0")


#Then we do some household level variables
#Urban school location = 1
eclsk.sub$urban<-recode(eclsk.sub$x1locale, recodes = "1:3=1; 4=0; -1:-9=NA")

#poverty level in poverty = 1
eclsk.sub$pov<-recode(eclsk.sub$x2povty , recodes ="1:2=1; 3=0; -9=NA")

#Household size
eclsk.sub$hhsize<-eclsk.sub$x1htotal

#school % minority student body
eclsk.sub$minorsch<-ifelse(eclsk.sub$x2krceth <0, NA, eclsk.sub$x2krceth/10)

#Unsafe neighborhood
eclsk.sub$unsafe<-Recode(eclsk.sub$p2safepl , recodes = "1:2='unsafe'; 3='safe'; else=NA", as.factor = T)

#Show the first few lines of the data
head(eclsk.sub)

Modeling

Model 1

The First model will fit a model that considers the individual and family level variables and a random intercept only. One may state a research question such as:

“Do children living in households in poverty have lower math test scores than children living in households that are not in poverty, net of other demographic and socioeconomic factors?”

So the hypothesis of interest here involves us testing whether the effect of household poverty shows a $\beta$ coefficient different from zero.

If we were to write the symbolic form of this model it would be: \[y_{ij} = \beta_{0j} + \sum {\beta_k x_{ik}} + e_{ij}\] \[\beta_{0j} = \beta_0 + u_j\] \[u_j \sim N(0, \sigma^2_u)\]

More on modeling

In general, if your outcome is continuous, it is advised to always scale it, and all continuous predictors to a common scale. In practice, this makes it easier for the computer to arrive at a numerical solution, it also can alleviate multicollinearity between predictors.

eclsk.sub$mathz<-scale(eclsk.sub$math, center = T, scale=T)
fit1<-lmer(mathz~male+hisp+black+asian+nahn+other+lths+gths+single+notmar+urban+pov+hhsize+(1|s1_id), data=eclsk.sub)

summary(fit1)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## mathz ~ male + hisp + black + asian + nahn + other + lths + gths +  
##     single + notmar + urban + pov + hhsize + (1 | s1_id)
##    Data: eclsk.sub
## 
## REML criterion at convergence: 27955
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9453 -0.6842 -0.0821  0.5997  9.3797 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  s1_id    (Intercept) 0.08912  0.2985  
##  Residual             0.75180  0.8671  
## Number of obs: 10633, groups:  s1_id, 852
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  2.850e-01  4.257e-02  9.328e+03   6.693 2.31e-11 ***
## male         5.529e-02  1.726e-02  1.036e+04   3.203  0.00136 ** 
## hisp        -2.898e-01  2.717e-02  8.719e+03 -10.664  < 2e-16 ***
## black       -2.798e-01  3.398e-02  7.924e+03  -8.233  < 2e-16 ***
## asian        2.673e-01  3.889e-02  9.904e+03   6.874 6.62e-12 ***
## nahn        -1.477e-01  8.584e-02  1.062e+04  -1.721  0.08524 .  
## other        5.168e-02  4.038e-02  1.052e+04   1.280  0.20069    
## lths        -1.891e-01  3.414e-02  1.057e+04  -5.540 3.10e-08 ***
## gths         2.784e-01  2.377e-02  1.052e+04  11.711  < 2e-16 ***
## single      -1.990e-01  2.877e-02  1.049e+04  -6.918 4.85e-12 ***
## notmar      -2.032e-01  2.936e-02  1.040e+04  -6.920 4.80e-12 ***
## urban       -8.107e-03  1.030e-02  8.638e+02  -0.787  0.43148    
## pov         -2.760e-01  2.182e-02  1.061e+04 -12.649  < 2e-16 ***
## hhsize      -2.983e-02  6.943e-03  1.054e+04  -4.297 1.75e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##   vcov(x)     if you need it

The coefficient for pov in the model is -0.28, versus the standard error which is 0.021823, which gives a test of 1.061227410^{4}, which, if we use a z-statistic, yields a p-value of : < 1e-05. This tells us that, yes, children in households below the poverty line have significantly lower math scores than children not living in households in poverty.

Model 2

The second model considers both random intercepts and random slopes, in this case, I’m only considering a random slope for poverty status. This would imply the proposition:

“Do children living in poverty face a disadvantage in terms of their math test score equally in every school?”

If we were to write the symbolic form of this model it would be: \[y_{ij} = \beta_{0j} + \sum {\beta_k x_{ik}} + \beta_{j \ pov} \ pov_i + e_{ij}\] \[\beta_{0j} = \beta_0 + u_{1j}\] \[\beta_{j \ pov} = \beta_{pov} + u_{2j}\] \[\mathbf{u} \text{~} \mathbf{MVN (0, \Sigma )}\] \[\mathbf{\Sigma = \begin{bmatrix} \sigma_1&\sigma_{12} \\ \sigma_{21}&\sigma_{2} \end{bmatrix}}\]

the term $\beta_{j \ pov} \ pov_i$ is the effect of poverty for the $j^{th}$ school. Since we have two random effects in the model at the school level, it’s easier to model them as a multivariate normal distribution, which is how the lmer() function does it.

fit2<-lmer(mathz~male+hisp+black+asian+nahn+other+lths+gths+single+notmar+urban+pov+hhsize+(1+pov|s1_id), data = eclsk.sub)

summary(fit2)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## mathz ~ male + hisp + black + asian + nahn + other + lths + gths +  
##     single + notmar + urban + pov + hhsize + (1 + pov | s1_id)
##    Data: eclsk.sub
## 
## REML criterion at convergence: 27930.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1954 -0.6814 -0.0818  0.5974  9.3914 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  s1_id    (Intercept) 0.1199   0.3463        
##           pov         0.0494   0.2223   -0.65
##  Residual             0.7436   0.8623        
## Number of obs: 10633, groups:  s1_id, 852
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)  2.787e-01  4.296e-02  8.144e+03   6.488 9.22e-11 ***
## male         5.452e-02  1.723e-02  1.035e+04   3.164  0.00156 ** 
## hisp        -2.894e-01  2.690e-02  7.413e+03 -10.757  < 2e-16 ***
## black       -2.759e-01  3.362e-02  6.466e+03  -8.206 2.73e-16 ***
## asian        2.662e-01  3.908e-02  9.795e+03   6.812 1.02e-11 ***
## nahn        -1.486e-01  8.564e-02  1.047e+04  -1.735  0.08280 .  
## other        5.089e-02  4.033e-02  1.049e+04   1.262  0.20711    
## lths        -1.919e-01  3.393e-02  1.009e+04  -5.654 1.61e-08 ***
## gths         2.768e-01  2.373e-02  1.048e+04  11.666  < 2e-16 ***
## single      -1.975e-01  2.864e-02  1.023e+04  -6.898 5.59e-12 ***
## notmar      -2.018e-01  2.927e-02  1.032e+04  -6.895 5.69e-12 ***
## urban       -8.397e-03  1.033e-02  8.522e+02  -0.813  0.41639    
## pov         -2.798e-01  2.318e-02  1.078e+03 -12.070  < 2e-16 ***
## hhsize      -2.944e-02  6.915e-03  1.034e+04  -4.258 2.08e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Correlation matrix not shown by default, as p = 14 > 12.
## Use print(x, correlation=TRUE)  or
##   vcov(x)     if you need it

likelihood ratio between models

anova (fit1, fit2)

## refitting model(s) with ML (instead of REML)

In this case, fitting the random slope for poverty adds some information to the model, because when the two models are compared, the second model with the random effect for poverty fits the data better than the first model without this term in the model.

I interpret this as kids living in households below the poverty line face a systematic disadvantage, because of the fixed effect coefficient of -0.28, but the level of disadvantage varies by school. In other words, some schools may be better at offsetting the disadvantages faced by poor students.

Intra-class correlations

Here is the ICC from fits 1 & 2

icc(fit1)

## 
## Linear mixed model
##  Family: gaussian (identity)
## Formula: mathz ~ male + hisp + black + asian + nahn + other + lths + gths + single + notmar + urban + pov + hhsize + (1 | s1_id)
## 
##   ICC (s1_id): 0.105979

icc(fit2)

## Caution! ICC for random-slope-intercept models usually not meaningful. See 'Note' in `?icc`.

## 
## Linear mixed model
##  Family: gaussian (identity)
## Formula: mathz ~ male + hisp + black + asian + nahn + other + lths + gths + single + notmar + urban + pov + hhsize + (1 + pov | s1_id)
## 
##   ICC (s1_id): 0.138867

These results suggest that by including the random slope in fit2, we increase the correlation within schools. Again, I think that this has to do with selection of students into schools more than anything.

Test for random effect

We can also test for the “significance” of a particular random effect. This is the likelihood ratio test for removing a particular random effect from a model. This is a good guide for general whether to include particular random effects in a model.

ranova(fit1)

ranova(fit2)

more model selection

We can also calculate other metrics of model selection using the model.sel() function in the MuMIn library. This by default will calculate the AIC and $\delta$AIC for models

model.sel(fit1, fit2)

Again, using the AIC we see evidence for the random slope model over the random intercept alone model.

We can also calculate a fake $R^2$ for these models. Marginal $R_{GLMM}^2$ represents the variance explained by the fixed effects, and is defined as:

\[R_{GLMM}(m)^2 = (\sigma_{f}^2) / (\sigma_{f}^2 + \sigma_{\alpha}^2 + \sigma_{\epsilon}^2)\]

Conditional $R_{GLMM}(m)^2$ is interpreted as a variance explained by the entire model, including both fixed and random effects, and is calculated according to the equation:

\[R_{GLMM}(c)^2 = (\sigma_{f}^2 + \sigma_{\alpha}^2) / (\sigma_{f}^2 + \sigma_{\alpha}^2 + \sigma_{\epsilon}^2)\]

where $\sigma_{f}^2$ is the variance of the fixed effect components, $\sigma_{\alpha}^2$ is the variance of the random effects, and $ _{}^2$ is the “observation-level” variance.

r.squaredGLMM(fit1)

## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help
## page.

##            R2m      R2c
## [1,] 0.1608057 0.249743

r.squaredGLMM(fit2)

##            R2m       R2c
## [1,] 0.1610777 0.2580624

So in terms of fake $R^2$, the models are effectively the same, with the random slope model showing a very slight increase in prediction.

References for these functions:

Nakagawa, S., Schielzeth, H. (2013) A general and simple method for obtaining R² from Generalized Linear Mixed-effects Models. Methods in Ecology and Evolution 4: 133-142

Johnson, P.C.D. (2014) Extension of Nakagawa & Schielzeth’s R_GLMM² to random slopes models. Methods in Ecology and Evolution 5: 44-946

Nakagawa, S., Johnson, P.C.D., Schielzeth, H. (2017) The coefficient of determination R² and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J. R. Soc. Interface 14: 20170213.

Visualizations of effects

Sometimes it’s fun to plot things.

Now, let’s plot the regression lines for each school! This way we can visualize how the mean of each school varies, as well as the effect of poverty varying between schools.

#get the random effects
rancoefs2<-ranef(fit2)

#here are the first 10 schools random effects
head(rancoefs2$s1_id, n=10)

#Histograms of the random effects
par(mfrow=c(1,2))
hist(rancoefs2$s1_id[,"(Intercept)"],xlab= "Intercept Random Effect",  main = "Distribution of Random Intercepts")
hist(rancoefs2$s1_id[,"pov"], xlab= "Poverty Slope Random Effect", main = "Distribution of Random Slopes")

Here is the bi-variate plot of each school’s random effect for the intercept and the slope for poverty. This should show a negative trend, since in the output for the fit2 model, the correlation in the random effects was -0.6511989.

par(mfrow=c(1,1))
plot(rancoefs2$s1_id[,"(Intercept)"],rancoefs2$s1_id[,"pov"], main="Random effects for each school", xlab="Intercept", ylab="Poverty slope" )

Here, I plot the random slopes, to do this, I iterate over all schools, drawing lines for each one. The intercept of each line is the global intercept from fit2 + the random intercept component for each school. The slope for each line is the average slope, the fixed effect for poverty from fit2, plus the random slope component for each school.

par(mfrow=c(1,1))
fixef(fit2)

##  (Intercept)         male         hisp        black        asian 
##  0.278712381  0.054519278 -0.289380649 -0.275885361  0.266200396 
##         nahn        other         lths         gths       single 
## -0.148580454  0.050885839 -0.191865180  0.276782647 -0.197547328 
##       notmar        urban          pov       hhsize 
## -0.201845484 -0.008396928 -0.279812417 -0.029442369

summary(fixef(fit2)[1]+rancoefs2$s1_id) # I do this to get values for the range for the y axis.

##   (Intercept)           pov         
##  Min.   :-0.3982   Min.   :-0.2162  
##  1st Qu.: 0.1026   1st Qu.: 0.2216  
##  Median : 0.2618   Median : 0.2809  
##  Mean   : 0.2787   Mean   : 0.2787  
##  3rd Qu.: 0.4338   3rd Qu.: 0.3413  
##  Max.   : 1.2119   Max.   : 0.5907

plot(NULL, ylim=c(-1, 2), xlim=c(0,1),ylab="Math Score", xlab="HH Poverty Status") #You may have to change the ylim() for your outcome.
title (main="Regression Lines for each school from Random Slope and Intercept Model")
cols=sample(rainbow(n=25), size = dim(rancoefs2$s1_id)[1], replace = T)
for (i in 1: dim(rancoefs2$s1_id)[1]){
  
  abline(a=fixef(fit2)[1]+rancoefs2$s1_id[[1]][i], b=fixef(fit2)[13]+rancoefs2$s1_id[[2]][i], col=cols[i],lwd=.5 )
}

#this line shows the "average" effect of poverty.
abline(a=fixef(fit2)[1], b=fixef(fit2)[13], col=1, lwd=3)
legend("topright", col=1, lwd=5, legend="Average Effect of Poverty")

References

Bernardi, Fabrizio, and Jonas Radl. 2014. “The long-term consequences of parental divorce for children’s educational attainment.” DEMOGRAPHIC RESEARCH 30. doi:10.4054/DemRes.2014.30.61.

Dahlin, Johanna, and Juho Härkönen. 2013. “Cross-national differences in the gender gap in subjective health in Europe: Does country-level gender equality matter?” Social Science & Medicine 98 (December). Pergamon: 24–28. doi:10.1016/J.SOCSCIMED.2013.08.028.

O׳Campo, Patricia, Blair Wheaton, Rosane Nisenbaum, Richard H. Glazier, James R. Dunn, and Catharine Chambers. 2015. “The Neighbourhood Effects on Health and Well-being (NEHW) study.” Health & Place 31 (January). Pergamon: 65–74. doi:10.1016/J.HEALTHPLACE.2014.11.001.

Sparks, Corey. 2015. “An examination of disparities in cancer incidence in Texas using Bayesian random coefficient models.” PeerJ 3 (September). PeerJ Inc.: e1283. doi:10.7717/peerj.1283.

DEM 7473 - Week 3: Basic Hierarchical Models - Random Intercepts and Slopes

Corey S. Sparks, PhD

September 17, 2018