In this tutorial, I will be introducing the topic of hierarchical linear modeling (HLM), while also giving some simple examples of how to model data using this technique.
What is hierarchical linear modeling? First, it’s important to understand that this is a technique with many names, so it’s possible you’ve heard of this technique being used in other contexts without even realizing it. Other common names include multilevel modeling, mixed modeling, mixed effect modeling, nested modeling, random effects modeling, random parameter modeling, and many more. Sometimes these terms are used in slightly different contexts, but suffice it to say that all these terms refer to essentially the same technique.
The primary goal of HLM is to account for situations in which data exist within multiple levels in a dataset. A prototypical example would be a dataset of students in various classes, where each class contains multiple students. We could say students are “nested” within class. Let’s say we wanted to compute some regression in which we were investigating learning achievement at the student level. It is likely that the class the student is in will have some effect on the outcome. If we fail to account for the class variable, we are not modeling our data as well as possible, and problematically, we are violating the assumption of independence in observations. Therefore, as responsible statisticians, we should find some way for accounting for class in our analysis.
How then, should we account for class in our analysis? We’ll explore a few possible techniques.
One possible solution to deal with nested data is to aggregate within our level-2 variable. In our students within class example, that would mean averaging the scores of all students who are within the same class.
This technique is simple and straight-forward, but it has multiple (potentially large) limitations. The first problem is that we’re throwing away a lot of meaningful information and variance.
The second prioblem is that we’re cutting down on our sample size, perhaps dramatically. Imagine you had a dataset of 15 classes, each averaging 25 students. In this example, we’d have a sample size of 375 observations to work with. If we average within classes, we limit our sample to 15 classes. This is obviously problematic.
Lastly, there’s no guarantee that our relationship at level 2 is the same as level 1. In fact, the relationships can be opposite of one another, something called Simpson’s Paradox. Let me illustrate with a super simple simulated example:
Let’s say we have a dataset with 3 groups. Each group contains 3 observations of a variable x and a variable y. It looks like this:
group1 <- data.frame(x = 1:3, y = 4:6)
group2 <- data.frame(x = 2:4, y = 3:5)
group3 <- data.frame(x = 3:5, y = 2:4)
Now let’s compute the correlations between x and y between each group:
cor(group1$x, group1$y)
## [1] 1
cor(group2$x, group2$y)
## [1] 1
cor(group3$x, group3$y)
## [1] 1
The correlation in each group is a perfect correlation of +1.
Now let’s aggregate within each group:
x <- c(mean(group1$x), mean(group2$x), mean(group3$x))
y <- c(mean(group1$y), mean(group2$y), mean(group3$y))
And now let’s see how x and y correlate.
cor(x, y)
## [1] -1
Now they’re perfectly negatively correlated! So relying on a level 2 association to infer something about a level 1 relationship is obviously problematic.
In summary, in most cases, it’s best to avoid aggregating within higher levels.
A preferable solution is to include our nesting variable as a fixed effect in our model. If you have ever modeled a factor variable in a linear regression, you’ve used this technique, likely without even realizing it. To include a grouping variable as a fixed effect in a model, we dummy code the variable, and each level of the variable gets a coefficient, representing an inflation or deflation of the intercept, in comparison to a reference level.
Let’s look at an example in r.
Let’s load in the lme4 package into r. (This package will be very
important when we get into random effects modeling, but for now we’re
just using it for a sample dataset). As always, be sure to install the
library first with install.packages("lme4") if it’s not
already installed. We’ll also bring in the tidyverse for good
measure.
library(lme4)
library(tidyverse)
We will be using an included dataset called “sleepstudy” for this simple example. This study records participants’ reaction times after various days of sleep deprivation. So, in this case, observations (level 1) are nested within participant (level 2). (Note that in our earlier example, participants were the level 1 variable and here they’re level 2. That’s ok! Nesting is a domain general concept and varies based on the structure of the data).
Let’s take a look at the structure of the data:
str(sleepstudy)
## 'data.frame': 180 obs. of 3 variables:
## $ Reaction: num 250 259 251 321 357 ...
## $ Days : num 0 1 2 3 4 5 6 7 8 9 ...
## $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
So we see that this is a very simple dataset, in which the data are already structured how we want them. We have a numeric measure of reaction time, a numeric measure of the days of sleep deprivation, and a factor variable of participants’ unique ID number.
We want to predict participants’ reaction time by the number of days of sleep deprivation. Let’s first model this without accounting for the nested structure of our data (IMPORTANT: This is not a good way of modeling the data, but we’re doing it for demonstration purposes):
model_no_nest <- lm(Reaction ~ Days, data = sleepstudy)
And let’s look at the output:
summary(model_no_nest)
##
## Call:
## lm(formula = Reaction ~ Days, data = sleepstudy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.848 -27.483 1.546 26.142 139.953
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 251.405 6.610 38.033 < 2e-16 ***
## Days 10.467 1.238 8.454 9.89e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47.71 on 178 degrees of freedom
## Multiple R-squared: 0.2865, Adjusted R-squared: 0.2825
## F-statistic: 71.46 on 1 and 178 DF, p-value: 9.894e-15
So, as we’d expect, we see that reaction time significantly slows with increased sleep deprivation. However, our R-squared isn’t that great, at about R = .29. I bet we can improve that by accounting for the nested structure of our data.
Fortunately, we can account for nesting with a fixed effect quite easily in R, by just including our factorized Subject ID term:
model_fixed <- lm(Reaction ~ Days + Subject, data = sleepstudy)
And looking at the summary:
summary(model_fixed)
##
## Call:
## lm(formula = Reaction ~ Days + Subject, data = sleepstudy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -100.540 -16.389 -0.341 15.215 131.159
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 295.0310 10.4471 28.240 < 2e-16 ***
## Days 10.4673 0.8042 13.015 < 2e-16 ***
## Subject309 -126.9008 13.8597 -9.156 2.35e-16 ***
## Subject310 -111.1326 13.8597 -8.018 2.07e-13 ***
## Subject330 -38.9124 13.8597 -2.808 0.005609 **
## Subject331 -32.6978 13.8597 -2.359 0.019514 *
## Subject332 -34.8318 13.8597 -2.513 0.012949 *
## Subject333 -25.9755 13.8597 -1.874 0.062718 .
## Subject334 -46.8318 13.8597 -3.379 0.000913 ***
## Subject335 -92.0638 13.8597 -6.643 4.51e-10 ***
## Subject337 33.5872 13.8597 2.423 0.016486 *
## Subject349 -66.2994 13.8597 -4.784 3.87e-06 ***
## Subject350 -28.5311 13.8597 -2.059 0.041147 *
## Subject351 -52.0361 13.8597 -3.754 0.000242 ***
## Subject352 -4.7123 13.8597 -0.340 0.734300
## Subject369 -36.0992 13.8597 -2.605 0.010059 *
## Subject370 -50.4321 13.8597 -3.639 0.000369 ***
## Subject371 -47.1498 13.8597 -3.402 0.000844 ***
## Subject372 -24.2477 13.8597 -1.750 0.082108 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.99 on 161 degrees of freedom
## Multiple R-squared: 0.7277, Adjusted R-squared: 0.6973
## F-statistic: 23.91 on 18 and 161 DF, p-value: < 2.2e-16
First, let’s note the R-squared. This model fits our data much, much better than did our model that did not account for nesting. When looking at the coefficients, we need to be careful about our interpretation. The intercept reflects the reaction time at 0 days of sleep deprivation for our reference subject. R automatically picked Subject 308 as our reference, but we could have manually specified a reference. The coefficient for each other subject represents an inflation or deflation of the intercept. So, for subject 350, for example, we can estimate a predicted sleep reaction time of roughly 295 - 28.5 + 10.5*(days of sleep deprivation). Note also that the slope (i.e. the effect of sleep deprivation) is specied to be constant for each participant. If we wanted to allow the slope to vary, we would need to specify interaction terms.
We can plot this model to see what this prediction actually looks like:
sleepstudy <- sleepstudy %>%
mutate(fixedprediction = predict(model_fixed)) #generate predicted outcome (y-hat)
ggplot(data = sleepstudy, aes(x = Days, y = Reaction, color = Subject)) +
geom_point() +
geom_line(aes(y = fixedprediction)) +
ylim(c(150,500)) #Plot our predictions vs actual y
Here, we can clearly see that each participant has their own, independently-estimated intercept, and each participant has the same slope. If we wanted to allow for different slopes across the sample, we would need to specify interaction terms.
Another good solution is to allow for random effects. This is the technique that is typically being referred to when one is discussing “Hierarchical Linear Modeling” or “Multi-Level Modeling.” In many ways, random effects modeling is similar to fixed effects modeling, but with some important differences. With this technique, rather than estimating a parameter separately for every level, we suppose that the parameter varies along a distribution, and we generate a parameter estimate of that distribution.
To make this difference clearer, let’s return to our earlier example in R, but this time fit a model that specifies random effects.
Before we get deep into this example, let’s load in some additional
packages we will use. lmerTest allows for some additional
statistics on top of base lme4. MuMIn provides
some other useful functions. And sjPlot builds a nice table
visualization of model summary statistics.
library(lmerTest)
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
library(MuMIn)
library(sjPlot)
## Learn more about sjPlot with 'browseVignettes("sjPlot")'.
We will specify this model using the lmer() function
from lme4. This notation is quite similar to lm() notation,
but we need to include random effects syntax. This syntax takes the form
(<some value> | <some value>). The left side specifies a
random slope, while the right specifies a random intercept. For now, we
will specify only a random intercept, like so:
model_rand_int <- lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
To go through this command in detail, we are specifying a model where reaction time is predicted by days of sleep deprivation. We are specifying that the intercept will vary on a distribution by participant. However, we specify that slope will be constant across participants (much like we did with the fixed effects example).
Now let’s look at the output:
summary(model_rand_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (1 | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1786.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2257 -0.5529 0.0109 0.5188 4.2506
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1378.2 37.12
## Residual 960.5 30.99
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.4051 9.7467 22.8102 25.79 <2e-16 ***
## Days 10.4673 0.8042 161.0000 13.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.371
There’s a lot to take in here. Let’s start by looking at the section
marked “Fixed effects”. This is akin to the “Coefficients” section of
the lm() models. The first thing we can notice is that the
intercept is very different here from our fixed effect model. Why is
this? Recall that in the previous example, the intercept value
represented the intercept for an arbitrary reference group. Here, the
intercept is the sample-wide intercept. You can think of it as the
average intercept. So our interpretation here is very different.
You should also notice that we do not have coefficients for each group here like we did in the fixed effects model. This is because we are not measuring a parameter for each group. Instead, we are measuring a variance parameter for the distribution. To see this variance parameter, we can look at the “Random effects” section. The variance of the intercept is 1378.2 and the standard deviation is 37.12 (these are mathematically equivalent). So our model estimates that the mean intercept is 251.4, with a standard deviation of 37.12.
While we don’t get fixed effect estimates for each group, we can see the random effects estimate for each group, like so:
ranef(model_rand_int)
## $Subject
## (Intercept)
## 308 40.783710
## 309 -77.849554
## 310 -63.108567
## 330 4.406442
## 331 10.216189
## 332 8.221238
## 333 16.500494
## 334 -2.996981
## 335 -45.282127
## 337 72.182686
## 349 -21.196249
## 350 14.111363
## 351 -7.862221
## 352 36.378425
## 369 7.036381
## 370 -6.362703
## 371 -3.294273
## 372 18.115747
##
## with conditional variances for "Subject"
To interpret this, we should compare each intercept coefficient to
the sample-wide intercept (251.4). So Subject 308’s intercept, for
example is 251.4 + 40.8. Note, however, that this ranef()
function does not give us any standard errors or p-values. That’s
because we’re not estimating group-level parameters in the same way we
were in fixed effects. If our primary interest is in estimating these
parameters for a particular group or groups, we’re probably best served
by a fixed effects model.
Let’s look back at our earlier summary. One thing to note is that
there is no R-squared value, which makes it difficult to assess the
model fit. I will refrain from going too far into the weeds of
discussing the math here, but computing an R-squared is not
straightforward for HLM in the way it is for standard regression.
However, we can get a metric of R-squared proposed by Nakagawa and
colleagues (2017) by calling r.squaredGLMM() from the MuMIn
package:
r.squaredGLMM(model_rand_int)
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
## R2m R2c
## [1,] 0.2798856 0.7042555
This returns two values. The first represents the variance explained only by the fixed effects (called the marginal r-squared). The second represents the variance explained by both the fixed and random effects (called the conditional r-squared).
We can also get this information and more in a pretty table by
calling tab_model() from the sjPlot plackage.
tab_model(model_rand_int)
## Argument 'df_method' is deprecated. Please use 'ci_method' instead.
| Reaction | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 251.41 | 232.17 – 270.64 | <0.001 |
| Days | 10.47 | 8.88 – 12.05 | <0.001 |
| Random Effects | |||
| σ2 | 960.46 | ||
| τ00 Subject | 1378.18 | ||
| ICC | 0.59 | ||
| N Subject | 18 | ||
| Observations | 180 | ||
| Marginal R2 / Conditional R2 | 0.280 / 0.704 | ||
An additional metric provided here, which may be of some use, is the intraclass correlation coefficient (ICC). This value is simply the ratio of the variance of the random effect to the total variance (the variance of the random effect + the residual variance). If this value is high, this suggests that much of the variance is being explained by your grouping variable. If it is low, little of the variance is being explained by the grouping variable. Some will argue that if the ICC is sufficiently low, you need not account for the grouping at all, but be aware that this is controversial.
Let’s now plot our model results.
sleepstudy <- sleepstudy %>%
mutate(randintpred = predict(model_rand_int)) #Get y-hats
ggplot(data = sleepstudy, aes(x = Days, y = Reaction, color = Subject)) +
geom_point() +
geom_line(aes(y = randintpred)) +
ylim(c(150,500))
You should note that this plot does not look terribly different from the
plot for our fixed effect model. This is typical. As long as each group
is well-represented and assumptions are not violated, fixed effects
models and random effects models tend to yield similar results. One
slight difference here, however, is that the extreme intercept values
are brought slightly toward the mean in the random effects model. This
is due to a property of random effects models called (“partial
pooling”). Information about other groups is partially used to
infer estimates about a given group. On the other hand, in a fixed
effect model, there is no pooling, so each intercept estimate is
independent. So a random effects model can suggest that these extreme
values may be over or under estimates by bringing in information from
other groups.
At this stage, one may be wondering in what contexts one should favor random effects models versus fixed effects models. This is a pretty big debate among statisticians, and I do not expect to resolve it here (nor do I aim to), but I can provide some basic guidance.
You should consider a fixed effects model if:
You should consider a random effects model if:
Now that you understand random effects models and when to use them, let’s see what else we can do with them.
In our previous random effects model we only specified a random intercept. However, we can easily specify a random slope as well:
model_rand_slope <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
This syntax is very similar to our earlier syntax. We just add the Days variable to the left of Subject to specify that we want to allow the slope of Days to vary across subjects.
Let’s take a look at the output.
summary(model_rand_slope)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4634 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.10 24.741
## Days 35.07 5.922 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 251.405 6.825 17.000 36.838 < 2e-16 ***
## Days 10.467 1.546 17.000 6.771 3.26e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
Here, just like before, the “Intercept” should be interpreted as the average intercept across the sample. Similarly “Days” should be interpreted as the average slope of Days across the sample.
We can also see that we have a new random effect. The variance of the slope of Days is 35.07 (SD = 5.92).
Just like before, we can also see our random effects estimates, but now with an additional column.
ranef(model_rand_slope)
## $Subject
## (Intercept) Days
## 308 2.2585509 9.1989758
## 309 -40.3987381 -8.6196806
## 310 -38.9604090 -5.4488565
## 330 23.6906196 -4.8143503
## 331 22.2603126 -3.0699116
## 332 9.0395679 -0.2721770
## 333 16.8405086 -0.2236361
## 334 -7.2326151 1.0745816
## 335 -0.3336684 -10.7521652
## 337 34.8904868 8.6282652
## 349 -25.2102286 1.1734322
## 350 -13.0700342 6.6142178
## 351 4.5778642 -3.0152621
## 352 20.8636782 3.5360011
## 369 3.2754656 0.8722149
## 370 -25.6129993 4.8224850
## 371 0.8070461 -0.9881562
## 372 12.3145921 1.2840221
##
## with conditional variances for "Subject"
And we can compute the R-squared:
r.squaredGLMM(model_rand_slope)
## R2m R2c
## [1,] 0.2786511 0.7992199
And finally plot the data:
sleepstudy <- sleepstudy %>%
mutate(randintslopepred = predict(model_rand_slope)) #New y-hats
ggplot(data = sleepstudy, aes(x = Days, y = Reaction, color = Subject)) +
geom_point() +
geom_line(aes(y = randintslopepred)) +
ylim(c(150,500))
This model looks quite different from our previous ones because we allow the effect of sleep deprivation to vary across participants. By plotting the data, we can see that, for some participants, sleep deprivation had very little effect for some participants (and strangely, perhaps a positive effect for one), while it had very large effects for others.
In all our models so far, we’ve been looking at only two levels of the data (observations nested within school). Let’s say, hypothetically, each participant was also from a different school. We could then model our data across three levels (observations nested within participant nested within school). Let’s simulate some data to look at this.
school <- factor(c(rep(1,60), rep(2,60), rep(3,60)))
sleepstudy <- sleepstudy %>%
mutate(School = school) %>%
select(School, Subject, Reaction, Days)
glimpse(sleepstudy)
## Rows: 180
## Columns: 4
## $ School <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1~
## $ Subject <fct> 308, 308, 308, 308, 308, 308, 308, 308, 308, 308, 309, 309, 3~
## $ Reaction <dbl> 249.5600, 258.7047, 250.8006, 321.4398, 356.8519, 414.6901, 3~
## $ Days <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0~
So now each participant is in one of three schools (IMPORTANT NOTE: in a real world example, 3 schools would not be enough to estimate variance at this level, but this is only for a demonstration.)
There are a few ways of modeling a nested relationship, but I prefer the explicit syntax, like so.
model_three_level <- lmer(Reaction ~ Days + (1 | School) + (1 | Subject:School), data = sleepstudy)
## boundary (singular) fit: see ?isSingular
So what does this notation mean? It means, first that we are predicting reaction time from days of sleep deprivation. Second, we are allowing the intercept to vary across Schools (but not allowing slope to vary). And third, we are allowing the intercept to vary across subjects (which we are specifying to be nested within school).
We get a singularity warning here, suggesting this model is overfit (unsurprising, given these data are merely simulated and we lack sufficient variance), but we can still see the syntax we would use.
We can also specify a random slope here. Let’s say we wanted to specify a random slope of Days that varies by School, but not by Subject within School, we could specify this as follows:
model_three_level <- lmer(Reaction ~ Days + (Days | School) + (1 | Subject:School), data = sleepstudy)
## boundary (singular) fit: see ?isSingular
This tutorial covered a lot of ground. By this point, you should understand the basic problem of hierarchical data and why ignoring this hierarchy is problematic. You should further understand the fixed effects and random effects methods of dealing with this hierarchy, as well as when you might prefer one method over the other. Finally, you should practically understand how to model various random effects structures.
Good luck with your own modeling!