An Introduction to Linear Mixed Effects Models for the Social Sciences

“Of course, the most rewarding part is the ‘Aha’ moment, the excitement of discovery and enjoyment of understanding something new – the feeling of being on top of a hill and having a clear view. But most of the time, doing mathematics for me is like being on a long hike with no trail and no end in sight.” -Maryam Mirzakhani

“I am in a charming state of confusion.” -Ada Lovelace

My goal in this workshop is to introduce linear mixed-effects modeling in a way that is both accessible and gives attendees an exposure to the mathematical equations and formulas that make-up the method. This is to familiarize learners with the appropriate terminology in hopes of de-mystifying it for future exposures.

Material Covered in this Workshop

Section 1: Introduction to Linear Mixed Effects Models
Section 2: The Mathematical Model Explained
Section 3: Practical Examples
- 3:1: Hierarchical Example
- 3.2: Repeated Measures Example
Section 4: Closing Remarks and Resources for Further Education

1.0. Introduction to Linear Mixed Effects Models

Linear Mixed Effects (LME) Models at the most basic level can be thought of as regression models with added components that account for variation in the intercept and/or slope parameters across individuals (repeated measures context) or across groups (multilevel/hierarchical context). In fact, the LME approach is just an extension of the general linear model (GLM), which is the model for regression. Whereas a basic linear model only contains fixed effects that reflect the mean intercept and mean slope(s), the LME approach includes both fixed effects and random effects, hence the name “mixed effects modeling.”

What is a Random Effect?

Good question! To answer it, I will first explain what a fixed effect is in the context of a simple regression model. The simplest linear regression model includes an intercept parameter and a slope parameter. The estimated intercept is the expected value of the outcome (y) variable when the predictor (x) variable has a value of 0. The estimated slope is the expected change in y for a single unit change in x. Both of these parameters are what we call “fixed,” meaning that they do not vary across individuals; every individual in the population has the same expected value for the intercept and slope. In other words, an estimated fixed effect parameter is interpreted as the expected value (mean) for the population. When reporting these findings, we use terms like, “The average individual …” or “On average …”. Of course, the typical value won’t perfectly predict true value for each individual or unit. The difference between the expected value and the true value for each unit/person is called “residual error”.

A random effect is a parameter that is allowed to vary across units or individuals. In other words, rather than taking on a single fixed value, random effects are assumed to follow a distribution. Typically, we assume that the random effects follow a Normal distribution. I will discuss this in more detail later. Random effects can be added to a model to account for variation about an intercept or slope parameter. For example, a random intercept may be used in a repeated measures context to account for variation across individuals in their baseline levels of y. The model would still contain a fixed intercept, representing the mean intercept for all individuals. Each individual then gets their own estimated random effect (from the distribution of random effects), representing an adjustment from the mean that is unique to the individual.

How do I know whether I should use a mixed-effects modeling approach?

Another great question!

You should consider this approach if your data violate the GLM assumption of independence. Let’s quickly review this assumption. Specifically, it refers to independence in the residual errors, essentially meaning that all observations are unrelated after you account for the predictors in the model. Your study may include data that violate this assumption. For example, if repeated measures are nested within individuals (or if individuals are nested within groups), we actually expect that observations belonging to the same individual and/or the same group would be correlated. Ignoring the correlated error structure or the “clustering” in the data may lead to biased standard errors. Recall that test-statistics for hypothesis testing of parameter estimates include the standard error in the denominator, so biased standard errors may lead to incorrect inferences. Adding random effects that are assigned to individuals or to groups is one way to address correlated error structures in your model.

You may also approach this decision from a more theoretical perspective. In a repeated measures context, you may be interested in individual differences when it comes to change across time, or individual differences in intraindividual change. This may be theoretical motivation for a random slope model. Or, perhaps you are a researcher doing group-level research and you have reason to believe that the effect of \(x\) on \(y\) varies across groups. This would also lead you to a random slope model. I will give more specific examples in Section 3.

2.0. The Mathematical Model Defined

As mentioned in Section 1, the LME model is an extention of a basic linear model. Recall the model for a simple linear regression:

\[ Y_{i} = \beta_{0} + \beta_{1}x_{i} + \epsilon_{i} \] The outcome \(Y_{i}\) has a subscript “\(i\)”, indicating that it is predicted for each individual. On the right side of the equation, we see that \(Y_{i}\) is predicted by the individual’s value on predictor variable \(x\). The variable \(x\) is a linear predictor of \(Y\). You might be able to recognize that this equation resembles the equation of a line (\(Y = mx+ b\)). In regression, the linear equation includes an intercept (\(\beta_{0}\)) and a slope (\(\beta_{1}\)) paramater, as well as a residual error term (\(\epsilon_{i}\)) which represents individual uniqueness that is not explained by the model. The uniquenesses, (\(\epsilon_{i}\)), are typically assumed to follow a Normal distirbution with mean \(0\) and variance \(\sigma^2\), or in statistical terms, \(\epsilon_{i}\) ~\(N(0, \sigma^2)\).

Importantly, the intercept and slope are both fixed in a basic linear regression model. You can recognize that the intercept and slope are fixed because they do not include a subscript \(i\) or \(j\) (as we will see later). The lack of a subscript indicates that these parameters each only take on a single value that is meant to represent the entire population intercept or slope, respectively.

Now let’s take a look at the most basic LME model in comparison to the regression model above:

\[ Y_{ij} = \beta_{0j} + \beta_{1j}x_{ij} + \epsilon_{ij} \]

Look familiar? The only difference in between this equation and the linear regression equation is that this one contains more subscripts. (But this simple update will give us so much more information, as you’ll see!) For the purpose of defining the model, for now let’s assume that the subscript \(j\) represents a group of indivuals. In the LME model, \(i\) can take on any value in \((1, ..., N)\), where \(N\) is the number of individuals in the study. The subscript \(j\) may take on values in \((1, ..., J)\), where \(J\) is the number of groups in the study. In this model, recognize that each group is allowed its own unique intercept and slope. You could read the entire model as: “The outcome value for person \(i\) in group \(j\) is equal to the intercept for group \(j\), plus the slope for group \(j\) multiplied by the x-value for person \(i\) in group \(j\), plus some error that cannot be explained by the model for person \(i\) in group \(j\).” The errors in \(\epsilon_{ij}\) are typically assumed to be independently and identically distributed \((iid)\) ~\(N(0, \sigma^2)\).

If we think of the above expression as Level 1 of the LME model, we could define Level 2 as:

\[ \beta_{0j} = \gamma_{00} + u_{0j}\] \[ \beta_{1j} = \gamma_{10} + u_{1j}\]

where \(\gamma_{00}\) is the average intercept for all groups, and \(u_{0j}\) is the unique addition to the intercept for group \(j\). Similarly, \(\gamma_{10}\) is the average slope for all groups, and \(u_{1j}\) is an addition to the slope that is unique to group \(j\). This LME model has two fixed effects, \(\gamma_{00}\) and \(\gamma_{10}\). The greek symbol for the fixed effects is pronounced ‘gamma’. The model also has two random effects: \(u_{0j}\) is the random intercept, and \(u_{1j}\) is the random slope. It is an option to include only a random intercept or only a random slope. More complex models are also possible, but beyond the scope of this workshop.

The random effects are typically assumed to follow a joint Normal distribution with mean \(0\) and covariance matrix G:

\[ G = \left[\begin{array} {rr} \tau_{00} & \tau_{01} \\ \tau_{10} & \tau_{11} \end{array}\right] \]

In simpler language, the random effects follow a joint Normal distribution with mean equal to 0, and a covariance matrix that includes some important parameters:

\(\tau_{00}\) is the population variance of the random intercept
\(\tau_{11}\) is the population variance of the random slope
\(\tau_{10}\) and \(\tau_{01}\) on the off-diagonals are identical and symbolize the covariance of the random intercept and the random slope

3.0. Practical Examples

In this section, I provide a couple of brief fictional scenarios in which a researcher may want to use the LME approach.

Hierarchical Design Example

The classic example of hierarchical/multilevel design is the nesting structure of students within classrooms, and classrooms within schools, schools within districts, etc. Imagine a study on teacher burnout in junior high schools, in which researchers conduct a survey of perceived administrative support (PAS) and use PAS as a predictor or burnout. The researchers are studying a sample of junior high school teachers from the Sacramento region of Northern California. The sample includes 200 (\(N\)=200) teachers from 7 different schools (\(J\)=7).

To study the effect of PAS on teacher burnout, we must first recognize that teachers within a school have more shared variance than teachers across schools. As mentioned in Section 1, ignoring this fact could result in biased standard errors. One solution would be to account for the clustering using a LME model.

In this example, Level 1 of the random-intercept random-slope model would look like this:

\[ Burnout_{teacher|school} = \beta_{0school} + \beta_{1school}PAS_{teacher|school} + \epsilon_{teacher|school} \] and Level 2 may be:

\[ \beta_{0school} = \gamma_{00} + u_{0school}\]

\[ \beta_{1school} = \gamma_{10} + u_{1school}\] Notice we could substitute the Level 2 equations in for the Level 1 coefficients to achieve:

\[ Burnout_{teacher|school} = (\gamma_{00} + u_{0school}) + (\gamma_{10} + u_{1school})PAS_{teacher|school} + \epsilon_{teacher|school} \] Let’s breakdown each term in the model:

\(Burnout_{teacher|school}\) = The expected burnout for teacher \(i\) in school \(j\)
\(\gamma_{00}\) = The average teacher burnout level for the entire population of schools
\(u_{0school}\) = The adjustment to or deviation from the average burnout level unique to school \(j\)
\(\gamma_{10}\) = The average effect of PAS on burnout across all teachers in all schools
\(u_{1school}\) = The adjustment to or deviation from the average effect of PAS on burnout that is unique to school \(j\)
\(PAS_{teacher|school}\) = The PAS score for teacher \(i\) in school \(j\)
\(\epsilon_{teacher|school}\) = the teacher burnout level for teacher \(i\) in school \(j\) not explained by the model

By plugging values into this equation, you could quickly see that a school with a negative random intercept value will have a lower mean teacher burnout score than average, and a school with a positive random intercept will have a higher than average teacher burnout score.

When implementing the model using statistical software, you will get an estimate for the variance of each random effect, as well as the covariance between the random effects. I introduced this variance-covariance matrix, \(G\), in Section 2. In this example, a large random intercept variance would indicate a lot of variation in teacher burnout between schools. A large random slope variance would indicate that the effect of PAS on burnout varies greatly from school to school. A positive covariance between the random intercept and random slope would indicate that schools with higher burnout levels are also likely to be relatively more sensitive to PAS relative to other schools.

Repeated Measures Example

Repeated measures designs also necessarily lead to clustered data because multiple observations on the same measure get nested within an individual. For example, imagine a researcher is interested in studying the effectiveness of on-campus pet therapy in decreasing stress levels of college students. A sample of 150 college students were recruited to participate in a 5-week long program during which they spend four hours per week engaging in pet therapy. Participants each completed a stress inventory at the start of the study and then again during each week of the intervention program, resulting in a total of 6 observations per individual. For the record, I fully believe in the therapeutic effect of dogs! Here is a photo of me and my dog, Fisher (yes, named after R.A. himself):

In this example, we may reasonably hypothesize that pet therapy has a similar effect on all individuals, but we may want to account for the fact that students will vary in their baseline levels of stress (imagine parallel decreasing lines representing decrease in stress across time for each individual). This would lead us to fit a random intercept, fixed slope model to the data. The Level 1 model is thus:

\[ Stress_{time|student} = \beta_{0student} + \beta_{1}Time_{time} + \epsilon_{time|student} \]

and Level 2 of the model is:

\[ \beta_{0student} = \gamma_{00} + u_{0student}\]

yielding the full model:

\[ Stress_{time|student} = (\gamma_{00} + u_{0student}) + \beta_{1}Time_{time} + \epsilon_{time|student} \] in which \(\gamma_{00}\) is the fixed intercept, or the population mean stress level at Time=0. Notice that the random intercept \(u_{0}\) is unique to the individual, and essentially acts like a deviation score from the population mean stress level. An individual with a negative value of \(u_{00}\) has a lower than average mean stress level, and a person with a high value on \(u_{00}\) is a person with a higher than average mean stress level. The variance of thus random effect would be an indicator of how much students vary in their mean stress levels.

You could think about adding a random slope to this model, as well. A random slope would account for individual differences in change across the course of the pet therapy intervention program. Pop quiz: If you were to add a random slope to the above model and fit this model to data, you would get an estimate of the covariance between the random intercept and random slope (\(\tau_{01}\)). What would a negative value of \(\tau_{01}\) tell us?

Method Implementation (Example using R Software)

To demonstrate the model, I use simulated data generated using the code below. The variable names correspond to the repeated measures example given in Section 3. I am using the lmer function from the lme4 package in R, but most widely available programs have LME functionality, including SAS, SPSS, and Stata.

Simulate Data

set.seed(222)

N <- 150
nobs <- 6
sigma <- 20
sig00 <- 10
b0 <- 80
b1 <- -6

ID = rep(1:N, each = nobs)
Time = rep(0:5, 150)

dat <- data.frame(ID, Time)

u0 <- rnorm(N, 0, sig00)
dat$u0 <- rep(u0, each=nobs)

dat$err <- rnorm(900, 0, sigma)

dat$Stress <- (b0 + u0)  + b1*dat$Time + dat$err

dat <- dat[,-c(3:4)]

Load Packages and Rmarkdown setup Options

Data Checks and Pre-Analysis

summary(dat)

##        ID             Time         Stress       
##  Min.   :  1.0   Min.   :0.0   Min.   : -4.232  
##  1st Qu.: 38.0   1st Qu.:1.0   1st Qu.: 47.452  
##  Median : 75.5   Median :2.5   Median : 64.205  
##  Mean   : 75.5   Mean   :2.5   Mean   : 64.953  
##  3rd Qu.:113.0   3rd Qu.:4.0   3rd Qu.: 81.453  
##  Max.   :150.0   Max.   :5.0   Max.   :147.064

names(dat)

## [1] "ID"     "Time"   "Stress"

group_by(dat, Time) %>%
  summarise(
    count = n(),
    mean = mean(Stress, na.rm=TRUE),
    sd = sd(Stress, na.rm=TRUE)
  )

## # A tibble: 6 x 4
##    Time count  mean    sd
##   <int> <int> <dbl> <dbl>
## 1     0   150  77.4  22.5
## 2     1   150  78.6  22.8
## 3     2   150  70.9  22.9
## 4     3   150  61.1  20.5
## 5     4   150  51.6  22.9
## 6     5   150  50.2  19.9

Visual Summary

ggplot(dat, aes(x = Time, y = Stress, group=ID, color = ID)) +
  geom_line(alpha=.3) + labs(title="Stress Levels of Students Across Pet Therapy Intervention Program") +
  theme(legend.position = "none")

rando <- round(runif(20,1,85),0)
indplot <- subset(dat, ID %in% rando)
ggplot(indplot, aes(x=Time, y=Stress)) + geom_line(color="tomato") + 
  facet_wrap(~ID) + 
  labs(x="Time", y="Stress Score", title="Individual Stress Trajectories\nfor a Random Sample of Students") + 
  theme(axis.text.x=element_blank(), axis.ticks.x=element_blank())

Running the Models

There are two notable packages for running LME models in R. The nlme package (“nlme” = “nonlinear mixed effects”) was the first fully developed and popularized package for mixed-effects analysis. More recently, the lme4 package was developed. The two packages are very similar. Some differences are that nlme still has better functionality for nonlinear models, but lme4 is more advanced when it comes to estimation methods, and therefore is often more efficient.

Model 0: Linear regression (fixed-effects only)

First let’s take a look at Stress levels as a function of Time to observe how Stress levels change with continued exposure to pet therapy. I fit a simple OLS linear regression model below:

model0 <- lm(Stress ~ Time, data=dat)
summary(model0)

## 
## Call:
## lm(formula = Stress ~ Time, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -65.285 -16.382  -0.425  15.504  65.932 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  81.1328     1.3067   62.09   <2e-16 ***
## Time         -6.4718     0.4316  -14.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.11 on 898 degrees of freedom
## Multiple R-squared:  0.2002, Adjusted R-squared:  0.1994 
## F-statistic: 224.8 on 1 and 898 DF,  p-value: < 2.2e-16

In the output, we see an estimate for each fixed effect in our regression model. The fixed intercept is estimated to be 81.1 and the fixed slope is estimated to be -2.5. Therefore, on average we expect that baseline stress scores are around 81, and the average college student (assuming our sample was representative, of course) would expect to see a 2.5-point decrease in stress with each week of exposure to pet therapy.

Next let’s add in the random intercept to account for individual differences in baseline stress levels.

Model 1: Random intercept model

model1 <- lmer(Stress ~ Time + (1|ID), data=dat, REML=FALSE)
summary(model1)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Stress ~ Time + (1 | ID)
##    Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##   8132.9   8152.1  -4062.4   8124.9      896 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.9317 -0.7365 -0.0144  0.6990  2.9661 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  ID       (Intercept)   5.607   2.368  
##  Residual             482.278  21.961  
## Number of obs: 900, groups:  ID, 150
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  81.1328     1.3121   61.84
## Time         -6.4718     0.4286  -15.10
## 
## Correlation of Fixed Effects:
##      (Intr)
## Time -0.817

Notice that the estimates for the fixed effects look very similar to those in the OLS regression output. This is expected. However, the standard errors of the fixed-effects may be biased in the OLS model where we don’t account for clustering of observations within individuals, a violation of the assumption of independence. This output also includes a ‘Random Effects’ table, which now shows the variance attributable to individual differences in stress as well as residual variance.

Model 2: Random intercept and Random slope Model

In practice, you may have reason to believe that individuals also vary in their response to pet therapy across time. We could add a random slope to our model to account for this. We fit Model 2 as such:

model2 <- lmer(Stress ~ Time + (1 + Time|ID), data=dat, REML=FALSE)

## boundary (singular) fit: see ?isSingular

summary(model2)

## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Stress ~ Time + (1 + Time | ID)
##    Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##   8135.3   8164.1  -4061.6   8123.3      894 
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.83814 -0.73046 -0.01787  0.69831  3.01069 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  ID       (Intercept)  34.9908  5.9153       
##           Time          0.9964  0.9982  -1.00
##  Residual             473.2834 21.7551       
## Number of obs: 900, groups:  ID, 150
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  81.1328     1.3733   59.08
## Time         -6.4718     0.4324  -14.97
## 
## Correlation of Fixed Effects:
##      (Intr)
## Time -0.825
## convergence code: 0
## boundary (singular) fit: see ?isSingular

The convergence error here is hardly a surprise, given that we have no reason to expect individual variation in the slope parameter. This example is to demonstrate that the random effects table now includes an additional estimated random effect for Time. This paramater represents individual variation in the slope, or in how stress changes in response to continued exposure to pet therapy. We also see a correlation between the random intercept and random slope. I won’t interpret the output given that the model had convergence issues and may be misspecified (i.e. it isn’t the right model for these data).

4.0. Closing Remarks and Resources for Further Education

This tutorial was a very brief introduction to linear mixed-effects models. There are so many more complex applications of this method, including adding more levels (time nested within student, students nested within schools, schools nested within districts, etc), adding predictors at Level 2 (i.e. adding predictors to explain some of the variance across groups), and more. I hope that you have learned something here today that will set you on your journey to continue learning about LME models.

If you have any questions specific to this tutorial, please feel free to send me an email at melissa.mcternan@bc.edu.

Resources

Bates, Machler, Bolker, & Walker, Fitting Linear Mixed-Effects Models using lme4
Raudenbush & Bryk, Hierarchical Linear Models
Singer & Willett, Applied Longitudinal Data Analysis