What is `Corr` in the output of mixed-effects models?

When fitting mixed effects regression models, especially those that try to keep it “maximal” (as per Barr et al., 2013), the random effects in the output of the model sometimes displays a column named Corr where some rows have numbers that range from -1 to 1.

It’s easy to guess that Corr stands for correlation and that the numbers in the column are correlation coefficients. If there are multiple predictors and the random effects structure includes more than one of those terms (e.g., (1 + Effect_1 * Effect_2 | Subject)), we even get another clue for this from the way that the Corr column spreads out in the shape of a right triangle, much like in a correlation matrix.

Despite the fact that we’re bound to have encountered this at some point when working with or reading about mixed effects models, I’ve found that there aren’t many beginner-friendly material explaining what they are - there are 1 paragraph StackExchange answers and dense statistics papers, but not much in between in terms of comprehensiveness.

So here are my compiled notes on correlation parameters in linear mixed effects models that I’ve made for myself (with a basic knowledge of LMEMs) that I hope others may find useful.

Before we get started

Our toy data and model

For the purposes of this discussion, I have created a toy experiment data (the code used to generate it is attached at the bottom).

The dataset toydata has 1,920 rows with the following columns:

Subject: The subject ID, which ranges from 1 to 80
Item: The item ID, which ranges from 1 to 24
Condition: The experimental condition, which is either Control or Response
Response: A continuous response variable

toydata

## # A tibble: 1,920 x 4
##    Subject Item  Condition Response
##    <fct>   <fct> <fct>        <dbl>
##  1 1       1     Control       226.
##  2 1       2     Treatment     300.
##  3 1       3     Control       239.
##  4 1       4     Treatment     262.
##  5 1       5     Control       241.
##  6 1       6     Treatment     264.
##  7 1       7     Control       237.
##  8 1       8     Treatment     230.
##  9 1       9     Control       229.
## 10 1       10    Treatment     283.
## # ... with 1,910 more rows

Imagine toydata to be the results from a very simple experiment. In this imaginary experiment, there are 80 subjects and each subject is tested on 24 items, resulting in a total of 1,920 trials/observations. This is a within-partipant design, so each participant sees 12 of the items in the Control condition and the other 12 in the Treatment condition.

Let’s say that with our toy data, we want to know whether Condition has an effect on Response. Our goal by using mixed-effects modeling is to isolate the effect of Condition on Response (fixed effect), while controlling for by-item and by-subject variations (random effects). So let’s fit a simple linear mixed-effects model with the maximal random effects structure, with random intercepts and slopes for Condition by Subject and Item:

model <- lmer(Response ~ Condition + (1+Condition|Subject) + (1+Condition|Item),
              REML = FALSE, control = lmerControl('bobyqa'), data = toydata)

We can really quickly check model assumptions with the very convenient performance::check_model() function:

check_model(model)

Everything looks okay, so let’s look at the model output:

summary(model)

## Linear mixed model fit by maximum likelihood . t-tests use
##   Satterthwaite's method [lmerModLmerTest]
## Formula: 
## Response ~ Condition + (1 + Condition | Subject) + (1 + Condition |  
##     Item)
##    Data: toydata
## Control: lmerControl("bobyqa")
## 
##      AIC      BIC   logLik deviance df.resid 
##    13267    13317    -6624    13249     1911 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -3.176 -0.627 -0.065  0.576  4.864 
## 
## Random effects:
##  Groups   Name               Variance Std.Dev. Corr
##  Subject  (Intercept)        637.1    25.24        
##           ConditionTreatment 108.1    10.40    0.85
##  Item     (Intercept)         44.4     6.66        
##           ConditionTreatment 308.2    17.56    0.14
##  Residual                     37.4     6.11        
## Number of obs: 1920, groups:  Subject, 80; Item, 24
## 
## Fixed effects:
##                    Estimate Std. Error     df t value Pr(>|t|)    
## (Intercept)          209.64       3.14 100.84    66.8  < 2e-16 ***
## ConditionTreatment    35.88       3.78  28.52     9.5  2.5e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## CndtnTrtmnt 0.291

The t-statistic and p-value of ConditionTreatment in the fixed effects output suggests that our data is extremely unlikely given the null hypothesis that Condition has no effect on Response (and further suggests a positive effect of the Treatment condition on Response compared to the Control condition). Therefore, this is strong evidence in support of our hypothesis that Condition indeed has an effect on Response.

That’s great but we’re more interested in the random effects here, so let’s look at that.

The random effects (a review)

We can isolate the random effects from the summary() object by extracting the varcor variable:

summary(model)$varcor

##  Groups   Name               Std.Dev. Corr
##  Subject  (Intercept)        25.24        
##           ConditionTreatment 10.40    0.85
##  Item     (Intercept)         6.66        
##           ConditionTreatment 17.56    0.14
##  Residual                     6.11

Let’s ignore the Corr column for a moment and talk about Std.Dev first.

The Std.Dev. values for the Subject random effects group suggest that the variation in the subject intercepts are fitted with a larger sd of 25.24 and that the variation in subject slopes for Condition are fitted with a smaller sd of 10.4.

Let’s plot the random effects for subject for visualization:

We see a clear variation in the intercepts, and a subtler variation in the slopes. This is overall pretty consistent with the larger sd for subject intercepts and the smaller sd for subject slopes that we found earlier.

Let’s plot the by-item variation as well:

Here, we see the opposite: a clear variation in the slopes, and a subtler variation in the intercepts. Again, this is overall pretty consistent given the item random effects output: a larger sd for the item slopes and a smaller sd for item intercepts, as we found earlier.

Now that we reviewed Std.Dev., let’s talk about Corr, which is our main focus here.

The Correlation Parameter

Looking at the Corr column now, we see two numbers: 0.85 within the Subject random effects group and 0.14 within the Item random effects group.

##  Groups   Name               Std.Dev. Corr
##  Subject  (Intercept)        25.24        
##           ConditionTreatment 10.40    0.85
##  Item     (Intercept)         6.66        
##           ConditionTreatment 17.56    0.14
##  Residual                     6.11

As you might have guessed, they have something to do with the correlation between random effects (intercept and slope) within each group (subject and item).

But if we extract the subject random effects, for example, and measure the correlation between subject intercepts and subject slopes, we get a slightly different number:

# Extract by-subject intercept and slope
ranef_subj <- ranef(model)$Subject

ranef_subj_intercepts <- ranef_subj$`(Intercept)`
ranef_subj_slopes <- ranef_subj$ConditionTreatment

# Calculate correlation
cor(ranef_subj_intercepts, ranef_subj_slopes)

## [1] 0.88

In fact, we get slightly different values for the standard deviation of the subject random intercepts and slopes as well:

# Calculate the standard deviation of by-subject intercepts and slopes
summarize_all(ranef_subj, sd)

##   (Intercept) ConditionTreatment
## 1        25.3              10.18

So it looks like what the model isn’t just taking the random effects in our toydata dataset and calculating their variations and correlations. So then what IS it doing?

How the model estimates random effects

What we have to keep in mind when doing mixed effects modeling is that the model is FITTING the random effects in the data, rather than just DESCRIBING them. More specifically, the model is ESTIMATING population parameters that generated the sample of random effects that are seen in the data.

And in fact that’s exactly what we want to do. We don’t care about how individual subjects or items behave in our experiment, in the sense that we don’t care how fast John Doe presses a button, for example. We don’t want to predict John Doe’s behavior, but we do want to estimate, using data from John Doe, Jane Doe, Average Joe, and other participants from our experiment, the overall distribution of people’s idiosyncratic tendencies so that we can statistically control for them to get a better estimate for our fixed effects that we care about.

Take the random intercepts by subject for example. The model estimated the distribution of subject intercepts to follow a normal distribution with sd = 25.24, which is a an estimate for the Population. The variation in subject intercepts in the data itself that we manually calculated above (25.3) is the Sample standard deviation. Of course, if the sample follows a normal distribution and if we also assume the population to be normally distributed, the sample variance should be the best estimate for the population variance. And in fact they do end up being very close but there are some differences, in part due to noise, but also because the model is doing some other calculations to fit the data.

So the numbers in the Std.Dev. colum are the model’s FIT for the VARIATION within each random effect

With this, we now have a better understand the numbers in the Corr column: they are the model’s FIT for the CORRELATION between random effects

To go more in depth with our discussion, let’s plot the intercepts by slopes for our 80 subjects:

Recall that when we manually calculated the correlation between subject intercepts and subject slopes within our sample of 80 subjects in the data, we got 0.88. That is in fact what is shown by the plot above.

And as we discussed earlier, the numbers in the Std.Dev. column are the model’s fit for the variation within each random effect. So the model is saying that the variation for subject intercepts follows a normal distribution with mean = 0 and sd = 25.24. Likewise, the model is saying that the variation for subject slopes follows a normal distribution with mean = 0 and sd = 10.4.

So the model estimates these two distributions - one for subject slopes and one for subject intercepts - to capture the overall distribution of subject random effects.

This is illustrated below, where the normal curve at the top is the distribution of subject intercepts estimated by the model, and the normal curve to the right is the distribution of subject slopes estimated by the model. For every subject, their intercepts and slopes are assumed to be generated from these two parameters:

But is specifying each distribution for intercept and item enough to capture the overall distribution of subject random effects?

One way to test this is to work backwards and generate some observations from the model’s parameters. The idea here is this: if the two normal distributions (one for subject intercept and one for subject slope) can sufficiently capture the distribution of the subject random effects, then sampling from them should yield a distribution that is in the shape of the actual distribution in our data.

Let’s draw 80 samples of subject intercepts and subject slopes from their respective distributions and then plot them together. Here’s one result:

These points are consistent with what the two distributions predict - there are more points towards the center (the means of the distributions) and less points towards the corners (the tails of the distributions).

But this doesn’t look like the actual distribution of our subject random effects.

We can repeat this sampling procedure many times, but none of them look close to the distribution of the subject random effects in our data:

So what are we missing here?

Well, what’s missing here is the correlation between the intercepts and slopes that I conveniently left out to demonstrate that just specifying the individual distributions for subject intercepts and subject slopes poorly captures the actual distribution of subject random effects in our data.

In more technical terms, treating the two random effects as independently sampled from their respective distributions fails to fit the data well because the two random effects are highly correlated. They should instead be treated as being jointly sampled from a bivariate distribution

And that’s exactly what adding the correlation parameter does. Let’s break this down.

When we say that two variables are independently sampled from two distributions (as we just did above), then their joint distribution looks something like this, where most of the data is expected to fall within the grey shaded ellipse:

This is clearly a bad fit for the distribution of our subject random effects…

… because the distribution of the subject random effects actually takes the shape of a tilted ellipse instead (dotted outline):

In fact, we cannot generate any distribution of a tilted shape with just two independent distributions for each variable. We need to factor in covariation to capture the correlation between variables. Barr et al. (2013) illustrates this clearly in the supplementary materials to their paper. You can see from the plot below (originally Figure 1 on the linked page) that without the correlation parameter, you can only capture distributions that are symmetrical with respect to the axes (the darker ellipses). However, once you add in a correlation parameter (\(\rho\)), you can capture distributions that are in the “tilted” shape (the lighter ellipses) like the distribution of our highly correlated subject intercepts and subject slopes.

Putting it all together

Here’s the model’s output again. Let’s keep focusing on the subject random effects for now.

##  Groups   Name               Std.Dev. Corr
##  Subject  (Intercept)        25.24        
##           ConditionTreatment 10.40    0.85
##  Item     (Intercept)         6.66        
##           ConditionTreatment 17.56    0.14
##  Residual                     6.11

There are three parameters that the model estimated to capture the by-subject variation:

The variation (Std.Dev.) for subject intercept
The variation (Std.Dev.) for subject slope
The correlation (Corr) between subject intercept and subject slope.

With these three parameters, the model is defining a bivariate normal distribution over subject intercepts and subject slopes:

\[(S_{I}, S_{S})\ \sim\ N(0,\begin{bmatrix}s_{I}^2 & \rho\ s_{I}^2 s_{S}^2 \\ \rho\ s_{I}^2 s_{S}^2 & s_{S}^2 \end{bmatrix})\]

For the covariance matrix, we can substitute the standard deviation for the subject intercept \(s_{I}\) with 25.24, the standard deviation for the subject slope \(s_{S}\) with 10.4, and the correlation \(\rho\) with 0.85 to yield the following joint distribution:

\[(S_{I}, S_{S})\ \sim\ N(0,\begin{bmatrix}25.24^2 & 0.85\ \times\ 25.24\ \times\ 10.4 \\ 0.85\ \times\ 25.24\ \times\ 10.4 & 10.4^2 \end{bmatrix})\]

If subject intercepts and subject slopes are jointly sampled from the above distribution, most observations should fall within the grey area:

Let’s again repeatedly sample from this new bivariate distribution (which you can do with MASS::mvrnorm()) to check:

Like we expected, this new distribution generates observations of subject slopes and subject intercepts that are highly correlated. But more importantly, the distribution of subject random effects in our data looks like it could be one of these samples, meaning that this bivariate normal distribution fits our data well.

Good thing that we had the model estimate this parameter by specifying the random effects structure for subjects as (1 + Condition | Subject) in our model formula!

What if we leave out this correlation parameter? Would it significantly worsen model fit?

We can check by building another model without the correlation term between the subject random effects and comparing it with our original model. Removing a correlation term in lmer() is actually sort of tricky and just using the double bar syntax (||) doesn’t always work. I won’t go into the details of how to do that here, but there are good discussions of doing this using model.matrix() in this Rpubs post and Section 5.4 (also Appendix E) of Frossard and Renaud (2019).

The no_subj_cor_model below is a depleted model without the correlation parameter between the random intercepts and random slopes by subject. You can see that the subject group is missing a value in the Corr column.

no_subj_cor_model <- lmer(Response ~ Condition + (1+model.matrix(model)[,2]||Subject) + (1+Condition|Item),
                          REML = FALSE, control = lmerControl('bobyqa'), data = toydata)

no_subj_cor_model

## Linear mixed model fit by maximum likelihood  ['lmerModLmerTest']
## Formula: 
## Response ~ Condition + (1 + model.matrix(model)[, 2] || Subject) +  
##     (1 + Condition | Item)
##    Data: toydata
##      AIC      BIC   logLik deviance df.resid 
##    13352    13396    -6668    13336     1912 
## Random effects:
##  Groups    Name                     Std.Dev. Corr
##  Subject   (Intercept)              25.48        
##  Subject.1 model.matrix(model)[, 2] 10.53        
##  Item      (Intercept)               6.68        
##            ConditionTreatment       17.66    0.14
##  Residual                            6.11        
## Number of obs: 1920, groups:  Subject, 80; Item, 24
## Fixed Effects:
##        (Intercept)  ConditionTreatment  
##              209.6                35.9

Now let’s compare it to our original model using anova():

anova(no_subj_cor_model, model, test = 'Chisq')

## Data: toydata
## Models:
## no_subj_cor_model: Response ~ Condition + (1 + model.matrix(model)[, 2] || Subject) + 
## no_subj_cor_model:     (1 + Condition | Item)
## model: Response ~ Condition + (1 + Condition | Subject) + (1 + Condition | 
## model:     Item)
##                   Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## no_subj_cor_model  8 13352 13396  -6668    13336                        
## model              9 13267 13317  -6624    13249  87.2      1     <2e-16
##                      
## no_subj_cor_model    
## model             ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The first thing to notice is that no_subj_cor_model has one less Df, or degrees of freedom, than model. This is because every correlation between random effects is an additional parameter that the model is estimating. So removing the correlation between random effects for no_subj_cor_model leaves it with 8 parameters, one less than the original model. There’s a good discussion of what parameters are specified by different lmer() formulas in this StackExchange thread.

After performing this sanity check, the next thing to note is the very low number in Pr(>Chisq), telling us that the models are significantly different from one another. This might come off as weird if you’re only used to performing ANOVA comparisons to check whether a predictor is significant or not. In fact, there are no obvious differences between the output of no_subj_cor_model and the output of our original model (printed again below) other than the presence/absence of the correlation between subject random effects:

model

## Linear mixed model fit by maximum likelihood  ['lmerModLmerTest']
## Formula: 
## Response ~ Condition + (1 + Condition | Subject) + (1 + Condition |  
##     Item)
##    Data: toydata
##      AIC      BIC   logLik deviance df.resid 
##    13267    13317    -6624    13249     1911 
## Random effects:
##  Groups   Name               Std.Dev. Corr
##  Subject  (Intercept)        25.24        
##           ConditionTreatment 10.40    0.85
##  Item     (Intercept)         6.66        
##           ConditionTreatment 17.56    0.14
##  Residual                     6.11        
## Number of obs: 1920, groups:  Subject, 80; Item, 24
## Fixed Effects:
##        (Intercept)  ConditionTreatment  
##              209.6                35.9

But clearly something major is going on behind the curtains, so we turn to the last term(s) of interest - AIC and BIC, which are scores for model fit. The numbers are hard to interpret on their own, but useful when comparing models. Here, both the AIC and the BIC of no_subj_cor_model are higher than model, suggesting that no_subj_cor_model has a worse fit, and a statistically significant one at that.

More specifically, we know that the only meaningful difference between no_subj_cor_model and model is the correlation parameter for the subject random effects, so no_subj_cor_model must be capturing the subject random effects relativelty poorly under its assumption that subject intercepts and subject slopes do not correlate with one another (i.e., that they are independent).

So let’s look at the random effects calculated by no_subj_cor_model and its poor attempt at fitting their distribution.

First, let’s plot the subject intercepts by subject slopes like we did for our original model:

You might notice that the no_subj_cor_model calculates subject random effects that are very similar to those calculated by our original mode. Here’s a side-by-side comparison of the subject random effects from model and no_subj_cor_model:

This illustrates a very important point. Removing the correlation parameter does not change the calculation of the random effects (barring any serious convergence failures, of course). This shouldn’t be surprising because random effects, like fixed effects, speak to FACTS (in the frequentist sense) about how the data that we observe is generated. It is literally the case here since I included these random effects explicitly in making toydata. But more importantly, the idea that there are random variations generated from underlying population-level parameters is an assumption that we are making when we use mixed-effects moels.

The only meaningful difference between the two models here, then, is in their fit - e.g., how well the model captures the distribution of the subject random effects. We actually went over this above - we saw that our original model fits subject random effects using a bivariate normal distribution assuming a correlation, while no_subj_cor_model should be fitting subject random effects using two univariate normal distributions, assuming no (i.e., zero) correlation.

Here’s a visual comparison of model fit, with the plot for model at the top and the plot for no_subj_cor_model at the bottom:

Where the term \(\rho = 0\) in the plot for no_subj_cor_model indicates that the subject intercepts and subject slopes are generated from this bivariate distribution:

\[(S_{I}, S_{S})\ \sim\ N(0,\begin{bmatrix}s_{I}^2 & 0 \\ 0 & s_{S}^2 \end{bmatrix})\]

Which is the same as sampling from these two univariate distributions:

\[S_I \sim N(0, s_I)\]

\[S_S \sim N(0, s_S)\]

Now, looking at the plots, no_subj_cor_model (bottom) clearly fits the distribution of the subject random effects poorly compared to our original model model (top), and that appears to be the driving the significant decrease in fit that we found from the ANOVA comparison earlier. It seems to be the case that the inclusion of the correlation parameter between subject random intercepts is necessary to fit the data well.

Are correlation parameters always necessary?

The question of how “maximal” our models should be is very tricky, and especially so when it concerns the inclusion/exclusion of correlation parameters (see discussion here and here). But one thing for sure is that we should not be blindly fitting maxmial models, especially if doing so leads to convergence issues (see Bates et al., 2015).

I’ll demonstrate one case here where it doesn’t seem like including a correlation parameter particularly improves model fit.

Let’s repeat the model comparison process we just did, except this time taking out the correlation parameter for item.

For context, here is the random effects output of model again:

##  Groups   Name               Std.Dev. Corr
##  Subject  (Intercept)        25.24        
##           ConditionTreatment 10.40    0.85
##  Item     (Intercept)         6.66        
##           ConditionTreatment 17.56    0.14
##  Residual                     6.11

Again, there are three parameters that the model estimated to capture the by-item variation:

The variation (Std.Dev.) for item intercept
The variation (Std.Dev.) for item slope
The correlation (Corr) between item intercept and item slope.

And here is what the distribution of item random effects from model look like:

Our model fitted a bivariate normal distribution with the standard deviation of item intercepts = 6.66, the standard deviation of item slopes = 10.4, and correlation = 0.14.

We can again visualize the fit of model to the distribution of the item random effects:

The model estimates a low correlation of 0.14, which is reflected in the small tilt of the ellipse. It looks like the model is capturing the distribution of the item random effects well. But is the correlation parameter really that necessary here?

Let’s make another depleted model, no_item_cor_model, with the correlation between item random effects removed:

no_item_cor_model <- lmer(Response ~ Condition + (1+Condition|Subject) + (1+model.matrix(model)[,2]||Item),
                          REML = FALSE, control = lmerControl('bobyqa'), data = toydata)

no_item_cor_model

## Linear mixed model fit by maximum likelihood  ['lmerModLmerTest']
## Formula: 
## Response ~ Condition + (1 + Condition | Subject) + (1 + model.matrix(model)[,  
##     2] || Item)
##    Data: toydata
##      AIC      BIC   logLik deviance df.resid 
##    13265    13310    -6625    13249     1912 
## Random effects:
##  Groups   Name                     Std.Dev. Corr
##  Subject  (Intercept)              25.24        
##           ConditionTreatment       10.40    0.85
##  Item     (Intercept)               6.67        
##  Item.1   model.matrix(model)[, 2] 17.61        
##  Residual                           6.11        
## Number of obs: 1920, groups:  Subject, 80; Item, 24
## Fixed Effects:
##        (Intercept)  ConditionTreatment  
##              209.6                35.9

Again, the output of the depleted model printed here does not differ that much from the output of model. We can see also get a sense of this by visualizing the fit of ‘no_item_cor_model’ to the distribution of item random effects:

We can see this even more clearly with a side-by-side comparison of model fit by model (blue) and no_item_cor_model (red):

Doesn’t seem like there are big differences here, but we have to run some statistics to be sure. So let’s do another ANOVA:

anova(no_item_cor_model, model)

## Data: toydata
## Models:
## no_item_cor_model: Response ~ Condition + (1 + Condition | Subject) + (1 + model.matrix(model)[, 
## no_item_cor_model:     2] || Item)
## model: Response ~ Condition + (1 + Condition | Subject) + (1 + Condition | 
## model:     Item)
##                   Df   AIC   BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## no_item_cor_model  8 13265 13310  -6625    13249                        
## model              9 13267 13317  -6624    13249  0.47      1       0.49

First, for sanity check, we see that no_item_cor_model has one less Df than model, which is what we’d expect if no_item_cor_model indeed lacks the correlation parameter for item random effects. Next, we see that the value of Pr(>Chisq) is very high, at 0.49, suggesting that the models are not significantly different. This is corroborated by the very small differences between the models’ AIC and BIC values. These small differences are probably reducible to the fact that AIC and BIC penalize more number of parameters (as a way of balancing model fit with model complexity). In fact, the differences in AIC between the models is approximately 2, which is what you’d expect if you added a redundant parameter with no additional explanatory power to the model.

In sum, we see that when modeling our dataset toydata, estimating a correlation parameter between subject random effects improves model fit, while estimating a correlation parameter between item random effects doesn’t as much.

Conclusion and implications for model building

My main goal here was to simply go over what the correlation parameter in mixed-effects models is, so the question of whether we should be including certain correlation parameter(s) in our models are beyond the scope of this discussion (and should be handled on a case-by-case basis). It’s also something that I’m in the process of learning, so I don’t have a good answer to it yet. But my personal view (which may change later with more knowledge and experience) is to keep correlation parameters in the model unless they make the model fail to converge. So in the case of the correlation parameter for item random effects in model that we discussed above, I’d personally keep that in since we didn’t run into any convergence issues fitting the maximal model. In general, if there’s no meaningful difference either way, I personally err towards leaving the correlation parameter in there. In other words, I try to keep the model as maximal as possible (Barr et al., 2013) without overparameterizing it (Bates et al., 2015).

I don’t think this view is really controversial, but the more challenging part of this is putting it into practice. We not only need to balance model fit with model complexity, but we also often need to navigate conflicts between important considerations from statistical theory and from whatever domain our research is in (linguistics, psychology, etc.).

To resolve this, a lot of people have streamlined different methods of reducing the complexity of the random effects structure in a statistically motivated way. One such example is using Principal Components Analysis (PCA) (suggested in Bates et al., 2015). In Section 3 of their paper, Bates and colleagues outline a procedure for iterative model reduction which involves PCA (now available in the lme4 as the rePCA() function) to determine how many random effect terms are sufficient to capture the variance in the random effects. This is still not a perfect solution, of course, but it’s a good next step for putting this knowledge into practice.

Anyways, that’s it for my notes. Here’s the code that generated the data. I’ll upload the full markdown file for this on my github later after I clean up some code.

###########
## Setup ##
###########

# Load Packages
library(MASS)
library(lme4)
library(tidyverse)
library(performance)

# Set seed
set.seed(1234)

# Set number of participants and items
n_subjects <- 80
n_items <- 24

#################
## Make trials ##
#################

# Generate levels
Subject <- gl(n_subjects, n_items)
Item <- rep(gl(n_items, 1), n_subjects)
Condition <- factor(rep(c(rep(c("Control", "Treatment"), n_items/2),
                   rep(c("Treatment", "Control"), n_items/2)),
                 n_subjects/2))

# Treatment coding
Condition_coded <- ifelse(Condition == "Control", 0, 1)

# Combine into trials
Data <- tibble(Subject, Item, Condition, Condition_coded)

#############################
## Add Intercept and Slope ##
#############################

# Add intercept
Data$Intercept <- 200

# Add slope
Data$Slope <- ifelse(Data$Condition == "Treatment", 30, 0)

########################
## Add Random Effects ##
########################

# By-subject variation in intercept and slope (sampled from bivariate normal)
sd_subj_intercept <- 25
sd_subj_slope <- 10
subj_ranef_cor <- 0.8

subj_ranef <- mvrnorm(n_subjects,
                      # means of two normals are both 0
                      c("Intercept" = 0, "Slope" = 0),
                      # 2x2 covariance matrix
                      matrix(
                        c(sd_subj_intercept^2, subj_ranef_cor * sd_subj_intercept * sd_subj_slope,
                          subj_ranef_cor * sd_subj_intercept * sd_subj_slope, sd_subj_slope^2),
                        ncol = 2)
                      )

Data$Subj_intercept <- rep(subj_ranef[,"Intercept"], each = n_items)
Data$Subj_slope <- rep(subj_ranef[,"Slope"], each = n_items)

# By-item variation in intercept and slope (sampled independently)
Data$Item_intercept <- rep(rnorm(n_items, sd = 5), times = n_subjects)
Data$Item_slope <- rep(rnorm(n_items, sd = 15), times = n_subjects)

# Random noise
Data$Noise <- rnorm(nrow(Data), 0, 5) + rlnorm(nrow(Data), 0.5)

###########################
## Generate Observations ##
###########################

Data <- Data %>%
  mutate(Response =
           Intercept +
           Slope * Condition_coded +
           Subj_intercept +
           Subj_slope * Condition_coded +
           Item_intercept +
           Item_slope * Condition_coded +
           Noise)

#################
## Toy Dataset ##
#################

toydata <- Data %>% 
  select(Subject, Item, Condition, Response)

The Correlation Parameter in the Random Effects of Mixed Effects Models

June Choe

5/28/2020

What is `Corr` in the output of mixed-effects models?

Before we get started

Our toy data and model

The random effects (a review)

The Correlation Parameter

How the model estimates random effects

So what are we missing here?

Putting it all together

What if we leave out this correlation parameter? Would it significantly worsen model fit?

Are correlation parameters always necessary?

Conclusion and implications for model building

The Correlation Parameter in the Random Effects of Mixed Effects Models

June Choe

5/28/2020

What is Corr in the output of mixed-effects models?

Before we get started

Our toy data and model

The random effects (a review)

The Correlation Parameter

How the model estimates random effects

So what are we missing here?

Putting it all together

What if we leave out this correlation parameter? Would it significantly worsen model fit?

Are correlation parameters always necessary?

Conclusion and implications for model building

What is `Corr` in the output of mixed-effects models?