I’ll be exploring the differences between these three models:

subj_intercepts_mod <- lmer(rt ~ A + (1|Subject))
subjA_intercepts_mod <- lmer(rt ~ 1 + (1|Subject:A))
subj_slopes_mod <- lmer(rt ~ A + (A|Subject))

Granted, the second model is rarely encountered in practice. More common would be a nested structure (see this crossvalidated post).

lmer(rt ~ 1 + (1|Subject) + (1|Subject:A))
## the above random effects structure is often written as `(1|Subject/A)` the 
## same way `y ~ A + B + A:B` is usually written as `y ~ A * B`.

Even though the non-nested (1|Subject:A) situation is rare, I still find it informative as an example when exploring these models.

I’ll work with a modified version of the sleepstudy dataset from lme4.

library(lme4)
library(ggplot2)

library(dplyr)
library(reshape2)
library(broom)    # install_github("dgrtwo/broom")

library(stringr)

example <- sleepstudy %>%
  mutate(A = ifelse(Days < 5, -0.5, 0.5)) %>%
  select(Subject, A, Reaction)
head(example, n = 11)
##    Subject    A Reaction
## 1      308 -0.5    249.6
## 2      308 -0.5    258.7
## 3      308 -0.5    250.8
## 4      308 -0.5    321.4
## 5      308 -0.5    356.9
## 6      308  0.5    414.7
## 7      308  0.5    382.2
## 8      308  0.5    290.1
## 9      308  0.5    430.6
## 10     308  0.5    466.4
## 11     309 -0.5    222.7

I’ll start by plotting the means.

base_plot <- ggplot(example, aes(x = A, y = Reaction)) +
  stat_summary(aes(fill = factor(A)), fun.y = mean, geom = "bar", color = "black") + 
  scale_fill_manual(values = c("#66c2a5", "#8da0cb")) +  # colorbrewer2.org
  theme(legend.position = "none")
base_plot

plot of chunk unnamed-chunk-4

Part 1: subj_intercepts_mod

First I’ll fit an lmer model that allows the intercept to vary across subjects.

subj_intercepts_mod <- lmer(Reaction ~ A + (1|Subject), data = example)
# broom::tidy turns `fixef(subj_intercepts_mod)` into a data.frame
fixed_params <- tidy(subj_intercepts_mod, effects = "fixed")[,c("term", "estimate")]
fixed_params
##          term estimate
## 1 (Intercept)   298.51
## 2           A    53.76

Although it’s sometimes helpful to think about model parameters (and you can draw them easily with ggplot2::geom_abline), I find it more beneficial in a simple design like this to deal in estimates. I’ll write a little function to speed up this conversion that seems like overkill now but it will come in handy later.

# converts parameters of a `Reaction ~ (Intercept) + A` model into estimates,
# assumes A is a 0-centered, unit-weighted, dichotomous variable
convert_parameters_to_estimates <- function(tidy_frame, id_var = ".") {
  tidy_frame %>%
    dcast(as.formula(paste(id_var, "term", sep="~")), value.var = "estimate") %>%
    mutate(`-0.5` = `(Intercept)` - A/2, `0.5` = `(Intercept)` + A/2) %>%
    select(-`(Intercept)`, -A) %>%
    melt(idvars = id_var, measure.vars = c("-0.5", "0.5"),
         variable.name = "A", value.name = "Reaction") %>%
    mutate(A = as.numeric(as.character(A)))
}

fixed_estimates <- convert_parameters_to_estimates(fixed_params)[,c("A", "Reaction")]
fixed_estimates
##      A Reaction
## 1 -0.5    271.6
## 2  0.5    325.4
# sanity check
example %>% group_by(A) %>% summarize(Reaction = mean(Reaction)) %>%
  merge(., fixed_estimates, by = "A", suffixes = c("_mean", "_model"))
##      A Reaction_mean Reaction_model
## 1 -0.5         271.6          271.6
## 2  0.5         325.4          325.4

It’s possible to turn parameters from the model into estimates that make sense; now let’s do the same thing with random effects. How will the model’s random effect parameters, when converted to estimates, compare to the averages for each subject that we can calculate by hand?

random_params <- tidy(subj_intercepts_mod, effect = "random")
random_estimates <- convert_parameters_to_estimates(random_params, id_var = "level")

fixed_slopes_plot <- base_plot + 
  geom_point(data = random_estimates, shape = 17, size = 3) +
  geom_line(aes(group = level), data = random_estimates)
fixed_slopes_plot

plot of chunk unnamed-chunk-7

fixed_slopes_plot +
  stat_summary(aes(group = Subject), fun.y = mean,  # means from raw data
               geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)

plot of chunk unnamed-chunk-7

Of course the reason the two sets of points don’t line up is because we are only allowing the subject’s overall Reaction to vary, not the subject’s overall Reaction in each condition. Applying the same slope to each subject, this is the best we can do to account for variance:

base_plot + 
  geom_line(aes(group = level), data = random_estimates) +
  ## calculate mean Reaction by subject using `stat_summary`
  stat_summary(aes(x=0.0, y=Reaction, group=level), data = random_estimates, fun.y = mean,
               geom = "point", shape = 17, size = 3) +
  stat_summary(aes(x=0.0, group=Subject), fun.y = mean,
               geom = "point", shape = 19, size = 4, color = "#fc8d62", alpha = 0.6)

plot of chunk unnamed-chunk-8

## Well, **almost** accurate estimates of each subject's overall mean Reaction.
## Note that the random estimate means (triangles) are closer to the overall mean,
## reflecting that the model assumes each subject's mean is closer to the overall 
## average than it actually is---a fundamental "assumption" of a multilevel model.

Conclusion

subj_intercepts_mod <- lmer(Reaction ~ A + (1|Subject))

A model that allows intercepts to vary across subjects does just that: it does a great job of estimating overall Reaction for each subject, but it is limited in estimating the effect of A on Reaction.

Part 2: subjA_intercepts_mod

We’re looking for a way to capture the fact that all of the following by-subject lines don’t have the same slope.

subj_means_plot <- base_plot + 
  stat_summary(aes(group = Subject), fun.y = mean,  # means from raw data
               geom = "point", shape = 19, size = 4, color = "#fc8d62") +
  stat_summary(aes(group = Subject), fun.y = mean, 
               geom = "line", size = 1.2, color = "#fc8d62")
subj_means_plot

plot of chunk unnamed-chunk-10

One way to give the model some flexibility would be to “sever the connection” between the measurements on the left bar from those in the right bar.

example$SubAject <- with(example, paste(Subject, A, sep = ":"))
subjA_intercepts_mod <- lmer(Reaction ~ 1 + (1|SubAject), data = example)

Why make a new, hideously-named variable SubAject? Because if the model can’t understand the relationship between Subject and A, I shouldn’t be able to either! We’ve severed the connection between scores on the left and scores on the right, and given the model more flexibility to estimate the effects.

Of course, nothing is preventing me from prying apart in the model what I glued together in the data (it’s R, after all), so let’s take a look.

subjA_estimates <- tidy(subjA_intercepts_mod, effect = "random")

# split Subject:A into Subject and A
subjA_estimates[,c("Subject", "A")] <- as.data.frame(str_split_fixed(subjA_estimates$level, ":", 2))
subjA_estimates <- subjA_estimates %>%
  select(Subject, A, Reaction = estimate) %>%
  mutate(A = as.numeric(as.character(A))) # thanks R!

## It's interesting to note that prying apart the mashed variabl, leads to two 
## values for each subject even though I didn't specificy `A` as a "fixed 
## effect" in the model.
head(subjA_estimates, n = 3)
##   Subject    A Reaction
## 1     308 -0.5    288.3
## 2     308  0.5    389.7
## 3     309 -0.5    215.2
model_estimates_plot <- subj_means_plot +
  geom_point(aes(x = A-0.05), data = random_estimates, size = 3, shape = 17) +
  geom_point(aes(x = A+0.05), data = subjA_estimates, size = 3, shape = 15)

model_estimates_plot + annotate(geom = "text", x = -0.48, y = 55, size = 4,
  label = "I Am Legend:\ntriangles=subj_intercepts_mod\ncircles=**subj_means**\nsquares=subjA_intercepts_mod")

plot of chunk unnamed-chunk-12

(1|Subject:A) does indeed lead to better estimates of the subject means than (1|Subject), but in the process we “undid” the thing we really cared about— the average effect of A—and we only have a single fixed effect for intercept.

tidy(subjA_intercepts_mod, effects = "fixed")
##          term estimate std.error statistic
## 1 (Intercept)    298.5     8.333     35.82

Adding A in as a fixed factor would indeed get us a parameter, but what would does it tell you? In this context it would be as if you took each of your participants, cut them in two, and assigned each half to a different condition, and forgot that the two halves know each other!

subjA_fixed_mod <- lmer(Reaction ~ A + (1|SubAject), data = example)
tidy(subjA_fixed_mod, effects = "fixed")
##          term estimate std.error statistic
## 1 (Intercept)   298.51     7.088    42.114
## 2           A    53.76    14.176     3.792

Conclusion

Although non-nested random effects (e.g., (1|Subject:A)) are not particularly useful in this case, creating new grouping variables is demonstrably effective in giving the model more flexibility.

Part 3: subj_slopes_mod

The better solution is to allow intercepts and slopes to vary across subjects.

subj_slopes_mod <- lmer(Reaction ~ A + (A|Subject), data = example)
# the random intercept is implied, so `(A|Subject) == (1+A|Subject)`

subj_slopes_parameters <- tidy(subj_slopes_mod, effect = "random") %>% select(-group)
subj_slopes_estimates <- convert_parameters_to_estimates(subj_slopes_parameters, id_var = "level")

model_estimates_plot +
  geom_point(aes(x = A+0.1), data = subj_slopes_estimates, shape = 18, size = 4) +
  annotate(geom = "text", x = -0.48, y = 73, size = 4,
    label = "I Am Legend:\ncircles=subj_intercepts_mod\ntriangles=**subj_means**\nsquares=suAbj_intercepts_mod\ndiamonds=subj_slopes_mod")

plot of chunk unnamed-chunk-15

Note that we’ve added two parameters to our original model: one for random slopes, and another for the relationship (i.e., correlation) between random intercepts and random slopes.

anova(subj_intercepts_mod, subj_slopes_mod)
## refitting model(s) with ML (instead of REML)
## Data: example
## Models:
## subj_intercepts_mod: Reaction ~ A + (1 | Subject)
## subj_slopes_mod: Reaction ~ A + (A | Subject)
##                     Df  AIC  BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## subj_intercepts_mod  4 1833 1846   -913     1825                        
## subj_slopes_mod      6 1811 1830   -900     1799    26      2    2.3e-06
##                        
## subj_intercepts_mod    
## subj_slopes_mod     ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## You can remove the random correlation parameter by only specifying one
## random effect at a time: `(1|Subject) + (0+A|Subject)`

However, the more complicated model is still better than subjA_intercept_mod.

anova(subj_slopes_mod, subjA_intercepts_mod)
## refitting model(s) with ML (instead of REML)
## Data: example
## Models:
## subjA_intercepts_mod: Reaction ~ 1 + (1 | SubAject)
## subj_slopes_mod: Reaction ~ A + (A | Subject)
##                      Df  AIC  BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## subjA_intercepts_mod  3 1836 1845   -915     1830                        
## subj_slopes_mod       6 1811 1830   -900     1799  30.7      3    9.6e-07
##                         
## subjA_intercepts_mod    
## subj_slopes_mod      ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion

Fully specifying random intercepts and slopes capitalizes on what hierarchical mixed effects models are supposed to do: explain variance at multiple levels.