My question pertains to using hierarchical linear modeling / mixed modeling using lme4 in R. I have some data from a series of independent studies comparing older adults and younger adults mean response times (RT) on experiments that have two conditions (“repetition” and “switch”).

# load data
data <- read.csv("https://dl.dropboxusercontent.com/u/33530204/agingData.csv")

# look at the data
head(data)
##   study rtYounger rtOlder  condition
## 1    s1       315     399 repetition
## 2    s1       325     439     switch
## 3    s2      1193    1560 repetition
## 4    s2      1320    1685     switch
## 5    s3       671     965 repetition
## 6    s3      1220    2033     switch

The main analysis uses a so-called “Brinley Plot”, which plots a regression plot, for each study and condition, predicting RT for older adults from RT for younger adults. The theoretical question of interest is whether the data are best described with one regression line (i.e., the same fit for “repetition” and “switch” conditions) or whether two are required (i.e., one regression line for “repetition” and a separate one for “switch”).

Below is a plot of the data with two regression lines (one for each condition).

# load packages
library(ggplot2)

# plot the data
p <- ggplot(data, aes(y = rtOlder, x = rtYounger, color = condition)) 
p <- p + geom_point()
p <- p + geom_smooth(method = lm, se = FALSE, fullrange = TRUE) 
p <- p + coord_cartesian(xlim = c(0, 4000), ylim = c(0, 4000)) 
p

Each data point represents the average response time for older adults plotted against the average response time for younger adults, for that condition, for that study.

Modeling the data

This section describes (largely verbatim) Verhaeghen (2014, p.24)1 who describes how to model this data using hierarchical linear modeling, assessing whether two regression lines or one is sufficient.

The within-study level model represents resonse times (RTs) of older adults as a function of the corresponding RT of younger adults, and is given by

\[ RT_{Older, it} = \beta_{0t} + \beta_{1t} * RT_{Younger, it} + R_{it}\]

where \(RT_{Older, it}\) is the average response time of older adults from condition \(i\) in study \(t\), \(RT_{Younger, it}\) is the average response time of younger adults from condition \(i\) in study \(t\), \(\beta_{0t}\) is the intercept for study \(t\), \(\beta_{1t}\) is the slope relating older to younger RTs for study \(t\), and \(R_{it}\) is the residual for condition \(i\) in study \(t\).

The between-study level model represents each regression parameter as a function of the overall mean and each study’s unique effect as follows:

\[\beta_{0t} = \bar{\beta_{0}} U_{0t} \]

\[\beta_{1t} = \bar{\beta_{1}} U_{1t} \]

where \(\bar{\beta_{0}}\) is the average intercept across all studies, \(\bar{\beta_{1}}\) is the average slope of across all studies, \(U_{0t}\) and \(U_{1t}\) are the increments to intercept and slope associated with study \(t\).

These equations reflect the “null model”. Condition effects are examined in the within-study level model by introducing a dummy variable that codes for condition (condition = 0 if “repetition”, condition = 1 if “switch”):

\[ RT_{Older, it} = \beta_{0t} + \beta_{1t} * RT_{Younger, it} + \alpha_{0t}(condition) + \alpha_{1t}(condition * RT_{Younger, it}) + R_{it}\]

Question

I assume this can easily be tested using lme4 in R, but how would I code the “null model”, and how would I code the final equation? Then, the adequacy of the null model can be assesed by comparing that model with the final model, so I assume this can be done by likelihood tests or just by checking AIC/BIC?

If you can help at all, please email me at grange.jim@gmail.com.

Many thanks!!


  1. Verhaeghen, P. (2014). The Elements of Cognitive Aging. Oxford University Press.