Basic repeated measures example

In my own personal lexicon I don't make much distinction between mixed models, repeated measures analyses, and longitudinal data analysis – the latter two are really special cases of the first, where the analysis involves experimental units measured repeatedly over time (as opposed to, say, a nested design where multiple measurements are taken, all at once, within a block). The special features of longitudinal/repeated measures analyses are:

we often want to treat time as a continuous covariate, allowing for smooth trends (linear, quadratic, spline) over time
we may want to account for autocorrelation in the errors at pairs of measurements depending on their distance apart in time

This is all fairly straightforward …

library("nlme")
library("ggplot2")
library("bbmle")
library("gridExtra")
theme_set(theme_bw())
ggplot(Orthodont, aes(x = age, y = distance, colour = Sex)) + geom_line(aes(group = Subject))

plot of chunk pkgs

m1 <- lme(distance ~ I(age - 11) * Sex, random = ~age | Subject, data = Orthodont)
summary(m1)

## Linear mixed-effects model fit by REML
##  Data: Orthodont 
##     AIC   BIC logLik
##   448.6 469.7 -216.3
## 
## Random effects:
##  Formula: ~age | Subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev Corr  
## (Intercept) 2.4055 (Intr)
## age         0.1803 -0.668
## Residual    1.3100       
## 
## Fixed effects: distance ~ I(age - 11) * Sex 
##                        Value Std.Error DF t-value p-value
## (Intercept)           24.969    0.4860 79   51.38  0.0000
## I(age - 11)            0.784    0.0860 79    9.12  0.0000
## SexFemale             -2.321    0.7614 25   -3.05  0.0054
## I(age - 11):SexFemale -0.305    0.1347 79   -2.26  0.0264
##  Correlation: 
##                       (Intr) I(g-11) SexFml
## I(age - 11)            0.102               
## SexFemale             -0.638 -0.065        
## I(age - 11):SexFemale -0.065 -0.638   0.102
## 
## Standardized Within-Group Residuals:
##       Min        Q1       Med        Q3       Max 
## -3.168078 -0.385939  0.007104  0.445155  3.849464 
## 
## Number of Observations: 108
## Number of Groups: 27

I think the DF column is actually wrong for age and age:SexFemale – these should probably be 25 as well.
comparison of SexFemale only make sense (i.e. matches the conclusion we get from glancing at the plot the males have higher distance than females) if we center the age variable

Even though we only have 4 observations per subject, we'll try to compute the autocorrelation:

plot(ACF(m1), alpha = 0.05)

plot of chunk unnamed-chunk-3

There is actually sig. (negative) autocorrelation at lag-1 …

Use autoregressive-order-1 model, with order of observation in the data as a covariate

m2 <- update(m1, correlation = corAR1(form = ~1 | Subject))
summary(m2$modelStruct)

## Random effects:
##  Formula: ~age | Subject
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev Corr  
## (Intercept) 3.0889 (Intr)
## age         0.2663 -0.831
## Residual    1.0000       
## 
## Correlation Structure: AR(1)
##  Formula: ~1 | Subject 
##  Parameter estimate(s):
##     Phi 
## -0.4733

Correlation coefficient of -0.47 between successive estimates.

(It's not normally necessary to pull bits out of the fit – could just use summary(m2) – doing it for compactness.

If we wanted we could test the difference by AIC: adding autocorrelation seems to be justified.

AICtab(m1, m2)  ## slightly prettier than AIC(m1,m2)

##    dAIC df
## m2 0.0  9 
## m1 1.8  8

We can plot the residuals (supposedly) corrected for the autocorrelation – but (even though the likelihood is better) the apparent autocorrelation is worse: too noisy??

plot(ACF(m2, resType = "normalized"), alpha = 0.05)

plot of chunk unnamed-chunk-6

Look at residuals by themselves

O2 <- transform(Orthodont, res = residuals(m2))
ggplot(O2, aes(x = age, y = res, colour = Sex)) + geom_line(aes(group = Subject))

plot of chunk unnamed-chunk-7

Autocorrelation pattern is fairly clear from this. Also clear that one subject is a little wonky:

subset(O2, abs(res) > 2.5)

##    distance age Subject  Sex    res
## 34     20.5  10     M09 Male -3.591
## 35     31.0  12     M09 Male  4.590
## 36     26.0  14     M09 Male -2.729
## 49     17.0   8     M13 Male -2.934

Taking this individual out does help with normality …

m3 <- update(m2, subset = Subject != "M09")
library("gridExtra")
grid.arrange(qqnorm(m2), qqnorm(m3), nrow = 1)

plot of chunk unnamed-chunk-9

Doesn't really seem to fix the ACF issue, though …

plot(ACF(m3, resType = "normalized"), alpha = 0.05)

plot of chunk unnamed-chunk-10

To do

Comparison with two-stage model (use lmList to fit linear regressions for each individual, then analyze slope and intercept parameters?)