In my own personal lexicon I don't make much distinction between mixed models, repeated measures analyses, and longitudinal data analysis – the latter two are really special cases of the first, where the analysis involves experimental units measured repeatedly over time (as opposed to, say, a nested design where multiple measurements are taken, all at once, within a block). The special features of longitudinal/repeated measures analyses are:
This is all fairly straightforward …
library("nlme")
library("ggplot2")
library("bbmle")
library("gridExtra")
theme_set(theme_bw())
ggplot(Orthodont, aes(x = age, y = distance, colour = Sex)) + geom_line(aes(group = Subject))
m1 <- lme(distance ~ I(age - 11) * Sex, random = ~age | Subject, data = Orthodont)
summary(m1)
## Linear mixed-effects model fit by REML
## Data: Orthodont
## AIC BIC logLik
## 448.6 469.7 -216.3
##
## Random effects:
## Formula: ~age | Subject
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 2.4055 (Intr)
## age 0.1803 -0.668
## Residual 1.3100
##
## Fixed effects: distance ~ I(age - 11) * Sex
## Value Std.Error DF t-value p-value
## (Intercept) 24.969 0.4860 79 51.38 0.0000
## I(age - 11) 0.784 0.0860 79 9.12 0.0000
## SexFemale -2.321 0.7614 25 -3.05 0.0054
## I(age - 11):SexFemale -0.305 0.1347 79 -2.26 0.0264
## Correlation:
## (Intr) I(g-11) SexFml
## I(age - 11) 0.102
## SexFemale -0.638 -0.065
## I(age - 11):SexFemale -0.065 -0.638 0.102
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.168078 -0.385939 0.007104 0.445155 3.849464
##
## Number of Observations: 108
## Number of Groups: 27
DF column is actually wrong for age and age:SexFemale – these should probably be 25 as well.SexFemale only make sense (i.e. matches the conclusion we get from glancing at the plot the males have higher distance than females) if we center the age variable Even though we only have 4 observations per subject, we'll try to compute the autocorrelation:
plot(ACF(m1), alpha = 0.05)
There is actually sig. (negative) autocorrelation at lag-1 …
Use autoregressive-order-1 model, with order of observation in the data as a covariate
m2 <- update(m1, correlation = corAR1(form = ~1 | Subject))
summary(m2$modelStruct)
## Random effects:
## Formula: ~age | Subject
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 3.0889 (Intr)
## age 0.2663 -0.831
## Residual 1.0000
##
## Correlation Structure: AR(1)
## Formula: ~1 | Subject
## Parameter estimate(s):
## Phi
## -0.4733
Correlation coefficient of -0.47 between successive estimates.
(It's not normally necessary to pull bits out of the fit – could just use summary(m2) – doing it for compactness.
If we wanted we could test the difference by AIC: adding autocorrelation seems to be justified.
AICtab(m1, m2) ## slightly prettier than AIC(m1,m2)
## dAIC df
## m2 0.0 9
## m1 1.8 8
We can plot the residuals (supposedly) corrected for the autocorrelation – but (even though the likelihood is better) the apparent autocorrelation is worse: too noisy??
plot(ACF(m2, resType = "normalized"), alpha = 0.05)
Look at residuals by themselves
O2 <- transform(Orthodont, res = residuals(m2))
ggplot(O2, aes(x = age, y = res, colour = Sex)) + geom_line(aes(group = Subject))
Autocorrelation pattern is fairly clear from this. Also clear that one subject is a little wonky:
subset(O2, abs(res) > 2.5)
## distance age Subject Sex res
## 34 20.5 10 M09 Male -3.591
## 35 31.0 12 M09 Male 4.590
## 36 26.0 14 M09 Male -2.729
## 49 17.0 8 M13 Male -2.934
Taking this individual out does help with normality …
m3 <- update(m2, subset = Subject != "M09")
library("gridExtra")
grid.arrange(qqnorm(m2), qqnorm(m3), nrow = 1)
Doesn't really seem to fix the ACF issue, though …
plot(ACF(m3, resType = "normalized"), alpha = 0.05)
lmList to fit linear regressions for each individual, then analyze slope and intercept parameters?)