A recent manuscript by Douglas Bates, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen (posted on arXiv) describes a strategy for determining parsimonious random effect structures for multilevel regression, as opposed to the maximal random effect structures recommended by Barr et al.. Here I am going to try out their recommended strategy on some of my own data.
Read in data based on a study of effects of transitional probability (TP) on novel word learning (Mirman et al., 2008):
WordLearnEx <- read.delim("http://www.danmirman.org/gca/WordLearnEx.txt")
summary(WordLearnEx)
## Subject TP Block Accuracy Correct
## Min. :244.0 High:280 Min. : 1.0 Min. :0.000 Min. :0.00
## 1st Qu.:319.5 Low :280 1st Qu.: 3.0 1st Qu.:0.667 1st Qu.:4.00
## Median :346.5 Median : 5.5 Median :0.833 Median :5.00
## Mean :344.4 Mean : 5.5 Mean :0.805 Mean :4.83
## 3rd Qu.:370.5 3rd Qu.: 8.0 3rd Qu.:1.000 3rd Qu.:6.00
## Max. :395.0 Max. :10.0 Max. :1.000 Max. :6.00
## Incorrect
## Min. :0.00
## 1st Qu.:0.00
## Median :1.00
## Mean :1.17
## 3rd Qu.:2.00
## Max. :6.00
Prepare data for analysis by making a second-order orthogonal polynomial from the Block values:
t <- poly(1:max(WordLearnEx$Block), 2)
WordLearnEx[, paste("ot", 1:2, sep="")] <- t[WordLearnEx$Block, 1:2]
Fit a maximal model:
m.max <- lmer(Accuracy ~ (ot1 + ot2)*TP +
(ot1 + ot2 | Subject), data=WordLearnEx, REML=FALSE)
summary(m.max)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Accuracy ~ (ot1 + ot2) * TP + (ot1 + ot2 | Subject)
## Data: WordLearnEx
##
## AIC BIC logLik deviance df.resid
## -332.6 -276.4 179.3 -358.6 547
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6182 -0.5358 0.1263 0.5665 2.6157
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 0.01076 0.10374
## ot1 0.01542 0.12419 -0.33
## ot2 0.00628 0.07925 -0.28 -0.82
## Residual 0.02456 0.15672
## Number of obs: 560, groups: Subject, 56
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.831486 0.021727 38.27
## ot1 0.287391 0.037788 7.61
## ot2 -0.167304 0.033188 -5.04
## TPLow -0.052961 0.030727 -1.72
## ot1:TPLow -0.001075 0.053441 -0.02
## ot2:TPLow 0.116455 0.046935 2.48
##
## Correlation of Fixed Effects:
## (Intr) ot1 ot2 TPLow o1:TPL
## ot1 -0.183
## ot2 -0.114 -0.229
## TPLow -0.707 0.129 0.081
## ot1:TPLow 0.129 -0.707 0.162 -0.183
## ot2:TPLow 0.081 0.162 -0.707 -0.114 -0.229
The model converged, so that eliminates one motivation for simplifying random effects. Also, none of the random effect correlations are 1 or -1, which would have been a sign of an overparameterized model. Also note that the ot2:TPLow effect (i.e., TP effect on quadratic term) is the key significant fixed effect (\(t=2.48, p = 0.013\)). It will be interesting to see whether this is different in a parsimonious model.
Now let’s try the RePsychLing procedure. First, if you need to install the package (you’ll need the devtools package):
devtools::install_github("dmbates/RePsychLing")
Now we can load the package:
library("RePsychLing")
The first step is to use rePCA to conduct a PCA of random effects to determine how many random effect parameters are needed to capture all of the variance:
summary(rePCA(m.max))
## $Subject
## Importance of components:
## [,1] [,2] [,3]
## Standard deviation 0.9195 0.6902 0
## Proportion of Variance 0.6396 0.3604 0
## Cumulative Proportion 0.6396 1.0000 1
Looks like two random effects terms are sufficient for capturing all of the variance in the random effects, which is quite a bit less than the maximal model’s 6 (3 variances and 3 correlations).
The second step is to use the new(?) zero correlation parameters random effects specification to remove the correlations. Instead using a pipe in the random effect specification (...|...), I will now use a double-pipe (...||...):
m.zcp <- lmer(Accuracy ~ (ot1 + ot2)*TP +
(ot1 + ot2 || Subject), data=WordLearnEx, REML=FALSE)
summary(m.zcp)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## Accuracy ~ (ot1 + ot2) * TP + ((1 | Subject) + (0 + ot1 | Subject) +
## (0 + ot2 | Subject))
## Data: WordLearnEx
##
## AIC BIC logLik deviance df.resid
## -329.0 -285.8 174.5 -349.0 550
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4466 -0.5392 0.1451 0.5781 2.5777
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 0.0106457 0.10318
## Subject.1 ot1 0.0122195 0.11054
## Subject.2 ot2 0.0001099 0.01048
## Residual 0.0256140 0.16004
## Number of obs: 560, groups: Subject, 56
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.831486 0.021718 38.29
## ot1 0.287391 0.036759 7.82
## ot2 -0.167304 0.030310 -5.52
## TPLow -0.052961 0.030714 -1.72
## ot1:TPLow -0.001075 0.051984 -0.02
## ot2:TPLow 0.116455 0.042865 2.72
##
## Correlation of Fixed Effects:
## (Intr) ot1 ot2 TPLow o1:TPL
## ot1 0.000
## ot2 0.000 0.000
## TPLow -0.707 0.000 0.000
## ot1:TPLow 0.000 -0.707 0.000 0.000
## ot2:TPLow 0.000 0.000 -0.707 0.000 0.000
anova(m.max, m.zcp)
## Data: WordLearnEx
## Models:
## m.zcp: Accuracy ~ (ot1 + ot2) * TP + ((1 | Subject) + (0 + ot1 | Subject) +
## m.zcp: (0 + ot2 | Subject))
## m.max: Accuracy ~ (ot1 + ot2) * TP + (ot1 + ot2 | Subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## m.zcp 10 -329.04 -285.76 174.52 -349.04
## m.max 13 -332.63 -276.37 179.32 -358.63 9.5928 3 0.02236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This seems interesting: although rePCA indicated that 2 random effects would be enough, removing the correlation parameters and dropping down to the 3 random effect variances is a significant decrement in model fit.
These trial-level data (error trials have already been removed) from a pilot study of phonological neighborhood density effects in younger and older adults.
density <- read.delim("http://www.danmirman.org/gca/Density.txt")
summary(density)
## Subject Group Item Density RT
## S103 : 115 old :1494 bag : 30 High:1397 Min. : 440
## S122 : 109 young:1370 blonde : 30 Low :1467 1st Qu.: 1161
## S105 : 107 chain : 30 Median : 1536
## S102 : 105 chair : 30 Mean : 1730
## S108 : 102 child : 30 3rd Qu.: 2040
## S110 : 102 couch : 30 Max. :12753
## (Other):2224 (Other):2684
To get main effect estimates for the fixed effects and avoid problems in the random effect syntax, I will create a deviance code for the Density and Group factors:
density$DensityDev <- ifelse(density$Density == "High", 0.5, -0.5)
density$GroupDev <- ifelse(density$Group == "old", 0.5, -0.5)
Fit the maximal model:
pnd.max <- lmer(RT ~ DensityDev*GroupDev + (DensityDev | Subject) + (GroupDev | Item),
data=density, REML=FALSE)
summary(pnd.max)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula:
## RT ~ DensityDev * GroupDev + (DensityDev | Subject) + (GroupDev |
## Item)
## Data: density
##
## AIC BIC logLik deviance df.resid
## 46245.1 46310.7 -23111.6 46223.1 2853
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8354 -0.5429 -0.1105 0.3643 12.3008
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Item (Intercept) 74015 272.1
## GroupDev 35651 188.8 1.00
## Subject (Intercept) 153306 391.5
## DensityDev 5564 74.6 1.00
## Residual 541954 736.2
## Number of obs: 2864, groups: Item, 120; Subject, 30
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1753.51 77.16 22.725
## DensityDev 116.68 58.85 1.982
## GroupDev 650.47 147.00 4.425
## DensityDev:GroupDev 144.55 70.93 2.038
##
## Correlation of Fixed Effects:
## (Intr) DnstyD GropDv
## DensityDev 0.217
## GroupDev -0.023 -0.015
## DnstyDv:GrD -0.024 0.398 0.379
This model also converged, so that’s not a motivation for simplifying the random effect structure. However, for both subjects and items, the random effect correlations were 1.00, suggesting that the model is overparameterized. Also, the main effect of Density and the Density-by-Group interactions are right near the critical 2.0 threshold, so it will be interesting to see whether simplifying the random effect structure affects these results.
Step 1: use rePCA to assess how many random effect parameters are needed:
summary(rePCA(pnd.max))
## $Item
## Importance of components:
## [,1] [,2]
## Standard deviation 0.4498 0
## Proportion of Variance 1.0000 0
## Cumulative Proportion 1.0000 1
##
## $Subject
## Importance of components:
## [,1] [,2]
## Standard deviation 0.5414 0
## Proportion of Variance 1.0000 0
## Cumulative Proportion 1.0000 1
Seems like one would do the trick. Before dropping all the way down to just a random intercepts model, let’s try removing the random effect correlations:
pnd.zcp <- lmer(RT ~ DensityDev*GroupDev + (DensityDev || Subject) + (GroupDev || Item),
data=density, REML=FALSE)
anova(pnd.zcp, pnd.max)
## Data: density
## Models:
## pnd.zcp: RT ~ DensityDev * GroupDev + ((1 | Subject) + (0 + DensityDev |
## pnd.zcp: Subject)) + ((1 | Item) + (0 + GroupDev | Item))
## pnd.max: RT ~ DensityDev * GroupDev + (DensityDev | Subject) + (GroupDev |
## pnd.max: Item)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## pnd.zcp 9 46272 46325 -23127 46254
## pnd.max 11 46245 46311 -23112 46223 30.527 2 2.35e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(pnd.zcp)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: RT ~ DensityDev * GroupDev + ((1 | Subject) + (0 + DensityDev |
## Subject)) + ((1 | Item) + (0 + GroupDev | Item))
## Data: density
##
## AIC BIC logLik deviance df.resid
## 46271.6 46325.3 -23126.8 46253.6 2855
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0179 -0.5342 -0.1052 0.3690 12.4007
##
## Random effects:
## Groups Name Variance Std.Dev.
## Item GroupDev 33086 181.90
## Item.1 (Intercept) 79032 281.13
## Subject DensityDev 5018 70.84
## Subject.1 (Intercept) 152735 390.81
## Residual 541035 735.55
## Number of obs: 2864, groups: Item, 120; Subject, 30
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1756.46 77.32 22.716
## DensityDev 117.63 60.18 1.955
## GroupDev 628.76 146.69 4.286
## DensityDev:GroupDev 136.43 70.06 1.947
##
## Correlation of Fixed Effects:
## (Intr) DnstyD GropDv
## DensityDev 0.001
## GroupDev -0.062 0.000
## DnstyDv:GrD 0.000 -0.021 0.004
Once again we get a substantial decrease in model fit when removing supposedly unnecessary random effect estimates. This is particularly surprising considering that the correlations that were removed were 1.00, so they really didn’t seem like they were capturing anything important.
Out of curiousity, let’s try the intercepts-only model suggested by rePCA:
pnd.int <- lmer(RT ~ DensityDev*GroupDev + (1 | Subject) + (1 | Item),
data=density, REML=FALSE)
anova(pnd.int, pnd.zcp, pnd.max)
## Data: density
## Models:
## pnd.int: RT ~ DensityDev * GroupDev + (1 | Subject) + (1 | Item)
## pnd.zcp: RT ~ DensityDev * GroupDev + ((1 | Subject) + (0 + DensityDev |
## pnd.zcp: Subject)) + ((1 | Item) + (0 + GroupDev | Item))
## pnd.max: RT ~ DensityDev * GroupDev + (DensityDev | Subject) + (GroupDev |
## pnd.max: Item)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## pnd.int 7 46273 46315 -23130 46259
## pnd.zcp 9 46272 46325 -23127 46254 5.5473 2 0.06243 .
## pnd.max 11 46245 46311 -23112 46223 30.5274 2 2.35e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Even this reduction produces are marginal decrease in model fit. Also, checking back on the fixed effect estimates:
summary(pnd.int)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: RT ~ DensityDev * GroupDev + (1 | Subject) + (1 | Item)
## Data: density
##
## AIC BIC logLik deviance df.resid
## 46273.2 46314.9 -23129.6 46259.2 2857
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0999 -0.5418 -0.1055 0.3677 12.5441
##
## Random effects:
## Groups Name Variance Std.Dev.
## Item (Intercept) 78386 280.0
## Subject (Intercept) 152732 390.8
## Residual 550642 742.1
## Number of obs: 2864, groups: Item, 120; Subject, 30
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1755.77 77.30 22.713
## DensityDev 117.30 58.66 2.000
## GroupDev 624.51 145.74 4.285
## DensityDev:GroupDev 136.46 55.90 2.441
##
## Correlation of Fixed Effects:
## (Intr) DnstyD GropDv
## DensityDev 0.001
## GroupDev -0.062 0.000
## DnstyDv:GrD 0.000 -0.021 0.005
Looks like the t-values inched up, consistent with the Barr et al. (2013) claim that non-maximal random effect structures are (somewhat) less conservative. However, the t-value or the interaction term was lowest in the zero correlation parameters model, so maybe it’s not that simple.