A recent manuscript by Douglas Bates, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen (posted on arXiv) describes a strategy for determining parsimonious random effect structures for multilevel regression, as opposed to the maximal random effect structures recommended by Barr et al.. Here I am going to try out their recommended strategy on some of my own data.

Example 1: Word learning

Read in data based on a study of effects of transitional probability (TP) on novel word learning (Mirman et al., 2008):

WordLearnEx <- read.delim("http://www.danmirman.org/gca/WordLearnEx.txt")
summary(WordLearnEx)
##     Subject         TP          Block         Accuracy        Correct    
##  Min.   :244.0   High:280   Min.   : 1.0   Min.   :0.000   Min.   :0.00  
##  1st Qu.:319.5   Low :280   1st Qu.: 3.0   1st Qu.:0.667   1st Qu.:4.00  
##  Median :346.5              Median : 5.5   Median :0.833   Median :5.00  
##  Mean   :344.4              Mean   : 5.5   Mean   :0.805   Mean   :4.83  
##  3rd Qu.:370.5              3rd Qu.: 8.0   3rd Qu.:1.000   3rd Qu.:6.00  
##  Max.   :395.0              Max.   :10.0   Max.   :1.000   Max.   :6.00  
##    Incorrect   
##  Min.   :0.00  
##  1st Qu.:0.00  
##  Median :1.00  
##  Mean   :1.17  
##  3rd Qu.:2.00  
##  Max.   :6.00

Prepare data for analysis by making a second-order orthogonal polynomial from the Block values:

t <- poly(1:max(WordLearnEx$Block), 2)
WordLearnEx[, paste("ot", 1:2, sep="")] <- t[WordLearnEx$Block, 1:2]

Fit a maximal model:

m.max <- lmer(Accuracy ~ (ot1 + ot2)*TP + 
               (ot1 + ot2 | Subject), data=WordLearnEx, REML=FALSE)
summary(m.max)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: Accuracy ~ (ot1 + ot2) * TP + (ot1 + ot2 | Subject)
##    Data: WordLearnEx
## 
##      AIC      BIC   logLik deviance df.resid 
##   -332.6   -276.4    179.3   -358.6      547 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.6182 -0.5358  0.1263  0.5665  2.6157 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr       
##  Subject  (Intercept) 0.01076  0.10374             
##           ot1         0.01542  0.12419  -0.33      
##           ot2         0.00628  0.07925  -0.28 -0.82
##  Residual             0.02456  0.15672             
## Number of obs: 560, groups:  Subject, 56
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  0.831486   0.021727   38.27
## ot1          0.287391   0.037788    7.61
## ot2         -0.167304   0.033188   -5.04
## TPLow       -0.052961   0.030727   -1.72
## ot1:TPLow   -0.001075   0.053441   -0.02
## ot2:TPLow    0.116455   0.046935    2.48
## 
## Correlation of Fixed Effects:
##           (Intr) ot1    ot2    TPLow  o1:TPL
## ot1       -0.183                            
## ot2       -0.114 -0.229                     
## TPLow     -0.707  0.129  0.081              
## ot1:TPLow  0.129 -0.707  0.162 -0.183       
## ot2:TPLow  0.081  0.162 -0.707 -0.114 -0.229

The model converged, so that eliminates one motivation for simplifying random effects. Also, none of the random effect correlations are 1 or -1, which would have been a sign of an overparameterized model. Also note that the ot2:TPLow effect (i.e., TP effect on quadratic term) is the key significant fixed effect (\(t=2.48, p = 0.013\)). It will be interesting to see whether this is different in a parsimonious model.

Now let’s try the RePsychLing procedure. First, if you need to install the package (you’ll need the devtools package):

devtools::install_github("dmbates/RePsychLing")

Now we can load the package:

library("RePsychLing")

The first step is to use rePCA to conduct a PCA of random effects to determine how many random effect parameters are needed to capture all of the variance:

summary(rePCA(m.max))
## $Subject
## Importance of components:
##                          [,1]   [,2] [,3]
## Standard deviation     0.9195 0.6902    0
## Proportion of Variance 0.6396 0.3604    0
## Cumulative Proportion  0.6396 1.0000    1

Looks like two random effects terms are sufficient for capturing all of the variance in the random effects, which is quite a bit less than the maximal model’s 6 (3 variances and 3 correlations).

The second step is to use the new(?) zero correlation parameters random effects specification to remove the correlations. Instead using a pipe in the random effect specification (...|...), I will now use a double-pipe (...||...):

m.zcp <- lmer(Accuracy ~ (ot1 + ot2)*TP + 
               (ot1 + ot2 || Subject), data=WordLearnEx, REML=FALSE)
summary(m.zcp)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## Accuracy ~ (ot1 + ot2) * TP + ((1 | Subject) + (0 + ot1 | Subject) +  
##     (0 + ot2 | Subject))
##    Data: WordLearnEx
## 
##      AIC      BIC   logLik deviance df.resid 
##   -329.0   -285.8    174.5   -349.0      550 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4466 -0.5392  0.1451  0.5781  2.5777 
## 
## Random effects:
##  Groups    Name        Variance  Std.Dev.
##  Subject   (Intercept) 0.0106457 0.10318 
##  Subject.1 ot1         0.0122195 0.11054 
##  Subject.2 ot2         0.0001099 0.01048 
##  Residual              0.0256140 0.16004 
## Number of obs: 560, groups:  Subject, 56
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  0.831486   0.021718   38.29
## ot1          0.287391   0.036759    7.82
## ot2         -0.167304   0.030310   -5.52
## TPLow       -0.052961   0.030714   -1.72
## ot1:TPLow   -0.001075   0.051984   -0.02
## ot2:TPLow    0.116455   0.042865    2.72
## 
## Correlation of Fixed Effects:
##           (Intr) ot1    ot2    TPLow  o1:TPL
## ot1        0.000                            
## ot2        0.000  0.000                     
## TPLow     -0.707  0.000  0.000              
## ot1:TPLow  0.000 -0.707  0.000  0.000       
## ot2:TPLow  0.000  0.000 -0.707  0.000  0.000
anova(m.max, m.zcp)
## Data: WordLearnEx
## Models:
## m.zcp: Accuracy ~ (ot1 + ot2) * TP + ((1 | Subject) + (0 + ot1 | Subject) + 
## m.zcp:     (0 + ot2 | Subject))
## m.max: Accuracy ~ (ot1 + ot2) * TP + (ot1 + ot2 | Subject)
##       Df     AIC     BIC logLik deviance  Chisq Chi Df Pr(>Chisq)  
## m.zcp 10 -329.04 -285.76 174.52  -349.04                           
## m.max 13 -332.63 -276.37 179.32  -358.63 9.5928      3    0.02236 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This seems interesting: although rePCA indicated that 2 random effects would be enough, removing the correlation parameters and dropping down to the 3 random effect variances is a significant decrement in model fit.

Example 2: Crossed random effects of subjects and items

These trial-level data (error trials have already been removed) from a pilot study of phonological neighborhood density effects in younger and older adults.

density <- read.delim("http://www.danmirman.org/gca/Density.txt")
summary(density)
##     Subject       Group           Item      Density           RT       
##  S103   : 115   old  :1494   bag    :  30   High:1397   Min.   :  440  
##  S122   : 109   young:1370   blonde :  30   Low :1467   1st Qu.: 1161  
##  S105   : 107                chain  :  30               Median : 1536  
##  S102   : 105                chair  :  30               Mean   : 1730  
##  S108   : 102                child  :  30               3rd Qu.: 2040  
##  S110   : 102                couch  :  30               Max.   :12753  
##  (Other):2224                (Other):2684

To get main effect estimates for the fixed effects and avoid problems in the random effect syntax, I will create a deviance code for the Density and Group factors:

density$DensityDev <- ifelse(density$Density == "High", 0.5, -0.5)
density$GroupDev <- ifelse(density$Group == "old", 0.5, -0.5)

Fit the maximal model:

pnd.max <- lmer(RT ~ DensityDev*GroupDev + (DensityDev | Subject) + (GroupDev | Item), 
                data=density, REML=FALSE)
summary(pnd.max)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: 
## RT ~ DensityDev * GroupDev + (DensityDev | Subject) + (GroupDev |  
##     Item)
##    Data: density
## 
##      AIC      BIC   logLik deviance df.resid 
##  46245.1  46310.7 -23111.6  46223.1     2853 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.8354 -0.5429 -0.1105  0.3643 12.3008 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Item     (Intercept)  74015   272.1        
##           GroupDev     35651   188.8    1.00
##  Subject  (Intercept) 153306   391.5        
##           DensityDev    5564    74.6    1.00
##  Residual             541954   736.2        
## Number of obs: 2864, groups:  Item, 120; Subject, 30
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          1753.51      77.16  22.725
## DensityDev            116.68      58.85   1.982
## GroupDev              650.47     147.00   4.425
## DensityDev:GroupDev   144.55      70.93   2.038
## 
## Correlation of Fixed Effects:
##             (Intr) DnstyD GropDv
## DensityDev   0.217              
## GroupDev    -0.023 -0.015       
## DnstyDv:GrD -0.024  0.398  0.379

This model also converged, so that’s not a motivation for simplifying the random effect structure. However, for both subjects and items, the random effect correlations were 1.00, suggesting that the model is overparameterized. Also, the main effect of Density and the Density-by-Group interactions are right near the critical 2.0 threshold, so it will be interesting to see whether simplifying the random effect structure affects these results.

Step 1: use rePCA to assess how many random effect parameters are needed:

summary(rePCA(pnd.max))
## $Item
## Importance of components:
##                          [,1] [,2]
## Standard deviation     0.4498    0
## Proportion of Variance 1.0000    0
## Cumulative Proportion  1.0000    1
## 
## $Subject
## Importance of components:
##                          [,1] [,2]
## Standard deviation     0.5414    0
## Proportion of Variance 1.0000    0
## Cumulative Proportion  1.0000    1

Seems like one would do the trick. Before dropping all the way down to just a random intercepts model, let’s try removing the random effect correlations:

pnd.zcp <- lmer(RT ~ DensityDev*GroupDev + (DensityDev || Subject) + (GroupDev || Item), 
                data=density, REML=FALSE)
anova(pnd.zcp, pnd.max)
## Data: density
## Models:
## pnd.zcp: RT ~ DensityDev * GroupDev + ((1 | Subject) + (0 + DensityDev | 
## pnd.zcp:     Subject)) + ((1 | Item) + (0 + GroupDev | Item))
## pnd.max: RT ~ DensityDev * GroupDev + (DensityDev | Subject) + (GroupDev | 
## pnd.max:     Item)
##         Df   AIC   BIC logLik deviance  Chisq Chi Df Pr(>Chisq)    
## pnd.zcp  9 46272 46325 -23127    46254                             
## pnd.max 11 46245 46311 -23112    46223 30.527      2   2.35e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(pnd.zcp)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: RT ~ DensityDev * GroupDev + ((1 | Subject) + (0 + DensityDev |  
##     Subject)) + ((1 | Item) + (0 + GroupDev | Item))
##    Data: density
## 
##      AIC      BIC   logLik deviance df.resid 
##  46271.6  46325.3 -23126.8  46253.6     2855 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0179 -0.5342 -0.1052  0.3690 12.4007 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  Item      GroupDev     33086   181.90  
##  Item.1    (Intercept)  79032   281.13  
##  Subject   DensityDev    5018    70.84  
##  Subject.1 (Intercept) 152735   390.81  
##  Residual              541035   735.55  
## Number of obs: 2864, groups:  Item, 120; Subject, 30
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          1756.46      77.32  22.716
## DensityDev            117.63      60.18   1.955
## GroupDev              628.76     146.69   4.286
## DensityDev:GroupDev   136.43      70.06   1.947
## 
## Correlation of Fixed Effects:
##             (Intr) DnstyD GropDv
## DensityDev   0.001              
## GroupDev    -0.062  0.000       
## DnstyDv:GrD  0.000 -0.021  0.004

Once again we get a substantial decrease in model fit when removing supposedly unnecessary random effect estimates. This is particularly surprising considering that the correlations that were removed were 1.00, so they really didn’t seem like they were capturing anything important.

Out of curiousity, let’s try the intercepts-only model suggested by rePCA:

pnd.int <- lmer(RT ~ DensityDev*GroupDev + (1 | Subject) + (1 | Item), 
                data=density, REML=FALSE)
anova(pnd.int, pnd.zcp, pnd.max)
## Data: density
## Models:
## pnd.int: RT ~ DensityDev * GroupDev + (1 | Subject) + (1 | Item)
## pnd.zcp: RT ~ DensityDev * GroupDev + ((1 | Subject) + (0 + DensityDev | 
## pnd.zcp:     Subject)) + ((1 | Item) + (0 + GroupDev | Item))
## pnd.max: RT ~ DensityDev * GroupDev + (DensityDev | Subject) + (GroupDev | 
## pnd.max:     Item)
##         Df   AIC   BIC logLik deviance   Chisq Chi Df Pr(>Chisq)    
## pnd.int  7 46273 46315 -23130    46259                              
## pnd.zcp  9 46272 46325 -23127    46254  5.5473      2    0.06243 .  
## pnd.max 11 46245 46311 -23112    46223 30.5274      2   2.35e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Even this reduction produces are marginal decrease in model fit. Also, checking back on the fixed effect estimates:

summary(pnd.int)
## Linear mixed model fit by maximum likelihood  ['lmerMod']
## Formula: RT ~ DensityDev * GroupDev + (1 | Subject) + (1 | Item)
##    Data: density
## 
##      AIC      BIC   logLik deviance df.resid 
##  46273.2  46314.9 -23129.6  46259.2     2857 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0999 -0.5418 -0.1055  0.3677 12.5441 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Item     (Intercept)  78386   280.0   
##  Subject  (Intercept) 152732   390.8   
##  Residual             550642   742.1   
## Number of obs: 2864, groups:  Item, 120; Subject, 30
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          1755.77      77.30  22.713
## DensityDev            117.30      58.66   2.000
## GroupDev              624.51     145.74   4.285
## DensityDev:GroupDev   136.46      55.90   2.441
## 
## Correlation of Fixed Effects:
##             (Intr) DnstyD GropDv
## DensityDev   0.001              
## GroupDev    -0.062  0.000       
## DnstyDv:GrD  0.000 -0.021  0.005

Looks like the t-values inched up, consistent with the Barr et al. (2013) claim that non-maximal random effect structures are (somewhat) less conservative. However, the t-value or the interaction term was lowest in the zero correlation parameters model, so maybe it’s not that simple.