Acknowledgements: This tutorial is based on the tutorial by Gabriela K Hajduk

Terms:

Fixed effect: main predictor (e.g., body length, test)

Random effect: source of systematic noise (e.g., mountain range, individual dragon) that is not important for our hypothesis

Levels: different values of a predictor (e.g., trial 1 vs 2)

Interaction: combinations of 2 predictors making different predictions for the dependent variable

Linear regression, without random effects

Hypothesis 1: higher body length –> higher test score

lm.bl <- lm(score ~ bodyLength, data = dragons)
summary(lm.bl)
## 
## Call:
## lm(formula = score ~ bodyLength, data = dragons)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -101.151  -30.292    9.781   15.755  139.128 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 20.92057   14.65522   1.428    0.154    
## bodyLength   0.34729    0.07256   4.786 1.88e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 44.59 on 1438 degrees of freedom
## Multiple R-squared:  0.01568,    Adjusted R-squared:  0.01499 
## F-statistic: 22.91 on 1 and 1438 DF,  p-value: 1.876e-06

Data interpretation and reporting:

How do we interpret the data?

ggplot(data = dragons, 
       mapping = aes(x = bodyLength, y = score)) + 
  geom_smooth(method='lm')
## `geom_smooth()` using formula = 'y ~ x'

ggplot(data = dragons, 
       mapping = aes(x = bodyLength, y = score)) + 
  geom_smooth(method='lm') +
  geom_abline(intercept = 20.92057, slope = 0.34729, 
              color = "red")
## `geom_smooth()` using formula = 'y ~ x'

Reporting results:

Body length is a significant predictor of test scores, with dragons that have longer bodies scoring higher (\(\beta\) = 0.35, 95% \(CI\) [0.20, 0.49], \(p < .001\)).

Some useful functions:

coef(lm.bl): get only beta estimates

coef(summary(lm.bl)): get beta estimates, p-value, etc.

report(lm.bl): get a (lengthy) APA report.

confint: get CI (default is 95%). Or you can report SE. Use confint.merMod for lmer/glmer.

force_decimals and force_p: additional aesthetics stuff, automatically format values to APA.

Now onto other models…

When we facet out mountainRange and test, we saw that the effect is different between tests 1 and 2. We also saw that the effect is different between mountain ranges.

ggplot(data = dragons, 
       mapping = aes(x = bodyLength, y = score, colour = mountainRange)) + 
  geom_point(size = 2, alpha = .3) +
  geom_smooth(method='lm') +
  facet_grid(test~mountainRange) +
  theme_classic() + 
  theme(legend.position = "none")
## `geom_smooth()` using formula = 'y ~ x'

Considering random effects

Relationship between body length and score might differ based on which mountain range a dragon comes from. And the overall effect we found of body length on score might be driven by some mountain ranges where the effect is very strong.

Relationship between body length and score might also differ based on the individual dragon (i.e., some dragons are better at all tests.)

Syntax: (1|random effect)

lmer.base <- lmer(score ~ bodyLength + (1|mountainRange) + (1|pid), data = dragons)
## boundary (singular) fit: see help('isSingular')
summary(lmer.base)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ bodyLength + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 15021.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2751 -0.6674  0.2031  0.3593  3.1240 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  pid           (Intercept)    0.000  0.000  
##  mountainRange (Intercept)    8.696  2.949  
##  Residual                  1981.747 44.517  
## Number of obs: 1440, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept) 18.47702   17.72650 18.47986   1.042 0.310700    
## bodyLength   0.35942    0.08771 18.68057   4.098 0.000631 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## bodyLength -0.996
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(lmer.base, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##               Chisq Df Pr(>Chisq)    
## (Intercept)  1.0865  1     0.2973    
## bodyLength  16.7937  1  4.167e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coef(summary(lmer.base))
##               Estimate  Std. Error       df  t value     Pr(>|t|)
## (Intercept) 18.4770218 17.72650380 18.47986 1.042339 0.3106996953
## bodyLength   0.3594248  0.08770712 18.68057 4.098011 0.0006313981

Yes – after adding the random effect of mountain range, there is still an effect of body length on score.

Note: Singular fit

Not a big deal for this demonstration, but if you see this in your actual analysis, it’s a sign that your predictor structure is too complex and is overfitting the data. Consider dropping random effects.

Note: Common random effects:

Individual participant: if a participant sees multiple trials, there will be dependencies between their responses – they might intrinsically be better or worse at the task.

Individual trial: a trial might intrinsically be easier or more difficult for all participants.

I always include both of these random effects in my analysis. In this demonstration, we will be including random participant effect, but we don’t have multiple trials so we won’t be including that.

What is Anova doing?

It’s running a Wald chi-squared test – p-values will be similar to a likelihood ratio test comparing against a uniformly informative model (no fixed effects, no random effects!) But these are different tests.

lm.uni <- lm(score ~ 1 , data = dragons)
#summary(lm.uni)
#Anova(lm.uni, type = 3)
anova(lmer.base, lm.uni, type = 3, refit = FALSE)
## Warning in anova.merMod(lmer.base, lm.uni, type = 3, refit = FALSE): some
## models fit with REML = TRUE, some not
## Data: dragons
## Models:
## lm.uni: score ~ 1
## lmer.base: score ~ bodyLength + (1 | mountainRange) + (1 | pid)
##           npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## lm.uni       2 15048 15059 -7522.2    15044                         
## lmer.base    5 15032 15058 -7510.8    15022 22.842  3  4.357e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding a second predictor

Let’s say we have a second hypothesis:

Hypothesis 2: higher body length –> higher test score, scores for test 2 > test 1

Checking the effect of test:

lmer.test <- lmer(score ~ test + (1|mountainRange) + (1|pid), data = dragons )
## boundary (singular) fit: see help('isSingular')
summary(lmer.test)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ test + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 14978.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0958 -0.5653  0.0019  0.4473  3.4375 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  pid           (Intercept)    0.00   0.000  
##  mountainRange (Intercept)   29.29   5.412  
##  Residual                  1931.43  43.948  
## Number of obs: 1440, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  101.885      2.772   16.498  36.754  < 2e-16 ***
## testtest_2   -14.290      2.837 1430.000  -5.037 5.32e-07 ***
## testtest_3   -18.859      2.837 1430.000  -6.648 4.22e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr) tstt_2
## testtest_2 -0.512       
## testtest_3 -0.512  0.500
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(lmer.test, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                Chisq Df Pr(>Chisq)    
## (Intercept) 1350.838  1  < 2.2e-16 ***
## test          48.108  2  3.577e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Add both fixed effects:

lmer.bl_test <- lmer(score ~ bodyLength + test + (1|mountainRange) + (1|pid), data = dragons)
## boundary (singular) fit: see help('isSingular')
summary(lmer.bl_test)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 14966.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.1336 -0.5674  0.0128  0.4442  3.3524 
## 
## Random effects:
##  Groups        Name        Variance  Std.Dev. 
##  pid           (Intercept) 5.513e-11 7.425e-06
##  mountainRange (Intercept) 9.031e+00 3.005e+00
##  Residual                  1.920e+03 4.381e+01
## Number of obs: 1440, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##               Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)   29.38397   17.68361   19.29598   1.662 0.112744    
## bodyLength     0.36013    0.08712   19.18554   4.134 0.000554 ***
## testtest_2   -14.29003    2.82808 1429.68868  -5.053 4.91e-07 ***
## testtest_3   -18.85880    2.82808 1429.68868  -6.668 3.68e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr) bdyLng tstt_2
## bodyLength -0.992              
## testtest_2 -0.080  0.000       
## testtest_3 -0.080  0.000  0.500
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
lmer.anova.bl_test <- Anova(lmer.bl_test, type = 3)
lmer.anova.bl_test 
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##               Chisq Df Pr(>Chisq)    
## (Intercept)  2.7611  1    0.09658 .  
## bodyLength  17.0893  1  3.566e-05 ***
## test        48.4063  2  3.081e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lmer.compare.bl_test <- anova(lmer.bl_test, lmer.base, type = 3, refit = FALSE)
lmer.compare.bl_test
## Data: dragons
## Models:
## lmer.base: score ~ bodyLength + (1 | mountainRange) + (1 | pid)
## lmer.bl_test: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
##              npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## lmer.base       5 15032 15058 -7510.8    15022                         
## lmer.bl_test    7 14980 15017 -7483.2    14966 55.217  2  1.023e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is an effect of body length and test (after controlling for random effects). And adding test as a fixed effect results in a model that better explains the variance in the data.

Post-hoc comparisons

How do we interpret the ‘test’ effect? Use emmeans.

lmer.contr.bl_test <- lmer.bl_test %>% 
  emmeans(specs = pairwise ~ test,
          type = "response") %>%
  pluck("contrasts") %>%
  summary()
  
lmer.contr.bl_test
##  contrast        estimate   SE  df t.ratio p.value
##  test_1 - test_2    14.29 2.83 958   5.053  <.0001
##  test_1 - test_3    18.86 2.83 958   6.668  <.0001
##  test_2 - test_3     4.57 2.83 958   1.616  0.2395
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: tukey method for comparing a family of 3 estimates

Scores in test 1 is significantly higher than scores in test 2, and also significantly higher than test 3. But there is no significant differences between test 1 and 3.

Reporting results:

A model that includes both body length and test as predictors showed that they both are significant predictors of test scores, with dragons that have longer bodies scoring higher (\(\beta\) = 0.36, 95% \(CI\) [0.19, 0.53], \(p < .001\)). There is an effect of test (\(\chi^2\)(2) = 48.41, \(p < .001\)).

Post-hoc pairwise comparisons between the three test types with Tukey corrections showed that scores in Test 1 is significantly higher than Test 2 (\(t\) = 5.05, \(p < .001\)) and Test 3 (\(t\) = 6.67, \(p < .001\)). Scores in Test 2 was not significantly different from Test 3 (\(t\) = 1.62, \(p = .240\)).

A likelihood ratio test shows that the model with both body length and test as predictors explained significantly more variation in the observed data compared to the base model with only body length as predictor (\(\chi^2\)(2) = 55.22, \(p < .001\)).

Note: ANOVA:

Anova() or car::Anova(): compare the full model with a uniformly informative model (all predictors removed) –> this allows us to get the p-value to see whether a predictor (e.g., body length) is significant.

anova(): compare 2 models –> whether 1 model explains significantly more variance in the data compared to the other.

See also: https://www.bookdown.org/rwnahhas/RMPH/mlr-distinctions.html

Adding an interaction effect

So now we know that both body length and test is a significant predictor of the data. Is there an interaction effect – does body length predict scores in a similar manner across both tests?

Syntax: fixed_eff1 * fixed_eff2 includes both interaction and main effects. fixed_eff1 : fixed_eff2 to suppress main effects and only test interaction.

lmer.bl_test_int <- lmer(score ~ bodyLength * test + (1|mountainRange) + (1|pid), 
                         data = dragons)
## boundary (singular) fit: see help('isSingular')
summary(lmer.bl_test_int)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 14960.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2524 -0.5721  0.0021  0.4294  3.2704 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  pid           (Intercept)    0.000  0.000  
##  mountainRange (Intercept)    9.086  3.014  
##  Residual                  1909.311 43.696  
## Number of obs: 1440, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##                         Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)           -1.116e+00  2.687e+01  9.985e+01  -0.042 0.966935    
## bodyLength             5.116e-01  1.330e-01  1.011e+02   3.848 0.000209 ***
## testtest_2             7.906e+01  3.517e+01  1.428e+03   2.248 0.024747 *  
## testtest_3            -2.078e+01  3.517e+01  1.428e+03  -0.591 0.554757    
## bodyLength:testtest_2 -4.637e-01  1.742e-01  1.428e+03  -2.663 0.007842 ** 
## bodyLength:testtest_3  9.545e-03  1.742e-01  1.428e+03   0.055 0.956300    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) bdyLng tstt_2 tstt_3 bdL:_2
## bodyLength  -0.996                            
## testtest_2  -0.655  0.653                     
## testtest_3  -0.655  0.653  0.500              
## bdyLngth:_2  0.653 -0.655 -0.997 -0.498       
## bdyLngth:_3  0.653 -0.655 -0.498 -0.997  0.500
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(lmer.bl_test_int, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                   Chisq Df Pr(>Chisq)    
## (Intercept)      0.0017  1  0.9668519    
## bodyLength      14.8043  1  0.0001193 ***
## test             8.9721  2  0.0112648 *  
## bodyLength:test  9.6508  2  0.0080235 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmer.bl_test_int, lmer.bl_test, type = 3, refit = FALSE)
## Data: dragons
## Models:
## lmer.bl_test: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
## lmer.bl_test_int: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
##                  npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## lmer.bl_test        7 14980 15017 -7483.2    14966                       
## lmer.bl_test_int    9 14978 15026 -7480.2    14960 6.0273  2    0.04911 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is an interaction effect!

Interpreting estimates of interactions requires some math, which we won’t go into. But I recommend this post if you’re interested!

Quick plot (note that this doesn’t take into account variances explained by the random effects) that demonstrates this:

Red line: test_1; Blue line: test_2; Green line: test_3

ggplot() + 
  geom_smooth(data = dragons %>%
                filter(test == "test_1"), 
              mapping = aes(x = bodyLength, y = score), 
              method='lm', 
              color = "red") +
  
  geom_smooth(data = dragons %>%
                filter(test == "test_2"), 
              mapping = aes(x = bodyLength, y = score), 
              method='lm', 
              color = "blue") +
  
  geom_smooth(data = dragons %>%
                filter(test == "test_3"), 
              mapping = aes(x = bodyLength, y = score), 
              method='lm', 
              color = "green") +
  theme_classic() 
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Interaction effect vs. 2 predictors

Let’s look at a subset of the data with only scores for test 1 and 3.

lmer.bl_test_13 <- lmer(score ~ bodyLength + test + (1|mountainRange) + (1|pid), 
                            data = dragons %>%
                              filter(test != "test_2"))
## boundary (singular) fit: see help('isSingular')
summary(lmer.bl_test_13)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
##    Data: dragons %>% filter(test != "test_2")
## 
## REML criterion at convergence: 9809.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4652 -0.1759 -0.0033  0.0531  3.5476 
## 
## Random effects:
##  Groups        Name        Variance  Std.Dev. 
##  pid           (Intercept) 3.800e-10 1.949e-05
##  mountainRange (Intercept) 5.634e+00 2.374e+00
##  Residual                  1.610e+03 4.013e+01
## Number of obs: 960, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  -1.41476   18.25883  15.33431  -0.077    0.939    
## bodyLength    0.51312    0.09014  15.30095   5.692 3.96e-05 ***
## testtest_3  -18.85880    2.59020 950.66254  -7.281 6.95e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr) bdyLng
## bodyLength -0.994       
## testtest_3 -0.071  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(lmer.bl_test_13, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##              Chisq Df Pr(>Chisq)    
## (Intercept)  0.006  1     0.9382    
## bodyLength  32.402  1  1.254e-08 ***
## test        53.010  1  3.318e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lmer.bl_test_int_13 <- lmer(score ~ bodyLength * test + (1|mountainRange) + (1|pid), 
                            data = dragons %>%
                              filter(test != "test_2"))
## boundary (singular) fit: see help('isSingular')
summary(lmer.bl_test_int_13)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
##    Data: dragons %>% filter(test != "test_2")
## 
## REML criterion at convergence: 9811.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4675 -0.1759 -0.0037  0.0535  3.5428 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  pid           (Intercept)    0.00   0.000  
##  mountainRange (Intercept)    5.62   2.371  
##  Residual                  1611.88  40.148  
## Number of obs: 960, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##                         Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)            -0.447916  24.350698  47.544616  -0.018 0.985401    
## bodyLength              0.508317   0.120542  47.939601   4.217 0.000109 ***
## testtest_3            -20.780373  32.318533 949.663116  -0.643 0.520388    
## bodyLength:testtest_3   0.009545   0.160019 949.663116   0.060 0.952447    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) bdyLng tstt_3
## bodyLength  -0.997              
## testtest_3  -0.664  0.662       
## bdyLngth:_3  0.661 -0.664 -0.997
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(lmer.bl_test_int_13, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                   Chisq Df Pr(>Chisq)    
## (Intercept)      0.0003  1     0.9853    
## bodyLength      17.7824  1  2.477e-05 ***
## test             0.4134  1     0.5202    
## bodyLength:test  0.0036  1     0.9524    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is an effect of test, but not when an interaction with body length is added. However, a likelihood ratio test shows that the model without the interaction better explains the data.

anova(lmer.bl_test_int_13, lmer.bl_test_13, type = 3, refit = FALSE)
## Data: dragons %>% filter(test != "test_2")
## Models:
## lmer.bl_test_13: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
## lmer.bl_test_int_13: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
##                     npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)
## lmer.bl_test_13        6 9821.4 9850.6 -4904.7   9809.4                    
## lmer.bl_test_int_13    7 9825.3 9859.3 -4905.6   9811.3     0  1          1

Generalized linear mixed-effect models

If DVs are measures where the residual error does not follow a normal distribution –> use glmer instead. This allows you to specify a link function that “link” the response to a linear predictor.

A common use is with a binary outcome (e.g., success / failure, 2AFC, etc.), coded as 0/1. This is essentially a logistic regression. In our example dataset, this is the case with test_4.

#removed pid random effect because each dragon is only measured once for test_4
glmer.test4 <- glmer(score ~ bodyLength + (1|mountainRange), 
                     data = dragons_test4,
                     family = "binomial")
## boundary (singular) fit: see help('isSingular')
summary(glmer.test4)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: score ~ bodyLength + (1 | mountainRange)
##    Data: dragons_test4
## 
##      AIC      BIC   logLik deviance df.resid 
##    654.4    666.9   -324.2    648.4      477 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.4269 -1.1456  0.7600  0.8543  0.9935 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  mountainRange (Intercept) 0        0       
## Number of obs: 480, groups:  mountainRange, 8
## 
## Fixed effects:
##              Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  2.331149   1.168484   1.995   0.0460 *
## bodyLength  -0.009854   0.005773  -1.707   0.0878 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##            (Intr)
## bodyLength -0.997
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(glmer.test4, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##              Chisq Df Pr(>Chisq)  
## (Intercept) 3.9801  1    0.04604 *
## bodyLength  2.9142  1    0.08780 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Different structures of random effects

Random slopes

Think of random effects as ‘grouping’ variables that determine dependencies of data. They can affect either the intercept or the slope of the best-fit line.

Consider the scenario where we are predicting score based on body length, with mountain range as the grouping variable.

Intercept: the projected score when body length = 0, for each mountain range. You can also think of this as the ‘baseline’ performance, even though this is not very accurate.

Slope: the change in score with a fixed change in body length, for each mountain range.

If we fit a best-fit line by each mountain range, the intercept and/or slope of each line might be different. We might want to account for this in our random effect structure.

Syntax:

Intercept random effect: (1|rand_eff)

Slope random effect – need to include an intercept random effect: (1 + fixed_eff_whose_slope_vary | rand_eff). Essentially, we are allowing the fixed effect’s slope to vary based on the grouping variable.

# we let all main effects and the interaction effect vary in both slope and intercept based on mountainRange.
lmer.bl_test_int_slope <- lmer(score ~ bodyLength * test + (1 + (bodyLength * test)|mountainRange) + (1|pid), data = dragons)
## boundary (singular) fit: see help('isSingular')
## Warning: Model failed to converge with 4 negative eigenvalues: -7.1e-01
## -1.3e+00 -1.4e+00 -3.2e+05
summary(lmer.bl_test_int_slope)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: 
## score ~ bodyLength * test + (1 + (bodyLength * test) | mountainRange) +  
##     (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 14951.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.52925 -0.55372  0.00112  0.45972  3.14159 
## 
## Random effects:
##  Groups        Name                  Variance  Std.Dev. Corr                   
##  pid           (Intercept)           1.628e-03  0.04034                        
##  mountainRange (Intercept)           2.095e+03 45.76748                        
##                bodyLength            4.880e-02  0.22090 -1.00                  
##                testtest_2            2.062e+03 45.41169  0.01 -0.01            
##                testtest_3            7.360e+03 85.79297 -0.07  0.07 -0.03      
##                bodyLength:testtest_2 5.899e-02  0.24287 -0.02  0.02 -1.00 -0.01
##                bodyLength:testtest_3 1.901e-01  0.43601  0.04 -0.04  0.00 -1.00
##  Residual                            1.884e+03 43.40870                        
##       
##       
##       
##       
##       
##       
##       
##   0.04
##       
## Number of obs: 1440, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##                       Estimate Std. Error       df t value Pr(>|t|)  
## (Intercept)            -8.9973    31.8843   4.4620  -0.282   0.7904  
## bodyLength              0.5457     0.1559   4.8064   3.501   0.0184 *
## testtest_2             96.8812    40.2111   6.7594   2.409   0.0481 *
## testtest_3            -27.8376    49.9662  13.3054  -0.557   0.5867  
## bodyLength:testtest_2  -0.5536     0.2022   7.8922  -2.738   0.0259 *
## bodyLength:testtest_3   0.0238     0.2489  14.3827   0.096   0.9251  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) bdyLng tstt_2 tstt_3 bdL:_2
## bodyLength  -0.998                            
## testtest_2  -0.474  0.479                     
## testtest_3  -0.449  0.451  0.292              
## bdyLngth:_2  0.462 -0.471 -0.997 -0.300       
## bdyLngth:_3  0.436 -0.440 -0.300 -0.998  0.312
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(lmer.bl_test_int_slope, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                   Chisq Df Pr(>Chisq)    
## (Intercept)      0.0796  1  0.7778006    
## bodyLength      12.2559  1  0.0004638 ***
## test             7.5385  2  0.0230688 *  
## bodyLength:test  8.4944  2  0.0143039 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmer.bl_test_int_slope, lmer.bl_test_int, type = 3, refit = FALSE)
## Data: dragons
## Models:
## lmer.bl_test_int: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
## lmer.bl_test_int_slope: score ~ bodyLength * test + (1 + (bodyLength * test) | mountainRange) + (1 | pid)
##                        npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)
## lmer.bl_test_int          9 14978 15026 -7480.2    14960                     
## lmer.bl_test_int_slope   29 15010 15162 -7475.8    14952 8.7126 20      0.986

Yes, there is still an effect of body length, test and interaction effect when allowing for a random slope. But it does not explain a significantly larger variance in the data.

Nesting random effects

There are 3 sites (a, b, c) within each mountain range. Maybe the dragons in each have a different intercept (again, intercept = roughly baseline performance).

Wrong syntax: (1|mountainRange) + (1|site) –> crossed structure, assume that all sites ‘A’ across mountain ranges have the same intercept. This model thinks that there are only 3 random site intercepts (in fact, there are 3 sites * 8 mountain ranges = 24 site intercepts).

Correct syntax: (1|mountainRange/site) or (1|mountainRange) + (1|mountainRange:site)

Note:

Make your life easier by just naming the sites differently from the beginning (e.g. mountainRange_A)

lmer.bl_test_int_site <- lmer(score ~ bodyLength * test + (1|mountainRange/site) + (1|pid), data = dragons)
## boundary (singular) fit: see help('isSingular')
summary(lmer.bl_test_int_site)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: score ~ bodyLength * test + (1 | mountainRange/site) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 14957.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.2972 -0.5778 -0.0013  0.4557  3.1654 
## 
## Random effects:
##  Groups             Name        Variance  Std.Dev. 
##  pid                (Intercept) 2.663e-07 5.161e-04
##  site:mountainRange (Intercept) 2.374e+01 4.872e+00
##  mountainRange      (Intercept) 1.212e+00 1.101e+00
##  Residual                       1.894e+03 4.352e+01
## Number of obs: 1440, groups:  
## pid, 480; site:mountainRange, 24; mountainRange, 8
## 
## Fixed effects:
##                         Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)            3.500e+00  2.757e+01  6.583e+01   0.127 0.899366    
## bodyLength             4.887e-01  1.365e-01  6.629e+01   3.580 0.000648 ***
## testtest_2             7.906e+01  3.503e+01  1.412e+03   2.257 0.024162 *  
## testtest_3            -2.078e+01  3.503e+01  1.412e+03  -0.593 0.553127    
## bodyLength:testtest_2 -4.637e-01  1.734e-01  1.412e+03  -2.674 0.007592 ** 
## bodyLength:testtest_3  9.545e-03  1.734e-01  1.412e+03   0.055 0.956120    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) bdyLng tstt_2 tstt_3 bdL:_2
## bodyLength  -0.997                            
## testtest_2  -0.635  0.633                     
## testtest_3  -0.635  0.633  0.500              
## bdyLngth:_2  0.633 -0.635 -0.997 -0.498       
## bdyLngth:_3  0.633 -0.635 -0.498 -0.997  0.500
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')
Anova(lmer.bl_test_int_site, type = 3)
## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                   Chisq Df Pr(>Chisq)    
## (Intercept)      0.0161  1  0.8989795    
## bodyLength      12.8178  1  0.0003433 ***
## test             9.0463  2  0.0108546 *  
## bodyLength:test  9.7305  2  0.0077097 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lmer.bl_test_int_site, lmer.bl_test_int, type = 3, refit = FALSE)
## Data: dragons
## Models:
## lmer.bl_test_int: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
## lmer.bl_test_int_site: score ~ bodyLength * test + (1 | mountainRange/site) + (1 | pid)
##                       npar   AIC   BIC  logLik deviance  Chisq Df Pr(>Chisq)  
## lmer.bl_test_int         9 14978 15026 -7480.2    14960                       
## lmer.bl_test_int_site   10 14977 15030 -7478.7    14957 2.9318  1    0.08685 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Not an improvement over the interaction model without the nested random effect!