Acknowledgements

This tutorial is based on the tutorial by Gabriela K Hajduk

The code and data can be downloaded here.

A published version of the notebook is here.

Set up

Download R and R Studio.

Load libraries that we might use. Uncomment the line to install packages if necessary.

knitr::opts_chunk$set(echo = TRUE)
#install.packages(c('tidyverse', 'purrr', 'ggplot2', 'lme4', 'car'))
library('tidyverse')
library("purrr")
library('ggplot2')
library('lme4')
library('car')

Load data

There are different functions you can use for different data types. .csv is the most common file type to store your data, but we’re using .RData for this tutorial.

Recording data and making your life easier

Long format shows multiple measurements for the same subject on different lines.

Wide format shows multiple measurements for the same subject on the same line.

It’s relatively easy to switch between the 2 formats. But in my experience, long format is more useful, for e.g., mixed-effects models (the standard analysis tool) requires long format. Therefore, I record my data in the long format.

Converting long to wide and vice versa

dragons_long_to_wide <- dragons_long %>%
  pivot_wider(
    names_from = test, values_from = score
  )
head(dragons_long_to_wide)

## # A tibble: 6 × 6
##   pid                               mountainRange site  bodyLength test_1 test_2
##   <chr>                             <fct>         <fct>      <dbl>  <dbl>  <dbl>
## 1 2c98c3d0-6f27-4e71-b717-b67b6c0c… Bavarian      a           166.   83.9   88.3
## 2 66bff560-fdc3-48cc-a9a5-99452b00… Bavarian      a           168.   85.8   59.5
## 3 75be5f72-097e-42ee-863a-ab968991… Bavarian      a           166.   84.2   44.0
## 4 64ccd831-70c6-43bc-9d00-30be2377… Bavarian      a           168.   84.9   58.9
## 5 56fd0730-5cde-4717-b045-13b9f6f9… Bavarian      a           170.   86.4   19.5
## 6 c8e97ce4-86fd-470d-a27d-e61c70b5… Bavarian      a           169.   85.1   90.7

dragons_wide_to_long <- dragons_wide %>%
  pivot_longer(c(test_1, test_2), names_to = "test", values_to = "score")
head(dragons_wide_to_long)

## # A tibble: 6 × 6
##   pid                                 mountainRange site  bodyLength test  score
##   <chr>                               <fct>         <fct>      <dbl> <chr> <dbl>
## 1 2c98c3d0-6f27-4e71-b717-b67b6c0c29… Bavarian      a           166. test…  83.9
## 2 2c98c3d0-6f27-4e71-b717-b67b6c0c29… Bavarian      a           166. test…  88.3
## 3 66bff560-fdc3-48cc-a9a5-99452b000d… Bavarian      a           168. test…  85.8
## 4 66bff560-fdc3-48cc-a9a5-99452b000d… Bavarian      a           168. test…  59.5
## 5 75be5f72-097e-42ee-863a-ab968991d7… Bavarian      a           166. test…  84.2
## 6 75be5f72-097e-42ee-863a-ab968991d7… Bavarian      a           166. test…  44.0

Data Manipulation

Useful to save an original version before you manipulate data, especially if you haven’t set up and test your analysis pipeline beforehand.

dragons <- dragons_long

Example of base R:

dragons_baseR <- dragons

#`$` selects a column / makes the new column
dragons_baseR$new_score <- dragons$score + 50

head(dragons_baseR)

## # A tibble: 6 × 7
##   pid                       mountainRange site  bodyLength test  score new_score
##   <chr>                     <fct>         <fct>      <dbl> <chr> <dbl>     <dbl>
## 1 2c98c3d0-6f27-4e71-b717-… Bavarian      a           166. test…  83.9     134. 
## 2 2c98c3d0-6f27-4e71-b717-… Bavarian      a           166. test…  88.3     138. 
## 3 66bff560-fdc3-48cc-a9a5-… Bavarian      a           168. test…  85.8     136. 
## 4 66bff560-fdc3-48cc-a9a5-… Bavarian      a           168. test…  59.5     109. 
## 5 75be5f72-097e-42ee-863a-… Bavarian      a           166. test…  84.2     134. 
## 6 75be5f72-097e-42ee-863a-… Bavarian      a           166. test…  44.0      94.0

Examples of dplyr:

Useful functions:

filter()

mutate()

select()

summarise() + group_by()

Logic:

& for and

| for or

Pipe:

%>%

How you link functions / actions together. output <- input %>% action %>% action %>% ...

dplyr <- dragons %>% 
  filter(site == 'a' & mountainRange == 'Julian') %>% # this line filters out dragons from the Julian mountain range that are in site a
  mutate(mountainRange = ifelse(as.character(mountainRange) == 'Julian', 'Julien', mountainRange)) # this line changes all the mountain range recorded as 'Julian' to 'Julien'

# toasted bread <- bread %>% cut %>% toast

head(dplyr)

## # A tibble: 6 × 6
##   pid                                mountainRange site  bodyLength test   score
##   <chr>                              <chr>         <fct>      <dbl> <chr>  <dbl>
## 1 49aeee5f-bb4a-4759-b01d-53000a19a… Julien        a           211. test… 107.  
## 2 49aeee5f-bb4a-4759-b01d-53000a19a… Julien        a           211. test…  50.9 
## 3 df6a731d-152f-4bae-823a-96e3ffd46… Julien        a           216. test… 110.  
## 4 df6a731d-152f-4bae-823a-96e3ffd46… Julien        a           216. test…  68.8 
## 5 fe788f56-2260-4317-a2dc-2bfb96e3e… Julien        a           203. test… 102.  
## 6 fe788f56-2260-4317-a2dc-2bfb96e3e… Julien        a           203. test…   8.68

Calculate the mean of the 2 scores for each dragon.

Without grouping, the mean score is calculated across all observations –> not what we want.

dragons_mean <- dragons %>%
  mutate(mean = mean(score))
head(dragons_mean)

## # A tibble: 6 × 7
##   pid                           mountainRange site  bodyLength test  score  mean
##   <chr>                         <fct>         <fct>      <dbl> <chr> <dbl> <dbl>
## 1 2c98c3d0-6f27-4e71-b717-b67b… Bavarian      a           166. test…  83.9  75.9
## 2 2c98c3d0-6f27-4e71-b717-b67b… Bavarian      a           166. test…  88.3  75.9
## 3 66bff560-fdc3-48cc-a9a5-9945… Bavarian      a           168. test…  85.8  75.9
## 4 66bff560-fdc3-48cc-a9a5-9945… Bavarian      a           168. test…  59.5  75.9
## 5 75be5f72-097e-42ee-863a-ab96… Bavarian      a           166. test…  84.2  75.9
## 6 75be5f72-097e-42ee-863a-ab96… Bavarian      a           166. test…  44.0  75.9

With grouping, we get the mean of the 2 tests for each dragon

dragons <- dragons %>% 
  group_by(pid) %>%
  mutate(mean = mean(score))

head(dragons)

## # A tibble: 6 × 7
## # Groups:   pid [3]
##   pid                           mountainRange site  bodyLength test  score  mean
##   <chr>                         <fct>         <fct>      <dbl> <chr> <dbl> <dbl>
## 1 2c98c3d0-6f27-4e71-b717-b67b… Bavarian      a           166. test…  83.9  86.1
## 2 2c98c3d0-6f27-4e71-b717-b67b… Bavarian      a           166. test…  88.3  86.1
## 3 66bff560-fdc3-48cc-a9a5-9945… Bavarian      a           168. test…  85.8  72.7
## 4 66bff560-fdc3-48cc-a9a5-9945… Bavarian      a           168. test…  59.5  72.7
## 5 75be5f72-097e-42ee-863a-ab96… Bavarian      a           166. test…  84.2  64.1
## 6 75be5f72-097e-42ee-863a-ab96… Bavarian      a           166. test…  44.0  64.1

Grouping with summary: get the mean scores for each test, by each mountain range

dragons_mean_mountainRange <- dragons %>%
  group_by(test, mountainRange) %>%
  summarise(mean = mean(score))

## `summarise()` has grouped output by 'test'. You can override using the
## `.groups` argument.

dragons_mean_mountainRange

## # A tibble: 16 × 3
## # Groups:   test [2]
##    test   mountainRange  mean
##    <chr>  <fct>         <dbl>
##  1 test_1 Bavarian       91.3
##  2 test_1 Central       106. 
##  3 test_1 Emmental      107. 
##  4 test_1 Julian        111. 
##  5 test_1 Ligurian      107. 
##  6 test_1 Maritime      101. 
##  7 test_1 Sarntal       102. 
##  8 test_1 Southern       91.1
##  9 test_2 Bavarian       52.2
## 10 test_2 Central        47.6
## 11 test_2 Emmental       42.4
## 12 test_2 Julian         52.9
## 13 test_2 Ligurian       46.7
## 14 test_2 Maritime       54.5
## 15 test_2 Sarntal        46.4
## 16 test_2 Southern       57.3

Standardize (z-score) scores: Note the combination of base R and dplyr.

dragons <- dragons %>%
  mutate(score_z = (score - mean(dragons$score)) / sd(dragons$score))

head(dragons)

## # A tibble: 6 × 8
## # Groups:   pid [3]
##   pid                   mountainRange site  bodyLength test  score  mean score_z
##   <chr>                 <fct>         <fct>      <dbl> <chr> <dbl> <dbl>   <dbl>
## 1 2c98c3d0-6f27-4e71-b… Bavarian      a           166. test…  83.9  86.1   0.237
## 2 2c98c3d0-6f27-4e71-b… Bavarian      a           166. test…  88.3  86.1   0.368
## 3 66bff560-fdc3-48cc-a… Bavarian      a           168. test…  85.8  72.7   0.294
## 4 66bff560-fdc3-48cc-a… Bavarian      a           168. test…  59.5  72.7  -0.491
## 5 75be5f72-097e-42ee-8… Bavarian      a           166. test…  84.2  64.1   0.245
## 6 75be5f72-097e-42ee-8… Bavarian      a           166. test…  44.0  64.1  -0.953

Visualization

Bar plot:

ggplot(data = dragons,
       mapping = aes(x = mountainRange, y = score)) + 
  stat_summary(fun.data = "mean_cl_boot", 
               geom = "bar") + 
  stat_summary(fun.data = "mean_cl_boot", 
               geom = "pointrange") + 
  theme_classic()

Violin + jitter plot:

ggplot(data = dragons,
       mapping = aes(x = mountainRange, y = score)) + 
  geom_violin(aes(fill = mountainRange)) +
  geom_jitter(height = 0, # jitter horizontally, not vertically 
              alpha = 0.5, # transparency
              aes(color = test)) + 
  theme_classic() + 
  guides(fill = "none")

ggplot(data = dragons %>% filter(test == "test_1"),
       mapping = aes(x = mountainRange, y = score)) + 
  geom_violin(aes(fill = mountainRange)) +
  geom_jitter(height = 0, 
              alpha = 0.5) + 
  theme_classic() + 
  theme(legend.position = "none")

Scatter plot + best fit line:

ggplot(data = dragons, 
       mapping = aes(x = bodyLength, y = score)) + 
  geom_smooth(method = 'lm') + 
  geom_point(aes(color = mountainRange), 
             size = 2) +
  theme_classic()

## `geom_smooth()` using formula = 'y ~ x'

Facets into grouping variables:

ggplot(data = dragons, 
       mapping = aes(x = bodyLength, y = score, colour = mountainRange)) + 
  geom_point(size = 2, alpha = .3) +
  geom_smooth(method='lm') +
  facet_grid(~mountainRange) +
  theme_classic() + 
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

ggplot(data = dragons, 
       mapping = aes(x = bodyLength, y = score, colour = test)) + 
  geom_point(size = 2, alpha = .3) +
  geom_smooth(method='lm') +
  facet_grid(~test) +
  theme_classic() + 
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

#Another example of facet_grid
ggplot(data = dragons, 
       mapping = aes(x = bodyLength, y = score, colour = mountainRange)) + 
  geom_point(size = 2, alpha = .3) +
  geom_smooth(method='lm') +
  facet_grid(test~mountainRange) +
  theme_classic() + 
  theme(legend.position = "none")

## `geom_smooth()` using formula = 'y ~ x'

Data Analysis

Terms:

Fixed effect: main predictor (e.g., body length, test)

Random effect: source of systematic noise (e.g., mountain range, individual dragon)

Levels: different values of a predictor (e.g., trial 1 vs 2)

Interaction: combinations of 2 predictors making different predictions for the dependent variable

Hypothesis 1: higher body length –> higher test score

Not controlling for random effects

lm.bl <- lm(score ~ bodyLength, data = dragons)
summary(lm.bl)

## 
## Call:
## lm(formula = score ~ bodyLength, data = dragons)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -79.55 -24.95  17.10  26.48  37.32 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 37.55276   13.44816   2.792  0.00534 **
## bodyLength   0.19069    0.06659   2.864  0.00428 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 33.41 on 958 degrees of freedom
## Multiple R-squared:  0.008488,   Adjusted R-squared:  0.007454 
## F-statistic: 8.202 on 1 and 958 DF,  p-value: 0.004277

There is an effect of body length.

But when we facet out mountainRange and test, we saw that the effect is different between tests 1 and 2. We also saw that for some mountain ranges, this isn’t the case.

Considering random effects

Relationship between body length and score might differ based on which mountain range a dragon comes from. And the overall effect we found of body length on score might be driven by some mountain ranges where the effect is very strong.

Relationship between body length and score might also differ based on the individual dragon (i.e., some dragons are better at all tests.)

Syntax: (1|random effect)

lmer.base <- lmer(score ~ bodyLength + (1|mountainRange) + (1|pid), data = dragons)

## boundary (singular) fit: see help('isSingular')

summary(lmer.base)

## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ bodyLength + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 9461.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3808 -0.7466  0.5118  0.7925  1.1170 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  pid           (Intercept)    0      0.00   
##  mountainRange (Intercept)    0      0.00   
##  Residual                  1116     33.41   
## Number of obs: 960, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 37.55276   13.44816   2.792
## bodyLength   0.19069    0.06659   2.864
## 
## Correlation of Fixed Effects:
##            (Intr)
## bodyLength -0.997
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Anova(lmer.base, type = 3)

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##              Chisq Df Pr(>Chisq)   
## (Intercept) 7.7975  1   0.005232 **
## bodyLength  8.2016  1   0.004185 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes – after adding the random effect of mountain range, there is still an effect of body length on score.

Note: Singular fit

Not a big deal for this demonstration, but if you see this in your actual analysis, it’s a sign that your predictor structure is too complex and is overfitting the data. Consider dropping random effects.

Note: Common random effects:

Individual participant: if a participant sees multiple trials, there will be dependencies between their responses – they might intrinsically be better or worse at the task.

Individual trial: a trial might intrinsically be easier or more difficult for all participants.

I always include both of these random effects in my analysis. In this demonstration, we will be including random participant effect, but we don’t have multiple trials so we won’t be including that.

Adding a second predictor

Let’s say we have a second hypothesis:

Hypothesis 2: higher body length –> higher test score, scores for test 2 > test 1

Checking the effect of test:

lmer.test <- lmer(score ~ test + (1|mountainRange) + (1|pid), data = dragons)

## boundary (singular) fit: see help('isSingular')

summary(lmer.test)

## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ test + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 8582.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.47364 -0.50218  0.02854  0.44364  2.36916 
## 
## Random effects:
##  Groups        Name        Variance Std.Dev.
##  pid           (Intercept)   0.000   0.000  
##  mountainRange (Intercept)   5.837   2.416  
##  Residual                  446.407  21.128  
## Number of obs: 960, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  101.885      1.288   79.09
## testtest_2   -51.885      1.364  -38.04
## 
## Correlation of Fixed Effects:
##            (Intr)
## testtest_2 -0.529
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Anova(lmer.test)

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: score
##       Chisq Df Pr(>Chisq)    
## test 1447.3  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Add both fixed effects:

lmer.bl_test <- lmer(score ~ bodyLength + test + (1|mountainRange) + (1|pid), data = dragons)

## boundary (singular) fit: see help('isSingular')

summary(lmer.bl_test)

## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 8570.5
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.59747 -0.40623  0.03411  0.33284  2.55997 
## 
## Random effects:
##  Groups        Name        Variance  Std.Dev. 
##  pid           (Intercept) 1.183e-13 3.439e-07
##  mountainRange (Intercept) 1.534e+00 1.239e+00
##  Residual                  4.412e+02 2.101e+01
## Number of obs: 960, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  61.60849    9.55251   6.449
## bodyLength    0.20006    0.04716   4.242
## testtest_2  -51.88521    1.35590 -38.266
## 
## Correlation of Fixed Effects:
##            (Intr) bdyLng
## bodyLength -0.994       
## testtest_2 -0.071  0.000
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Anova(lmer.bl_test, type = 3)

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                Chisq Df Pr(>Chisq)    
## (Intercept)   41.596  1  1.123e-10 ***
## bodyLength    17.996  1  2.213e-05 ***
## test        1464.314  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lmer.bl_test, lmer.base, type = 3, refit = FALSE)

## Data: dragons
## Models:
## lmer.base: score ~ bodyLength + (1 | mountainRange) + (1 | pid)
## lmer.bl_test: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
##              npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## lmer.base       5 9471.1 9495.4 -4730.5   9461.1                         
## lmer.bl_test    6 8582.5 8611.8 -4285.3   8570.5 890.55  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is an effect of body length and test (after controlling for random effects). And adding test as a fixed effect results in a model that better explains the variance in the data.

Note: ANOVA:

Anova() or car::Anova(): compare the full model with a reduced model where a predictor is removed –> this allows us to get the p-value to see whether a predictor (e.g., body length) is significant.

anova(): compare 2 models –> whether 1 model explains significantly more variance in the data compared to the other.

Adding an interaction effect

So now we know that both body length and test is a significant predictor of the data. Do they predict the data in the same way?

Syntax: fixed_eff1 * fixed_eff2

lmer.bl_test_int <- lmer(score ~ bodyLength * test + (1|mountainRange) + (1|pid), data = dragons)

## boundary (singular) fit: see help('isSingular')

summary(lmer.bl_test_int)

## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 8518
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.56955 -0.08974 -0.00160  0.09905  2.53328 
## 
## Random effects:
##  Groups        Name        Variance  Std.Dev. 
##  pid           (Intercept) 1.684e-11 4.103e-06
##  mountainRange (Intercept) 1.767e+00 1.329e+00
##  Residual                  4.165e+02 2.041e+01
## Number of obs: 960, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           -0.75632   12.51795  -0.060
## bodyLength             0.50985    0.06196   8.228
## testtest_2            72.11810   16.42886   4.390
## bodyLength:testtest_2 -0.61596    0.08134  -7.572
## 
## Correlation of Fixed Effects:
##             (Intr) bdyLng tstt_2
## bodyLength  -0.997              
## testtest_2  -0.656  0.654       
## bdyLngth:_2  0.654 -0.656 -0.997
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Anova(lmer.bl_test_int, type = 3)

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                   Chisq Df Pr(>Chisq)    
## (Intercept)      0.0037  1     0.9518    
## bodyLength      67.7023  1  < 2.2e-16 ***
## test            19.2696  1  1.135e-05 ***
## bodyLength:test 57.3394  1  3.667e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lmer.bl_test_int, lmer.bl_test, type = 3, refit = FALSE)

## Data: dragons
## Models:
## lmer.bl_test: score ~ bodyLength + test + (1 | mountainRange) + (1 | pid)
## lmer.bl_test_int: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
##                  npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## lmer.bl_test        6 8582.5 8611.8 -4285.3   8570.5                         
## lmer.bl_test_int    7 8532.0 8566.1 -4259.0   8518.0 52.552  1   4.19e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes!

Different structures of random effects

Random slopes

Think of random effects as ‘grouping’ variables that determine dependencies of data. They can affect either the intercept or the slope of the best-fit line.

Consider the scenario where we are predicting score based on body length, with mountain range as the grouping variable.

Intercept: the projected score when body length = 0, for each mountain range. You can also think of this as the ‘baseline’ performance, even though this is not very accurate.

Slope: the change in score with a fixed change in body length, for each mountain range.

If we fit a best-fit line by each mountain range, the intercept and/or slope of each line might be different. We might want to account for this in our random effect structure.

Syntax:

Intercept random effect: (1|rand_eff)

Slope random effect – need to include an intercept random effect: (1 + slope_rand_eff | intercept_rand_eff). This is usually the same as the fixed effect (essentially, we are allowing the fixed effect’s slope to vary based on the grouping variable.)

lmer.bl_test_int_slope <- lmer(score ~ bodyLength * test + (1 + (bodyLength * test)|mountainRange) + (1|pid), data = dragons)

## boundary (singular) fit: see help('isSingular')

summary(lmer.bl_test_int_slope)

## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## score ~ bodyLength * test + (1 + (bodyLength * test) | mountainRange) +  
##     (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 8509.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.70151 -0.10835 -0.00134  0.11802  2.69517 
## 
## Random effects:
##  Groups        Name                  Variance  Std.Dev. Corr             
##  pid           (Intercept)           1.484e-04  0.01218                  
##  mountainRange (Intercept)           2.719e+02 16.48835                  
##                bodyLength            6.565e-03  0.08103 -1.00            
##                testtest_2            2.180e+02 14.76378  0.86 -0.86      
##                bodyLength:testtest_2 5.070e-03  0.07120 -0.83  0.83 -0.98
##  Residual                            4.104e+02 20.25823                  
## Number of obs: 960, groups:  pid, 480; mountainRange, 8
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           -2.18460   13.53461  -0.161
## bodyLength             0.51456    0.06668   7.717
## testtest_2            60.21359   19.87466   3.030
## bodyLength:testtest_2 -0.55872    0.09827  -5.686
## 
## Correlation of Fixed Effects:
##             (Intr) bdyLng tstt_2
## bodyLength  -0.997              
## testtest_2  -0.381  0.385       
## bdyLngth:_2  0.387 -0.394 -0.996
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Anova(lmer.bl_test_int_slope, type = 3)

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                   Chisq Df Pr(>Chisq)    
## (Intercept)      0.0261  1   0.871772    
## bodyLength      59.5446  1  1.196e-14 ***
## test             9.1789  1   0.002448 ** 
## bodyLength:test 32.3285  1  1.302e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lmer.bl_test_int_slope, lmer.bl_test_int, type = 3, refit = FALSE)

## Data: dragons
## Models:
## lmer.bl_test_int: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
## lmer.bl_test_int_slope: score ~ bodyLength * test + (1 + (bodyLength * test) | mountainRange) + (1 | pid)
##                        npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## lmer.bl_test_int          7 8532.0 8566.1 -4259.0   8518.0                     
## lmer.bl_test_int_slope   16 8541.6 8619.5 -4254.8   8509.6 8.4153  9     0.4929

Yes, there is still an effect of body length, test and interaction effect when allowing for a random slope. But it does not explain a significantly larger variance in the data.

Nesting random effects

There are 3 sites within each mountain range. Maybe the dragons in each have a different intercept (remember, intercept = roughly baseline performance).

Wrong syntax: (1|mountainRange) + (1|site) –> crossed structure, assume that all sites ‘A’ across mountain ranges have the same intercept. This model thinks that there are only 3 random site intercepts (in fact, there are 3 sites * 8 mountain ranges = 24 site intercepts).

Correct syntax: (1|mountainRange/site) or (1|mountainRange) + (1|mountainRange:site)

Note:

Make your life easier by just naming the sites differently from the beginning (e.g. mountainRange_A)

lmer.bl_test_int_site <- lmer(score ~ bodyLength * test + (1|mountainRange/site) + (1|pid), data = dragons)

## boundary (singular) fit: see help('isSingular')

summary(lmer.bl_test_int_site)

## Linear mixed model fit by REML ['lmerMod']
## Formula: score ~ bodyLength * test + (1 | mountainRange/site) + (1 | pid)
##    Data: dragons
## 
## REML criterion at convergence: 8518
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.56955 -0.08974 -0.00160  0.09905  2.53328 
## 
## Random effects:
##  Groups             Name        Variance  Std.Dev. 
##  pid                (Intercept) 9.073e-08 3.012e-04
##  site:mountainRange (Intercept) 0.000e+00 0.000e+00
##  mountainRange      (Intercept) 1.767e+00 1.329e+00
##  Residual                       4.165e+02 2.041e+01
## Number of obs: 960, groups:  pid, 480; site:mountainRange, 24; mountainRange, 8
## 
## Fixed effects:
##                       Estimate Std. Error t value
## (Intercept)           -0.75619   12.51790  -0.060
## bodyLength             0.50985    0.06196   8.228
## testtest_2            72.11810   16.42886   4.390
## bodyLength:testtest_2 -0.61596    0.08134  -7.572
## 
## Correlation of Fixed Effects:
##             (Intr) bdyLng tstt_2
## bodyLength  -0.997              
## testtest_2  -0.656  0.654       
## bdyLngth:_2  0.654 -0.656 -0.997
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Anova(lmer.bl_test_int_site, type = 3)

## Analysis of Deviance Table (Type III Wald chisquare tests)
## 
## Response: score
##                   Chisq Df Pr(>Chisq)    
## (Intercept)      0.0036  1     0.9518    
## bodyLength      67.7027  1  < 2.2e-16 ***
## test            19.2696  1  1.135e-05 ***
## bodyLength:test 57.3394  1  3.667e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lmer.bl_test_int_site, lmer.bl_test_int, type = 3, refit = FALSE)

## Data: dragons
## Models:
## lmer.bl_test_int: score ~ bodyLength * test + (1 | mountainRange) + (1 | pid)
## lmer.bl_test_int_site: score ~ bodyLength * test + (1 | mountainRange/site) + (1 | pid)
##                       npar  AIC    BIC logLik deviance Chisq Df Pr(>Chisq)
## lmer.bl_test_int         7 8532 8566.1  -4259     8518                    
## lmer.bl_test_int_site    8 8534 8572.9  -4259     8518     0  1          1

Significant, but not an improvement over the interaction model!

R Tutorial

2023-08-07