I wrote this vignette to help others who, like me, struggle with 3-way interactions!
First, load the packages: Load packages:
library(tidyverse)
library(brms)
library(sjstats)
library(bayesplot)
library(tidybayes)
library(ggeffects)
library(sjmisc)
library(emmeans)
This is a model that I ran for my PhD analysis. The response is the F2 at the 20% timepoint of a diphthong. F2 was normalized and it was standardized before being entered into the model. The predictors are:
duration of the vowel: log-transformed and standardized
age: adolescent (the intercept) or child
sex: male (the intercept) or female
vowel: GOAT (the intercept) or LOT. The purpose of this is to see not just what GOAT is doing across different populations, but to see within a population, where is their GOAT in relation to their LOT? It is a kind of sanity-check (or so I thought…)
following segment: intercept = open syllable; there are several other levels.
model <- read_rds("models/goat_ch_f220_v1.rds")
The model formula is as follows:
model[[1]]
normF2_20z ~ LogDurZ + age * sex * vowel + age * sex * fol_segF1F2 + (1 + LogDurZ + vowel + fol_segF1F2 | participant) + (1 + age + sex + age:sex | word)
The priors look like this:
model[[6]]
prior class coef group resp dpar nlpar bound
1 normal(0, 1) b
2 b agechild
3 b agechild:fol_segF1F2coronal
4 b agechild:fol_segF1F2labial
5 b agechild:fol_segF1F2nasal
6 b agechild:fol_segF1F2other
7 b agechild:fol_segF1F2velar
8 b agechild:sexF
9 b agechild:sexF:fol_segF1F2coronal
10 b agechild:sexF:fol_segF1F2labial
11 b agechild:sexF:fol_segF1F2nasal
12 b agechild:sexF:fol_segF1F2other
13 b agechild:sexF:fol_segF1F2velar
14 b agechild:sexF:vowellot
15 b agechild:vowellot
16 b fol_segF1F2coronal
17 b fol_segF1F2labial
18 b fol_segF1F2nasal
19 b fol_segF1F2other
20 b fol_segF1F2velar
21 b LogDurZ
22 b sexF
23 b sexF:fol_segF1F2coronal
24 b sexF:fol_segF1F2labial
25 b sexF:fol_segF1F2nasal
26 b sexF:fol_segF1F2other
27 b sexF:fol_segF1F2velar
28 b sexF:vowellot
29 b vowellot
30 normal(0, 1) Intercept
31 lkj_corr_cholesky(2) L
32 L participant
33 L word
34 student_t(3, 0, 10) sd
35 sd participant
36 sd fol_segF1F2coronal participant
37 sd fol_segF1F2labial participant
38 sd fol_segF1F2nasal participant
39 sd fol_segF1F2other participant
40 sd fol_segF1F2velar participant
41 sd Intercept participant
42 sd LogDurZ participant
43 sd vowellot participant
44 sd word
45 sd agechild word
46 sd agechild:sexF word
47 sd Intercept word
48 sd sexF word
49 student_t(3, 0, 10) sigma
For female children (age=child and sex=female), what is the difference in F2 between their GOAT and their LOT?
tidy_stan(model, prob=0.95)
[34m
# Summary Statistics of Stan-Model
[39m estimate std.error HDI(95%) ratio rhat mcse
Intercept -0.08 0.17 [-0.43 0.26] 0.17 1.01 0.01
LogDurZ -0.16 0.02 [-0.20 -0.12] 0.77 1.00 0.00
agechild 0.30 0.28 [-0.20 0.88] 0.23 1.01 0.01
sexF 0.58 0.27 [ 0.03 1.07] 0.20 1.01 0.01
vowellot -1.29 0.44 [-2.08 -0.42] 0.67 1.00 0.01
fol_segF1F2coronal -0.12 0.14 [-0.40 0.16] 0.43 1.00 0.00
fol_segF1F2labial -0.30 0.33 [-0.95 0.38] 0.76 1.00 0.01
fol_segF1F2nasal -0.24 0.18 [-0.59 0.13] 0.40 1.00 0.00
fol_segF1F2other 0.34 0.44 [-0.57 1.09] 0.67 1.00 0.01
fol_segF1F2velar -0.06 0.28 [-0.58 0.52] 0.70 1.00 0.01
agechild.sexF -0.69 0.38 [-1.41 0.08] 0.23 1.01 0.01
agechild.vowellot -0.97 0.59 [-2.14 0.17] 0.80 1.00 0.01
sexF.vowellot 0.12 0.56 [-0.94 1.22] 0.53 1.00 0.01
agechild.fol_segF1F2coronal 0.26 0.22 [-0.16 0.73] 0.70 1.00 0.00
agechild.fol_segF1F2labial 0.13 0.44 [-0.79 1.00] 0.76 1.00 0.01
agechild.fol_segF1F2nasal -0.29 0.26 [-0.80 0.23] 0.65 1.00 0.01
agechild.fol_segF1F2other 0.44 0.57 [-0.70 1.56] 0.78 1.00 0.01
agechild.fol_segF1F2velar -0.66 0.51 [-1.65 0.37] 1.03 1.00 0.01
sexF.fol_segF1F2coronal -0.20 0.19 [-0.60 0.17] 0.53 1.00 0.00
sexF.fol_segF1F2labial 0.06 0.57 [-1.04 1.14] 0.96 1.00 0.01
sexF.fol_segF1F2nasal 0.02 0.24 [-0.42 0.52] 0.49 1.00 0.01
sexF.fol_segF1F2other -0.91 0.55 [-2.01 0.16] 0.58 1.00 0.01
sexF.fol_segF1F2velar 0.15 0.38 [-0.56 0.93] 0.95 1.00 0.01
agechild.sexF.vowellot 0.55 0.71 [-0.86 1.95] 1.06 1.00 0.01
agechild.sexF.fol_segF1F2coronal 0.47 0.29 [-0.08 1.04] 0.69 1.00 0.01
agechild.sexF.fol_segF1F2labial -0.21 0.63 [-1.36 1.02] 0.94 1.00 0.01
agechild.sexF.fol_segF1F2nasal 0.38 0.33 [-0.26 1.03] 0.56 1.00 0.01
agechild.sexF.fol_segF1F2other 0.56 0.70 [-0.81 1.97] 1.06 1.00 0.01
agechild.sexF.fol_segF1F2velar 0.53 0.70 [-0.78 1.87] 1.42 1.00 0.01
In the model summary, the posterior for agechild:sexF:vowellot has median 0.55. This means that for female children, LOT has a higher F2 by 0.55 standard deviations than GOAT, right? But wait, that doesn’t make sense. As linguists, we know that LOT is really back.
This is what we see when we look at marginal_effects:
f2_conditions <- make_conditions(model, "age")
model_marg <- marginal_effects(
model, "sex:vowel", conditions = f2_conditions
)
plot(model_marg, plot = FALSE)[[1]] +
ggplot2::theme_minimal() +
#ggplot2::scale_colour_manual(values=face_fleece_cols) +
#ggplot2::labs(y="Normalized Log-TL z-score") +
ggplot2::theme(legend.title = element_blank(),
strip.text = element_text(size = rel(2)),
legend.text = element_text(size = rel(2)),
axis.title = element_text(size = rel(2)),
axis.text = element_text(size = rel(2)))
The plot thickens! The marginal effects plot suggests that every group – male adolescents, female adolescents, male children and female children respectively – is predicted to have a lower F2 in LOT than in GOAT. How can that be the case? It seems to directly contradict what the tidy_stan summary said…
Let’s look into this further by calling the marginal effects table:
model_marg[[1]] %>% as_tibble() %>% select(cond__, effect1__, effect2__, estimate__, se__, lower__, upper__)
And let’s call the tidy_stan summary again - just the bits we’re interested in:
m_summary <- tidy_stan(model, prob=0.95)
m_summary[c(1, 3:5, 11:13, 24),]
[34m
# Summary Statistics of Stan-Model
[39m estimate std.error HDI(95%) ratio rhat mcse
Intercept -0.08 0.17 [-0.43 0.26] 0.17 1.01 0.01
agechild 0.30 0.28 [-0.20 0.88] 0.23 1.01 0.01
sexF 0.58 0.27 [ 0.03 1.07] 0.20 1.01 0.01
vowellot -1.29 0.44 [-2.08 -0.42] 0.67 1.00 0.01
agechild.sexF -0.69 0.38 [-1.41 0.08] 0.23 1.01 0.01
agechild.vowellot -0.97 0.59 [-2.14 0.17] 0.80 1.00 0.01
sexF.vowellot 0.12 0.56 [-0.94 1.22] 0.53 1.00 0.01
agechild.sexF.vowellot 0.55 0.71 [-0.86 1.95] 1.06 1.00 0.01
Leaving out the coefficients and interactions for duration and following segment, our model has the following formula:
\[y = Intercept + \beta_{1}age + \beta_{2}sex + \beta_{3}vowel + \\ \beta_{4}age:sex + \beta_{5}age:vowel + \beta_{6}sex:vowel +\\ \beta_{7}age:sex:vowel \]
If we wanted to know the prediction, for example, of y (i.e. F2) for adolescent females saying GOAT, then age=0, sex=1, and vowel=0. So we get: \[ y = Intercept + \beta_{1}age*0 + \beta_{2}sex*1 + \beta_{3}vowel*0 + \\ \beta_{4}age:sex*0 + \beta_{5}age:vowel*0 + \beta_{6}sex:vowel*0 +\\ \beta_{7}age:sex:vowel*0 \] And if you get rid of all the 0 coefficients, you get: \[ y = Intercept + \beta_{2}sex*1 \] For female children saying GOAT, our formula is: \[ y = Intercept + \beta_{1}age*1 + \beta_{2}sex*1 + \beta_{3}vowel*0 + \\ \beta_{4}age:sex*1 + \beta_{5}age:vowel*0 + \beta_{6}sex:vowel*0 +\\ \beta_{7}age:sex:vowel*0 \] And that is what we have to sum together.
What about the prediction for female children’s LOT? To find that out, we need to “switch on” every coefficient – because if you like, age=TRUE, sex=TRUE, vowel=TRUE – we need all of them! \[y = Intercept + \beta_{1}age*1 + \beta_{2}sex*1 + \beta_{3}vowel*1 + \\ \beta_{4}age:sex*1 + \beta_{5}age:vowel*1 + \beta_{6}sex:vowel*1 +\\ \beta_{7}age:sex:vowel*1 \] And then we can sub in our actual values from the model output: \[y = -0.08 + 0.3 + 0.58 + -1.29 + \\ -0.69 + -0.97 + 0.12 +\\ 0.55 \]
And we can work that out now:
(f_ch_LOT <- -0.08 + #intercept
0.3 + #age=child
0.58 + #sex=F
-1.29 + #vowel=LOT
-0.69 + #age=child & sex=F
-0.97 + # age=child & vowel=LOT
0.12 + #sex=F & vowel=LOT
0.55) # age=child & sex=F & vowel=LOT
[1] -1.48
And you’ll notice that this is more or less the same estimate as:
model_marg[[1]][8,]
sex vowel age cond__ normF2_20z LogDurZ fol_segF1F2 participant word
8 F lot child age = child -4.775782e-17 1.162093e-16 open NA NA
effect1__ effect2__ estimate__ se__ lower__ upper__
8 F lot -1.49878 1.025578 -3.423493 0.5021353
What is the prediction for female children’s GOAT? It is:
(f_ch_GOAT <- -0.08 + #intercept
0.3 + #age=child
0.58 + #sex=F
#-1.29 + #vowel=LOT
-0.69) #age=child & sex=F
[1] 0.11
#-0.97 + # age=child & vowel=LOT
# 0.12 + #sex=F & vowel=LOT
#0.55 # age=child & sex=F & vowel=LOT
And this corresponds to:
model_marg[[1]][7,]
sex vowel age cond__ normF2_20z LogDurZ fol_segF1F2 participant word
7 F goat child age = child -4.775782e-17 1.162093e-16 open NA NA
effect1__ effect2__ estimate__ se__ lower__ upper__
7 F goat 0.1159604 0.2346242 -0.3463444 0.5832559
What is the difference between the two?
f_ch_GOAT-f_ch_LOT
[1] 1.59
Let’s check this with emmeans:
emmeans(model, pairwise ~ vowel | age + sex)
$emmeans
age = adolescent, sex = M:
vowel emmean lower.HPD upper.HPD
goat -0.145 -0.487 0.2404
lot -1.443 -2.175 -0.6875
age = child, sex = M:
vowel emmean lower.HPD upper.HPD
goat 0.141 -0.392 0.6359
lot -2.140 -3.185 -1.1637
age = adolescent, sex = F:
vowel emmean lower.HPD upper.HPD
goat 0.288 -0.206 0.8133
lot -0.888 -1.836 0.0916
age = child, sex = F:
vowel emmean lower.HPD upper.HPD
goat 0.178 -0.459 0.7598
lot -1.436 -3.056 0.2599
HPD interval probability: 0.95
$contrasts
age = adolescent, sex = M:
contrast estimate lower.HPD upper.HPD
goat - lot 1.29 0.418 2.08
age = child, sex = M:
contrast estimate lower.HPD upper.HPD
goat - lot 2.27 1.089 3.40
age = adolescent, sex = F:
contrast estimate lower.HPD upper.HPD
goat - lot 1.17 0.178 2.39
age = child, sex = F:
contrast estimate lower.HPD upper.HPD
goat - lot 1.63 -0.411 3.47
HPD interval probability: 0.95
Our estimate was 1.59; emmeans gives the estimate of 1.63. Most of the numbers are very slightly different, e.g. emmeans’ prediction for female children’s GOAT is 0.178, not the 0.11 that we calculated. This could be down to any number of things - different point estimates being calculated, the rounding to 2dp that tidy_stan does. But I would say these outputs are basically the same.
And voilà! That is how we interpret our 3-way interaction =)