The slope of the regression line between the continuous variable and the dependent variable is different across the levels of the categorical variable.
\[y_i = \beta_0 + \beta_1 X_i + \beta_2 Z_i + \beta_3 (X_i Z_i) + \epsilon_i\]
z = categorical predictor
x = continuous predictor
B0 (intercept) = value of y (dv) when x and z are zero. When the continuous variable is 0 and the reference level of the categorical variable.
B1 = effect of x (continuous IV) when z = 0 (reference group).
B2 = difference in intercept between z = 0 and z = 1, when x = 0.
B3 = Difference in slopes across levels of Z. This means we have two different lines that we are comparing.
library(carData)
salary <- carData::Salaries
library(ggplot2)
ggplot(salary, aes(x = yrs.service, y = salary, color = sex)) +
geom_smooth(method = "lm", se = F) +
geom_point()
## `geom_smooth()` using formula = 'y ~ x'
salary$rank <- relevel(salary$rank, ref = "Prof")
mod.1 <- lm(salary ~ yrs.service*sex, salary)
summary(mod.1)
##
## Call:
## lm(formula = salary ~ yrs.service * sex, data = salary)
##
## Residuals:
## Min 1Q Median 3Q Max
## -80381 -20258 -3727 16353 102536
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82068.5 7568.7 10.843 < 2e-16 ***
## yrs.service 1637.3 523.0 3.130 0.00188 **
## sexMale 20128.6 7991.1 2.519 0.01217 *
## yrs.service:sexMale -931.7 535.2 -1.741 0.08251 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28420 on 393 degrees of freedom
## Multiple R-squared: 0.1266, Adjusted R-squared: 0.1199
## F-statistic: 18.98 on 3 and 393 DF, p-value: 1.622e-11
There is a difference in the differences between groups in the first IV across levels of the second IV.
48 rats were randomly assigned both one of three poisons and one of four possible treatments. The experimenters then measured their survival time in tens of hours. A total of 12 groups, each with 4 replicates.
library(faraway)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.4 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(interactions)
rats <- faraway::rats
rats %>%
group_by(poison, treat) %>%
summarize(
`Survival Time` = mean(time)
)
## `summarise()` has grouped output by 'poison'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 3
## # Groups: poison [3]
## poison treat `Survival Time`
## <fct> <fct> <dbl>
## 1 I A 0.412
## 2 I B 0.88
## 3 I C 0.568
## 4 I D 0.61
## 5 II A 0.32
## 6 II B 0.815
## 7 II C 0.375
## 8 II D 0.668
## 9 III A 0.21
## 10 III B 0.335
## 11 III C 0.235
## 12 III D 0.325
mod.1 <- lm(time ~ poison*treat, rats)
summary(mod.1)
##
## Call:
## lm(formula = time ~ poison * treat, data = rats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32500 -0.04875 0.00500 0.04312 0.42500
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.41250 0.07457 5.532 2.94e-06 ***
## poisonII -0.09250 0.10546 -0.877 0.3862
## poisonIII -0.20250 0.10546 -1.920 0.0628 .
## treatB 0.46750 0.10546 4.433 8.37e-05 ***
## treatC 0.15500 0.10546 1.470 0.1503
## treatD 0.19750 0.10546 1.873 0.0692 .
## poisonII:treatB 0.02750 0.14914 0.184 0.8547
## poisonIII:treatB -0.34250 0.14914 -2.297 0.0276 *
## poisonII:treatC -0.10000 0.14914 -0.671 0.5068
## poisonIII:treatC -0.13000 0.14914 -0.872 0.3892
## poisonII:treatD 0.15000 0.14914 1.006 0.3212
## poisonIII:treatD -0.08250 0.14914 -0.553 0.5836
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1491 on 36 degrees of freedom
## Multiple R-squared: 0.7335, Adjusted R-squared: 0.6521
## F-statistic: 9.01 on 11 and 36 DF, p-value: 1.986e-07
cat_plot(mod.1,
pred = treat,
modx = poison)
library(emmeans)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
# Fit model with interaction
mod.1 <- lm(time ~ poison*treat, rats)
# Compare levels of factor1 at each level of factor2
emmeans_result <- emmeans(mod.1, specs = pairwise ~ poison | treat)
# Or compare all combinations
emmeans_result <- emmeans(mod.1, specs = ~ poison:treat)
pairs(emmeans_result, adjust = "tukey")
## contrast estimate SE df t.ratio p.value
## I A - II A 0.0925 0.105 36 0.877 0.9990
## I A - III A 0.2025 0.105 36 1.920 0.7395
## I A - I B -0.4675 0.105 36 -4.433 0.0041
## I A - II B -0.4025 0.105 36 -3.817 0.0221
## I A - III B 0.0775 0.105 36 0.735 0.9998
## I A - I C -0.1550 0.105 36 -1.470 0.9394
## I A - II C 0.0375 0.105 36 0.356 1.0000
## I A - III C 0.1775 0.105 36 1.683 0.8641
## I A - I D -0.1975 0.105 36 -1.873 0.7673
## I A - II D -0.2550 0.105 36 -2.418 0.4203
## I A - III D 0.0875 0.105 36 0.830 0.9994
## II A - III A 0.1100 0.105 36 1.043 0.9953
## II A - I B -0.5600 0.105 36 -5.310 0.0003
## II A - II B -0.4950 0.105 36 -4.694 0.0020
## II A - III B -0.0150 0.105 36 -0.142 1.0000
## II A - I C -0.2475 0.105 36 -2.347 0.4646
## II A - II C -0.0550 0.105 36 -0.522 1.0000
## II A - III C 0.0850 0.105 36 0.806 0.9995
## II A - I D -0.2900 0.105 36 -2.750 0.2439
## II A - II D -0.3475 0.105 36 -3.295 0.0791
## II A - III D -0.0050 0.105 36 -0.047 1.0000
## III A - I B -0.6700 0.105 36 -6.353 <.0001
## III A - II B -0.6050 0.105 36 -5.737 0.0001
## III A - III B -0.1250 0.105 36 -1.185 0.9869
## III A - I C -0.3575 0.105 36 -3.390 0.0635
## III A - II C -0.1650 0.105 36 -1.565 0.9106
## III A - III C -0.0250 0.105 36 -0.237 1.0000
## III A - I D -0.4000 0.105 36 -3.793 0.0235
## III A - II D -0.4575 0.105 36 -4.338 0.0054
## III A - III D -0.1150 0.105 36 -1.090 0.9933
## I B - II B 0.0650 0.105 36 0.616 1.0000
## I B - III B 0.5450 0.105 36 5.168 0.0005
## I B - I C 0.3125 0.105 36 2.963 0.1619
## I B - II C 0.5050 0.105 36 4.789 0.0015
## I B - III C 0.6450 0.105 36 6.116 <.0001
## I B - I D 0.2700 0.105 36 2.560 0.3379
## I B - II D 0.2125 0.105 36 2.015 0.6809
## I B - III D 0.5550 0.105 36 5.263 0.0004
## II B - III B 0.4800 0.105 36 4.552 0.0030
## II B - I C 0.2475 0.105 36 2.347 0.4646
## II B - II C 0.4400 0.105 36 4.172 0.0085
## II B - III C 0.5800 0.105 36 5.500 0.0002
## II B - I D 0.2050 0.105 36 1.944 0.7252
## II B - II D 0.1475 0.105 36 1.399 0.9563
## II B - III D 0.4900 0.105 36 4.646 0.0023
## III B - I C -0.2325 0.105 36 -2.205 0.5569
## III B - II C -0.0400 0.105 36 -0.379 1.0000
## III B - III C 0.1000 0.105 36 0.948 0.9979
## III B - I D -0.2750 0.105 36 -2.608 0.3126
## III B - II D -0.3325 0.105 36 -3.153 0.1087
## III B - III D 0.0100 0.105 36 0.095 1.0000
## I C - II C 0.1925 0.105 36 1.825 0.7938
## I C - III C 0.3325 0.105 36 3.153 0.1087
## I C - I D -0.0425 0.105 36 -0.403 1.0000
## I C - II D -0.1000 0.105 36 -0.948 0.9979
## I C - III D 0.2425 0.105 36 2.300 0.4949
## II C - III C 0.1400 0.105 36 1.328 0.9696
## II C - I D -0.2350 0.105 36 -2.228 0.5413
## II C - II D -0.2925 0.105 36 -2.774 0.2336
## II C - III D 0.0500 0.105 36 0.474 1.0000
## III C - I D -0.3750 0.105 36 -3.556 0.0426
## III C - II D -0.4325 0.105 36 -4.101 0.0104
## III C - III D -0.0900 0.105 36 -0.853 0.9992
## I D - II D -0.0575 0.105 36 -0.545 1.0000
## I D - III D 0.2850 0.105 36 2.703 0.2656
## II D - III D 0.3425 0.105 36 3.248 0.0881
##
## P value adjustment: tukey method for comparing a family of 12 estimates
This data set gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn.
data(warpbreaks)
warpbreaks %>%
group_by(wool, tension) %>%
summarize(
`Number of Breaks` = mean(breaks)
)
## `summarise()` has grouped output by 'wool'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 3
## # Groups: wool [2]
## wool tension `Number of Breaks`
## <fct> <fct> <dbl>
## 1 A L 44.6
## 2 A M 24
## 3 A H 24.6
## 4 B L 28.2
## 5 B M 28.8
## 6 B H 18.8
mod.2 <- lm(breaks ~ wool*tension, warpbreaks)
summary(mod.2)
##
## Call:
## lm(formula = breaks ~ wool * tension, data = warpbreaks)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.5556 -6.8889 -0.6667 7.1944 25.4444
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 44.556 3.647 12.218 2.43e-16 ***
## woolB -16.333 5.157 -3.167 0.002677 **
## tensionM -20.556 5.157 -3.986 0.000228 ***
## tensionH -20.000 5.157 -3.878 0.000320 ***
## woolB:tensionM 21.111 7.294 2.895 0.005698 **
## woolB:tensionH 10.556 7.294 1.447 0.154327
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.94 on 48 degrees of freedom
## Multiple R-squared: 0.3778, Adjusted R-squared: 0.3129
## F-statistic: 5.828 on 5 and 48 DF, p-value: 0.0002772
# Fit model with interaction
mod.2 <- lm(breaks ~ tension*wool, warpbreaks)
# Compare levels of factor1 at each level of factor2
emmeans_result <- emmeans(mod.2, specs = pairwise ~ tension | wool)
# Or compare all combinations
emmeans_result <- emmeans(mod.2, specs = ~ tension:wool)
pairs(emmeans_result, adjust = "tukey")
## contrast estimate SE df t.ratio p.value
## L A - M A 20.556 5.16 48 3.986 0.0030
## L A - H A 20.000 5.16 48 3.878 0.0041
## L A - L B 16.333 5.16 48 3.167 0.0302
## L A - M B 15.778 5.16 48 3.059 0.0398
## L A - H B 25.778 5.16 48 4.998 0.0001
## M A - H A -0.556 5.16 48 -0.108 1.0000
## M A - L B -4.222 5.16 48 -0.819 0.9627
## M A - M B -4.778 5.16 48 -0.926 0.9377
## M A - H B 5.222 5.16 48 1.013 0.9115
## H A - L B -3.667 5.16 48 -0.711 0.9797
## H A - M B -4.222 5.16 48 -0.819 0.9627
## H A - H B 5.778 5.16 48 1.120 0.8706
## L B - M B -0.556 5.16 48 -0.108 1.0000
## L B - H B 9.444 5.16 48 1.831 0.4561
## M B - H B 10.000 5.16 48 1.939 0.3919
##
## P value adjustment: tukey method for comparing a family of 6 estimates
cat_plot(mod.2,
pred = wool,
modx = tension)
y = B0 + B1X1 + B2Z1 + B3X1Z2 + Error
y = body mass
x = cat predictor sex
z = cat predictor #2 island
xz = their product/the interaction
Sex | Island | x | z | xz |
---|---|---|---|---|
Female | Biscoe | 0 | 0 | 0 |
Female | Dream | 0 | 1 | 0 |
Male | Biscoe | 1 | 0 | 0 |
Male | Dream | 1 | 1 | 1 |
library(palmerpenguins)
##
## Attaching package: 'palmerpenguins'
## The following objects are masked from 'package:datasets':
##
## penguins, penguins_raw
penguin <- penguins %>%
filter(island != 'Torgersen')
penguin$island <- droplevels(penguin$island)
mod.3 <- lm(body_mass_g ~ sex*island, penguin)
summary(mod.3)
##
## Call:
## lm(formula = body_mass_g ~ sex * island, data = penguin)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1554.52 -230.85 80.62 395.48 1195.48
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4319.37 62.46 69.150 <2e-16 ***
## sexmale 785.14 87.54 8.969 <2e-16 ***
## islandDream -873.06 94.97 -9.193 <2e-16 ***
## sexmale:islandDream -244.36 133.47 -1.831 0.0682 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 558.7 on 282 degrees of freedom
## (6 observations deleted due to missingness)
## Multiple R-squared: 0.5422, Adjusted R-squared: 0.5373
## F-statistic: 111.3 on 3 and 282 DF, p-value: < 2.2e-16
Intercept: value of y (body mass) when x and z are 0 (female- biscoe island). The expected body mass of a female penguin from biscoe island is 4319.37 grams.
B1 (SexMale Coef): Difference between levels of X (sex) when z = 0 (biscoe). The difference in body mass between Biscoe females and males is 785.14 (with males weighing more). With dummy coding, slopes for marginal effects refer to Group - reference group (intercept). 4319.37 + 785.14
B2 (islandDream coef) = Difference between levels of z (island) when x (sex) = 0 (female). The difference in body mass between biscoe and dream island females is -873.06 (they weigh less). 4319.37 -873.06
B3 (interaction coef): Difference between levels of x (sex) across levels of z (island). The difference between body mass for biscoe and dream island penguins differs by -244.36 between female and male penguins.
round(coef(mod.3),0)
## (Intercept) sexmale islandDream sexmale:islandDream
## 4319 785 -873 -244
y = B0 + B1X1 + B2Z1 + B3X1Z2 + Error
y-hat = 4319 + 785X - 873Z - 244XZ
** What is the estimated mean Y (body mass) for each group?**
Sex | Island | x | z | xz |
---|---|---|---|---|
Female | Biscoe | 0 | 0 | 0 |
Female | Dream | 0 | 1 | 0 |
Male | Biscoe | 1 | 0 | 0 |
Male | Dream | 1 | 1 | 1 |
print(female.biscoe <- 4319 + 785*0 - 873*0 - 244*0)
## [1] 4319
print(female.dream <- 4319 + 785*0 - 873*1 - 244*0)
## [1] 3446
print(male.biscoe <- 4319 + 785*1 - 873*0 - 244*0)
## [1] 5104
print(male.dream <- 4319 + 785*1 - 873*1 - 244*1)
## [1] 3987
cat_plot(mod.3,
pred = sex,
modx = island)
The slope of the regression line between IV1 and the DV changes as the values of the second IV change.
rebel <- carData::Chirot
summary(lm(intensity ~ commerce*tradition + inequality, rebel))
##
## Call:
## lm(formula = intensity ~ commerce * tradition + inequality, data = rebel)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96093 -0.48563 -0.08182 0.43344 2.42595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.796058 10.637269 1.203 0.2394
## commerce -0.873442 0.356804 -2.448 0.0212 *
## tradition -0.195827 0.127774 -1.533 0.1370
## inequality 2.565560 1.921109 1.335 0.1929
## commerce:tradition 0.011245 0.004155 2.706 0.0116 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.089 on 27 degrees of freedom
## Multiple R-squared: 0.672, Adjusted R-squared: 0.6235
## F-statistic: 13.83 on 4 and 27 DF, p-value: 2.928e-06
This turns the IVs into z-scores. With a z-score, 0 = to the mean and 1 = sd.
library(dplyr)
rebel_c <- rebel %>%
mutate(
commerce_C = scale(commerce),
tradition_C = scale(tradition))
summary(lm(intensity ~ commerce_C*tradition_C, rebel_c))
##
## Call:
## lm(formula = intensity ~ commerce_C * tradition_C, data = rebel_c)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.96824 -0.60224 -0.04482 0.46919 2.49589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1321 0.2025 -0.652 0.5194
## commerce_C 0.9869 0.2160 4.569 9e-05 ***
## tradition_C 0.4006 0.2090 1.917 0.0655 .
## commerce_C:tradition_C 0.4650 0.1897 2.451 0.0207 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.104 on 28 degrees of freedom
## Multiple R-squared: 0.6504, Adjusted R-squared: 0.6129
## F-statistic: 17.36 on 3 and 28 DF, p-value: 1.45e-06
Mean centered interpretations: - Intercept: Romanian counties with average tradition and average commercialization of agriculture are predicted to have a rebellion intensity of -0.1321. - Commerce_centered: For counties with an average tradition score (0 = average tradition score), for every one unit increase in commercialization it is predicted that rebellion intensity increases by 0.9869. - Tradition_centered: For counties with average commercialization (0), for every one unit increase in traditionalism, rebellion intensity increases by 0.4. - Interaction: For every one unit increase in commercialization, the relationship between traditionalism and rebellion intensity increases by an additional 0.4650.
Simple Slopes: Regression of the outcome Y on a predictor X at a specific value of the interacting variable Z.
How do we select the values of Z? - This was simple when Z was a category - Now, we have to find reasonable values within the continuous range that we can plot - Assessing the effect of X on Y at the following 3 levels of Z: High, Avg, Low. High/Low = + and - 1 SD of the mean.
SD for traditionalism 3.861575
sd(rebel$tradition)
## [1] 3.861575
This means with our mean centered data, we will have 3 different lines at these values of Z: 0 (mean), +3.86 & -3.86.
library(interactions)
mod.c <- lm(intensity ~ commerce_C*tradition_C, data = rebel_c)
probe.int <- probe_interaction(mod.c,
pred = commerce_C,
modx = tradition_C,
cond.int = T,
interval = T,
jnplot = T)
probe.int
## JOHNSON-NEYMAN INTERVAL
##
## When tradition_C is OUTSIDE the interval [-14.74, -0.81], the slope of
## commerce_C is p < .05.
##
## Note: The range of observed values of tradition_C is [-2.12, 1.74]
## SIMPLE SLOPES ANALYSIS
##
## When tradition_C = -1.000000e+00 (- 1 SD):
##
## Est. S.E. t val. p
## --------------------------- ------- ------ -------- ------
## Slope of commerce_C 0.52 0.32 1.61 0.12
## Conditional intercept -0.53 0.29 -1.86 0.07
##
## When tradition_C = 1.258976e-15 (Mean):
##
## Est. S.E. t val. p
## --------------------------- ------- ------ -------- ------
## Slope of commerce_C 0.99 0.22 4.57 0.00
## Conditional intercept -0.13 0.20 -0.65 0.52
##
## When tradition_C = 1.000000e+00 (+ 1 SD):
##
## Est. S.E. t val. p
## --------------------------- ------ ------ -------- ------
## Slope of commerce_C 1.45 0.24 5.94 0.00
## Conditional intercept 0.27 0.30 0.91 0.37
probe.int$simslopes$jnplot
probe.int$simslopes$slopes
## Value of tradition_C Est. S.E. 2.5% 97.5% t val.
## 1 -1.000000e+00 0.5219192 0.3248416 -0.1434886 1.187327 1.606688
## 2 1.258976e-15 0.9869270 0.2159900 0.5444916 1.429362 4.569318
## 3 1.000000e+00 1.4519348 0.2444679 0.9511651 1.952705 5.939164
## p
## 1 1.193430e-01
## 2 8.995365e-05
## 3 2.155770e-06
Y = (INTERCEPT) + (slope1X) + (Slope2Z) + (Slope3XZ) Y = (-0.1321) + (0.9869X) + (0.4006Z) + (0.4650XZ)
Simple slopes in the equation Y = (-0.1321) + (0.9869X) + (0.4006 * 3.86) + (0.4650XZ) Y = (-0.1321) + (0.9869X) + (0.4006 -3.86) + (0.4650XZ) Y = (-0.1321) + (0.9869X) + (0.4006 0) + (0.4650X*Z)
Ordinal, magnitude interaction: They do not cross, but they will increase in magnitude at certain levels
Cross-over / disordinal interaction: The slopes will cross over into an X.