Categorical x Continuous Variable

The slope of the regression line between the continuous variable and the dependent variable is different across the levels of the categorical variable.

\[y_i = \beta_0 + \beta_1 X_i + \beta_2 Z_i + \beta_3 (X_i Z_i) + \epsilon_i\]

library(carData)
salary <- carData::Salaries
library(ggplot2)
ggplot(salary, aes(x = yrs.service, y = salary, color = sex)) +
  geom_smooth(method = "lm", se = F) +
  geom_point()
## `geom_smooth()` using formula = 'y ~ x'

salary$rank <- relevel(salary$rank, ref = "Prof")
mod.1 <- lm(salary ~ yrs.service*sex, salary)
summary(mod.1)
## 
## Call:
## lm(formula = salary ~ yrs.service * sex, data = salary)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -80381 -20258  -3727  16353 102536 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          82068.5     7568.7  10.843  < 2e-16 ***
## yrs.service           1637.3      523.0   3.130  0.00188 ** 
## sexMale              20128.6     7991.1   2.519  0.01217 *  
## yrs.service:sexMale   -931.7      535.2  -1.741  0.08251 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28420 on 393 degrees of freedom
## Multiple R-squared:  0.1266, Adjusted R-squared:  0.1199 
## F-statistic: 18.98 on 3 and 393 DF,  p-value: 1.622e-11

Categorical x Categorical Variable

There is a difference in the differences between groups in the first IV across levels of the second IV.

Rats Example

48 rats were randomly assigned both one of three poisons and one of four possible treatments. The experimenters then measured their survival time in tens of hours. A total of 12 groups, each with 4 replicates.

library(faraway)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.4     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(interactions)
rats <- faraway::rats
rats %>%
  group_by(poison, treat) %>%
  summarize(
    `Survival Time` = mean(time)
  )
## `summarise()` has grouped output by 'poison'. You can override using the
## `.groups` argument.
## # A tibble: 12 × 3
## # Groups:   poison [3]
##    poison treat `Survival Time`
##    <fct>  <fct>           <dbl>
##  1 I      A               0.412
##  2 I      B               0.88 
##  3 I      C               0.568
##  4 I      D               0.61 
##  5 II     A               0.32 
##  6 II     B               0.815
##  7 II     C               0.375
##  8 II     D               0.668
##  9 III    A               0.21 
## 10 III    B               0.335
## 11 III    C               0.235
## 12 III    D               0.325
mod.1 <- lm(time ~ poison*treat, rats)
summary(mod.1)
## 
## Call:
## lm(formula = time ~ poison * treat, data = rats)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32500 -0.04875  0.00500  0.04312  0.42500 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.41250    0.07457   5.532 2.94e-06 ***
## poisonII         -0.09250    0.10546  -0.877   0.3862    
## poisonIII        -0.20250    0.10546  -1.920   0.0628 .  
## treatB            0.46750    0.10546   4.433 8.37e-05 ***
## treatC            0.15500    0.10546   1.470   0.1503    
## treatD            0.19750    0.10546   1.873   0.0692 .  
## poisonII:treatB   0.02750    0.14914   0.184   0.8547    
## poisonIII:treatB -0.34250    0.14914  -2.297   0.0276 *  
## poisonII:treatC  -0.10000    0.14914  -0.671   0.5068    
## poisonIII:treatC -0.13000    0.14914  -0.872   0.3892    
## poisonII:treatD   0.15000    0.14914   1.006   0.3212    
## poisonIII:treatD -0.08250    0.14914  -0.553   0.5836    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1491 on 36 degrees of freedom
## Multiple R-squared:  0.7335, Adjusted R-squared:  0.6521 
## F-statistic:  9.01 on 11 and 36 DF,  p-value: 1.986e-07
cat_plot(mod.1,
         pred = treat,
         modx = poison)

library(emmeans)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
# Fit model with interaction  
mod.1 <- lm(time ~ poison*treat, rats) 
  
# Compare levels of factor1 at each level of factor2  
emmeans_result <- emmeans(mod.1, specs = pairwise ~ poison | treat)  
  
# Or compare all combinations  
emmeans_result <- emmeans(mod.1, specs = ~ poison:treat)  
pairs(emmeans_result, adjust = "tukey")  
##  contrast      estimate    SE df t.ratio p.value
##  I A - II A      0.0925 0.105 36   0.877  0.9990
##  I A - III A     0.2025 0.105 36   1.920  0.7395
##  I A - I B      -0.4675 0.105 36  -4.433  0.0041
##  I A - II B     -0.4025 0.105 36  -3.817  0.0221
##  I A - III B     0.0775 0.105 36   0.735  0.9998
##  I A - I C      -0.1550 0.105 36  -1.470  0.9394
##  I A - II C      0.0375 0.105 36   0.356  1.0000
##  I A - III C     0.1775 0.105 36   1.683  0.8641
##  I A - I D      -0.1975 0.105 36  -1.873  0.7673
##  I A - II D     -0.2550 0.105 36  -2.418  0.4203
##  I A - III D     0.0875 0.105 36   0.830  0.9994
##  II A - III A    0.1100 0.105 36   1.043  0.9953
##  II A - I B     -0.5600 0.105 36  -5.310  0.0003
##  II A - II B    -0.4950 0.105 36  -4.694  0.0020
##  II A - III B   -0.0150 0.105 36  -0.142  1.0000
##  II A - I C     -0.2475 0.105 36  -2.347  0.4646
##  II A - II C    -0.0550 0.105 36  -0.522  1.0000
##  II A - III C    0.0850 0.105 36   0.806  0.9995
##  II A - I D     -0.2900 0.105 36  -2.750  0.2439
##  II A - II D    -0.3475 0.105 36  -3.295  0.0791
##  II A - III D   -0.0050 0.105 36  -0.047  1.0000
##  III A - I B    -0.6700 0.105 36  -6.353  <.0001
##  III A - II B   -0.6050 0.105 36  -5.737  0.0001
##  III A - III B  -0.1250 0.105 36  -1.185  0.9869
##  III A - I C    -0.3575 0.105 36  -3.390  0.0635
##  III A - II C   -0.1650 0.105 36  -1.565  0.9106
##  III A - III C  -0.0250 0.105 36  -0.237  1.0000
##  III A - I D    -0.4000 0.105 36  -3.793  0.0235
##  III A - II D   -0.4575 0.105 36  -4.338  0.0054
##  III A - III D  -0.1150 0.105 36  -1.090  0.9933
##  I B - II B      0.0650 0.105 36   0.616  1.0000
##  I B - III B     0.5450 0.105 36   5.168  0.0005
##  I B - I C       0.3125 0.105 36   2.963  0.1619
##  I B - II C      0.5050 0.105 36   4.789  0.0015
##  I B - III C     0.6450 0.105 36   6.116  <.0001
##  I B - I D       0.2700 0.105 36   2.560  0.3379
##  I B - II D      0.2125 0.105 36   2.015  0.6809
##  I B - III D     0.5550 0.105 36   5.263  0.0004
##  II B - III B    0.4800 0.105 36   4.552  0.0030
##  II B - I C      0.2475 0.105 36   2.347  0.4646
##  II B - II C     0.4400 0.105 36   4.172  0.0085
##  II B - III C    0.5800 0.105 36   5.500  0.0002
##  II B - I D      0.2050 0.105 36   1.944  0.7252
##  II B - II D     0.1475 0.105 36   1.399  0.9563
##  II B - III D    0.4900 0.105 36   4.646  0.0023
##  III B - I C    -0.2325 0.105 36  -2.205  0.5569
##  III B - II C   -0.0400 0.105 36  -0.379  1.0000
##  III B - III C   0.1000 0.105 36   0.948  0.9979
##  III B - I D    -0.2750 0.105 36  -2.608  0.3126
##  III B - II D   -0.3325 0.105 36  -3.153  0.1087
##  III B - III D   0.0100 0.105 36   0.095  1.0000
##  I C - II C      0.1925 0.105 36   1.825  0.7938
##  I C - III C     0.3325 0.105 36   3.153  0.1087
##  I C - I D      -0.0425 0.105 36  -0.403  1.0000
##  I C - II D     -0.1000 0.105 36  -0.948  0.9979
##  I C - III D     0.2425 0.105 36   2.300  0.4949
##  II C - III C    0.1400 0.105 36   1.328  0.9696
##  II C - I D     -0.2350 0.105 36  -2.228  0.5413
##  II C - II D    -0.2925 0.105 36  -2.774  0.2336
##  II C - III D    0.0500 0.105 36   0.474  1.0000
##  III C - I D    -0.3750 0.105 36  -3.556  0.0426
##  III C - II D   -0.4325 0.105 36  -4.101  0.0104
##  III C - III D  -0.0900 0.105 36  -0.853  0.9992
##  I D - II D     -0.0575 0.105 36  -0.545  1.0000
##  I D - III D     0.2850 0.105 36   2.703  0.2656
##  II D - III D    0.3425 0.105 36   3.248  0.0881
## 
## P value adjustment: tukey method for comparing a family of 12 estimates

Wool Example

This data set gives the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn.

data(warpbreaks)
warpbreaks %>%
  group_by(wool, tension) %>%
  summarize(
    `Number of Breaks` = mean(breaks)
  )
## `summarise()` has grouped output by 'wool'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 3
## # Groups:   wool [2]
##   wool  tension `Number of Breaks`
##   <fct> <fct>                <dbl>
## 1 A     L                     44.6
## 2 A     M                     24  
## 3 A     H                     24.6
## 4 B     L                     28.2
## 5 B     M                     28.8
## 6 B     H                     18.8
mod.2 <- lm(breaks ~ wool*tension, warpbreaks)
summary(mod.2)
## 
## Call:
## lm(formula = breaks ~ wool * tension, data = warpbreaks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.5556  -6.8889  -0.6667   7.1944  25.4444 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      44.556      3.647  12.218 2.43e-16 ***
## woolB           -16.333      5.157  -3.167 0.002677 ** 
## tensionM        -20.556      5.157  -3.986 0.000228 ***
## tensionH        -20.000      5.157  -3.878 0.000320 ***
## woolB:tensionM   21.111      7.294   2.895 0.005698 ** 
## woolB:tensionH   10.556      7.294   1.447 0.154327    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.94 on 48 degrees of freedom
## Multiple R-squared:  0.3778, Adjusted R-squared:  0.3129 
## F-statistic: 5.828 on 5 and 48 DF,  p-value: 0.0002772
# Fit model with interaction  
mod.2 <- lm(breaks ~ tension*wool, warpbreaks) 
  
# Compare levels of factor1 at each level of factor2  
emmeans_result <- emmeans(mod.2, specs = pairwise ~ tension | wool)  
  
# Or compare all combinations  
emmeans_result <- emmeans(mod.2, specs = ~ tension:wool)  
pairs(emmeans_result, adjust = "tukey")  
##  contrast  estimate   SE df t.ratio p.value
##  L A - M A   20.556 5.16 48   3.986  0.0030
##  L A - H A   20.000 5.16 48   3.878  0.0041
##  L A - L B   16.333 5.16 48   3.167  0.0302
##  L A - M B   15.778 5.16 48   3.059  0.0398
##  L A - H B   25.778 5.16 48   4.998  0.0001
##  M A - H A   -0.556 5.16 48  -0.108  1.0000
##  M A - L B   -4.222 5.16 48  -0.819  0.9627
##  M A - M B   -4.778 5.16 48  -0.926  0.9377
##  M A - H B    5.222 5.16 48   1.013  0.9115
##  H A - L B   -3.667 5.16 48  -0.711  0.9797
##  H A - M B   -4.222 5.16 48  -0.819  0.9627
##  H A - H B    5.778 5.16 48   1.120  0.8706
##  L B - M B   -0.556 5.16 48  -0.108  1.0000
##  L B - H B    9.444 5.16 48   1.831  0.4561
##  M B - H B   10.000 5.16 48   1.939  0.3919
## 
## P value adjustment: tukey method for comparing a family of 6 estimates
cat_plot(mod.2,
         pred = wool,
         modx = tension)

Penguins Example

y = B0 + B1X1 + B2Z1 + B3X1Z2 + Error

y = body mass

x = cat predictor sex

z = cat predictor #2 island

xz = their product/the interaction

Sex Island x z xz
Female Biscoe 0 0 0
Female Dream 0 1 0
Male Biscoe 1 0 0
Male Dream 1 1 1
library(palmerpenguins)
## 
## Attaching package: 'palmerpenguins'
## The following objects are masked from 'package:datasets':
## 
##     penguins, penguins_raw
penguin <- penguins %>%
  filter(island != 'Torgersen')

penguin$island <- droplevels(penguin$island)
mod.3 <- lm(body_mass_g ~ sex*island, penguin)
summary(mod.3)
## 
## Call:
## lm(formula = body_mass_g ~ sex * island, data = penguin)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1554.52  -230.85    80.62   395.48  1195.48 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4319.37      62.46  69.150   <2e-16 ***
## sexmale               785.14      87.54   8.969   <2e-16 ***
## islandDream          -873.06      94.97  -9.193   <2e-16 ***
## sexmale:islandDream  -244.36     133.47  -1.831   0.0682 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 558.7 on 282 degrees of freedom
##   (6 observations deleted due to missingness)
## Multiple R-squared:  0.5422, Adjusted R-squared:  0.5373 
## F-statistic: 111.3 on 3 and 282 DF,  p-value: < 2.2e-16

Intercept: value of y (body mass) when x and z are 0 (female- biscoe island). The expected body mass of a female penguin from biscoe island is 4319.37 grams.

B1 (SexMale Coef): Difference between levels of X (sex) when z = 0 (biscoe). The difference in body mass between Biscoe females and males is 785.14 (with males weighing more). With dummy coding, slopes for marginal effects refer to Group - reference group (intercept). 4319.37 + 785.14

B2 (islandDream coef) = Difference between levels of z (island) when x (sex) = 0 (female). The difference in body mass between biscoe and dream island females is -873.06 (they weigh less). 4319.37 -873.06

B3 (interaction coef): Difference between levels of x (sex) across levels of z (island). The difference between body mass for biscoe and dream island penguins differs by -244.36 between female and male penguins.

round(coef(mod.3),0)
##         (Intercept)             sexmale         islandDream sexmale:islandDream 
##                4319                 785                -873                -244

y = B0 + B1X1 + B2Z1 + B3X1Z2 + Error

y-hat = 4319 + 785X - 873Z - 244XZ

** What is the estimated mean Y (body mass) for each group?**

Sex Island x z xz
Female Biscoe 0 0 0
Female Dream 0 1 0
Male Biscoe 1 0 0
Male Dream 1 1 1
print(female.biscoe <- 4319 + 785*0 - 873*0 - 244*0)
## [1] 4319
print(female.dream <- 4319 + 785*0 - 873*1 - 244*0)
## [1] 3446
print(male.biscoe <- 4319 + 785*1 - 873*0 - 244*0)
## [1] 5104
print(male.dream <- 4319 + 785*1 - 873*1 - 244*1)
## [1] 3987
cat_plot(mod.3,
         pred = sex,
         modx = island)

Continous x Continous Variable

The slope of the regression line between IV1 and the DV changes as the values of the second IV change.

rebel <- carData::Chirot
summary(lm(intensity ~ commerce*tradition + inequality, rebel))
## 
## Call:
## lm(formula = intensity ~ commerce * tradition + inequality, data = rebel)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96093 -0.48563 -0.08182  0.43344  2.42595 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        12.796058  10.637269   1.203   0.2394  
## commerce           -0.873442   0.356804  -2.448   0.0212 *
## tradition          -0.195827   0.127774  -1.533   0.1370  
## inequality          2.565560   1.921109   1.335   0.1929  
## commerce:tradition  0.011245   0.004155   2.706   0.0116 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.089 on 27 degrees of freedom
## Multiple R-squared:  0.672,  Adjusted R-squared:  0.6235 
## F-statistic: 13.83 on 4 and 27 DF,  p-value: 2.928e-06

Mean centering the predictors

This turns the IVs into z-scores. With a z-score, 0 = to the mean and 1 = sd.

library(dplyr)
rebel_c <- rebel %>%
  mutate(
  commerce_C = scale(commerce),
  tradition_C = scale(tradition))
summary(lm(intensity ~ commerce_C*tradition_C, rebel_c))
## 
## Call:
## lm(formula = intensity ~ commerce_C * tradition_C, data = rebel_c)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.96824 -0.60224 -0.04482  0.46919  2.49589 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -0.1321     0.2025  -0.652   0.5194    
## commerce_C               0.9869     0.2160   4.569    9e-05 ***
## tradition_C              0.4006     0.2090   1.917   0.0655 .  
## commerce_C:tradition_C   0.4650     0.1897   2.451   0.0207 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.104 on 28 degrees of freedom
## Multiple R-squared:  0.6504, Adjusted R-squared:  0.6129 
## F-statistic: 17.36 on 3 and 28 DF,  p-value: 1.45e-06

Mean centered interpretations: - Intercept: Romanian counties with average tradition and average commercialization of agriculture are predicted to have a rebellion intensity of -0.1321. - Commerce_centered: For counties with an average tradition score (0 = average tradition score), for every one unit increase in commercialization it is predicted that rebellion intensity increases by 0.9869. - Tradition_centered: For counties with average commercialization (0), for every one unit increase in traditionalism, rebellion intensity increases by 0.4. - Interaction: For every one unit increase in commercialization, the relationship between traditionalism and rebellion intensity increases by an additional 0.4650.

Plotting Continuous x Continuous Interactions

Simple Slopes: Regression of the outcome Y on a predictor X at a specific value of the interacting variable Z.

How do we select the values of Z? - This was simple when Z was a category - Now, we have to find reasonable values within the continuous range that we can plot - Assessing the effect of X on Y at the following 3 levels of Z: High, Avg, Low. High/Low = + and - 1 SD of the mean.

SD for traditionalism 3.861575

sd(rebel$tradition)
## [1] 3.861575

This means with our mean centered data, we will have 3 different lines at these values of Z: 0 (mean), +3.86 & -3.86.

library(interactions)
mod.c <- lm(intensity ~ commerce_C*tradition_C, data = rebel_c)
probe.int <- probe_interaction(mod.c, 
                  pred = commerce_C,
                  modx = tradition_C,
                  cond.int = T,
                  interval = T,
                  jnplot = T)
probe.int 
## JOHNSON-NEYMAN INTERVAL
## 
## When tradition_C is OUTSIDE the interval [-14.74, -0.81], the slope of
## commerce_C is p < .05.
## 
## Note: The range of observed values of tradition_C is [-2.12, 1.74]

## SIMPLE SLOPES ANALYSIS
## 
## When tradition_C = -1.000000e+00 (- 1 SD): 
## 
##                                Est.   S.E.   t val.      p
## --------------------------- ------- ------ -------- ------
## Slope of commerce_C            0.52   0.32     1.61   0.12
## Conditional intercept         -0.53   0.29    -1.86   0.07
## 
## When tradition_C =  1.258976e-15 (Mean): 
## 
##                                Est.   S.E.   t val.      p
## --------------------------- ------- ------ -------- ------
## Slope of commerce_C            0.99   0.22     4.57   0.00
## Conditional intercept         -0.13   0.20    -0.65   0.52
## 
## When tradition_C =  1.000000e+00 (+ 1 SD): 
## 
##                               Est.   S.E.   t val.      p
## --------------------------- ------ ------ -------- ------
## Slope of commerce_C           1.45   0.24     5.94   0.00
## Conditional intercept         0.27   0.30     0.91   0.37

probe.int$simslopes$jnplot

probe.int$simslopes$slopes
##   Value of tradition_C      Est.      S.E.       2.5%    97.5%   t val.
## 1        -1.000000e+00 0.5219192 0.3248416 -0.1434886 1.187327 1.606688
## 2         1.258976e-15 0.9869270 0.2159900  0.5444916 1.429362 4.569318
## 3         1.000000e+00 1.4519348 0.2444679  0.9511651 1.952705 5.939164
##              p
## 1 1.193430e-01
## 2 8.995365e-05
## 3 2.155770e-06

Y = (INTERCEPT) + (slope1X) + (Slope2Z) + (Slope3XZ) Y = (-0.1321) + (0.9869X) + (0.4006Z) + (0.4650XZ)

Simple slopes in the equation Y = (-0.1321) + (0.9869X) + (0.4006 * 3.86) + (0.4650XZ) Y = (-0.1321) + (0.9869X) + (0.4006 -3.86) + (0.4650XZ) Y = (-0.1321) + (0.9869X) + (0.4006 0) + (0.4650X*Z)

Ordinal, magnitude interaction: They do not cross, but they will increase in magnitude at certain levels

Cross-over / disordinal interaction: The slopes will cross over into an X.