In this report we compare two survey‑weighted wage models for Illinois 2000:
education_attainment:race_ethnicity
term (plus standard
controls), omitting separate education or race main effects.education_attainment
,
race_ethnicity
) and their interaction
(*
), alongside the same controls.A final section then illustrates the very same logic on a real‑world birthweight ∼ smoking × race example.
mod2000 <- svyglm(
incwage_inflation ~ education_attainment:race_ethnicity
+ uhrswork + wkswork2
+ ind + occ + age + chicago_dummy,
design = design2000
)
# Filter tidy model output (remove ind and occ terms)
model_tidy_filtered <- tidy(mod2000, conf.int = TRUE) %>%
filter(!grepl("^ind", term), !grepl("^occ", term))
# Create a model equation string
model_equation <- "incwage_inflation ~ education_attainment:race_ethnicity + uhrswork + wkswork2 + ind + occ + age + chicago_dummy"
# Create table with equation as header
kbl(model_tidy_filtered,
caption = paste("Regression Model Results:\n", model_equation),
digits = 3,
booktabs = TRUE,
col.names = c("Term", "Estimate", "Std. Error", "Statistic", "P-value", "Conf. Low", "Conf. High")) %>%
kable_styling(full_width = FALSE, position = "center")
Term | Estimate | Std. Error | Statistic | P-value | Conf. Low | Conf. High |
---|---|---|---|---|---|---|
(Intercept) | 24894223555867.996 | 157418434806062.156 | 0.158 | 0.874 | -283642297184131.250 | 333430744295867.250 |
uhrswork | 1230.764 | 31.891 | 38.592 | 0.000 | 1168.258 | 1293.270 |
age | 900.623 | 15.238 | 59.103 | 0.000 | 870.756 | 930.490 |
chicago_dummy1 | -2459.179 | 500.431 | -4.914 | 0.000 | -3440.013 | -1478.345 |
education_attainmentLess than High School:race_ethnicityWhite (non-Hispanic or Latino) | -24894223477285.125 | 157415906522578.531 | -0.158 | 0.874 | -333425788839660.188 | 283637341885089.938 |
education_attainmentHigh School Diploma:race_ethnicityWhite (non-Hispanic or Latino) | -24894223472238.305 | 157426320013545.750 | -0.158 | 0.874 | -333446199038003.688 | 283657752093527.062 |
education_attainmentSome College:race_ethnicityWhite (non-Hispanic or Latino) | -24894223466668.320 | 157427085714059.062 | -0.158 | 0.874 | -333447699787873.188 | 283659252854536.562 |
education_attainmentBachelor’s Degree:race_ethnicityWhite (non-Hispanic or Latino) | -24894223441582.031 | 157419309464055.656 | -0.158 | 0.874 | -333432458491182.250 | 283644011608018.125 |
education_attainmentMaster’s Degree or Higher:race_ethnicityWhite (non-Hispanic or Latino) | -24894223413543.625 | 157418461729768.938 | -0.158 | 0.874 | -333430796923390.500 | 283642350096303.250 |
education_attainmentLess than High School:race_ethnicityHispanic or Latino | -24894223478645.219 | 157416433497210.000 | -0.158 | 0.874 | -333426821699208.125 | 283638374741917.625 |
education_attainmentHigh School Diploma:race_ethnicityHispanic or Latino | -24894223472768.824 | 157447277589078.938 | -0.158 | 0.874 | -333487275405772.375 | 283698828460234.750 |
education_attainmentSome College:race_ethnicityHispanic or Latino | -24894223467725.809 | 157427349782227.375 | -0.158 | 0.874 | -333448217356482.312 | 283659770421030.688 |
education_attainmentBachelor’s Degree:race_ethnicityHispanic or Latino | -24894223458217.449 | 157418030016239.281 | -0.158 | 0.874 | -333429950819450.562 | 283641503903015.688 |
education_attainmentMaster’s Degree or Higher:race_ethnicityHispanic or Latino | -24894223443730.332 | 157417914640856.812 | -0.158 | 0.874 | -333429724671860.688 | 283641277784400.062 |
education_attainmentLess than High School:race_ethnicityBlack (non-Hispanic or Latino) | -24894223470886.422 | 157424464650228.656 | -0.158 | 0.874 | -333442562567115.875 | 283654115625343.000 |
education_attainmentHigh School Diploma:race_ethnicityBlack (non-Hispanic or Latino) | -24894223471490.832 | 157419713415945.250 | -0.158 | 0.874 | -333433250257527.188 | 283644803314545.562 |
education_attainmentSome College:race_ethnicityBlack (non-Hispanic or Latino) | -24894223466551.203 | 157424209972735.875 | -0.158 | 0.874 | -333442063400737.562 | 283653616467635.188 |
education_attainmentBachelor’s Degree:race_ethnicityBlack (non-Hispanic or Latino) | -24894223455880.551 | 157437316621185.094 | -0.158 | 0.874 | -333467752120335.875 | 283679305208574.750 |
education_attainmentMaster’s Degree or Higher:race_ethnicityBlack (non-Hispanic or Latino) | -24894223441036.793 | 157419309464055.656 | -0.158 | 0.874 | -333432458490637.000 | 283644011608563.375 |
education_attainmentLess than High School:race_ethnicityOther (non-Hispanic or Latino) | -24894223482969.691 | 157416206904170.812 | -0.158 | 0.874 | -333426377586374.250 | 283637930620434.875 |
education_attainmentHigh School Diploma:race_ethnicityOther (non-Hispanic or Latino) | -24894223475445.223 | 157418016978118.844 | -0.158 | 0.874 | -333429925282261.438 | 283641478331370.938 |
education_attainmentSome College:race_ethnicityOther (non-Hispanic or Latino) | -24894223471128.570 | 157417517908790.250 | -0.158 | 0.874 | -333428947113510.312 | 283640500171253.188 |
education_attainmentBachelor’s Degree:race_ethnicityOther (non-Hispanic or Latino) | -24894223455165.410 | 157437391450233.188 | -0.158 | 0.874 | -333467898782838.250 | 283679451872507.375 |
education_attainmentMaster’s Degree or Higher:race_ethnicityOther (non-Hispanic or Latino) | -24894223440831.969 | 157416577445372.844 | -0.158 | 0.874 | -333427103796491.625 | 283638656914827.625 |
When we exclude the main effect, the model crashes. How can we prevent this? Or should we just include the main effects (as seen below?)
# Model 2: Education * Race interaction
mod2000b <- svyglm(
incwage_inflation ~ education_attainment * race_ethnicity +
uhrswork + wkswork2 +
ind + occ + age + chicago_dummy,
design = design2000
)
# Filter tidy model output (remove ind and occ terms)
model2_tidy_filtered <- tidy(mod2000b, conf.int = TRUE) %>%
filter(!grepl("^ind", term), !grepl("^occ", term))
# Create model equation string
model2_equation <- "incwage_inflation ~ education_attainment * race_ethnicity + uhrswork + wkswork2 + ind + occ + age + chicago_dummy"
# Display regression results table
kbl(model2_tidy_filtered,
caption = paste("Model 2 Results:\n", model2_equation),
digits = 3,
booktabs = TRUE,
col.names = c("Term", "Estimate", "Std. Error", "Statistic", "P-value", "Conf. Low", "Conf. High")) %>%
kable_styling(full_width = FALSE, position = "center")
Term | Estimate | Std. Error | Statistic | P-value | Conf. Low | Conf. High |
---|---|---|---|---|---|---|
(Intercept) | 78553.604 | 5970.008 | 13.158 | 0.000 | 66852.526 | 90254.682 |
education_attainmentHigh School Diploma | 5047.160 | 592.361 | 8.520 | 0.000 | 3886.147 | 6208.173 |
education_attainmentSome College | 10617.119 | 615.753 | 17.243 | 0.000 | 9410.258 | 11823.981 |
education_attainmentBachelor’s Degree | 35703.208 | 838.982 | 42.555 | 0.000 | 34058.823 | 37347.593 |
education_attainmentMaster’s Degree or Higher | 63741.123 | 1334.627 | 47.759 | 0.000 | 61125.285 | 66356.962 |
race_ethnicityHispanic or Latino | -1360.121 | 718.242 | -1.894 | 0.058 | -2767.859 | 47.616 |
race_ethnicityBlack (non-Hispanic or Latino) | 6398.267 | 1271.590 | 5.032 | 0.000 | 3905.979 | 8890.554 |
race_ethnicityOther (non-Hispanic or Latino) | -5684.876 | 1297.491 | -4.381 | 0.000 | -8227.929 | -3141.824 |
uhrswork | 1230.812 | 31.961 | 38.510 | 0.000 | 1168.170 | 1293.455 |
age | 900.622 | 15.235 | 59.115 | 0.000 | 870.761 | 930.482 |
chicago_dummy1 | -2458.981 | 500.471 | -4.913 | 0.000 | -3439.893 | -1478.069 |
education_attainmentHigh School Diploma:race_ethnicityHispanic or Latino | 829.021 | 924.551 | 0.897 | 0.370 | -983.078 | 2641.119 |
education_attainmentSome College:race_ethnicityHispanic or Latino | 301.862 | 1028.685 | 0.293 | 0.769 | -1714.338 | 2318.062 |
education_attainmentBachelor’s Degree:race_ethnicityHispanic or Latino | -15275.258 | 1888.524 | -8.088 | 0.000 | -18976.722 | -11573.795 |
education_attainmentMaster’s Degree or Higher:race_ethnicityHispanic or Latino | -28826.559 | 4609.893 | -6.253 | 0.000 | -37861.845 | -19791.274 |
education_attainmentHigh School Diploma:race_ethnicityBlack (non-Hispanic or Latino) | -5651.528 | 1366.415 | -4.136 | 0.000 | -8329.671 | -2973.386 |
education_attainmentSome College:race_ethnicityBlack (non-Hispanic or Latino) | -6281.893 | 1383.763 | -4.540 | 0.000 | -8994.036 | -3569.750 |
education_attainmentBachelor’s Degree:race_ethnicityBlack (non-Hispanic or Latino) | -20696.615 | 1747.421 | -11.844 | 0.000 | -24121.520 | -17271.709 |
education_attainmentMaster’s Degree or Higher:race_ethnicityBlack (non-Hispanic or Latino) | -33890.941 | 3072.765 | -11.029 | 0.000 | -39913.489 | -27868.393 |
education_attainmentHigh School Diploma:race_ethnicityOther (non-Hispanic or Latino) | 2477.261 | 1829.004 | 1.354 | 0.176 | -1107.545 | 6062.068 |
education_attainmentSome College:race_ethnicityOther (non-Hispanic or Latino) | 1224.228 | 1604.974 | 0.763 | 0.446 | -1921.484 | 4369.939 |
education_attainmentBachelor’s Degree:race_ethnicityOther (non-Hispanic or Latino) | -7898.584 | 1977.978 | -3.993 | 0.000 | -11775.376 | -4021.792 |
education_attainmentMaster’s Degree or Higher:race_ethnicityOther (non-Hispanic or Latino) | -21603.137 | 3187.711 | -6.777 | 0.000 | -27850.978 | -15355.297 |
Note: When an interaction is in the model, each main‐effect coefficient represents the effect only at the reference level of the other variable.
78,553.60
education_attainmentHigh School Diploma
)
Estimate: +5,047.16
(p < .001)
Meaning: Among Whites, earning
a HS diploma adds $5,047.16 versus Less than HS.
Check:
62,121.87 - 57,074.71 = 5,047.16
education_attainmentSome College
)
Estimate: +10,617.12
(p < .001)
Meaning: Among Whites, Some
College adds $10,617.12 versus Less than HS.
Check:
67,691.83 - 57,074.71 = 10,617.12
education_attainmentBachelor's Degree
)
Estimate: +35,703.21
(p < .001)
Meaning: Among Whites, a
Bachelor’s adds $35,703.21 versus Less than HS.
Check:
92,777.92 - 57,074.71 = 35,703.21
education_attainmentMaster's Degree or Higher
)
Estimate: +63,741.12
(p < .001)
Meaning: Among Whites, a
Master’s+ adds $63,741.12 versus Less than HS.
Check:
120,815.83 - 57,074.71 = 63,741.12
race_ethnicityHispanic or Latino
)
Estimate: –1,360.12
(p = .058)
Meaning: Among Less than HS,
Hispanics earn $1,360.12 less than Whites.
Check:
55,714.59 - 57,074.71 = -1,360.12
Each interaction term shows how the Hispanic effect changes at each education level relative to Whites.
…High School Diploma:race_ethnicityHispanic or Latino
)
Estimate: +829.02
(n.s.)
Meaning: The Hispanic HS premium is $829
above the White HS premium.
Check:
(61,590.77 - 55,714.59) - (62,121.87 - 57,074.71)
= 5,876.18 - 5,047.16 = 829.02
+301.86
(n.s.)–15,275.26
(p < .001)Estimate: –28,826.56
(p < .001)
Meaning: Hispanic Master’s+ return is $28,826.56
below the White Master’s+ return.
Check:
(90,629.15 - 55,714.59) - (120,815.83 - 57,074.71)
= 34,914.56 - 63,741.12 = -28,826.56
Take‑away:
- The education main effects show the return to each credential for Whites (reference race).
- The race main effect shows the race gap at Less than HS (reference education).
- The interaction terms quantify how the Hispanic return to each credential differs from the White return—i.e. the difference in slopes.
- All checks use the predicted means you see in the Emmeans post‑hoc table.
# emmeans post-hoc table for Model 2
emm2000b <- emmeans(
mod2000b,
~ education_attainment * race_ethnicity,
nuisance = c("ind", "occ", "age", "uhrswork", "wkswork2", "chicago_dummy")
)
tidy(emm2000b, conf.int = TRUE) %>%
kable(
caption = "Model 2: Predicted Income by Education and Race (2000) Post-Hoc Results Emmmeans",
digits = 2,
format.args = list(big.mark = ",")
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
education_attainment | race_ethnicity | estimate | std.error | df | conf.low | conf.high | statistic | p.value |
---|---|---|---|---|---|---|---|---|
Less than High School | White (non-Hispanic or Latino) | 57,074.71 | 748.33 | 181,458 | 55,608.00 | 58,541.42 | 76.27 | 0 |
High School Diploma | White (non-Hispanic or Latino) | 62,121.87 | 575.64 | 181,458 | 60,993.62 | 63,250.12 | 107.92 | 0 |
Some College | White (non-Hispanic or Latino) | 67,691.83 | 574.50 | 181,458 | 66,565.83 | 68,817.83 | 117.83 | 0 |
Bachelor’s Degree | White (non-Hispanic or Latino) | 92,777.92 | 738.84 | 181,458 | 91,329.80 | 94,226.04 | 125.57 | 0 |
Master’s Degree or Higher | White (non-Hispanic or Latino) | 120,815.83 | 1,226.07 | 181,458 | 118,412.76 | 123,218.91 | 98.54 | 0 |
Less than High School | Hispanic or Latino | 55,714.59 | 675.74 | 181,458 | 54,390.16 | 57,039.02 | 82.45 | 0 |
High School Diploma | Hispanic or Latino | 61,590.77 | 787.51 | 181,458 | 60,047.26 | 63,134.28 | 78.21 | 0 |
Some College | Hispanic or Latino | 66,633.57 | 886.77 | 181,458 | 64,895.52 | 68,371.63 | 75.14 | 0 |
Bachelor’s Degree | Hispanic or Latino | 76,142.54 | 1,749.97 | 181,458 | 72,712.63 | 79,572.44 | 43.51 | 0 |
Master’s Degree or Higher | Hispanic or Latino | 90,629.15 | 4,448.81 | 181,458 | 81,909.58 | 99,348.73 | 20.37 | 0 |
Less than High School | Black (non-Hispanic or Latino) | 63,472.98 | 1,248.94 | 181,458 | 61,025.09 | 65,920.86 | 50.82 | 0 |
High School Diploma | Black (non-Hispanic or Latino) | 62,868.61 | 696.08 | 181,458 | 61,504.31 | 64,232.91 | 90.32 | 0 |
Some College | Black (non-Hispanic or Latino) | 67,808.20 | 705.81 | 181,458 | 66,424.83 | 69,191.57 | 96.07 | 0 |
Bachelor’s Degree | Black (non-Hispanic or Latino) | 78,479.57 | 1,184.30 | 181,458 | 76,158.38 | 80,800.76 | 66.27 | 0 |
Master’s Degree or Higher | Black (non-Hispanic or Latino) | 93,323.16 | 2,650.64 | 181,458 | 88,127.97 | 98,518.35 | 35.21 | 0 |
Less than High School | Other (non-Hispanic or Latino) | 51,389.83 | 1,281.76 | 181,458 | 48,877.62 | 53,902.05 | 40.09 | 0 |
High School Diploma | Other (non-Hispanic or Latino) | 58,914.26 | 1,378.02 | 181,458 | 56,213.37 | 61,615.15 | 42.75 | 0 |
Some College | Other (non-Hispanic or Latino) | 63,231.18 | 1,051.22 | 181,458 | 61,170.82 | 65,291.54 | 60.15 | 0 |
Bachelor’s Degree | Other (non-Hispanic or Latino) | 79,194.46 | 1,511.72 | 181,458 | 76,231.53 | 82,157.39 | 52.39 | 0 |
Master’s Degree or Higher | Other (non-Hispanic or Latino) | 93,527.82 | 2,722.86 | 181,458 | 88,191.08 | 98,864.56 | 34.35 | 0 |
library(plotly)
# Use the same plot_df you already built:
# plot_df has columns: Education (factor), Race, Income, CI_Low, CI_High
# 2) Convert to a plain data.frame
plot_df <- as.data.frame(emm2000b)
# 1) Subset to just White & Hispanic or Latino
plot_df_sub <- plot_df %>%
filter(race_ethnicity %in% c(
"White (non-Hispanic or Latino)",
"Hispanic or Latino"
))
# 2) Interactive Plotly chart
plot_ly(
data = plot_df_sub,
x = ~education_attainment,
y = ~emmean,
color = ~race_ethnicity,
colors = RColorBrewer::brewer.pal(2, "Set1"),
type = 'scatter',
mode = 'lines+markers',
error_y = list(
type = "data",
array = ~emmean - lower.CL,
arrayminus = ~upper.CL - emmean,
thickness = 1.5,
width = 5
),
text = ~race_ethnicity,
hovertemplate = paste(
"<b>%{text}</b><br>",
"Education: %{x}<br>",
"Predicted Income: $%{y:,.0f}<br>",
"95% CI: [ %{y-error_y.array:,.0f}, %{y+error_y.arrayminus:,.0f} ]",
"<extra></extra>"
)
) %>%
layout(
title = list(
text = "<b>Interactive Predicted Income by Education & Race</b>",
font = list(size = 20)
),
xaxis = list(
title = "Education Level",
tickangle = -45,
tickfont = list(size = 12),
titlefont = list(size = 14)
),
yaxis = list(
title = "Predicted Inflation‑Adjusted Income",
tickfont = list(size = 12),
titlefont = list(size = 14)
),
legend = list(
title = list(text = "<b>Race/Ethnicity</b>"),
orientation = "h",
x = 0.3,
y = -0.2
)
)
Link to University of Zurich Interpreting Interactions Link: https://www.ebpi.uzh.ch/dam/jcr%3A5764104b-a3b3-451d-828d-34bed6c804fb/InteractionsStataR20170622.pdf?utm_source=chatgpt.com
library(MASS)
data(birthwt)
birthwt$smoke <- factor(birthwt$smoke, 0:1, c("non-smoker", "smoker"))
birthwt$race <- factor(birthwt$race, 1:3, c("white", "black", "other"))
birthwt$nonwhite <- birthwt$race != "white"
birthwt$nonwhite <- factor(as.numeric(birthwt$nonwhite), 0:1, c("white", "nonwhite"))
head(birthwt[, c("bwt", "low", "smoke", "nonwhite", "age", "lwt")])
## bwt low smoke nonwhite age lwt
## 85 2523 0 non-smoker nonwhite 19 182
## 86 2551 0 non-smoker nonwhite 33 155
## 87 2557 0 smoker white 20 105
## 88 2594 0 smoker white 21 108
## 89 2600 0 smoker white 18 107
## 91 2622 0 non-smoker nonwhite 21 124
# Fit the model
m3 <- lm(bwt ~ smoke * nonwhite, data = birthwt)
# 1) Tidy the m3 output (with confidence intervals)
model3_tidy <- tidy(m3, conf.int = TRUE)
# 2) (Optional) If you wanted to remove any terms, you'd filter here.
# But in this simple model we'll keep all four terms.
# e.g. model3_tidy <- model3_tidy %>% filter(term != "(Intercept)")
# 3) Display via kable
model3_tidy %>%
kbl(
caption = "Model Results:\n bwt ~ smoke * nonwhite",
digits = 3,
booktabs = TRUE,
col.names = c(
"Term", "Estimate", "Std. Error", "t value",
"P‑value", "Conf. Low", "Conf. High"
)
) %>%
kable_styling(full_width = FALSE, position = "center")
Term | Estimate | Std. Error | t value | P‑value | Conf. Low | Conf. High |
---|---|---|---|---|---|---|
(Intercept) | 3428.750 | 102.726 | 33.378 | 0.000 | 3226.086 | 3631.414 |
smokesmoker | -601.904 | 139.577 | -4.312 | 0.000 | -877.270 | -326.537 |
nonwhitenonwhite | -604.243 | 130.737 | -4.622 | 0.000 | -862.170 | -346.316 |
smokesmoker:nonwhitenonwhite | 419.488 | 217.086 | 1.932 | 0.055 | -8.795 | 847.770 |
Reference group: White non‑smokers
Intercept ((Intercept)
=
3 428.7 g)
Estimated mean birthweight for white mothers who
do not smoke (all dummies = 0).
Main effect of smoking (smokesmoker
= –601.9 g)
The effect of smoking among white
mothers: smokers’ babies weigh on average 601.9 g less
than their non‑smoking counterparts.
Main effect of non‑white
(nonwhitenonwhite
= –604.2 g)
The effect of non‑white race among
non‑smokers: non‑white mothers’ babies weigh on average
604.2 g less than white non‑smokers.
Interaction
(smokesmoker:nonwhitenonwhite
= +419.5 g)
The “extra” adjustment when both conditions hold. For non‑white
smokers, the combined main‑effect penalties (–601.9 g for
smoking, –604.2 g for non‑white) are partially offset by
+419.5 g, yielding a net
\[
3\,428.7
- 601.9
- 604.2
+ 419.5
= 2\,642.1\;\text{g}.
\]
This confirms that each coefficient of the interaction
term is interpreted relative to its reference level:
the smoking effect is “for whites,” the race effect is “for
non‑smokers,” and the interaction is the additional departure for
non‑white smokers.
We verify these cell means using the emmeans package:
# install.packages("emmeans") # if you haven’t already
library(emmeans)
# Compute estimated marginal means for each smoke × nonwhite cell
emm1 <- emmeans(
m3,
~ smoke * nonwhite
)
# View the table of EMMs with standard errors and 95% CIs
tidy(emm1, conf.int = TRUE) %>%
kable(
caption = "Model 2: Predicted BWT by Smoker and Race",
digits = 2,
format.args = list(big.mark = ",")
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
smoke | nonwhite | estimate | std.error | df | conf.low | conf.high | statistic | p.value |
---|---|---|---|---|---|---|---|---|
non-smoker | white | 3,428.75 | 102.73 | 185 | 3,226.09 | 3,631.41 | 33.38 | 0 |
smoker | white | 2,826.85 | 94.49 | 185 | 2,640.42 | 3,013.27 | 29.92 | 0 |
non-smoker | nonwhite | 2,824.51 | 80.87 | 185 | 2,664.97 | 2,984.05 | 34.93 | 0 |
smoker | nonwhite | 2,642.09 | 145.28 | 185 | 2,355.48 | 2,928.70 | 18.19 | 0 |