Wage Model by Education * Race

Introduction

In this report we compare two survey‑weighted wage models for Illinois 2000:

Model 1: Interaction‑only — includes only the education_attainment:race_ethnicity term (plus standard controls), omitting separate education or race main effects.
Model 2: Full interaction — includes both the main effects (education_attainment, race_ethnicity) and their interaction (*), alongside the same controls.

A final section then illustrates the very same logic on a real‑world birthweight ∼ smoking × race example.

2.1 Model 1: Interaction‑Only

mod2000    <- svyglm(
  incwage_inflation ~ education_attainment:race_ethnicity
                     + uhrswork + wkswork2
                     + ind + occ + age + chicago_dummy,
  design = design2000
)




# Filter tidy model output (remove ind and occ terms)
model_tidy_filtered <- tidy(mod2000, conf.int = TRUE) %>%
  filter(!grepl("^ind", term), !grepl("^occ", term))

# Create a model equation string
model_equation <- "incwage_inflation ~ education_attainment:race_ethnicity + uhrswork + wkswork2 + ind + occ + age + chicago_dummy"

# Create table with equation as header
kbl(model_tidy_filtered,
    caption = paste("Regression Model Results:\n", model_equation),
    digits = 3,
    booktabs = TRUE,
    col.names = c("Term", "Estimate", "Std. Error", "Statistic", "P-value", "Conf. Low", "Conf. High")) %>%
  kable_styling(full_width = FALSE, position = "center")

Regression Model Results: incwage_inflation ~ education_attainment:race_ethnicity + uhrswork + wkswork2 + ind + occ + age + chicago_dummy
Term	Estimate	Std. Error	Statistic	P-value	Conf. Low	Conf. High
(Intercept)	24894223555867.996	157418434806062.156	0.158	0.874	-283642297184131.250	333430744295867.250
uhrswork	1230.764	31.891	38.592	0.000	1168.258	1293.270
age	900.623	15.238	59.103	0.000	870.756	930.490
chicago_dummy1	-2459.179	500.431	-4.914	0.000	-3440.013	-1478.345
education_attainmentLess than High School:race_ethnicityWhite (non-Hispanic or Latino)	-24894223477285.125	157415906522578.531	-0.158	0.874	-333425788839660.188	283637341885089.938
education_attainmentHigh School Diploma:race_ethnicityWhite (non-Hispanic or Latino)	-24894223472238.305	157426320013545.750	-0.158	0.874	-333446199038003.688	283657752093527.062
education_attainmentSome College:race_ethnicityWhite (non-Hispanic or Latino)	-24894223466668.320	157427085714059.062	-0.158	0.874	-333447699787873.188	283659252854536.562
education_attainmentBachelor’s Degree:race_ethnicityWhite (non-Hispanic or Latino)	-24894223441582.031	157419309464055.656	-0.158	0.874	-333432458491182.250	283644011608018.125
education_attainmentMaster’s Degree or Higher:race_ethnicityWhite (non-Hispanic or Latino)	-24894223413543.625	157418461729768.938	-0.158	0.874	-333430796923390.500	283642350096303.250
education_attainmentLess than High School:race_ethnicityHispanic or Latino	-24894223478645.219	157416433497210.000	-0.158	0.874	-333426821699208.125	283638374741917.625
education_attainmentHigh School Diploma:race_ethnicityHispanic or Latino	-24894223472768.824	157447277589078.938	-0.158	0.874	-333487275405772.375	283698828460234.750
education_attainmentSome College:race_ethnicityHispanic or Latino	-24894223467725.809	157427349782227.375	-0.158	0.874	-333448217356482.312	283659770421030.688
education_attainmentBachelor’s Degree:race_ethnicityHispanic or Latino	-24894223458217.449	157418030016239.281	-0.158	0.874	-333429950819450.562	283641503903015.688
education_attainmentMaster’s Degree or Higher:race_ethnicityHispanic or Latino	-24894223443730.332	157417914640856.812	-0.158	0.874	-333429724671860.688	283641277784400.062
education_attainmentLess than High School:race_ethnicityBlack (non-Hispanic or Latino)	-24894223470886.422	157424464650228.656	-0.158	0.874	-333442562567115.875	283654115625343.000
education_attainmentHigh School Diploma:race_ethnicityBlack (non-Hispanic or Latino)	-24894223471490.832	157419713415945.250	-0.158	0.874	-333433250257527.188	283644803314545.562
education_attainmentSome College:race_ethnicityBlack (non-Hispanic or Latino)	-24894223466551.203	157424209972735.875	-0.158	0.874	-333442063400737.562	283653616467635.188
education_attainmentBachelor’s Degree:race_ethnicityBlack (non-Hispanic or Latino)	-24894223455880.551	157437316621185.094	-0.158	0.874	-333467752120335.875	283679305208574.750
education_attainmentMaster’s Degree or Higher:race_ethnicityBlack (non-Hispanic or Latino)	-24894223441036.793	157419309464055.656	-0.158	0.874	-333432458490637.000	283644011608563.375
education_attainmentLess than High School:race_ethnicityOther (non-Hispanic or Latino)	-24894223482969.691	157416206904170.812	-0.158	0.874	-333426377586374.250	283637930620434.875
education_attainmentHigh School Diploma:race_ethnicityOther (non-Hispanic or Latino)	-24894223475445.223	157418016978118.844	-0.158	0.874	-333429925282261.438	283641478331370.938
education_attainmentSome College:race_ethnicityOther (non-Hispanic or Latino)	-24894223471128.570	157417517908790.250	-0.158	0.874	-333428947113510.312	283640500171253.188
education_attainmentBachelor’s Degree:race_ethnicityOther (non-Hispanic or Latino)	-24894223455165.410	157437391450233.188	-0.158	0.874	-333467898782838.250	283679451872507.375
education_attainmentMaster’s Degree or Higher:race_ethnicityOther (non-Hispanic or Latino)	-24894223440831.969	157416577445372.844	-0.158	0.874	-333427103796491.625	283638656914827.625

When we exclude the main effect, the model crashes. How can we prevent this? Or should we just include the main effects (as seen below?)

Model 2: Main Effects + Interaction

# Model 2: Education * Race interaction
mod2000b <- svyglm(
  incwage_inflation ~ education_attainment * race_ethnicity +
                     uhrswork + wkswork2 +
                     ind + occ + age + chicago_dummy,
  design = design2000
)

# Filter tidy model output (remove ind and occ terms)
model2_tidy_filtered <- tidy(mod2000b, conf.int = TRUE) %>%
  filter(!grepl("^ind", term), !grepl("^occ", term))

# Create model equation string
model2_equation <- "incwage_inflation ~ education_attainment * race_ethnicity + uhrswork + wkswork2 + ind + occ + age + chicago_dummy"

# Display regression results table
kbl(model2_tidy_filtered,
    caption = paste("Model 2 Results:\n", model2_equation),
    digits = 3,
    booktabs = TRUE,
    col.names = c("Term", "Estimate", "Std. Error", "Statistic", "P-value", "Conf. Low", "Conf. High")) %>%
  kable_styling(full_width = FALSE, position = "center")

Model 2 Results: incwage_inflation ~ education_attainment * race_ethnicity + uhrswork + wkswork2 + ind + occ + age + chicago_dummy
Term	Estimate	Std. Error	Statistic	P-value	Conf. Low	Conf. High
(Intercept)	78553.604	5970.008	13.158	0.000	66852.526	90254.682
education_attainmentHigh School Diploma	5047.160	592.361	8.520	0.000	3886.147	6208.173
education_attainmentSome College	10617.119	615.753	17.243	0.000	9410.258	11823.981
education_attainmentBachelor’s Degree	35703.208	838.982	42.555	0.000	34058.823	37347.593
education_attainmentMaster’s Degree or Higher	63741.123	1334.627	47.759	0.000	61125.285	66356.962
race_ethnicityHispanic or Latino	-1360.121	718.242	-1.894	0.058	-2767.859	47.616
race_ethnicityBlack (non-Hispanic or Latino)	6398.267	1271.590	5.032	0.000	3905.979	8890.554
race_ethnicityOther (non-Hispanic or Latino)	-5684.876	1297.491	-4.381	0.000	-8227.929	-3141.824
uhrswork	1230.812	31.961	38.510	0.000	1168.170	1293.455
age	900.622	15.235	59.115	0.000	870.761	930.482
chicago_dummy1	-2458.981	500.471	-4.913	0.000	-3439.893	-1478.069
education_attainmentHigh School Diploma:race_ethnicityHispanic or Latino	829.021	924.551	0.897	0.370	-983.078	2641.119
education_attainmentSome College:race_ethnicityHispanic or Latino	301.862	1028.685	0.293	0.769	-1714.338	2318.062
education_attainmentBachelor’s Degree:race_ethnicityHispanic or Latino	-15275.258	1888.524	-8.088	0.000	-18976.722	-11573.795
education_attainmentMaster’s Degree or Higher:race_ethnicityHispanic or Latino	-28826.559	4609.893	-6.253	0.000	-37861.845	-19791.274
education_attainmentHigh School Diploma:race_ethnicityBlack (non-Hispanic or Latino)	-5651.528	1366.415	-4.136	0.000	-8329.671	-2973.386
education_attainmentSome College:race_ethnicityBlack (non-Hispanic or Latino)	-6281.893	1383.763	-4.540	0.000	-8994.036	-3569.750
education_attainmentBachelor’s Degree:race_ethnicityBlack (non-Hispanic or Latino)	-20696.615	1747.421	-11.844	0.000	-24121.520	-17271.709
education_attainmentMaster’s Degree or Higher:race_ethnicityBlack (non-Hispanic or Latino)	-33890.941	3072.765	-11.029	0.000	-39913.489	-27868.393
education_attainmentHigh School Diploma:race_ethnicityOther (non-Hispanic or Latino)	2477.261	1829.004	1.354	0.176	-1107.545	6062.068
education_attainmentSome College:race_ethnicityOther (non-Hispanic or Latino)	1224.228	1604.974	0.763	0.446	-1921.484	4369.939
education_attainmentBachelor’s Degree:race_ethnicityOther (non-Hispanic or Latino)	-7898.584	1977.978	-3.993	0.000	-11775.376	-4021.792
education_attainmentMaster’s Degree or Higher:race_ethnicityOther (non-Hispanic or Latino)	-21603.137	3187.711	-6.777	0.000	-27850.978	-15355.297

Interpretation of Main Effects (Conditional on Reference Levels)

Note: When an interaction is in the model, each main‐effect coefficient represents the effect only at the reference level of the other variable.

Intercept
- Value: 78,553.60
- Meaning: Predicted income for White individuals with Less than HS, holding controls at their baselines (0 hours, age 0, non‑Chicago).
High School Diploma (education_attainmentHigh School Diploma)
- Estimate: +5,047.16 (p < .001)
- Meaning: Among Whites, earning a HS diploma adds $5,047.16 versus Less than HS.
- Check:
```
62,121.87 - 57,074.71 = 5,047.16
```
Some College (education_attainmentSome College)
- Estimate: +10,617.12 (p < .001)
- Meaning: Among Whites, Some College adds $10,617.12 versus Less than HS.
- Check:
```
67,691.83 - 57,074.71 = 10,617.12
```
Bachelor’s Degree (education_attainmentBachelor's Degree)
- Estimate: +35,703.21 (p < .001)
- Meaning: Among Whites, a Bachelor’s adds $35,703.21 versus Less than HS.
- Check:
```
92,777.92 - 57,074.71 = 35,703.21
```
Master’s Degree or Higher (education_attainmentMaster's Degree or Higher)
- Estimate: +63,741.12 (p < .001)
- Meaning: Among Whites, a Master’s+ adds $63,741.12 versus Less than HS.
- Check:
```
120,815.83 - 57,074.71 = 63,741.12
```
Hispanic or Latino Main Effect (race_ethnicityHispanic or Latino)
- Estimate: –1,360.12 (p = .058)
- Meaning: Among Less than HS, Hispanics earn $1,360.12 less than Whites.
- Check:
```
55,714.59 - 57,074.71 = -1,360.12
```

Interpretation of Interaction with Hispanic or Latino

Each interaction term shows how the Hispanic effect changes at each education level relative to Whites.

HS Diploma × Hispanic (…High School Diploma:race_ethnicityHispanic or Latino)
- Estimate: +829.02 (n.s.)
- Meaning: The Hispanic HS premium is $829 above the White HS premium.
- Check:
```
(61,590.77 - 55,714.59) - (62,121.87 - 57,074.71)
= 5,876.18 - 5,047.16 = 829.02
```
Some College × Hispanic
- Estimate: +301.86 (n.s.)
- Meaning: Hispanic Some College premium is $301.86 above the White Some College premium.
- Check: analogous to above.
Bachelor’s × Hispanic
- Estimate: –15,275.26 (p < .001)
- Meaning: Hispanic Bachelor’s return is $15,275.26 below the White Bachelor’s return.
Master’s+ × Hispanic
- Estimate: –28,826.56 (p < .001)
- Meaning: Hispanic Master’s+ return is $28,826.56 below the White Master’s+ return.
- Check:
```
(90,629.15 - 55,714.59) - (120,815.83 - 57,074.71)
= 34,914.56 - 63,741.12 = -28,826.56
```

Take‑away:
- The education main effects show the return to each credential for Whites (reference race).
- The race main effect shows the race gap at Less than HS (reference education).
- The interaction terms quantify how the Hispanic return to each credential differs from the White return—i.e. the difference in slopes.
- All checks use the predicted means you see in the Emmeans post‑hoc table.

Estimated Marginal Means & Plot (Income by Education and Race Category to confirm the results shown above)

# emmeans post-hoc table for Model 2
emm2000b <- emmeans(
  mod2000b,
  ~ education_attainment * race_ethnicity,
  nuisance = c("ind", "occ", "age", "uhrswork", "wkswork2", "chicago_dummy")
)

tidy(emm2000b, conf.int = TRUE) %>%
  kable(
    caption = "Model 2: Predicted Income by Education and Race (2000) Post-Hoc Results Emmmeans",
    digits = 2,
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

Model 2: Predicted Income by Education and Race (2000) Post-Hoc Results Emmmeans
education_attainment	race_ethnicity	estimate	std.error	df	conf.low	conf.high	statistic
Less than High School	White (non-Hispanic or Latino)	57,074.71	748.33	181,458	55,608.00	58,541.42	76.27
High School Diploma	White (non-Hispanic or Latino)	62,121.87	575.64	181,458	60,993.62	63,250.12	107.92
Some College	White (non-Hispanic or Latino)	67,691.83	574.50	181,458	66,565.83	68,817.83	117.83
Bachelor’s Degree	White (non-Hispanic or Latino)	92,777.92	738.84	181,458	91,329.80	94,226.04	125.57
Master’s Degree or Higher	White (non-Hispanic or Latino)	120,815.83	1,226.07	181,458	118,412.76	123,218.91	98.54
Less than High School	Hispanic or Latino	55,714.59	675.74	181,458	54,390.16	57,039.02	82.45
High School Diploma	Hispanic or Latino	61,590.77	787.51	181,458	60,047.26	63,134.28	78.21
Some College	Hispanic or Latino	66,633.57	886.77	181,458	64,895.52	68,371.63	75.14
Bachelor’s Degree	Hispanic or Latino	76,142.54	1,749.97	181,458	72,712.63	79,572.44	43.51
Master’s Degree or Higher	Hispanic or Latino	90,629.15	4,448.81	181,458	81,909.58	99,348.73	20.37
Less than High School	Black (non-Hispanic or Latino)	63,472.98	1,248.94	181,458	61,025.09	65,920.86	50.82
High School Diploma	Black (non-Hispanic or Latino)	62,868.61	696.08	181,458	61,504.31	64,232.91	90.32
Some College	Black (non-Hispanic or Latino)	67,808.20	705.81	181,458	66,424.83	69,191.57	96.07
Bachelor’s Degree	Black (non-Hispanic or Latino)	78,479.57	1,184.30	181,458	76,158.38	80,800.76	66.27
Master’s Degree or Higher	Black (non-Hispanic or Latino)	93,323.16	2,650.64	181,458	88,127.97	98,518.35	35.21
Less than High School	Other (non-Hispanic or Latino)	51,389.83	1,281.76	181,458	48,877.62	53,902.05	40.09
High School Diploma	Other (non-Hispanic or Latino)	58,914.26	1,378.02	181,458	56,213.37	61,615.15	42.75
Some College	Other (non-Hispanic or Latino)	63,231.18	1,051.22	181,458	61,170.82	65,291.54	60.15
Bachelor’s Degree	Other (non-Hispanic or Latino)	79,194.46	1,511.72	181,458	76,231.53	82,157.39	52.39
Master’s Degree or Higher	Other (non-Hispanic or Latino)	93,527.82	2,722.86	181,458	88,191.08	98,864.56	34.35

Plot

library(plotly)

# Use the same plot_df you already built:
# plot_df has columns: Education (factor), Race, Income, CI_Low, CI_High
# 2) Convert to a plain data.frame
plot_df <- as.data.frame(emm2000b)
# 1) Subset to just White & Hispanic or Latino
plot_df_sub <- plot_df %>%
  filter(race_ethnicity %in% c(
    "White (non-Hispanic or Latino)",
    "Hispanic or Latino"
  ))

# 2) Interactive Plotly chart
plot_ly(
  data = plot_df_sub,
  x = ~education_attainment,
  y = ~emmean,
  color = ~race_ethnicity,
  colors = RColorBrewer::brewer.pal(2, "Set1"),
  type = 'scatter',
  mode = 'lines+markers',
  error_y = list(
    type       = "data",
    array      = ~emmean - lower.CL,
    arrayminus = ~upper.CL - emmean,
    thickness  = 1.5,
    width      = 5
  ),
  text = ~race_ethnicity,
  hovertemplate = paste(
    "<b>%{text}</b><br>",
    "Education: %{x}<br>",
    "Predicted Income: $%{y:,.0f}<br>",
    "95% CI: [ %{y-error_y.array:,.0f}, %{y+error_y.arrayminus:,.0f} ]",
    "<extra></extra>"
  )
) %>%
  layout(
    title = list(
      text = "<b>Interactive Predicted Income by Education & Race</b>",
      font = list(size = 20)
    ),
    xaxis = list(
      title     = "Education Level",
      tickangle = -45,
      tickfont  = list(size = 12),
      titlefont = list(size = 14)
    ),
    yaxis = list(
      title     = "Predicted Inflation‑Adjusted Income",
      tickfont  = list(size = 12),
      titlefont = list(size = 14)
    ),
    legend = list(
      title       = list(text = "<b>Race/Ethnicity</b>"),
      orientation = "h",
      x           = 0.3,
      y           = -0.2
    )
  )

Real‑World Example: Birthweight ~ Smoking * Race

Link to University of Zurich Interpreting Interactions Link: https://www.ebpi.uzh.ch/dam/jcr%3A5764104b-a3b3-451d-828d-34bed6c804fb/InteractionsStataR20170622.pdf?utm_source=chatgpt.com

library(MASS)
data(birthwt)
birthwt$smoke <- factor(birthwt$smoke, 0:1, c("non-smoker", "smoker"))
birthwt$race <- factor(birthwt$race, 1:3, c("white", "black", "other"))
birthwt$nonwhite <- birthwt$race != "white"
birthwt$nonwhite <- factor(as.numeric(birthwt$nonwhite), 0:1, c("white", "nonwhite"))
head(birthwt[, c("bwt", "low", "smoke", "nonwhite", "age", "lwt")])

##     bwt low      smoke nonwhite age lwt
## 85 2523   0 non-smoker nonwhite  19 182
## 86 2551   0 non-smoker nonwhite  33 155
## 87 2557   0     smoker    white  20 105
## 88 2594   0     smoker    white  21 108
## 89 2600   0     smoker    white  18 107
## 91 2622   0 non-smoker nonwhite  21 124

# Fit the model
m3 <- lm(bwt ~ smoke * nonwhite, data = birthwt)

# 1) Tidy the m3 output (with confidence intervals)
model3_tidy <- tidy(m3, conf.int = TRUE)

# 2) (Optional) If you wanted to remove any terms, you'd filter here.
#    But in this simple model we'll keep all four terms.
#    e.g. model3_tidy <- model3_tidy %>% filter(term != "(Intercept)")

# 3) Display via kable
model3_tidy %>%
  kbl(
    caption   = "Model Results:\n bwt ~ smoke * nonwhite",
    digits    = 3,
    booktabs  = TRUE,
    col.names = c(
      "Term", "Estimate", "Std. Error", "t value",
      "P‑value", "Conf. Low", "Conf. High"
    )
  ) %>%
  kable_styling(full_width = FALSE, position = "center")

Model Results: bwt ~ smoke * nonwhite
Term	Estimate	Std. Error	t value	P‑value	Conf. Low	Conf. High
(Intercept)	3428.750	102.726	33.378	0.000	3226.086	3631.414
smokesmoker	-601.904	139.577	-4.312	0.000	-877.270	-326.537
nonwhitenonwhite	-604.243	130.737	-4.622	0.000	-862.170	-346.316
smokesmoker:nonwhitenonwhite	419.488	217.086	1.932	0.055	-8.795	847.770

Reference group: White non‑smokers

Intercept ((Intercept) = 3 428.7 g)
Estimated mean birthweight for white mothers who do not smoke (all dummies = 0).
Main effect of smoking (smokesmoker = –601.9 g)
The effect of smoking among white mothers: smokers’ babies weigh on average 601.9 g less than their non‑smoking counterparts.
Main effect of non‑white (nonwhitenonwhite = –604.2 g)
The effect of non‑white race among non‑smokers: non‑white mothers’ babies weigh on average 604.2 g less than white non‑smokers.
Interaction (smokesmoker:nonwhitenonwhite = +419.5 g)
The “extra” adjustment when both conditions hold. For non‑white smokers, the combined main‑effect penalties (–601.9 g for smoking, –604.2 g for non‑white) are partially offset by +419.5 g, yielding a net
\[ 3\,428.7 - 601.9 - 604.2 + 419.5 = 2\,642.1\;\text{g}. \]
This confirms that each coefficient of the interaction term is interpreted relative to its reference level: the smoking effect is “for whites,” the race effect is “for non‑smokers,” and the interaction is the additional departure for non‑white smokers.

We verify these cell means using the emmeans package:

# install.packages("emmeans")  # if you haven’t already
library(emmeans)

# Compute estimated marginal means for each smoke × nonwhite cell
emm1 <- emmeans(
  m3,
  ~ smoke * nonwhite
)

# View the table of EMMs with standard errors and 95% CIs

tidy(emm1, conf.int = TRUE) %>%
  kable(
    caption = "Model 2: Predicted BWT by Smoker and Race",
    digits = 2,
    format.args = list(big.mark = ",")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))

Model 2: Predicted BWT by Smoker and Race
smoke	nonwhite	estimate	std.error	df	conf.low	conf.high	statistic
non-smoker	white	3,428.75	102.73	185	3,226.09	3,631.41	33.38
smoker	white	2,826.85	94.49	185	2,640.42	3,013.27	29.92
non-smoker	nonwhite	2,824.51	80.87	185	2,664.97	2,984.05	34.93
smoker	nonwhite	2,642.09	145.28	185	2,355.48	2,928.70	18.19