At ages 25–35, homeownership fell substantially from the Boomer cohort
to the GenZ cohort. The race × cohort extension
(RaceCohort_2026-04-26.Rmd) showed that the gap widened most for
non-Black households (because Black ownership was already chronically
low under Boomers).
This analysis asks a different question: how much of the cohort gap is “endowments” — GenZ has worse FAI, lower real income, less marriage, etc. — and how much is “coefficients” — the same FAI translates to a worse ownership probability for GenZ than it did for Boomers?
The standard tool is the Oaxaca–Blinder decomposition. Because the outcome is binary (own/rent), I use the Linear Probability Model (LPM) version, which is the convention in labor economics and which gives marginal effects directly comparable across groups. A Fairlie (non-linear) check is in Section 4 as robustness.
Section 5 does a complementary decomposition: instead of decomposing the ownership gap, I decompose the FAI gap itself into a “price growth” component and an “income stagnation” component.
library(data.table)
library(dplyr)
library(ggplot2)
library(lmtest)
library(sandwich)
library(broom)
in_dir <- "~/Desktop/DA6833/FinancialNihilism"
model_dt <- readRDS(file.path(in_dir, "model_dt.rds"))
setDT(model_dt)
stopifnot(levels(model_dt$GENERATION) ==
c("Boomer", "GenX", "Millennial", "GenZ"))
cat("Rows loaded :", nrow(model_dt), "\n")
## Rows loaded : 3150831
cat("Cohort sizes:\n"); print(model_dt[, .N, by = GENERATION])
## Cohort sizes:
## GENERATION N
## <fctr> <int>
## 1: Boomer 724067
## 2: GenX 684986
## 3: Millennial 1602049
## 4: GenZ 139729
gap_table <- model_dt[, .(
own_rate = round(weighted.mean(own, w), 4),
mean_FAI = round(weighted.mean(FAI_pi, w, na.rm = TRUE), 3),
n = .N
), by = GENERATION][order(GENERATION)]
gap_table
## GENERATION own_rate mean_FAI n
## <fctr> <num> <num> <int>
## 1: Boomer 0.5702 -0.119 724067
## 2: GenX 0.5588 -0.067 684986
## 3: Millennial 0.5210 0.123 1602049
## 4: GenZ 0.4671 0.246 139729
boomer_rate <- gap_table[GENERATION == "Boomer", own_rate]
genz_rate <- gap_table[GENERATION == "GenZ", own_rate]
cat("\nBoomer 25-35 ownership rate :", boomer_rate, "\n")
##
## Boomer 25-35 ownership rate : 0.5702
cat("GenZ 25-35 ownership rate :", genz_rate, "\n")
## GenZ 25-35 ownership rate : 0.4671
cat("Raw gap to decompose :", round(boomer_rate - genz_rate, 4), "\n")
## Raw gap to decompose : 0.1031
Of that raw gap, how much is “GenZ has different X” vs. “GenZ gets a different return on the same X”?
I subset to the two extreme cohorts (Boomer, GenZ) so the decomposition is between two well-defined groups. GenX and Millennial are dropped here — Section 3 generalizes to all four cohorts.
The covariates decomposed over are the same ones used in m1_pi:
FAI_pi, MARST, RACE_cat, HISPAN, EDUC_cat. Factor variables
are expanded to dummies so means and coefficients are computed on the
same expanded matrix.
ob_dt <- model_dt[GENERATION %in% c("Boomer", "GenZ") & !is.na(FAI_pi)]
ob_dt[, GENERATION := droplevels(GENERATION)]
X_full <- model.matrix(
~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat,
data = ob_dt
)
intercept_col <- which(colnames(X_full) == "(Intercept)")
ob_dt$.row <- seq_len(nrow(ob_dt))
boom_idx <- ob_dt[GENERATION == "Boomer", .row]
genz_idx <- ob_dt[GENERATION == "GenZ", .row]
X_boom <- X_full[boom_idx, , drop = FALSE]
X_genz <- X_full[genz_idx, , drop = FALSE]
y_boom <- ob_dt$own[boom_idx]
y_genz <- ob_dt$own[genz_idx]
w_boom <- ob_dt$w[boom_idx]
w_genz <- ob_dt$w[genz_idx]
cat("Boomer rows :", length(y_boom), "\n")
## Boomer rows : 724067
cat("GenZ rows :", length(y_genz), "\n")
## GenZ rows : 139729
# Weighted LPM, separately for each cohort
lpm_boom <- lm.wfit(X_boom, y_boom, w = w_boom)
lpm_genz <- lm.wfit(X_genz, y_genz, w = w_genz)
beta_boom <- coef(lpm_boom)
beta_genz <- coef(lpm_genz)
# Pooled LPM (Neumark 1988): use pooled β as the reference β*
pool_fit <- lm.wfit(X_full, ob_dt$own, w = ob_dt$w)
beta_star <- coef(pool_fit)
# Weighted means of X within each group
xbar_boom <- colSums(X_boom * w_boom) / sum(w_boom)
xbar_genz <- colSums(X_genz * w_genz) / sum(w_genz)
# Sanity check: predicted means should equal observed weighted means
pred_boom <- sum(xbar_boom * beta_boom)
pred_genz <- sum(xbar_genz * beta_genz)
ybar_boom <- weighted.mean(y_boom, w_boom)
ybar_genz <- weighted.mean(y_genz, w_genz)
cat("Boomer mean own (data) :", round(ybar_boom, 4), "\n")
## Boomer mean own (data) : 0.5702
cat("Boomer mean own (LPM) :", round(pred_boom, 4), "\n")
## Boomer mean own (LPM) : 0.5702
cat("GenZ mean own (data) :", round(ybar_genz, 4), "\n")
## GenZ mean own (data) : 0.4671
cat("GenZ mean own (LPM) :", round(pred_genz, 4), "\n")
## GenZ mean own (LPM) : 0.4671
Ȳ_Boom - Ȳ_GenZ = (X̄_Boom - X̄_GenZ)·β* + [X̄_Boom·(β_Boom - β*) + X̄_GenZ·(β* - β_GenZ)]
The first term is the explained (endowments) component: GenZ would own less than Boomers even if both groups got the same returns, simply because their characteristics are worse. The second term is the unexplained (coefficients) component: even with identical characteristics, GenZ’s mapping from X to ownership differs from Boomers’.
delta_X <- xbar_boom - xbar_genz
explained <- sum(delta_X * beta_star)
unexplain <- sum(xbar_boom * (beta_boom - beta_star)) +
sum(xbar_genz * (beta_star - beta_genz))
raw_gap <- ybar_boom - ybar_genz
twofold <- data.table(
component = c("Raw gap (Boomer - GenZ)",
"Explained (endowments)",
"Unexplained (coefficients)",
"Sum (sanity check)"),
value = c(raw_gap, explained, unexplain, explained + unexplain),
share_of_gap = c(1,
explained / raw_gap,
unexplain / raw_gap,
(explained + unexplain) / raw_gap)
)
twofold[, value := round(value, 4)]
twofold[, share_of_gap := round(share_of_gap, 3)]
twofold
## component value share_of_gap
## <char> <num> <num>
## 1: Raw gap (Boomer - GenZ) 0.1031 1.000
## 2: Explained (endowments) 0.1201 1.165
## 3: Unexplained (coefficients) -0.0170 -0.165
## 4: Sum (sanity check) 0.1031 1.000
For each covariate column k:
Dummy columns are aggregated back to their factor name for readability.
explained_k <- delta_X * beta_star
unexplain_k <- xbar_boom * (beta_boom - beta_star) +
xbar_genz * (beta_star - beta_genz)
col_names <- names(explained_k)
group <- fcase(
col_names == "(Intercept)", "(Intercept)",
col_names == "FAI_pi", "FAI_pi",
startsWith(col_names, "MARST"), "MARST",
startsWith(col_names, "RACE_cat"), "RACE_cat",
startsWith(col_names, "HISPAN"), "HISPAN",
startsWith(col_names, "EDUC_cat"), "EDUC_cat"
)
per_var <- data.table(
variable = group,
explained = explained_k,
unexplained = unexplain_k
)[, .(explained = sum(explained),
unexplained = sum(unexplained)), by = variable]
per_var[, total := explained + unexplained]
per_var[, share_of_gap_explained := round(explained / raw_gap, 3)]
per_var[, share_of_gap_unexplained := round(unexplained / raw_gap, 3)]
per_var[, c("explained", "unexplained", "total") :=
.(round(explained, 4), round(unexplained, 4), round(total, 4))]
per_var[order(-abs(total))]
## variable explained unexplained total share_of_gap_explained
## <char> <num> <num> <num> <num>
## 1: (Intercept) 0.0000 0.0801 0.0801 0.000
## 2: MARST 0.0623 -0.1128 -0.0505 0.604
## 3: FAI_pi 0.0384 -0.0020 0.0364 0.372
## 4: EDUC_cat -0.0057 0.0407 0.0349 -0.055
## 5: RACE_cat 0.0065 -0.0024 0.0040 0.063
## 6: HISPAN 0.0187 -0.0205 -0.0018 0.181
## share_of_gap_unexplained
## <num>
## 1: 0.777
## 2: -1.094
## 3: -0.020
## 4: 0.394
## 5: -0.024
## 6: -0.199
Block bootstrap by year-cluster: resample years with replacement,
refit on each replicate, recompute explained/unexplained. The
pre-built X_full matrix is reused inside the loop to avoid
redundant model.matrix() calls.
set.seed(2026)
years <- sort(unique(ob_dt$YEAR))
B <- 200
year_rows <- split(seq_len(nrow(ob_dt)), ob_dt$YEAR)
gen_lab <- ob_dt$GENERATION
y_vec <- ob_dt$own
w_vec <- ob_dt$w
boot_out <- vector("list", B)
for (b in seq_len(B)) {
yr_sample <- sample(years, size = length(years), replace = TRUE)
idx <- unlist(year_rows[as.character(yr_sample)], use.names = FALSE)
Xb <- X_full[idx, , drop = FALSE]
yb <- y_vec[idx]; wb <- w_vec[idx]; gb_lab <- gen_lab[idx]
bidx <- which(gb_lab == "Boomer")
gidx <- which(gb_lab == "GenZ")
if (length(bidx) < 50 || length(gidx) < 50) { rm(Xb, yb, wb, gb_lab); next }
fb <- tryCatch(lm.wfit(Xb[bidx, , drop = FALSE], yb[bidx], w = wb[bidx]),
error = function(e) NULL)
fg <- tryCatch(lm.wfit(Xb[gidx, , drop = FALSE], yb[gidx], w = wb[gidx]),
error = function(e) NULL)
fp <- tryCatch(lm.wfit(Xb, yb, w = wb), error = function(e) NULL)
if (is.null(fb) || is.null(fg) || is.null(fp)) {
rm(Xb, yb, wb, gb_lab); next
}
bbeta <- coef(fb); gbeta <- coef(fg); sbeta <- coef(fp)
bbeta[is.na(bbeta)] <- 0; gbeta[is.na(gbeta)] <- 0; sbeta[is.na(sbeta)] <- 0
xb_b <- colSums(Xb[bidx, , drop = FALSE] * wb[bidx]) / sum(wb[bidx])
xg_b <- colSums(Xb[gidx, , drop = FALSE] * wb[gidx]) / sum(wb[gidx])
boot_out[[b]] <- c(
explained = sum((xb_b - xg_b) * sbeta),
unexplained = sum(xb_b * (bbeta - sbeta)) + sum(xg_b * (sbeta - gbeta))
)
rm(Xb, yb, wb, gb_lab, bidx, gidx, fb, fg, fp,
bbeta, gbeta, sbeta, xb_b, xg_b)
if (b %% 20 == 0) gc(verbose = FALSE)
}
boot_dt <- rbindlist(lapply(boot_out, function(x)
if (is.null(x)) NULL else as.data.table(as.list(x))))
boot_summary <- data.table(
n_reps = nrow(boot_dt),
se_explained = sd(boot_dt$explained),
ci_lo_exp = quantile(boot_dt$explained, 0.025),
ci_hi_exp = quantile(boot_dt$explained, 0.975),
se_unexplain = sd(boot_dt$unexplained),
ci_lo_unx = quantile(boot_dt$unexplained, 0.025),
ci_hi_unx = quantile(boot_dt$unexplained, 0.975)
)
print(round(boot_summary, 4))
## n_reps se_explained ci_lo_exp ci_hi_exp se_unexplain ci_lo_unx ci_hi_unx
## <num> <num> <num> <num> <num> <num> <num>
## 1: 175 0.0152 0.0969 0.1532 0.0168 -0.0497 0.0078
Boomer→GenX, GenX→Millennial, Millennial→GenZ. This shows whether the explained/unexplained mix is stable across the chain or whether one transition is structurally distinct from the others.
do_ob <- function(g_old, g_new) {
d <- model_dt[GENERATION %in% c(g_old, g_new) & !is.na(FAI_pi)]
d[, GENERATION := droplevels(GENERATION)]
Xf <- model.matrix(
~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat, data = d
)
i_o <- which(d$GENERATION == g_old)
i_n <- which(d$GENERATION == g_new)
fb <- lm.wfit(Xf[i_o, , drop = FALSE], d$own[i_o], w = d$w[i_o])
fg <- lm.wfit(Xf[i_n, , drop = FALSE], d$own[i_n], w = d$w[i_n])
fp <- lm.wfit(Xf, d$own, w = d$w)
bb <- coef(fb); gb <- coef(fg); sb <- coef(fp)
bb[is.na(bb)] <- 0; gb[is.na(gb)] <- 0; sb[is.na(sb)] <- 0
xo <- colSums(Xf[i_o, , drop = FALSE] * d$w[i_o]) / sum(d$w[i_o])
xn <- colSums(Xf[i_n, , drop = FALSE] * d$w[i_n]) / sum(d$w[i_n])
raw <- weighted.mean(d$own[i_o], d$w[i_o]) -
weighted.mean(d$own[i_n], d$w[i_n])
exp_t <- sum((xo - xn) * sb)
unx_t <- sum(xo * (bb - sb)) + sum(xn * (sb - gb))
data.table(transition = paste(g_old, "->", g_new),
raw_gap = raw, explained = exp_t, unexplained = unx_t,
share_explained = exp_t / raw, share_unexplained = unx_t / raw)
}
pair_table <- rbind(
do_ob("Boomer", "GenX"),
do_ob("GenX", "Millennial"),
do_ob("Millennial", "GenZ"),
do_ob("Boomer", "GenZ")
)
pair_table[, c("raw_gap", "explained", "unexplained") :=
.(round(raw_gap, 4), round(explained, 4), round(unexplained, 4))]
pair_table[, c("share_explained", "share_unexplained") :=
.(round(share_explained, 3), round(share_unexplained, 3))]
pair_table
## transition raw_gap explained unexplained share_explained
## <char> <num> <num> <num> <num>
## 1: Boomer -> GenX 0.0114 0.0400 -0.0286 3.517
## 2: GenX -> Millennial 0.0378 0.0409 -0.0031 1.082
## 3: Millennial -> GenZ 0.0539 0.0375 0.0164 0.696
## 4: Boomer -> GenZ 0.1031 0.1201 -0.0170 1.165
## share_unexplained
## <num>
## 1: -2.517
## 2: -0.082
## 3: 0.304
## 4: -0.165
Linear probability is fine for marginal effects in the interior, but it can predict outside [0, 1] and it doesn’t respect the binary nature of the outcome. The Fairlie (1999, 2005) extension simulates counterfactual probabilities from a logit. The headline Boomer-vs-GenZ gap is decomposed here and compared against the LPM result in Section 2a.
ob_dt2 <- model_dt[GENERATION %in% c("Boomer", "GenZ") & !is.na(FAI_pi)]
ob_dt2[, GENERATION := droplevels(GENERATION)]
# Fit logit on Boomers (the "advantaged" group) — Fairlie's convention
logit_boom <- glm(
own ~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat,
data = ob_dt2[GENERATION == "Boomer"],
weights = w, family = binomial(link = "logit")
)
set.seed(2026)
genz_d <- ob_dt2[GENERATION == "GenZ"]
boom_d <- ob_dt2[GENERATION == "Boomer"]
# GenZ predicted at Boomer coefficients (used for reference)
p_genz_boomerbeta <- predict(logit_boom, newdata = genz_d, type = "response")
mean_genz_boomerbeta <- weighted.mean(p_genz_boomerbeta, genz_d$w)
# Fairlie X-swap: for each GenZ obs, draw a random Boomer's characteristics
# and predict under Boomer coefficients. Average over 20 draws for stability.
needed_cols <- c("FAI_pi", "MARST", "RACE_cat", "HISPAN", "EDUC_cat")
boom_X <- boom_d[, ..needed_cols]
n_draws <- 20
n_genz <- nrow(genz_d)
p_swap_X <- numeric(n_genz)
for (d in seq_len(n_draws)) {
idx <- sample.int(nrow(boom_X), size = n_genz, replace = TRUE)
p_swap_X <- p_swap_X +
predict(logit_boom, newdata = boom_X[idx], type = "response")
if (d %% 5 == 0) gc(verbose = FALSE)
}
p_swap_X <- p_swap_X / n_draws
mean_genz_boomerX <- weighted.mean(p_swap_X, genz_d$w)
rm(boom_X)
mean_boom <- weighted.mean(boom_d$own, boom_d$w)
mean_genz <- weighted.mean(genz_d$own, genz_d$w)
# Fairlie partition:
# Explained = how much GenZ ownership rises when given Boomer X
# Unexplained = remaining gap after X equalization
fairlie_explained <- mean_genz_boomerX - mean_genz
fairlie_unexplained <- mean_boom - mean_genz_boomerX
fairlie_total <- mean_boom - mean_genz
cat("Fairlie decomposition (binary, Boomer beta as reference):\n")
## Fairlie decomposition (binary, Boomer beta as reference):
cat(" Total gap (data) :", round(fairlie_total, 4), "\n")
## Total gap (data) : 0.1031
cat(" Explained (X swap) :", round(fairlie_explained, 4),
" share:", round(fairlie_explained / fairlie_total, 3), "\n")
## Explained (X swap) : 0.1074 share: 1.041
cat(" Unexplained :", round(fairlie_unexplained, 4),
" share:", round(fairlie_unexplained / fairlie_total, 3), "\n")
## Unexplained : -0.0043 share: -0.041
cat(" Sum :",
round(fairlie_explained + fairlie_unexplained, 4), "\n")
## Sum : 0.1031
The Oaxaca decomposition above shows that endowments account for the ownership gap, with FAI as a primary contributor. This section opens up FAI itself: since FAI_pi is a scaled version of price_to_income_hh = valueh_real / hhinc_real, a worse FAI reflects either higher real home values, lower real household income, or both.
Using the additive property of logarithms, the gap in log(P/I) between Boomer and GenZ households decomposes exactly into a price-growth component and an income component: Δ log(P/I) = Δ log(P) − Δ log(I)
fai_cols <- intersect(c("price_to_income_hh", "valueh_real",
"hhinc_real", "VALUEH", "HHINCOME"),
names(model_dt))
print(fai_cols)
## [1] "price_to_income_hh" "valueh_real" "hhinc_real"
## [4] "VALUEH" "HHINCOME"
req_cols <- c("price_to_income_hh", "valueh_real", "hhinc_real")
if (all(req_cols %in% names(model_dt))) {
fai_input <- model_dt[GENERATION %in% c("Boomer", "GenZ") &
!is.na(price_to_income_hh) &
!is.na(valueh_real) & !is.na(hhinc_real)]
fai_input[, GENERATION := droplevels(GENERATION)]
fai_input[, log_pi := log(price_to_income_hh)]
fai_input[, log_p := log(valueh_real)]
fai_input[, log_i := log(hhinc_real)]
decomp <- fai_input[, .(
mean_log_pi = weighted.mean(log_pi, w),
mean_log_p = weighted.mean(log_p, w),
mean_log_i = weighted.mean(log_i, w)
), by = GENERATION]
decomp
d_log_pi <- decomp[GENERATION == "GenZ", mean_log_pi] -
decomp[GENERATION == "Boomer", mean_log_pi]
d_log_p <- decomp[GENERATION == "GenZ", mean_log_p] -
decomp[GENERATION == "Boomer", mean_log_p]
d_log_i <- decomp[GENERATION == "GenZ", mean_log_i] -
decomp[GENERATION == "Boomer", mean_log_i]
fai_dec <- data.table(
component = c("Δ log(P/I) total",
"Δ log(P) price growth",
"Δ log(I) income (sign flipped)"),
value = c(d_log_pi, d_log_p, -d_log_i),
share = c(1, d_log_p / d_log_pi, -d_log_i / d_log_pi)
)
fai_dec[, value := round(value, 4)]
fai_dec[, share := round(share, 3)]
fai_dec
}
Real home values rose substantially between the Boomer and GenZ observation windows, while real household incomes also grew but by less. The price-growth component therefore accounts for more than the total gap in log(P/I), with income growth partially — but not fully — offsetting it.
Year clustering. With only 8 survey years, the cluster-bootstrap CIs are based on a small number of clusters. Results should be interpreted accordingly.
Composition of cohorts in the sample. Boomers aged 25–35 are observed primarily in 1980 and 1990; GenZ aged 25–35 is observed in 2024 only. The decomposition compares 25–35 year-olds across calendar time, which is the design intent, but cohort and macroeconomic conditions at prime buying age are necessarily collinear.
Selection into the sample. GenZ at 25–35 has an unusually high share still living with parents and not appearing as their own household unit. If those non-householders are systematically lower-income, the decomposition understates the true FAI gap for GenZ.