Structural Drivers of Financial Nihilism — Oaxaca-Blinder Decomposition

Why did the Boomer–GenZ ownership gap (ages 25–35) widen?

Olivia Shipley

2026-04-26

At ages 25–35, homeownership fell substantially from the Boomer cohort to the GenZ cohort. The race × cohort extension (RaceCohort_2026-04-26.Rmd) showed that the gap widened most for non-Black households (because Black ownership was already chronically low under Boomers).

This analysis asks a different question: how much of the cohort gap is “endowments” — GenZ has worse FAI, lower real income, less marriage, etc. — and how much is “coefficients” — the same FAI translates to a worse ownership probability for GenZ than it did for Boomers?

The standard tool is the Oaxaca–Blinder decomposition. Because the outcome is binary (own/rent), I use the Linear Probability Model (LPM) version, which is the convention in labor economics and which gives marginal effects directly comparable across groups. A Fairlie (non-linear) check is in Section 4 as robustness.

Section 5 does a complementary decomposition: instead of decomposing the ownership gap, I decompose the FAI gap itself into a “price growth” component and an “income stagnation” component.

0. Setup

library(data.table)
library(dplyr)
library(ggplot2)
library(lmtest)
library(sandwich)
library(broom)
in_dir   <- "~/Desktop/DA6833/FinancialNihilism"
model_dt <- readRDS(file.path(in_dir, "model_dt.rds"))
setDT(model_dt)

stopifnot(levels(model_dt$GENERATION) ==
          c("Boomer", "GenX", "Millennial", "GenZ"))

cat("Rows loaded :", nrow(model_dt), "\n")
## Rows loaded : 3150831
cat("Cohort sizes:\n"); print(model_dt[, .N, by = GENERATION])
## Cohort sizes:
##    GENERATION       N
##        <fctr>   <int>
## 1:     Boomer  724067
## 2:       GenX  684986
## 3: Millennial 1602049
## 4:       GenZ  139729

1. The headline gap to be decomposed

gap_table <- model_dt[, .(
  own_rate = round(weighted.mean(own, w), 4),
  mean_FAI = round(weighted.mean(FAI_pi, w, na.rm = TRUE), 3),
  n        = .N
), by = GENERATION][order(GENERATION)]
gap_table
##    GENERATION own_rate mean_FAI       n
##        <fctr>    <num>    <num>   <int>
## 1:     Boomer   0.5702   -0.119  724067
## 2:       GenX   0.5588   -0.067  684986
## 3: Millennial   0.5210    0.123 1602049
## 4:       GenZ   0.4671    0.246  139729
boomer_rate <- gap_table[GENERATION == "Boomer",  own_rate]
genz_rate   <- gap_table[GENERATION == "GenZ",    own_rate]
cat("\nBoomer 25-35 ownership rate :", boomer_rate, "\n")
## 
## Boomer 25-35 ownership rate : 0.5702
cat("GenZ   25-35 ownership rate :", genz_rate,   "\n")
## GenZ   25-35 ownership rate : 0.4671
cat("Raw gap to decompose        :", round(boomer_rate - genz_rate, 4), "\n")
## Raw gap to decompose        : 0.1031

Of that raw gap, how much is “GenZ has different X” vs. “GenZ gets a different return on the same X”?

2. Oaxaca-Blinder via Linear Probability Model

I subset to the two extreme cohorts (Boomer, GenZ) so the decomposition is between two well-defined groups. GenX and Millennial are dropped here — Section 3 generalizes to all four cohorts.

The covariates decomposed over are the same ones used in m1_pi: FAI_pi, MARST, RACE_cat, HISPAN, EDUC_cat. Factor variables are expanded to dummies so means and coefficients are computed on the same expanded matrix.

ob_dt <- model_dt[GENERATION %in% c("Boomer", "GenZ") & !is.na(FAI_pi)]
ob_dt[, GENERATION := droplevels(GENERATION)]

X_full <- model.matrix(
  ~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat,
  data = ob_dt
)

intercept_col <- which(colnames(X_full) == "(Intercept)")

ob_dt$.row <- seq_len(nrow(ob_dt))
boom_idx <- ob_dt[GENERATION == "Boomer", .row]
genz_idx <- ob_dt[GENERATION == "GenZ",   .row]

X_boom <- X_full[boom_idx, , drop = FALSE]
X_genz <- X_full[genz_idx, , drop = FALSE]
y_boom <- ob_dt$own[boom_idx]
y_genz <- ob_dt$own[genz_idx]
w_boom <- ob_dt$w[boom_idx]
w_genz <- ob_dt$w[genz_idx]

cat("Boomer rows :", length(y_boom), "\n")
## Boomer rows : 724067
cat("GenZ   rows :", length(y_genz), "\n")
## GenZ   rows : 139729
# Weighted LPM, separately for each cohort
lpm_boom <- lm.wfit(X_boom, y_boom, w = w_boom)
lpm_genz <- lm.wfit(X_genz, y_genz, w = w_genz)

beta_boom <- coef(lpm_boom)
beta_genz <- coef(lpm_genz)

# Pooled LPM (Neumark 1988): use pooled β as the reference β*
pool_fit  <- lm.wfit(X_full, ob_dt$own, w = ob_dt$w)
beta_star <- coef(pool_fit)

# Weighted means of X within each group
xbar_boom <- colSums(X_boom * w_boom) / sum(w_boom)
xbar_genz <- colSums(X_genz * w_genz) / sum(w_genz)

# Sanity check: predicted means should equal observed weighted means
pred_boom <- sum(xbar_boom * beta_boom)
pred_genz <- sum(xbar_genz * beta_genz)
ybar_boom <- weighted.mean(y_boom, w_boom)
ybar_genz <- weighted.mean(y_genz, w_genz)

cat("Boomer mean own (data) :", round(ybar_boom, 4), "\n")
## Boomer mean own (data) : 0.5702
cat("Boomer mean own (LPM)  :", round(pred_boom, 4), "\n")
## Boomer mean own (LPM)  : 0.5702
cat("GenZ   mean own (data) :", round(ybar_genz, 4), "\n")
## GenZ   mean own (data) : 0.4671
cat("GenZ   mean own (LPM)  :", round(pred_genz, 4), "\n")
## GenZ   mean own (LPM)  : 0.4671

2a. Twofold decomposition (pooled-β reference)

Ȳ_Boom - Ȳ_GenZ = (X̄_Boom - X̄_GenZ)·β* + [X̄_Boom·(β_Boom - β*) + X̄_GenZ·(β* - β_GenZ)]

The first term is the explained (endowments) component: GenZ would own less than Boomers even if both groups got the same returns, simply because their characteristics are worse. The second term is the unexplained (coefficients) component: even with identical characteristics, GenZ’s mapping from X to ownership differs from Boomers’.

delta_X    <- xbar_boom - xbar_genz
explained  <- sum(delta_X * beta_star)
unexplain  <- sum(xbar_boom * (beta_boom - beta_star)) +
              sum(xbar_genz * (beta_star  - beta_genz))

raw_gap <- ybar_boom - ybar_genz

twofold <- data.table(
  component = c("Raw gap (Boomer - GenZ)",
                "Explained (endowments)",
                "Unexplained (coefficients)",
                "Sum (sanity check)"),
  value = c(raw_gap, explained, unexplain, explained + unexplain),
  share_of_gap = c(1,
                   explained / raw_gap,
                   unexplain / raw_gap,
                   (explained + unexplain) / raw_gap)
)
twofold[, value        := round(value, 4)]
twofold[, share_of_gap := round(share_of_gap, 3)]
twofold
##                     component   value share_of_gap
##                        <char>   <num>        <num>
## 1:    Raw gap (Boomer - GenZ)  0.1031        1.000
## 2:     Explained (endowments)  0.1201        1.165
## 3: Unexplained (coefficients) -0.0170       -0.165
## 4:         Sum (sanity check)  0.1031        1.000

2b. Per-variable contributions

For each covariate column k:

Dummy columns are aggregated back to their factor name for readability.

explained_k  <- delta_X * beta_star
unexplain_k  <- xbar_boom * (beta_boom - beta_star) +
                xbar_genz * (beta_star  - beta_genz)

col_names <- names(explained_k)
group <- fcase(
  col_names == "(Intercept)",          "(Intercept)",
  col_names == "FAI_pi",               "FAI_pi",
  startsWith(col_names, "MARST"),      "MARST",
  startsWith(col_names, "RACE_cat"),   "RACE_cat",
  startsWith(col_names, "HISPAN"),     "HISPAN",
  startsWith(col_names, "EDUC_cat"),   "EDUC_cat"
)

per_var <- data.table(
  variable    = group,
  explained   = explained_k,
  unexplained = unexplain_k
)[, .(explained   = sum(explained),
      unexplained = sum(unexplained)), by = variable]

per_var[, total := explained + unexplained]
per_var[, share_of_gap_explained   := round(explained   / raw_gap, 3)]
per_var[, share_of_gap_unexplained := round(unexplained / raw_gap, 3)]
per_var[, c("explained", "unexplained", "total") :=
          .(round(explained, 4), round(unexplained, 4), round(total, 4))]
per_var[order(-abs(total))]
##       variable explained unexplained   total share_of_gap_explained
##         <char>     <num>       <num>   <num>                  <num>
## 1: (Intercept)    0.0000      0.0801  0.0801                  0.000
## 2:       MARST    0.0623     -0.1128 -0.0505                  0.604
## 3:      FAI_pi    0.0384     -0.0020  0.0364                  0.372
## 4:    EDUC_cat   -0.0057      0.0407  0.0349                 -0.055
## 5:    RACE_cat    0.0065     -0.0024  0.0040                  0.063
## 6:      HISPAN    0.0187     -0.0205 -0.0018                  0.181
##    share_of_gap_unexplained
##                       <num>
## 1:                    0.777
## 2:                   -1.094
## 3:                   -0.020
## 4:                    0.394
## 5:                   -0.024
## 6:                   -0.199

2c. Cluster-robust SEs on the totals

Block bootstrap by year-cluster: resample years with replacement, refit on each replicate, recompute explained/unexplained. The pre-built X_full matrix is reused inside the loop to avoid redundant model.matrix() calls.

set.seed(2026)
years     <- sort(unique(ob_dt$YEAR))
B         <- 200

year_rows <- split(seq_len(nrow(ob_dt)), ob_dt$YEAR)
gen_lab   <- ob_dt$GENERATION
y_vec     <- ob_dt$own
w_vec     <- ob_dt$w

boot_out  <- vector("list", B)

for (b in seq_len(B)) {
  yr_sample <- sample(years, size = length(years), replace = TRUE)
  idx <- unlist(year_rows[as.character(yr_sample)], use.names = FALSE)

  Xb <- X_full[idx, , drop = FALSE]
  yb <- y_vec[idx]; wb <- w_vec[idx]; gb_lab <- gen_lab[idx]

  bidx <- which(gb_lab == "Boomer")
  gidx <- which(gb_lab == "GenZ")
  if (length(bidx) < 50 || length(gidx) < 50) { rm(Xb, yb, wb, gb_lab); next }

  fb <- tryCatch(lm.wfit(Xb[bidx, , drop = FALSE], yb[bidx], w = wb[bidx]),
                 error = function(e) NULL)
  fg <- tryCatch(lm.wfit(Xb[gidx, , drop = FALSE], yb[gidx], w = wb[gidx]),
                 error = function(e) NULL)
  fp <- tryCatch(lm.wfit(Xb, yb, w = wb), error = function(e) NULL)
  if (is.null(fb) || is.null(fg) || is.null(fp)) {
    rm(Xb, yb, wb, gb_lab); next
  }

  bbeta <- coef(fb); gbeta <- coef(fg); sbeta <- coef(fp)
  bbeta[is.na(bbeta)] <- 0; gbeta[is.na(gbeta)] <- 0; sbeta[is.na(sbeta)] <- 0

  xb_b <- colSums(Xb[bidx, , drop = FALSE] * wb[bidx]) / sum(wb[bidx])
  xg_b <- colSums(Xb[gidx, , drop = FALSE] * wb[gidx]) / sum(wb[gidx])

  boot_out[[b]] <- c(
    explained   = sum((xb_b - xg_b) * sbeta),
    unexplained = sum(xb_b * (bbeta - sbeta)) + sum(xg_b * (sbeta - gbeta))
  )

  rm(Xb, yb, wb, gb_lab, bidx, gidx, fb, fg, fp,
     bbeta, gbeta, sbeta, xb_b, xg_b)
  if (b %% 20 == 0) gc(verbose = FALSE)
}

boot_dt <- rbindlist(lapply(boot_out, function(x)
  if (is.null(x)) NULL else as.data.table(as.list(x))))

boot_summary <- data.table(
  n_reps       = nrow(boot_dt),
  se_explained = sd(boot_dt$explained),
  ci_lo_exp    = quantile(boot_dt$explained,   0.025),
  ci_hi_exp    = quantile(boot_dt$explained,   0.975),
  se_unexplain = sd(boot_dt$unexplained),
  ci_lo_unx    = quantile(boot_dt$unexplained, 0.025),
  ci_hi_unx    = quantile(boot_dt$unexplained, 0.975)
)
print(round(boot_summary, 4))
##    n_reps se_explained ci_lo_exp ci_hi_exp se_unexplain ci_lo_unx ci_hi_unx
##     <num>        <num>     <num>     <num>        <num>     <num>     <num>
## 1:    175       0.0152    0.0969    0.1532       0.0168   -0.0497    0.0078

3. Pairwise decomposition across all cohort transitions

Boomer→GenX, GenX→Millennial, Millennial→GenZ. This shows whether the explained/unexplained mix is stable across the chain or whether one transition is structurally distinct from the others.

do_ob <- function(g_old, g_new) {
  d <- model_dt[GENERATION %in% c(g_old, g_new) & !is.na(FAI_pi)]
  d[, GENERATION := droplevels(GENERATION)]
  Xf <- model.matrix(
    ~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat, data = d
  )
  i_o <- which(d$GENERATION == g_old)
  i_n <- which(d$GENERATION == g_new)
  fb  <- lm.wfit(Xf[i_o, , drop = FALSE], d$own[i_o], w = d$w[i_o])
  fg  <- lm.wfit(Xf[i_n, , drop = FALSE], d$own[i_n], w = d$w[i_n])
  fp  <- lm.wfit(Xf,                     d$own,       w = d$w)
  bb <- coef(fb); gb <- coef(fg); sb <- coef(fp)
  bb[is.na(bb)] <- 0; gb[is.na(gb)] <- 0; sb[is.na(sb)] <- 0
  xo <- colSums(Xf[i_o, , drop = FALSE] * d$w[i_o]) / sum(d$w[i_o])
  xn <- colSums(Xf[i_n, , drop = FALSE] * d$w[i_n]) / sum(d$w[i_n])
  raw <- weighted.mean(d$own[i_o], d$w[i_o]) -
         weighted.mean(d$own[i_n], d$w[i_n])
  exp_t <- sum((xo - xn) * sb)
  unx_t <- sum(xo * (bb - sb)) + sum(xn * (sb - gb))
  data.table(transition = paste(g_old, "->", g_new),
             raw_gap = raw, explained = exp_t, unexplained = unx_t,
             share_explained = exp_t / raw, share_unexplained = unx_t / raw)
}

pair_table <- rbind(
  do_ob("Boomer",     "GenX"),
  do_ob("GenX",       "Millennial"),
  do_ob("Millennial", "GenZ"),
  do_ob("Boomer",     "GenZ")
)
pair_table[, c("raw_gap", "explained", "unexplained") :=
             .(round(raw_gap, 4), round(explained, 4), round(unexplained, 4))]
pair_table[, c("share_explained", "share_unexplained") :=
             .(round(share_explained, 3), round(share_unexplained, 3))]
pair_table
##            transition raw_gap explained unexplained share_explained
##                <char>   <num>     <num>       <num>           <num>
## 1:     Boomer -> GenX  0.0114    0.0400     -0.0286           3.517
## 2: GenX -> Millennial  0.0378    0.0409     -0.0031           1.082
## 3: Millennial -> GenZ  0.0539    0.0375      0.0164           0.696
## 4:     Boomer -> GenZ  0.1031    0.1201     -0.0170           1.165
##    share_unexplained
##                <num>
## 1:            -2.517
## 2:            -0.082
## 3:             0.304
## 4:            -0.165

4. Robustness: Fairlie (binary outcome) decomposition

Linear probability is fine for marginal effects in the interior, but it can predict outside [0, 1] and it doesn’t respect the binary nature of the outcome. The Fairlie (1999, 2005) extension simulates counterfactual probabilities from a logit. The headline Boomer-vs-GenZ gap is decomposed here and compared against the LPM result in Section 2a.

ob_dt2 <- model_dt[GENERATION %in% c("Boomer", "GenZ") & !is.na(FAI_pi)]
ob_dt2[, GENERATION := droplevels(GENERATION)]

# Fit logit on Boomers (the "advantaged" group) — Fairlie's convention
logit_boom <- glm(
  own ~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat,
  data = ob_dt2[GENERATION == "Boomer"],
  weights = w, family = binomial(link = "logit")
)

set.seed(2026)
genz_d  <- ob_dt2[GENERATION == "GenZ"]
boom_d  <- ob_dt2[GENERATION == "Boomer"]

# GenZ predicted at Boomer coefficients (used for reference)
p_genz_boomerbeta    <- predict(logit_boom, newdata = genz_d, type = "response")
mean_genz_boomerbeta <- weighted.mean(p_genz_boomerbeta, genz_d$w)

# Fairlie X-swap: for each GenZ obs, draw a random Boomer's characteristics
# and predict under Boomer coefficients. Average over 20 draws for stability.
needed_cols <- c("FAI_pi", "MARST", "RACE_cat", "HISPAN", "EDUC_cat")
boom_X      <- boom_d[, ..needed_cols]
n_draws     <- 20
n_genz      <- nrow(genz_d)
p_swap_X    <- numeric(n_genz)
for (d in seq_len(n_draws)) {
  idx <- sample.int(nrow(boom_X), size = n_genz, replace = TRUE)
  p_swap_X <- p_swap_X +
    predict(logit_boom, newdata = boom_X[idx], type = "response")
  if (d %% 5 == 0) gc(verbose = FALSE)
}
p_swap_X <- p_swap_X / n_draws
mean_genz_boomerX <- weighted.mean(p_swap_X, genz_d$w)
rm(boom_X)

mean_boom <- weighted.mean(boom_d$own, boom_d$w)
mean_genz <- weighted.mean(genz_d$own, genz_d$w)

# Fairlie partition:
#   Explained   = how much GenZ ownership rises when given Boomer X
#   Unexplained = remaining gap after X equalization
fairlie_explained   <- mean_genz_boomerX  - mean_genz
fairlie_unexplained <- mean_boom          - mean_genz_boomerX
fairlie_total       <- mean_boom - mean_genz

cat("Fairlie decomposition (binary, Boomer beta as reference):\n")
## Fairlie decomposition (binary, Boomer beta as reference):
cat("  Total gap (data)        :", round(fairlie_total, 4), "\n")
##   Total gap (data)        : 0.1031
cat("  Explained (X swap)      :", round(fairlie_explained, 4),
    "  share:", round(fairlie_explained / fairlie_total, 3), "\n")
##   Explained (X swap)      : 0.1074   share: 1.041
cat("  Unexplained             :", round(fairlie_unexplained, 4),
    "  share:", round(fairlie_unexplained / fairlie_total, 3), "\n")
##   Unexplained             : -0.0043   share: -0.041
cat("  Sum                     :",
    round(fairlie_explained + fairlie_unexplained, 4), "\n")
##   Sum                     : 0.1031

5. Decomposing the FAI gap itself

The Oaxaca decomposition above shows that endowments account for the ownership gap, with FAI as a primary contributor. This section opens up FAI itself: since FAI_pi is a scaled version of price_to_income_hh = valueh_real / hhinc_real, a worse FAI reflects either higher real home values, lower real household income, or both.

Using the additive property of logarithms, the gap in log(P/I) between Boomer and GenZ households decomposes exactly into a price-growth component and an income component: Δ log(P/I) = Δ log(P) − Δ log(I)

fai_cols <- intersect(c("price_to_income_hh", "valueh_real",
                        "hhinc_real", "VALUEH", "HHINCOME"),
                      names(model_dt))
print(fai_cols)
## [1] "price_to_income_hh" "valueh_real"        "hhinc_real"        
## [4] "VALUEH"             "HHINCOME"
req_cols <- c("price_to_income_hh", "valueh_real", "hhinc_real")
if (all(req_cols %in% names(model_dt))) {
  fai_input <- model_dt[GENERATION %in% c("Boomer", "GenZ") &
                        !is.na(price_to_income_hh) &
                        !is.na(valueh_real) & !is.na(hhinc_real)]
  fai_input[, GENERATION := droplevels(GENERATION)]

  fai_input[, log_pi := log(price_to_income_hh)]
  fai_input[, log_p  := log(valueh_real)]
  fai_input[, log_i  := log(hhinc_real)]

  decomp <- fai_input[, .(
    mean_log_pi = weighted.mean(log_pi, w),
    mean_log_p  = weighted.mean(log_p,  w),
    mean_log_i  = weighted.mean(log_i,  w)
  ), by = GENERATION]
  decomp

  d_log_pi <- decomp[GENERATION == "GenZ",   mean_log_pi] -
              decomp[GENERATION == "Boomer", mean_log_pi]
  d_log_p  <- decomp[GENERATION == "GenZ",   mean_log_p]  -
              decomp[GENERATION == "Boomer", mean_log_p]
  d_log_i  <- decomp[GENERATION == "GenZ",   mean_log_i]  -
              decomp[GENERATION == "Boomer", mean_log_i]

  fai_dec <- data.table(
    component = c("Δ log(P/I)  total",
                  "Δ log(P)    price growth",
                  "Δ log(I)    income (sign flipped)"),
    value     = c(d_log_pi, d_log_p, -d_log_i),
    share     = c(1, d_log_p / d_log_pi, -d_log_i / d_log_pi)
  )
  fai_dec[, value := round(value, 4)]
  fai_dec[, share := round(share, 3)]
  fai_dec
}

Real home values rose substantially between the Boomer and GenZ observation windows, while real household incomes also grew but by less. The price-growth component therefore accounts for more than the total gap in log(P/I), with income growth partially — but not fully — offsetting it.

6. Caveats and limitations

  1. Year clustering. With only 8 survey years, the cluster-bootstrap CIs are based on a small number of clusters. Results should be interpreted accordingly.

  2. Composition of cohorts in the sample. Boomers aged 25–35 are observed primarily in 1980 and 1990; GenZ aged 25–35 is observed in 2024 only. The decomposition compares 25–35 year-olds across calendar time, which is the design intent, but cohort and macroeconomic conditions at prime buying age are necessarily collinear.

  3. Selection into the sample. GenZ at 25–35 has an unusually high share still living with parents and not appearing as their own household unit. If those non-householders are systematically lower-income, the decomposition understates the true FAI gap for GenZ.