At ages 25–35, homeownership fell substantially from the Boomer cohort to the GenZ cohort. The race × cohort extension (RaceCohort_2026-04-26.Rmd) showed that the gap widened most for non-Black households (because Black ownership was already chronically low under Boomers).

This analysis asks a different question: how much of the cohort gap is “endowments” — GenZ has worse FAI, lower real income, less marriage, etc. — and how much is “coefficients” — the same FAI translates to a worse ownership probability for GenZ than it did for Boomers?

The standard tool is the Oaxaca–Blinder decomposition. Because the outcome is binary (own/rent), I use the Linear Probability Model (LPM) version, which is the convention in labor economics and which gives marginal effects directly comparable across groups. A Fairlie (non-linear) check is in Section 4 as robustness.

Section 5 does a complementary decomposition: instead of decomposing the ownership gap, I decompose the FAI gap itself into a “price growth” component and an “income stagnation” component.

0. Setup

library(data.table)
library(dplyr)
library(ggplot2)
library(lmtest)
library(sandwich)
library(broom)

in_dir   <- "~/Desktop/DA6833/FinancialNihilism"
model_dt <- readRDS(file.path(in_dir, "model_dt.rds"))
setDT(model_dt)

stopifnot(levels(model_dt$GENERATION) ==
          c("Boomer", "GenX", "Millennial", "GenZ"))

cat("Rows loaded :", nrow(model_dt), "\n")

## Rows loaded : 3150831

cat("Cohort sizes:\n"); print(model_dt[, .N, by = GENERATION])

## Cohort sizes:

##    GENERATION       N
##        <fctr>   <int>
## 1:     Boomer  724067
## 2:       GenX  684986
## 3: Millennial 1602049
## 4:       GenZ  139729

1. The headline gap to be decomposed

gap_table <- model_dt[, .(
  own_rate = round(weighted.mean(own, w), 4),
  mean_FAI = round(weighted.mean(FAI_pi, w, na.rm = TRUE), 3),
  n        = .N
), by = GENERATION][order(GENERATION)]
gap_table

##    GENERATION own_rate mean_FAI       n
##        <fctr>    <num>    <num>   <int>
## 1:     Boomer   0.5702   -0.119  724067
## 2:       GenX   0.5588   -0.067  684986
## 3: Millennial   0.5210    0.123 1602049
## 4:       GenZ   0.4671    0.246  139729

boomer_rate <- gap_table[GENERATION == "Boomer",  own_rate]
genz_rate   <- gap_table[GENERATION == "GenZ",    own_rate]
cat("\nBoomer 25-35 ownership rate :", boomer_rate, "\n")

## 
## Boomer 25-35 ownership rate : 0.5702

cat("GenZ   25-35 ownership rate :", genz_rate,   "\n")

## GenZ   25-35 ownership rate : 0.4671

cat("Raw gap to decompose        :", round(boomer_rate - genz_rate, 4), "\n")

## Raw gap to decompose        : 0.1031

Of that raw gap, how much is “GenZ has different X” vs. “GenZ gets a different return on the same X”?

2. Oaxaca-Blinder via Linear Probability Model

I subset to the two extreme cohorts (Boomer, GenZ) so the decomposition is between two well-defined groups. GenX and Millennial are dropped here — Section 3 generalizes to all four cohorts.

The covariates decomposed over are the same ones used in m1_pi: FAI_pi, MARST, RACE_cat, HISPAN, EDUC_cat. Factor variables are expanded to dummies so means and coefficients are computed on the same expanded matrix.

ob_dt <- model_dt[GENERATION %in% c("Boomer", "GenZ") & !is.na(FAI_pi)]
ob_dt[, GENERATION := droplevels(GENERATION)]

X_full <- model.matrix(
  ~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat,
  data = ob_dt
)

intercept_col <- which(colnames(X_full) == "(Intercept)")

ob_dt$.row <- seq_len(nrow(ob_dt))
boom_idx <- ob_dt[GENERATION == "Boomer", .row]
genz_idx <- ob_dt[GENERATION == "GenZ",   .row]

X_boom <- X_full[boom_idx, , drop = FALSE]
X_genz <- X_full[genz_idx, , drop = FALSE]
y_boom <- ob_dt$own[boom_idx]
y_genz <- ob_dt$own[genz_idx]
w_boom <- ob_dt$w[boom_idx]
w_genz <- ob_dt$w[genz_idx]

cat("Boomer rows :", length(y_boom), "\n")

## Boomer rows : 724067

cat("GenZ   rows :", length(y_genz), "\n")

## GenZ   rows : 139729

# Weighted LPM, separately for each cohort
lpm_boom <- lm.wfit(X_boom, y_boom, w = w_boom)
lpm_genz <- lm.wfit(X_genz, y_genz, w = w_genz)

beta_boom <- coef(lpm_boom)
beta_genz <- coef(lpm_genz)

# Pooled LPM (Neumark 1988): use pooled β as the reference β*
pool_fit  <- lm.wfit(X_full, ob_dt$own, w = ob_dt$w)
beta_star <- coef(pool_fit)

# Weighted means of X within each group
xbar_boom <- colSums(X_boom * w_boom) / sum(w_boom)
xbar_genz <- colSums(X_genz * w_genz) / sum(w_genz)

# Sanity check: predicted means should equal observed weighted means
pred_boom <- sum(xbar_boom * beta_boom)
pred_genz <- sum(xbar_genz * beta_genz)
ybar_boom <- weighted.mean(y_boom, w_boom)
ybar_genz <- weighted.mean(y_genz, w_genz)

cat("Boomer mean own (data) :", round(ybar_boom, 4), "\n")

## Boomer mean own (data) : 0.5702

cat("Boomer mean own (LPM)  :", round(pred_boom, 4), "\n")

## Boomer mean own (LPM)  : 0.5702

cat("GenZ   mean own (data) :", round(ybar_genz, 4), "\n")

## GenZ   mean own (data) : 0.4671

cat("GenZ   mean own (LPM)  :", round(pred_genz, 4), "\n")

## GenZ   mean own (LPM)  : 0.4671

2a. Twofold decomposition (pooled-β reference)

Ȳ_Boom - Ȳ_GenZ = (X̄_Boom - X̄_GenZ)·β* + [X̄_Boom·(β_Boom - β*) + X̄_GenZ·(β* - β_GenZ)]

The first term is the explained (endowments) component: GenZ would own less than Boomers even if both groups got the same returns, simply because their characteristics are worse. The second term is the unexplained (coefficients) component: even with identical characteristics, GenZ’s mapping from X to ownership differs from Boomers’.

delta_X    <- xbar_boom - xbar_genz
explained  <- sum(delta_X * beta_star)
unexplain  <- sum(xbar_boom * (beta_boom - beta_star)) +
              sum(xbar_genz * (beta_star  - beta_genz))

raw_gap <- ybar_boom - ybar_genz

twofold <- data.table(
  component = c("Raw gap (Boomer - GenZ)",
                "Explained (endowments)",
                "Unexplained (coefficients)",
                "Sum (sanity check)"),
  value = c(raw_gap, explained, unexplain, explained + unexplain),
  share_of_gap = c(1,
                   explained / raw_gap,
                   unexplain / raw_gap,
                   (explained + unexplain) / raw_gap)
)
twofold[, value        := round(value, 4)]
twofold[, share_of_gap := round(share_of_gap, 3)]
twofold

##                     component   value share_of_gap
##                        <char>   <num>        <num>
## 1:    Raw gap (Boomer - GenZ)  0.1031        1.000
## 2:     Explained (endowments)  0.1201        1.165
## 3: Unexplained (coefficients) -0.0170       -0.165
## 4:         Sum (sanity check)  0.1031        1.000

2b. Per-variable contributions

For each covariate column k:

explained_k = (X̄_Boom,k - X̄_GenZ,k) · β*_k
unexplained_k = X̄_Boom,k · (β_Boom,k - β*_k) + X̄_GenZ,k · (β*_k - β_GenZ,k)

Dummy columns are aggregated back to their factor name for readability.

explained_k  <- delta_X * beta_star
unexplain_k  <- xbar_boom * (beta_boom - beta_star) +
                xbar_genz * (beta_star  - beta_genz)

col_names <- names(explained_k)
group <- fcase(
  col_names == "(Intercept)",          "(Intercept)",
  col_names == "FAI_pi",               "FAI_pi",
  startsWith(col_names, "MARST"),      "MARST",
  startsWith(col_names, "RACE_cat"),   "RACE_cat",
  startsWith(col_names, "HISPAN"),     "HISPAN",
  startsWith(col_names, "EDUC_cat"),   "EDUC_cat"
)

per_var <- data.table(
  variable    = group,
  explained   = explained_k,
  unexplained = unexplain_k
)[, .(explained   = sum(explained),
      unexplained = sum(unexplained)), by = variable]

per_var[, total := explained + unexplained]
per_var[, share_of_gap_explained   := round(explained   / raw_gap, 3)]
per_var[, share_of_gap_unexplained := round(unexplained / raw_gap, 3)]
per_var[, c("explained", "unexplained", "total") :=
          .(round(explained, 4), round(unexplained, 4), round(total, 4))]
per_var[order(-abs(total))]

##       variable explained unexplained   total share_of_gap_explained
##         <char>     <num>       <num>   <num>                  <num>
## 1: (Intercept)    0.0000      0.0801  0.0801                  0.000
## 2:       MARST    0.0623     -0.1128 -0.0505                  0.604
## 3:      FAI_pi    0.0384     -0.0020  0.0364                  0.372
## 4:    EDUC_cat   -0.0057      0.0407  0.0349                 -0.055
## 5:    RACE_cat    0.0065     -0.0024  0.0040                  0.063
## 6:      HISPAN    0.0187     -0.0205 -0.0018                  0.181
##    share_of_gap_unexplained
##                       <num>
## 1:                    0.777
## 2:                   -1.094
## 3:                   -0.020
## 4:                    0.394
## 5:                   -0.024
## 6:                   -0.199

2c. Cluster-robust SEs on the totals

Block bootstrap by year-cluster: resample years with replacement, refit on each replicate, recompute explained/unexplained. The pre-built X_full matrix is reused inside the loop to avoid redundant model.matrix() calls.

set.seed(2026)
years     <- sort(unique(ob_dt$YEAR))
B         <- 200

year_rows <- split(seq_len(nrow(ob_dt)), ob_dt$YEAR)
gen_lab   <- ob_dt$GENERATION
y_vec     <- ob_dt$own
w_vec     <- ob_dt$w

boot_out  <- vector("list", B)

for (b in seq_len(B)) {
  yr_sample <- sample(years, size = length(years), replace = TRUE)
  idx <- unlist(year_rows[as.character(yr_sample)], use.names = FALSE)

  Xb <- X_full[idx, , drop = FALSE]
  yb <- y_vec[idx]; wb <- w_vec[idx]; gb_lab <- gen_lab[idx]

  bidx <- which(gb_lab == "Boomer")
  gidx <- which(gb_lab == "GenZ")
  if (length(bidx) < 50 || length(gidx) < 50) { rm(Xb, yb, wb, gb_lab); next }

  fb <- tryCatch(lm.wfit(Xb[bidx, , drop = FALSE], yb[bidx], w = wb[bidx]),
                 error = function(e) NULL)
  fg <- tryCatch(lm.wfit(Xb[gidx, , drop = FALSE], yb[gidx], w = wb[gidx]),
                 error = function(e) NULL)
  fp <- tryCatch(lm.wfit(Xb, yb, w = wb), error = function(e) NULL)
  if (is.null(fb) || is.null(fg) || is.null(fp)) {
    rm(Xb, yb, wb, gb_lab); next
  }

  bbeta <- coef(fb); gbeta <- coef(fg); sbeta <- coef(fp)
  bbeta[is.na(bbeta)] <- 0; gbeta[is.na(gbeta)] <- 0; sbeta[is.na(sbeta)] <- 0

  xb_b <- colSums(Xb[bidx, , drop = FALSE] * wb[bidx]) / sum(wb[bidx])
  xg_b <- colSums(Xb[gidx, , drop = FALSE] * wb[gidx]) / sum(wb[gidx])

  boot_out[[b]] <- c(
    explained   = sum((xb_b - xg_b) * sbeta),
    unexplained = sum(xb_b * (bbeta - sbeta)) + sum(xg_b * (sbeta - gbeta))
  )

  rm(Xb, yb, wb, gb_lab, bidx, gidx, fb, fg, fp,
     bbeta, gbeta, sbeta, xb_b, xg_b)
  if (b %% 20 == 0) gc(verbose = FALSE)
}

boot_dt <- rbindlist(lapply(boot_out, function(x)
  if (is.null(x)) NULL else as.data.table(as.list(x))))

boot_summary <- data.table(
  n_reps       = nrow(boot_dt),
  se_explained = sd(boot_dt$explained),
  ci_lo_exp    = quantile(boot_dt$explained,   0.025),
  ci_hi_exp    = quantile(boot_dt$explained,   0.975),
  se_unexplain = sd(boot_dt$unexplained),
  ci_lo_unx    = quantile(boot_dt$unexplained, 0.025),
  ci_hi_unx    = quantile(boot_dt$unexplained, 0.975)
)
print(round(boot_summary, 4))

##    n_reps se_explained ci_lo_exp ci_hi_exp se_unexplain ci_lo_unx ci_hi_unx
##     <num>        <num>     <num>     <num>        <num>     <num>     <num>
## 1:    175       0.0152    0.0969    0.1532       0.0168   -0.0497    0.0078

3. Pairwise decomposition across all cohort transitions

Boomer→GenX, GenX→Millennial, Millennial→GenZ. This shows whether the explained/unexplained mix is stable across the chain or whether one transition is structurally distinct from the others.

do_ob <- function(g_old, g_new) {
  d <- model_dt[GENERATION %in% c(g_old, g_new) & !is.na(FAI_pi)]
  d[, GENERATION := droplevels(GENERATION)]
  Xf <- model.matrix(
    ~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat, data = d
  )
  i_o <- which(d$GENERATION == g_old)
  i_n <- which(d$GENERATION == g_new)
  fb  <- lm.wfit(Xf[i_o, , drop = FALSE], d$own[i_o], w = d$w[i_o])
  fg  <- lm.wfit(Xf[i_n, , drop = FALSE], d$own[i_n], w = d$w[i_n])
  fp  <- lm.wfit(Xf,                     d$own,       w = d$w)
  bb <- coef(fb); gb <- coef(fg); sb <- coef(fp)
  bb[is.na(bb)] <- 0; gb[is.na(gb)] <- 0; sb[is.na(sb)] <- 0
  xo <- colSums(Xf[i_o, , drop = FALSE] * d$w[i_o]) / sum(d$w[i_o])
  xn <- colSums(Xf[i_n, , drop = FALSE] * d$w[i_n]) / sum(d$w[i_n])
  raw <- weighted.mean(d$own[i_o], d$w[i_o]) -
         weighted.mean(d$own[i_n], d$w[i_n])
  exp_t <- sum((xo - xn) * sb)
  unx_t <- sum(xo * (bb - sb)) + sum(xn * (sb - gb))
  data.table(transition = paste(g_old, "->", g_new),
             raw_gap = raw, explained = exp_t, unexplained = unx_t,
             share_explained = exp_t / raw, share_unexplained = unx_t / raw)
}

pair_table <- rbind(
  do_ob("Boomer",     "GenX"),
  do_ob("GenX",       "Millennial"),
  do_ob("Millennial", "GenZ"),
  do_ob("Boomer",     "GenZ")
)
pair_table[, c("raw_gap", "explained", "unexplained") :=
             .(round(raw_gap, 4), round(explained, 4), round(unexplained, 4))]
pair_table[, c("share_explained", "share_unexplained") :=
             .(round(share_explained, 3), round(share_unexplained, 3))]
pair_table

##            transition raw_gap explained unexplained share_explained
##                <char>   <num>     <num>       <num>           <num>
## 1:     Boomer -> GenX  0.0114    0.0400     -0.0286           3.517
## 2: GenX -> Millennial  0.0378    0.0409     -0.0031           1.082
## 3: Millennial -> GenZ  0.0539    0.0375      0.0164           0.696
## 4:     Boomer -> GenZ  0.1031    0.1201     -0.0170           1.165
##    share_unexplained
##                <num>
## 1:            -2.517
## 2:            -0.082
## 3:             0.304
## 4:            -0.165

4. Robustness: Fairlie (binary outcome) decomposition

Linear probability is fine for marginal effects in the interior, but it can predict outside [0, 1] and it doesn’t respect the binary nature of the outcome. The Fairlie (1999, 2005) extension simulates counterfactual probabilities from a logit. The headline Boomer-vs-GenZ gap is decomposed here and compared against the LPM result in Section 2a.

ob_dt2 <- model_dt[GENERATION %in% c("Boomer", "GenZ") & !is.na(FAI_pi)]
ob_dt2[, GENERATION := droplevels(GENERATION)]

# Fit logit on Boomers (the "advantaged" group) — Fairlie's convention
logit_boom <- glm(
  own ~ FAI_pi + MARST + RACE_cat + HISPAN + EDUC_cat,
  data = ob_dt2[GENERATION == "Boomer"],
  weights = w, family = binomial(link = "logit")
)

set.seed(2026)
genz_d  <- ob_dt2[GENERATION == "GenZ"]
boom_d  <- ob_dt2[GENERATION == "Boomer"]

# GenZ predicted at Boomer coefficients (used for reference)
p_genz_boomerbeta    <- predict(logit_boom, newdata = genz_d, type = "response")
mean_genz_boomerbeta <- weighted.mean(p_genz_boomerbeta, genz_d$w)

# Fairlie X-swap: for each GenZ obs, draw a random Boomer's characteristics
# and predict under Boomer coefficients. Average over 20 draws for stability.
needed_cols <- c("FAI_pi", "MARST", "RACE_cat", "HISPAN", "EDUC_cat")
boom_X      <- boom_d[, ..needed_cols]
n_draws     <- 20
n_genz      <- nrow(genz_d)
p_swap_X    <- numeric(n_genz)
for (d in seq_len(n_draws)) {
  idx <- sample.int(nrow(boom_X), size = n_genz, replace = TRUE)
  p_swap_X <- p_swap_X +
    predict(logit_boom, newdata = boom_X[idx], type = "response")
  if (d %% 5 == 0) gc(verbose = FALSE)
}
p_swap_X <- p_swap_X / n_draws
mean_genz_boomerX <- weighted.mean(p_swap_X, genz_d$w)
rm(boom_X)

mean_boom <- weighted.mean(boom_d$own, boom_d$w)
mean_genz <- weighted.mean(genz_d$own, genz_d$w)

# Fairlie partition:
#   Explained   = how much GenZ ownership rises when given Boomer X
#   Unexplained = remaining gap after X equalization
fairlie_explained   <- mean_genz_boomerX  - mean_genz
fairlie_unexplained <- mean_boom          - mean_genz_boomerX
fairlie_total       <- mean_boom - mean_genz

cat("Fairlie decomposition (binary, Boomer beta as reference):\n")

## Fairlie decomposition (binary, Boomer beta as reference):

cat("  Total gap (data)        :", round(fairlie_total, 4), "\n")

##   Total gap (data)        : 0.1031

cat("  Explained (X swap)      :", round(fairlie_explained, 4),
    "  share:", round(fairlie_explained / fairlie_total, 3), "\n")

##   Explained (X swap)      : 0.1074   share: 1.041

cat("  Unexplained             :", round(fairlie_unexplained, 4),
    "  share:", round(fairlie_unexplained / fairlie_total, 3), "\n")

##   Unexplained             : -0.0043   share: -0.041

cat("  Sum                     :",
    round(fairlie_explained + fairlie_unexplained, 4), "\n")

##   Sum                     : 0.1031

5. Decomposing the FAI gap itself

The Oaxaca decomposition above shows that endowments account for the ownership gap, with FAI as a primary contributor. This section opens up FAI itself: since FAI_pi is a scaled version of price_to_income_hh = valueh_real / hhinc_real, a worse FAI reflects either higher real home values, lower real household income, or both.

Using the additive property of logarithms, the gap in log(P/I) between Boomer and GenZ households decomposes exactly into a price-growth component and an income component: Δ log(P/I) = Δ log(P) − Δ log(I)

fai_cols <- intersect(c("price_to_income_hh", "valueh_real",
                        "hhinc_real", "VALUEH", "HHINCOME"),
                      names(model_dt))
print(fai_cols)

## [1] "price_to_income_hh" "valueh_real"        "hhinc_real"        
## [4] "VALUEH"             "HHINCOME"

req_cols <- c("price_to_income_hh", "valueh_real", "hhinc_real")
if (all(req_cols %in% names(model_dt))) {
  fai_input <- model_dt[GENERATION %in% c("Boomer", "GenZ") &
                        !is.na(price_to_income_hh) &
                        !is.na(valueh_real) & !is.na(hhinc_real)]
  fai_input[, GENERATION := droplevels(GENERATION)]

  fai_input[, log_pi := log(price_to_income_hh)]
  fai_input[, log_p  := log(valueh_real)]
  fai_input[, log_i  := log(hhinc_real)]

  decomp <- fai_input[, .(
    mean_log_pi = weighted.mean(log_pi, w),
    mean_log_p  = weighted.mean(log_p,  w),
    mean_log_i  = weighted.mean(log_i,  w)
  ), by = GENERATION]
  decomp

  d_log_pi <- decomp[GENERATION == "GenZ",   mean_log_pi] -
              decomp[GENERATION == "Boomer", mean_log_pi]
  d_log_p  <- decomp[GENERATION == "GenZ",   mean_log_p]  -
              decomp[GENERATION == "Boomer", mean_log_p]
  d_log_i  <- decomp[GENERATION == "GenZ",   mean_log_i]  -
              decomp[GENERATION == "Boomer", mean_log_i]

  fai_dec <- data.table(
    component = c("Δ log(P/I)  total",
                  "Δ log(P)    price growth",
                  "Δ log(I)    income (sign flipped)"),
    value     = c(d_log_pi, d_log_p, -d_log_i),
    share     = c(1, d_log_p / d_log_pi, -d_log_i / d_log_pi)
  )
  fai_dec[, value := round(value, 4)]
  fai_dec[, share := round(share, 3)]
  fai_dec
}

Real home values rose substantially between the Boomer and GenZ observation windows, while real household incomes also grew but by less. The price-growth component therefore accounts for more than the total gap in log(P/I), with income growth partially — but not fully — offsetting it.

Structural Drivers of Financial Nihilism — Oaxaca-Blinder Decomposition

Why did the Boomer–GenZ ownership gap (ages 25–35) widen?

Olivia Shipley

2026-04-26