Time is discrete, \(t=0,1,\dots,T\). Each individual \(i\) has gender \(g\in\{m,f\}\) and education \(e\). There are 3 employment sectors, \[ \mathcal S \equiv \{\text{Subsistence }(SUB),\ \text{Private }(PRI),\ \text{Public }(PUB)\}, \] and a home alternative \(H\) (non-employment).

The state at time \(t\) for individual \(i\) is: \[ \Xi_{it}=\Big(g,e,\ r_{it},\ \mathbf{\varepsilon}_i,\ F_{it},\ M_{it}\Big), \] where:

Assume \(F_{it}\) evolves exogenously via a Markov process \(Q(F_{t+1}\mid F_t)\) that is excluded from wages and excluded from offer arrival (it only affects utility).

Wages (Fixed Conditional on Permanent Skills)

If \(i\) works in sector \(s\in\mathcal S\) at \(t\), the (log) wage is: \[ \boxed{ \ln W_{it}^g(s) = \beta_{0,g}(s)\;+\;\beta_1\,\text{educ}_i\;\;+\;\varepsilon_{is}. } \] - Excluding work experience for now.

Distribution of permanent skills. For tractability and mover-based identification, assume a multivariate Normal by gender: \[ \mathbf{\varepsilon}_i \mid g \sim \mathcal N\big(\mathbf{0},\ \Sigma_g\big),\qquad \Sigma_g= \begin{pmatrix} \sigma^2_{1,g} & \rho_{12,g} \,\, \sigma_{1,g}\,\, \sigma_{2,g} & \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} \\ \rho_{12,g} \,\, \sigma_{1,g} \,\, \sigma_{2,g} & \sigma^2_{2,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g}\\ \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g} & \sigma^2_{3,g} \end{pmatrix}. \]

(Alternative: independent Fréchet components; results below do not require closed-form shares.)

For \(\Sigma_g\) to be a positive semidefinite (ie a valid covariance matrix), the corr matrix must also satisfy \[ 1 + 2 \, \rho_{12,g}\, \rho_{13,g}\, \rho_{23,g} - \, \rho_{12,g}^2 - \, \rho_{13,g}^2 - \, \rho_{23,g}^2 \] So for various pars values, make sure you check this condition.

Offers: One Sector per Period, Home Always Available

Each period, at most one employment offer arrives. Let \(O_{it}\in\{S,G,P,\varnothing\}\) denote the sector offered at \(t\) (or \(\varnothing\) if no offer). Define arrival probabilities conditional on observables \(X_{it}\) (age, past sector, etc.):

\[ \Pr\big(O_{it}=s\mid X_{it},g\big)=\lambda^g_s(X_{it}),\quad s\in\{S,G,P\},\qquad \Pr(O_{it}=\varnothing\mid X_{it},g)=1-\sum_{s}\lambda^g_s(X_{it}). \] Key friction: If \(r_{it}=s\) (currently attached to \(s\)), the agent can continue in sector \(s\) only if \(O_{it}=s\). Otherwise, they can’t remain employed in \(s\) this period.

Home \(H\) is always available (no offer needed).

In other words, agents compare at most one employment sector vs home choice each period.

Preferences

Let per-period utility be quasi-separable in (log) wages and non-pecuniary sector tastes shifted by fertility:

\[ u_{it}(s) = \beta_u\,\ln W_{it}^g(s) \;+\; \gamma_s\,F_{it},\qquad s\in\{SUB,PRI,PUB\}, \] with a home utility \(u_{it}(H)= b_0 \;+\; \gamma_H\,F_{it},\qquad b_0\in\mathbb R.\)

Dynamic Program

Timing Within Period: At the start of period \(t\) given state \(\Xi_{it}\):

  1. Offer draw: \(O_{it}\in\{S,G,P,\varnothing\}\) is realized from \(\{\lambda^g_s(X_{it})\}\).

  2. Choice set: \(\mathcal C_{it}=\{H\}\cup\{O_{it}\}\,\) (home plus the one offered sector if any).

  3. Decision: choose \(d_{it}\in\mathcal C_{it}\) to maximize current utility plus continuation.

  4. Experience update given choice; then \(F_{it}\) evolves to \(F_{it+1}\sim Q(\cdot\mid F_{it})\).

Let \(V_t(\Xi)\) be the value function. Define the sector-specific continuation value if offered \(s\):

\[ \mathcal V_t(s;\Xi) \equiv u_t(s) \;+\; \beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=s\big], \]

and the home continuation value: \[ \mathcal V_t(H;\Xi) \equiv u_t(H)\;+\;\beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=H\big], \] with discount factor \(\beta\in(0,1)\).

Given the one-offer constraint, \[ V_t(\Xi) \;=\; \underbrace{\Pr(O_t=\varnothing\mid X,g)}_{\text{no offer}} \cdot \max\{\mathcal V_t(H;\Xi)\} \;+\; \sum_{s\in\{S,G,P\}} \underbrace{\Pr(O_t=s\mid X,g)}_{\lambda^g_s(X)} \cdot \max\{\mathcal V_t(H;\Xi),\ \mathcal V_t(s;\Xi)\}. \] The law of motion inside the expectations updates experience per choice and draws \(F_{t+1}\) exogenously.

stayers vs. movers:

If \(r_t=s\) and \(O_t=s\), the agent may stay in \(s\) by comparing \(\mathcal V_t(s;\Xi)\) to \(\mathcal V_t(H;\Xi)\).

If \(O_t\ne s\) (or \(O_t=\varnothing\)), staying in \(s\) is infeasible; the agent compares only \(H\) and the offered \(O_t\) (if any).

Movers are observed when \(O_t=s'\ne r_t\) and \(\mathcal V_t(s';\Xi)\ge \mathcal V_t(H;\Xi)\).

Normalizations and Interpretability

To interpret gender wedges as skill price differences:

Observables and Moments: Given panel data on \(\{s_{it},\ \ln W_{it}\ \text{if } s_{it}\in\mathcal S,\ F_{it},\ e,\ \mathbf{x}_{it}\}\).

Define moments (by gender and also by education/experience if applicable):

Identification (Heuristic)

a. Education and Experience: \((\beta_1,\beta_2)\)

  • Within-sector cross-sectional variation in education identifies \(\beta_1\).

  • Within-person wage growth for stayers in \(s\) identifies \(\beta_2\):

\[ \Delta \ln W_{it}^g(s) \;=\; \beta_2\cdot \Delta x_{it}(s),\quad \text{since }\varepsilon_{is}\ \text{is permanent}. \]

b. Sector Skill Prices by Gender: \(\beta_{0,g}(s)\) (relative)

Using movers and the one-offer structure:

\[ \mathbb E\big[\ln W_{i}(s') - \ln W_{i}(s)\mid r_t=s,\ s_{t+1}=s',\ g\big] = \underbrace{\beta_{0,g}(s')-\beta_{0,g}(s)}_{\text{price difference}} + \beta_2\,\Delta x_i + \underbrace{\mathbb E[\varepsilon_{is'}-\varepsilon_{is}\mid \text{move } s\!\to\! s']}_{\text{selection term}}, \]

where the selection term arises because movers are selected on \(\mathbf{\varepsilon}\) given the acceptance rule and offers.

Given \(\Sigma_g\) (below) and the offer process (below), the selection term is pinned down in the simulated model; the residual identifies \(\beta_{0,g}(s')-\beta_{0,g}(s)\) up to the base-sector normalization.

Gender wedges. With centered \(\varepsilon\), the cross-gender difference \(\beta_{0,f}(s)-\beta_{0,m}(s)\) is interpretable as discrimination (skill price) in sector \(s\).

c. Permanent Skill Distribution: \(\Sigma_g\)

  • Within-sector residual dispersion of \(\ln W\) (net of \(\beta_1,\beta_2\)) identifies variances \(\sigma^2_{gs}\).

  • Mover patterns (who moves where; gains upon moving) identify covariances \(\rho_{g,ss}\): strong positive correlation implies individuals who are strong in \(s\) are also strong in \(s'\) (affecting acceptance when offered \(s'\)).

  • \(\Sigma_g\) cannot mimic within-person, event-timed changes tied to fertility (see below), which helps separate it from \(\gamma\).

d. Fertility Taste Shifters: \(\gamma_s\) (Exclusion)

Fertility \(F_{it}\) enters utility only (not wages/offers). Around fertility events:

  • Shares and transitions shift due to acceptance, not arrival:

\[ \frac{\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k+1)} {\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k)} \approx \frac{\text{acceptance}(s'|F=k+1)}{\text{acceptance}(s'|F=k)}. \]

  • These within-person changes identify \(\gamma_s\) because \(\Sigma_g\) is permanent and \(\lambda^g_s(\cdot)\) is excluded from \(F\).

If public-sector transitions and shares jump for women at childbirth while wages/offer proxies do not, then \(\gamma_G>0\) (family-friendly public sector) is identified—separate from \(\beta_{0,g}(s)\) and \(\Sigma_g\).

e. Offer Process: \(\lambda^g_s(X)\)

Given \(\gamma\) (acceptance shifts at fertility) and wage-side parameters, the levels of transitions identify the arrival probabilities:

\[ \Pr(s_{t+1}=s'\mid s_t=s) = \underbrace{\lambda^g_{s'}(X_{t+1})}_{\text{arrival}} \times \underbrace{\Pr(\text{accept } s'\mid O_{t+1}=s',\Xi_{t+1};\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g)}_{\text{acceptance}}. \]

Fertility-timed contrasts net out acceptance shifts, isolating \(\lambda^g_{s'}(\cdot)\) from the remaining variation in transitions.

f. Prices and Endowments

  • With appropriate normalization e.g. \(\mathbb E[\varepsilon_{is}\mid g]=0\), mean gender differences within a sector load on \(\beta_{0,f}(s)-\beta_{0,m}(s)\) (instead of \(\varepsilon\)).

  • Relative female advantage in public (vs private) can arise from either

    • higher \(\beta_{0,f}(PUB)-\beta_{0,m}(PUB)\) (skill price advantages), or

    • higher \(\sigma_{3,f}\) or favorable \(\rho_{23,f}\) (women’s endowment more aligned with \(PUB\) ie comparative advantages).

  • Will use mover wage gaps and fertility-timed acceptance shifts to separate these channels: \(\beta_0\) affects wage levels and mover gaps; \(\gamma\) affects acceptance around fertility; \(\Sigma_g\) affects dispersion and who moves, not event-timed shifts.

Estimation Method (SMM/SML Skeleton)

  1. Parameter blocks: \(\Theta=\{\beta_u,\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g,\lambda\}\) with normalizations (\(\beta_{0,m}(PRI)=0\), centered \(\varepsilon\)).

  2. Simulate offer draws \(O_{it}\) from \(\lambda^g_s(X_{it})\) over observed \(X_{it}\) paths.

  3. Solve choices period-by-period (myopic or with continuation via forward simulation) given \(F_{it}\), wages, and one-offer constraint.

  4. Match moments: sector shares, transitions (overall and by fertility states), within-sector wage means/variances, and mover wage gaps.

  5. Over-ID checks: out-of-sample moments (e.g., wage tails, acceptance at non-fertility events).

Possible counterfactuals to run using the estimated model:

Discussion (model mechanisms, identification, etc.)

This dynamic Roy-with-offers model hardwires the “one-offer-per-period” friction, forcing agents to compare one employment sector against home each period (home always feasible; staying in the same job requires a fresh offer). Wages depend only on permanent sector-specific skills and observables; there are no transitory wage shocks. Fertility is an observed, exogenous shifter of non-pecuniary sector utility only.

Identification separates:

The combination of these aspects of the data allows to separate the role of skill prices, skill comparative advantage and offer probabilities.

For example: The avg log wage difference for women who move from public to private is around -0.46 and -0.09 for women and men, respectively. Can use this information in conjunction with other aspects of the data. In order to distinguish the roles of skill price gaps vs. comparative advange/endowmentstructure, we use many data oments at once, nost jst te mean waegs ormean log wage differences for stayers and movers. In the state-depdendent offer model specified above, each block pushes differnet patterns in the data:

  1. Variance and shape of mover gains, not just the mean.

    • Price differences shift the mean of \(\Delta w\) almost uniformly, the variance and skew of \(\Delta w \mid s \rightarrow s',g\) are driven by the distribution of \(\varepsilon_{is'}- \varepsilon_{is}\) under selection.

    • Matching \(Var(\Delta w)\) by origin-destination-gender pins down \(\sigma_{gs}, \sigma_{gs'}\) and especially \(\rho_{g,ss'}\). A big mean gap with small variance points to \(\Delta \beta_{0,g}\); a big variance/left tail points to comparative advantage sorting.

  2. Rank-rank (or quantile-quantile) patterns for movers. - Regress post-move residual-wage rank in s’ on pre-move residual rank in s for movers of gender g. The slope identifies \(\rho_{g,ss'}\).

    • If prices drive the gap, the rank mapping is near 1 (high earners in PUB remain high in PRI).
    • If comparative advantage drives it, the slope flattens (or even inverts in tails): high PUB earners are not high in PRI.
  3. Stayer-leaver contrasts within origin sectors.

    • Compare residual wages (after removing \(\beta_{0,g}(s) + \beta_1 educ\) of stayers vs leavers from \(s\), by gender.

    • Strong negative selection of leavers (leavers much lower residuals in \(s\) than stayers) is evidence of comparative advantage. Pure price gaps do not create differential stayer-leaver sorting strength.

  4. Bidirectional flows: symmetry tests.

    • Jointly match \(PUB \rightarrow PRI\) and \(PRI \rightarrow PUB\) rates and gains by gender.

    • A price gap \(\Delta \beta_{0,g}(PRI,PUB)\) pushes both directions’ means (opposite sign, similar magnitude after accounting for selection).

    • Comparative advantage shows asymmetric selection and different dispersion across the two directions.

  5. Within-sector dispersion (pooled), by gender.

    • \(Var(\text{residual } lnW \mid s,g)\) pins down \(\sigma_{gs}^2\).

    • If women’s residual dispersion is notably different in \(PRI\) vs. \(PUB\), you can’t match both dispersions and the mover distributions with a pure price story; you’re forced to adjust \(\Sigma_g\) (including \(\rho_{23,g}\)).

  6. Event-time around fertility (the main exclusion):

    • Fertility shifts acceptance via \(\gamma_s\), not wages. After fertility events:
      • If sector shares shift and mean residual wages within sector change via composition, while mover gains’ mean doesn’t shift systematically, that’s a cceptance/selection, not prices.
      • Prices \(\beta_{0,g}\) are time-invariant here, so fertility timed movements help you attribute composition changes to \(\gamma\) and \(\Sigma_g\). not \(\beta_0\).
  7. State-dependent offer matrix moments.

    • The added tweak (offers depend on current sector) gives extra moments: staying hazards \(P(s \rightarrow s)\), directional flows \(P(PUB \rightarrow PRI)\) vs \(P(PRI \rightarrow PUB)\) by gender and fertility.

    • These depend strongly on \(\lambda_{s' \mid s,g}\) and \(\gamma\) but not on \(\beta_0\). Matching these pins down frictions and acceptance,leaving \(\beta_0\) to be identified by wage levels and mover means once selection is accounted for.

Practically, how does the estimator tell them apart?

In the structural SMM:

No single moment is decisive; the joint fit forces a decomposition: If you try to explain the women’s -0.46 mean with only a large \(\Delta \beta_{0,f}(PRI,PUB)\), you will misfit - the vairance of \(\Delta w\), - the mover rank-rank slope, (iii) stayer-leaver gaps, or (iv) the reverse \(PRI \rightarrow PUB\) flow so the optimizer shifts weight into \(\Sigma_f\) (comparative advantage) until all moments line up. Conversely, if you try to do it only with \(\Sigma_f\), you will miss within-sector mean levels and pooled wage gaps unless \(\beta_{0,f}(P)- \beta_{0,f}(G)\) adjusts.

Two concrete diagnostics:

Sources of gender differences and discussion of which moments move. What identifies prices \(\beta_0\), vs. endowments (\(\Sigma_g\)), and separates both from offers \(\lambda\) and tastes \(\gamma\). How to read the scenario results (ie what identifies what)

COMPARATIVE STATICS: SPECIFICATIONS

## =========================================================
## Dynamic Roy with One-Sector, STATE-DEPENDENT Offers
## (Experience removed; permanent sector skills only)
## ---------------------------------------------------------
## - 3 employment sectors
## - Home H always available
## - One offer per period (or none)
## - Offer arrival PROBABILITIES depend on CURRENT SECTOR (state)
## - Wages fixed per individual-sector (permanent epsilon + observables)
## - Fertility shifts utility (not wages, not offers)
## - Finite horizon, solved by backward induction
## =========================================================

set.seed(42)
suppressPackageStartupMessages({
  library(dplyr); library(tidyr); library(ggplot2); library(kableExtra)
})

## ----------------------------
## 1) Model primitives
## ----------------------------

T_per <- 5
N     <- 1000
HOME    <- "HME"
sectors <- c("SUB","PRI","PUB")
R_states <- c(HOME, sectors)  
state_levels <- c("HME","SUB","PRI","PUB")


RUN_GENDER <- "fem"   # "men", "fem", "both"
if (RUN_GENDER == "both") {
  genders <- c("men", "fem")
} else if (RUN_GENDER == "men") {
  genders <- c("men")
} else if (RUN_GENDER == "fem") {
  genders <- c("fem")
}
## ---- State-dependent (current-sector) offer process.
## tilt the offered sector toward the CURRENT r_t by adding a "stay bonus"
#stay_bonus <- 0.20  # extra mass for re-offer from current sector
## Keep "none" prob equal to the original base-none, and renormalize across S,G,P.
##  A1) Build conditional offer matrices with a given base + stay_bonus
  ## returns a list: probs[[r]] is a named vector over c(S,G,P,None) for origin r ∈ {H,S,G,P}
build_lambda_conditional <- function(lambda, bonus) {
  out <- list()
  base_sum <- sum(lambda)
  #print(base_sum)
  #print(paste("base_sum:", base_sum));
  
  # From HOME: use base probabilities (no tilt)
  out[[HOME]] <- c(lambda, None = 1 - base_sum)

  # From each sector: add stay bonus to that sector, renormalize offers to sum to base_sum
  for (r in sectors) {
    prob <- lambda
    prob[r] <- prob[r] + bonus
    prob <- prob * (base_sum / sum(prob))  # renormalize to sum to base_sum
    out[[r]] <- c(prob, None = 1 - sum(prob))
  }
  out
  #print(paste("out:", out[["PUB"]] ));
}


  cov_matrix <- function(sig, rho) {
    k <- length(sig)
    if (k < 2) stop("Length of 'sig' must be at least 2.")
    if (any(sig <= 0)) stop("'sig' must contain strictly positive values.")
  
    # correlation matrix from scalar rho
    cor_matrix_scalar <- function(rho, k) {
      if (rho <= -1 || rho >= 1)
        stop("Scalar rho must be strictly between -1 and 1.")
      M <- matrix(rho, k, k)
      diag(M) <- 1
      M
  }

  # correlation matrix from vector rho
  cor_matrix_vector <- function(rhos, k) {
    needed <- k*(k-1)/2
    if (length(rhos) != needed)
      stop(sprintf("For k=%d, rho vector must be length %d.", k, needed))
    if (any(rhos <= -1 | rhos >= 1))
      stop("All correlations must be strictly between -1 and 1.")

    M <- matrix(1, k, k)
    M[upper.tri(M)] <- rhos
    M[lower.tri(M)] <- t(M)[lower.tri(M)]
    M
  }

  # choose scalar or vector case
  R <- if (length(rho) == 1) {
    cor_matrix_scalar(rho, k)
  } else {
    cor_matrix_vector(rho, k)
  }

  # Σ = D R D
  D <- diag(sig)
  D %*% R %*% D
}

# Check for positive semidefiniteness
is_valid_corr <- function(rho) {
  # rho = c(r12, r13, r23)
  if (length(rho) != 3) stop("rho must be a vector of length 3")

  r12 <- rho[1]
  r13 <- rho[2]
  r23 <- rho[3]

  R <- matrix(c(
    1,    r12,  r13,
    r12,  1,    r23,
    r13,  r23,  1
  ), nrow = 3, byrow = TRUE)

  lambda_min <- min(eigen(R)$values)
  valid <- lambda_min > -1e-8

  if (!valid) {
    warning(sprintf(
      "Invalid correlation triple: smallest eigenvalue = %.4g",
      lambda_min
    ))
  }

  # Return nicely formatted result
  return(
    data.frame(
      r12 = r12,
      r13 = r13,
      r23 = r23,
      valid = valid
    )
  )
}


# Examples
rho <- c(0.3, 0.4, 0.2)
#is_valid_corr(rho)     # TRUE

rho <- c(0.9, 0.9, -0.9)    
#is_valid_corr(rho)     # FALSE

rho <- c(1, -1, 1)    
#is_valid_corr(rho)     # FALSE

rho <- c(0.7, -0.5, -0.2) 
#is_valid_corr(rho)     # TRUE

  
  
  

rmvnorm <- function(n, mu, Sigma) {
  Z <- matrix(rnorm(n * length(mu)), nrow = n)
  L <- chol(Sigma)
  sweep(Z %*% L, 2, mu, `+`)
}
  ## Fertility: exogenous Markov
  F_states <- 0:2
  Q_F <- matrix(0, 3, 3, dimnames=list(F_states, F_states))
  Q_F["0","0"] <- 0.95; Q_F["0","1"] <- 0.05; Q_F["0","2"] <- 0.00
  Q_F["1","1"] <- 0.92; Q_F["1","2"] <- 0.08
  Q_F["2","2"] <- 1.00

  #Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
  #Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05   # marriage probability
  #Q_M["1","1"] <- 1.00                         # married stays married


  ## Markov transition Q_M(m'|m)
  ## Example: small probability of marriage when single; absorbing once married.
  M_states <- 0:1          # 0 = single, 1 = married
  Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
  Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05   # Pr(marry)
  Q_M["1","1"] <- 1.00                         # married absorbing
  ## (can freely change this to allow divorce.)

  ## Marital-status taste shifters (parallel to gamma for fertility)
  ## delta_s_m  where s ∈ {HME, SUB, PRI, PUB} and m ∈ {0,1}
  ## These are non-pecuniary utility additions.
   delta_men <- c(HME=0, SUB=0, PRI=0, PUB=0)   # men
   delta_fem <- c(HME=0, SUB=0, PRI=0, PUB=0)   # women


  ## Draw next M
  #draw_next_M <- function(M_cur){
  #   sample(M_states, size=1, prob = Q_M[as.character(M_cur),])
  #}

  p_educ  <- 0.5   # Pr(educ=1)
  beta_u <- 1.0    # Utility scale and home intercept
  beta   <- 0.95   # discount factor
  beta1  <- 0.15   # Education return (log-wage)
  
  ## Skill prices (log-wage intercepts) by gender & sector (as before)
  beta0 <- tibble::tribble(
    ~g, ~s, ~par,
    "men","SUB",  0.15,#-0.20,
    "men","PRI",  0.15,#0.30,
    "men","PUB",  0.15,
    "fem","SUB",  0.15,#-0.20,
    "fem","PRI",  0.15,#-0.20,
    "fem","PUB",  0.15
  )
  
  #sig <- c(0.30, 0.30, 0.10)
  #rho <- c(0.20, 0.10, 0.40)   # (12, 13, 23)
  sig <- c(0.30, 0.30, 0.30)
  rho <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
  #is_valid_corr(rho)     
  sig0 <- c(0.30, 0.30, 0.30)
  rho0 <- c(0.0, 0.0, 0.0)   # (12, 13, 23)

  ## Fertility shifters (non-pecuniary) by sector incl. home (as before)
  gamma_m <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
  gamma_f <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
  
  ## ---- Base (memoryless) offer probs from previous code (for comparability)
  lambda_m <- c(SUB = 0.30, PRI = 0.30, PUB= 0.30)  # men
  lambda_f <- c(SUB = 0.30, PRI = 0.30, PUB = 0.30)  # women
  stay_bonus <- 0.90
  test <- build_lambda_conditional(lambda_m,stay_bonus)




## Scenario parameter builders
make_scenario <- function(scn){
  out <- list(label = scn)
    out$beta0_df <- beta0
    out$Sigma_m <- cov_matrix(sig0, rho0)
    out$Sigma_f <- cov_matrix(sig0, rho0)
    out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus)  # same as men
    out$gamma_m <- gamma_m
    out$gamma_f <- gamma_f
    out$redraw_eps <- TRUE
    
  if (scn == "BASELINE"){
    out$beta0_df <- beta0
    out$Sigma_m <- cov_matrix(sig, rho)
    out$Sigma_f <- cov_matrix(sig, rho)
    out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus)  # same as men
    out$gamma_m <- gamma_m
    out$gamma_f <- gamma_f
    out$redraw_eps <- TRUE
  } 
  else if (scn == "S1_prices"){
    ## Men vs women differ ONLY in beta0 (prices); same Sigma, same offers, same gamma
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~  -0.20,
      g=="men" & s=="PRI" ~  0.30,
      g=="men" & s=="PUB" ~  0.15,
      g=="fem" & s=="SUB" ~  -0.20,
      g=="fem" & s=="PRI" ~  0.15,
      g=="fem" & s=="PUB" ~  0.30     
    ))
    out$beta0_df <- beta0
    sig <- c(0.30, 0.30, 0.30)
    rho <- c(-0.20, -0.20, -0.20)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_m <- cov_matrix(sig, rho)
    sig <- c(0.30, 0.30, 0.30)
    rho <- c(-0.10, -0.10, -0.10)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
    out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus)  # same as men
    out$gamma_m <- gamma_m
    out$gamma_f <- gamma_f
    out$redraw_eps <- TRUE
  } 
  else if (scn == "S2_endoA"){
    ## Men vs women differ ONLY in Sigma (eps distribution); same prices, offers, gamma
    ## Make women's (G,P) more negatively correlated to illustrate comparative advantage
    out$beta0_df <- beta0
    
    sig <- c(0.30, 0.30, 0.30)
    rho <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_m <- cov_matrix(sig, rho)
    
    sig <- c(0.30, 0.30, 0.30)
    rho <- c(0.10, 0.10, 0.10)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
    
    out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus)  # same as men
    out$gamma_m <- gamma_m
    out$gamma_f <- gamma_f
    out$redraw_eps <- TRUE  # need to redraw eps given new Sigma_f
  } 

  else if (scn == "S2_endoB"){
    ## Men vs women differ ONLY in Sigma (eps distribution); same prices, offers, gamma
    ## Make women's (G,P) more negatively correlated to illustrate comparative advantage
    out$beta0_df <- beta0
    
    sig <- c(0.30, 0.30, 0.30)
    rho <- c(0.10, 0.10, 0.30)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_m <- cov_matrix(sig, rho)
    
    sig <- c(0.30, 0.30, 0.30)
    rho <- c(0.10, 0.10, -0.30)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
    
    out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus)  # same as men
    out$gamma_m <- gamma_m
    out$gamma_f <- gamma_f
    out$redraw_eps <- TRUE  # need to redraw eps given new Sigma_f
  } 

  else if (scn == "S3_offers"){
    ## Men vs women differ ONLY in offer arrival (state-dependent); same prices, same Sigma, same gamma
    out$beta0_df <- beta0
    out$Sigma_m <- cov_matrix(sig, rho)
    out$Sigma_f <- cov_matrix(sig, rho)
    out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
    lam <- c(SUB=0.10, PRI= 0.10, PUB=0.70) 
    out$lambda_f_cond <- build_lambda_conditional(lam, stay_bonus) 
    out$gamma_m <- gamma_m
    out$gamma_f <- gamma_f
    out$redraw_eps <- TRUE
  } 
  else if (scn == "S4_fertility"){
    ## Men vs women differ ONLY in gamma_s (fertility tastes); same prices, Sigma, offers
    out$beta0_df <- beta0
    out$Sigma_m  <- cov_matrix(sig, rho)
    out$Sigma_f  <- cov_matrix(sig, rho)
    out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus)  # same as men
    out$gamma_m <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.00) 
    out$gamma_f <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.60)  # make public more family-friendly for women
    out$redraw_eps <- TRUE
  } 
  else if (scn == "B0"){
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~  0.10,
      g=="men" & s=="PRI" ~  0.10,
      g=="men" & s=="PUB" ~  0.10,
      g=="fem" & s=="SUB" ~  0.10,
      g=="fem" & s=="PRI" ~  0.10,
      g=="fem" & s=="PUB" ~  0.10     ))
      sig <- c(0.30, 0.30, 0.30)
      rho <- c(-0.30, -0.30, -0.30)   # (12, 13, 23)
      is_valid_corr(rho)     
      out$Sigma_f <- cov_matrix(sig, rho)
  }
  else if (scn == "B1"){
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~  0.10,
      g=="men" & s=="PRI" ~  0.10,
      g=="men" & s=="PUB" ~  0.10,
      g=="fem" & s=="SUB" ~  0.10,
      g=="fem" & s=="PRI" ~  0.10,
      g=="fem" & s=="PUB" ~  0.10     ))
      sig <- c(0.30, 0.30, 0.30)
      rho <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
      is_valid_corr(rho)     
      out$Sigma_f <- cov_matrix(sig, rho)
  }
  else if (scn == "B2"){
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~  0.10,
      g=="men" & s=="PRI" ~  0.10,
      g=="men" & s=="PUB" ~  0.10,
      g=="fem" & s=="SUB" ~  0.10,
      g=="fem" & s=="PRI" ~  0.10,
      g=="fem" & s=="PUB" ~  0.10     ))
      sig <- c(0.30, 0.30, 0.30)
      rho <- c(0.30, 0.30, 0.30)   # (12, 13, 23)
      is_valid_corr(rho)     
      out$Sigma_f <- cov_matrix(sig, rho)
  }

  else if (scn == "C0"){
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~  0.10,
      g=="men" & s=="PRI" ~  0.10,
      g=="men" & s=="PUB" ~  0.10,
      g=="fem" & s=="SUB" ~  0.10,
      g=="fem" & s=="PRI" ~  0.10,
      g=="fem" & s=="PUB" ~  0.25     ))
      sig <- c(0.30, 0.30, 0.30)
      rho <- c(-0.30, -0.30, -0.30)   # (12, 13, 23)
      is_valid_corr(rho)     
      out$Sigma_f <- cov_matrix(sig, rho)
  }
  else if (scn == "C1"){
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~  0.10,
      g=="men" & s=="PRI" ~  0.10,
      g=="men" & s=="PUB" ~  0.10,
      g=="fem" & s=="SUB" ~  0.10,
      g=="fem" & s=="PRI" ~  0.10,
      g=="fem" & s=="PUB" ~  0.25     ))
      sig <- c(0.30, 0.30, 0.30)
      rho <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
      is_valid_corr(rho)     
      out$Sigma_f <- cov_matrix(sig, rho)
  }
  else if (scn == "C2"){
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~  0.10,
      g=="men" & s=="PRI" ~  0.10,
      g=="men" & s=="PUB" ~  0.10,
      g=="fem" & s=="SUB" ~  0.10,
      g=="fem" & s=="PRI" ~  0.10,
      g=="fem" & s=="PUB" ~  0.25     ))
      sig <- c(0.30, 0.30, 0.30)
      rho <- c(0.30, 0.30, 0.30)   # (12, 13, 23)
      is_valid_corr(rho)     
      out$Sigma_f <- cov_matrix(sig, rho)
  }
  else if (scn == "FR0"){
    out$gamma_m <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.00) 
    out$gamma_f <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.60)  # make public more family-friendly for women
  }
  else if (scn == "FR1"){
    out$gamma_m <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.00) 
    out$gamma_f <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=1.00)  # make public more family-friendly for women
  }
  else if (scn == "S0"){
    sig <- c(0.10, 0.10, 0.30)
    rho <- c(0.0, 0.0, -0.30)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
  } 
  else if (scn == "S1"){
    sig <- c(0.10, 0.10, 0.30)
    rho <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
  } 
  else if (scn == "S2"){
    sig <- c(0.10, 0.10, 0.30)
    rho <- c(0.0, 0.0, 0.30)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
  } 
  else if (scn == "S3"){
    sig <- c(0.10, 0.10, 0.30)
    rho <- c(0.0, 0.0, -0.30)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
  } 
  else if (scn == "S4"){
    sig <- c(0.10, 0.10, 0.30)
    rho <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)
  } 
  else if (scn == "S5"){
    sig <- c(0.40, 0.40, 0.40)
    rho <- c(0.30, 0.30, 0.30)   # (12, 13, 23)
    is_valid_corr(rho)     
    out$Sigma_f <- cov_matrix(sig, rho)

  } 
  else {
    stop("Unknown scenario label")
  }
  out
}

Solution and Simulation Programs

## ---------------------------------------------------------
## Block A: Scenario helpers (parameter overrides + wrappers)
## ---------------------------------------------------------

## Keep the same N, T_per, genders, sectors, HOME, beta1, beta_u, beta, Q_F, etc.
## We'll reuse your global g_vec, educ_vec for comparability across scenarios.

##  A2) Draw epsilons by gender given Sigma (keeps g_vec fixed)
draw_eps_by_gender <- function(g_vec, Sigma_m, Sigma_f){
  eps_mat <- matrix(NA_real_, nrow=length(g_vec), ncol=length(sectors),
                    dimnames=list(NULL, sectors))
  idx_m <- which(g_vec=="men"); idx_f <- which(g_vec=="fem")
  if (length(idx_m)>0) eps_mat[idx_m,] <- rmvnorm(length(idx_m), c(0,0,0), Sigma_m)
  if (length(idx_f)>0) eps_mat[idx_f,] <- rmvnorm(length(idx_f), c(0,0,0), Sigma_f)
  eps_mat
}

##  A3) Precompute per-person log wages for all sectors
## A3) Precompute per-person log wages for all sectors (robust, base-R)
## Precompute per-person log wages by sector (fixed forever)
precompute_logW_all <- function(g_vec, educ_vec, eps_mat, beta0_df){
  stopifnot(is.matrix(eps_mat),
            nrow(eps_mat) == length(g_vec),
            all(colnames(eps_mat) %in% sectors))
  
  Nloc <- length(g_vec)
  logW_all <- matrix(NA_real_, nrow = Nloc, ncol = length(sectors),
                     dimnames = list(NULL, sectors))
  
  # helper: fetch scalar beta0 for (g,s)
  get_beta0 <- function(gg, ss){
    idx <- (beta0_df$g == gg) & (beta0_df$s == ss)
    b <- beta0_df$par[idx]
    if (length(b) != 1L) stop(sprintf("beta0_df mismatch for g=%s, s=%s (found %d rows)", gg, ss, length(b)))
    b
  }
  
  for (ss in sectors){
    col <- match(ss, sectors)
    beta0_vec <- vapply(seq_len(Nloc), function(i) get_beta0(g_vec[i], ss), numeric(1))
    logW_all[, col] <- beta0_vec + beta1 * educ_vec + eps_mat[, col]
  }
  logW_all
}

## ----------------------------
##  A4) Solve V for one person *given scenario-specific objects*
## 4) Value Function (Backward Induction, person-specific)
## ----------------------------
## State: (t, r ∈ {H,S,G,P}, F ∈ {0,1,2})
## V_{T+1}(.) = 0. For t=T,...,1 compute V_t(r,F) = E_{O|r} [ max{ V_H, V_emp(O) } ]
## where V_H = u_home(F) + β E_F' V_{t+1}(H, F'), and
##       V_emp(s) = u_employed(logW_s, s, F) + β E_F' V_{t+1}(s, F').
solve_value_person_scn <- function(i, logW_all, g_vec, gamma_m,gamma_f,
                                   lambda_m_cond, lambda_f_cond){
  g <- g_vec[i]
  lambda_cond <- if (g=="men") lambda_m_cond else lambda_f_cond
  gamma <- if (g=="men") gamma_m else gamma_f
  delta <- if (g=="men") delta_men else delta_fem
  logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
  t_names <- as.character(1:(T_per+1)) # time index: 1..(T+1); V[t = T+1, ., .] = 0 (terminal)
  #V <- array(0, dim=c(T_per+1, length(R_states), length(F_states)),
  #           dimnames=list(t=t_names, r=R_states, F=as.character(F_states)))
  V <- array(
  0,
  dim=c(T_per+1, length(R_states), length(F_states), length(M_states)),
  dimnames=list(
    t=t_names,
    r=R_states,
    F=as.character(F_states),
    M=as.character(M_states)
  )
)
  
  # backward induction: tt = T, ..., 1
for (tt in T_per:1){
  for (r in R_states){
    p_offers <- lambda_cond[[r]]
    for (Fcur in F_states){
      for (Mcur in M_states){

        ## continuation if choose Home
        EV_next_HME <- sum(
          Q_F[as.character(Fcur), ] *
          V[as.character(tt+1), HOME, , as.character(Mcur)]
        )

        V_emp <- c(SUB=-Inf, PRI=-Inf, PUB=-Inf)

        for (s in sectors){
          EV_next_s <- sum(
            Q_F[as.character(Fcur), ] *
            V[as.character(tt+1), s, , as.character(Mcur)]
          )
          V_emp[s] <- beta_u * logW_i[s] +
                      ifelse(Fcur >= 1, gamma[s], 0) +
                      ifelse(Mcur == 1, delta[s], 0) +
                      beta * EV_next_s
        }

        V_HME <- 0 +
                 ifelse(Fcur >= 1, gamma["HME"], 0) +
                 ifelse(Mcur == 1, delta["HME"], 0) +
                 beta * EV_next_HME

        EV <- p_offers["None"] * V_HME +
              sum(p_offers[sectors] *
                  pmax(V_HME, V_emp[sectors]))

        V[as.character(tt), r, as.character(Fcur), as.character(Mcur)] <- EV

      }  # end Mcur
    }  # end Fcur
  }  # end r
}  # end tt

}











## ----------------------------
## 5) Simulation of optimal paths
## ----------------------------
##  A5) Simulate optimal paths *given* V_all and scenario objects
## Policy: Given (t, r, F) and realized offer O:
## choose O if V_emp(O) >= V_H; else choose Home.
## Offers drawn from λ_{·|r,g}; F evolves via Q_F.
simulate_paths_scn <- function(
  V_all, logW_all, g_vec, F0, gamma_m, gamma_f,
  lambda_m_cond, lambda_f_cond
){
  # helper: draw one offer given gender and current attachment r
  draw_offer_cond <- function(g, r){
    probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
    draw  <- sample(c(sectors, "None"), size = 1, prob = probs)
    if (draw == "None") NA_character_ else draw
  }

  out_list <- vector("list", length = N)

  for (i in seq_len(N)) {
    g_i    <- g_vec[i]
    educ_i <- educ_vec[i]
    V_i    <- V_all[[i]]
    logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)

    # start-of-panel state for person i
    r_cur  <- HOME
    F_cur  <- F0[i]
    M_cur  <- M0[i]

    # storage for this person's path
    rows <- vector("list", length = T_per)

    for (tt in 1:T_per) {
      # gender-specific gamma for THIS person
      gamma <- if (g_i == "men") gamma_m else gamma_f

      # draw an offer conditional on current attachment r_cur
      offer <- draw_offer_cond(g_i, r_cur)

      # continuation value if choose Home now
      EV_next_HME <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), HOME, ])
      V_HME <- 0 + ifelse(F_cur >= 1, gamma[["HME"]], 0) + beta * EV_next_HME

      # default choice is Home
      choice <- HOME
      employed <- 0L
      sector   <- NA_character_
      logW_chosen <- NA_real_

      # if there is an offer, compare employed value vs home
      if (!is.na(offer)) {
        s <- offer
        EV_next_s <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), s, ])
        V_emp_off <- beta_u * logW_i[s] +
                     ifelse(F_cur >= 1, gamma[[s]], 0) +
                     beta * EV_next_s

        if (V_emp_off >= V_HME) {
          choice <- s
          employed <- 1L
          sector <- s
          logW_chosen <- logW_i[s]
        }
      }

      # record this period
      rows[[tt]] <- data.frame(
        i = i, t = tt, g = g_i, educ = educ_i, F = F_cur, M=M_cur,      
        r = r_cur, offer = offer, choice = choice,
        employed = employed, sector = sector, logW = logW_chosen,
        stringsAsFactors = FALSE
      )

      # move to next period
      r_cur <- choice
      if (tt < T_per) {
        F_cur <- draw_next_F(F_cur)
        M_cur <- draw_next_M(M_cur)
      }

    }

    out_list[[i]] <- do.call(rbind, rows)
  }

  dt <- do.call(rbind, out_list)
  rownames(dt) <- NULL
  dt
}

Run baseline/each scenario and save output

## We'll keep the same g_vec, educ_vec, F0 across scenarios (comparability)
g_vec <- sample(genders, N, replace=TRUE)
educ_vec <- rbinom(N, 1, p_educ)
F0 <- sample(F_states, size=N, replace=TRUE, prob=c(0.8, 0.18, 0.02))
draw_next_F <- function(F){ sample(F_states, 1, prob=Q_F[as.character(F), ]) }
M0 <- sample(M_states, size=N, replace=TRUE, prob=c(0.90, 0.10))
draw_next_M <- function(M){
  sample(M_states, 1, prob = Q_M[as.character(M), ])
}


run_scenario <- function(scn_label){
  scn <- make_scenario(scn_label)

  
  ## Draw epsilons (only if needed)
  if (scn$redraw_eps){
    eps_mat_scn <- draw_eps_by_gender(g_vec, scn$Sigma_m, scn$Sigma_f)
  } else {
    ## reuse the eps_mat from your baseline if present; else draw with base Sigmas
    if (exists("eps_mat")) {
      eps_mat_scn <- eps_mat
    } else {
      eps_mat_scn <- draw_eps_by_gender(g_vec, Sigma_m, Sigma_f)
    }
  }
  
  logW_all_scn <- precompute_logW_all(g_vec, educ_vec, eps_mat_scn, scn$beta0_df)
  
  ## Solve V for each person
  ## Solve V for everyone (vectorized over i)
  ## (This is the heavy step: 5000 people × (T=10 × r=4 × F=3) is very feasible.)
  V_all_scn <- lapply(1:N, function(i)
    solve_value_person_scn(i, logW_all_scn, g_vec, scn$gamma_m,scn$gamma_f,
                           scn$lambda_m_cond, scn$lambda_f_cond))
  
  ## Simulate paths
  dt_scn <- simulate_paths_scn(V_all_scn, logW_all_scn, g_vec, F0, scn$gamma_m,scn$gamma_f,
                               scn$lambda_m_cond, scn$lambda_f_cond)
  dt_scn$scenario <- scn_label


  ## === Diagnostics ===
  
  ## (1) Pooled log-wage densities by sector×gender
  wages_pool <- dt_scn |>
    dplyr::filter(!is.na(sector), !is.na(logW)) |>
    dplyr::mutate(sector=factor(sector, levels=sectors))
  
  ## (2) Movers: Δ logW by origin→dest, gender
  moves <- dt_scn |>
    dplyr::group_by(i) |>
    dplyr::arrange(t, .by_group = TRUE) |>
    dplyr::mutate(next_logW = dplyr::lead(logW),
                  next_sector = dplyr::lead(sector),
                  employed_next = dplyr::lead(employed)) |>
    dplyr::ungroup() |>
    dplyr::filter(employed==1, employed_next==1,
                  !is.na(sector), !is.na(next_sector)) |>
    dplyr::mutate(orig=sector, dest=next_sector,
                  dlogW = next_logW - logW)
  
  ## (3) Residuals: remove sector×gender×educ mean (proxy for unobs)
  dt_resid <- dt_scn |>
    dplyr::filter(!is.na(sector), !is.na(logW)) |>
    dplyr::group_by(g, sector, educ) |>
    dplyr::mutate(resid = logW - mean(logW)) |>
    dplyr::ungroup()
  
  ## (4) Rank–rank slopes for G→P movers (by gender)
  rr <- dt_resid |>
    dplyr::select(i,t,g,sector,resid) |>
    dplyr::group_by(i) |>
    dplyr::arrange(t,.by_group=TRUE) |>
    dplyr::mutate(next_resid = dplyr::lead(resid),
                  next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    dplyr::filter(sector=="PUB", next_sector=="PRI") |>
    dplyr::group_by(g) |>
    dplyr::summarise(rank_slope = {
      x <- rank(resid, ties.method="average")/dplyr::n()
      y <- rank(next_resid, ties.method="average")/dplyr::n()
      if (length(x)>5) coef(lm(y ~ x))[2] else NA_real_
    }, .groups="drop")
  rr$scenario <- scn_label
  
  ## (5) Stayer–leaver residual gap in origin sector G (by gender)
sl <- dt_resid |>
  # 1. Sort each person's panel properly
  dplyr::group_by(i) |>
  dplyr::arrange(t, .by_group = TRUE) |>

  # 2. Next-period sector
  dplyr::mutate(next_sector = dplyr::lead(sector)) |>
  dplyr::ungroup() |>

  # 3. Keep only observations where current sector is PUB
  dplyr::filter(sector == "PUB") |>

  # 4. Drop final rows with no next observation
  dplyr::filter(!is.na(next_sector)) |>

  # 5. Stayer vs. leaver classification
  dplyr::mutate(
    status = dplyr::case_when(
      next_sector == "PUB" ~ "stayer",
      next_sector != "PUB" ~ "leaver"
    )
  ) |>

  # 6. Compute mean residuals by gender × status
  dplyr::group_by(g, status) |>
  dplyr::summarise(
    mean_resid = mean(resid, na.rm = TRUE),
    .groups = "drop"
  ) |>

  # 7. Reshape wide: stayer, leaver columns
  tidyr::pivot_wider(
    names_from = status,
    values_from = mean_resid,
    values_fill = NA
  ) |>

  # 8. Compute gap safely
  dplyr::mutate(
    gap = stayer - leaver,
    scenario = scn_label
  )

  
  ## (6) Staying hazards by current sector (state dependence proxy)
  haz <- dt_scn |>
    dplyr::group_by(i) |>
    dplyr::arrange(t,.by_group=TRUE) |>
    dplyr::mutate(next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    dplyr::filter(!is.na(sector)) |>
    dplyr::mutate(stay = as.integer(sector==next_sector)) |>
    dplyr::group_by(g, sector) |>
    dplyr::summarise(stay_hazard = mean(stay, na.rm=TRUE), .groups="drop") |>
    dplyr::mutate(scenario=scn_label)
  
  list(dt=dt_scn,
       wages_pool=wages_pool,
       moves=moves,
       rr=rr,
       sl=sl,
       haz=haz)
}

## Actually run all four scenarios
#scen_labels <- c("BASELINE","S1_prices","S2_endoA","S2_endoB","S3_offers","S4_fertility")
#results <- lapply(scen_labels, run_scenario)
#names(results) <- scen_labels

#compstats_sigmas <- c("S0","S1","S2","S3","S4","S5")
#results <- lapply(compstats_sigmas, run_scenario)
#names(results) <- compstats_sigmas

scenario_list <- c("B0","B1","B2","C0","C1","C2","FR0","FR1")
#scenario_list <- c("S0","S1","S2")

results <- lapply(scenario_list, run_scenario)
names(results) <- scenario_list



## Bind outputs for plotting / tables
wages_all <- dplyr::bind_rows(lapply(results, `[[`, "wages_pool"), .id="scenario_id")
moves_all <- dplyr::bind_rows(lapply(results, `[[`, "moves"), .id="scenario_id")
rr_all    <- dplyr::bind_rows(lapply(results, `[[`, "rr"), .id="scenario_id")
sl_all    <- dplyr::bind_rows(lapply(results, `[[`, "sl"), .id="scenario_id")
haz_all   <- dplyr::bind_rows(lapply(results, `[[`, "haz"), .id="scenario_id")


## Combine scenarios into one panel
to_factor <- function(x) factor(x, levels = c("SUB","PRI","PUB"))
panel_all <- dplyr::bind_rows(
  lapply(results, \(res) res$dt |> dplyr::mutate(sector = to_factor(sector))),
  .id = "scenario_id"
)


# pretty printing each component
#  temp <- make_scenario("BASELINE")
#for (r in names(temp$lambda_f_cond)) {
#  cat("\nFrom", r, ":\n")
#  print(round(temp$lambda_f_cond[[r]], 3))
#}

COMPARATIVE STATICS: OUTPUT

(i) Employment Transitions by Gender

# Build transitions: state_t (at t) -> state_t1 (at t+1)
transitions_all <- panel_all |>
  dplyr::group_by(scenario_id, g, i) |>
  dplyr::arrange(t, .by_group = TRUE) |>
  dplyr::mutate(
    state_t  = factor(ifelse(is.na(sector), "HME", as.character(sector)),
                      levels = state_levels),
    state_t1 = dplyr::lead(state_t)
  ) |>
  dplyr::ungroup() |>
  dplyr::filter(!is.na(state_t1)) |>
  dplyr::count(scenario_id, g, state_t, state_t1, name = "n") |>
  dplyr::group_by(scenario_id, g, state_t) |>
  dplyr::mutate(share = n / sum(n)) |>
  dplyr::ungroup()

# (Optional) pretty wide matrices per scenario x gender
transitions_wide <- transitions_all |>
  tidyr::pivot_wider(
    id_cols  = c(scenario_id, g, state_t),
    names_from = state_t1,
    values_from = share,
    values_fill = 0
  ) |>
  dplyr::arrange(scenario_id, g, factor(state_t, levels = state_levels))





knitr::kable(
  transitions_wide,
  caption = "Transition Matrix by Scenario",
  align = "c",
  digits = 2,
  format = "html"
) %>%
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    font_size = 13
  ) %>%
  kableExtra::collapse_rows(
    columns = 1:2,    # scenario_id + state_t
    valign = "top"
  )
Transition Matrix by Scenario
scenario_id g state_t HME SUB PRI PUB
B0 fem HME 0.48 0.18 0.17 0.18
SUB 0.21 0.60 0.09 0.09
PRI 0.22 0.10 0.59 0.09
PUB 0.20 0.11 0.09 0.60
B1 HME 0.52 0.16 0.15 0.17
SUB 0.20 0.61 0.10 0.09
PRI 0.18 0.10 0.60 0.11
PUB 0.21 0.10 0.09 0.60
B2 HME 0.60 0.13 0.15 0.12
SUB 0.17 0.61 0.12 0.10
PRI 0.18 0.13 0.58 0.11
PUB 0.17 0.13 0.14 0.57
C0 HME 0.46 0.15 0.18 0.21
SUB 0.18 0.61 0.09 0.12
PRI 0.20 0.10 0.60 0.11
PUB 0.21 0.09 0.12 0.59
C1 HME 0.48 0.16 0.13 0.23
SUB 0.15 0.60 0.12 0.13
PRI 0.18 0.12 0.58 0.12
PUB 0.18 0.10 0.11 0.61
C2 HME 0.54 0.14 0.12 0.20
SUB 0.16 0.62 0.11 0.12
PRI 0.16 0.11 0.60 0.13
PUB 0.19 0.12 0.11 0.59
FR0 HME 0.48 0.16 0.16 0.20
SUB 0.16 0.61 0.10 0.12
PRI 0.17 0.11 0.59 0.13
PUB 0.16 0.10 0.12 0.62
FR1 HME 0.44 0.19 0.17 0.21
SUB 0.18 0.59 0.10 0.13
PRI 0.16 0.10 0.61 0.13
PUB 0.16 0.10 0.11 0.63
#knitr::kable(
#  transitions_all,
#  caption = "Transition Matrix by Scenario (Wide)",
#  align = "c",
#  digits = 2,
#) %>%
#  format = "html"
#  kableExtra::kable_styling(
#    bootstrap_options = c("striped", "hover", "condensed"),
#    full_width = FALSE,
#    font_size = 13
#  ) %>%
#  kableExtra::collapse_rows(
#    columns = 1:2,    # scenario_id + state_t
#    valign = "top"
#  )
























# Heatmap per scenario & gender
p_trans <- ggplot(transitions_all,
                  aes(x = state_t1, y = state_t, fill = share)) +
  geom_tile() +
  geom_text(aes(label = scales::percent(share, accuracy = 1)),
            size = 3) +
  scale_x_discrete(limits = state_levels) +
  scale_y_discrete(limits = rev(state_levels)) +
  facet_grid(g ~ scenario_id) +
  scale_fill_viridis_c(option = "E") + 
  labs(title = "t → t+1 Employment Transitions by Scenario and Gender",
       x = "State at t+1", y = "State at t", fill = "Share") +
  theme_minimal()
#print(p_trans)
p_trans_clean <- ggplot(transitions_all,
    aes(x = state_t1, y = state_t, fill = share)) +
  geom_tile(color = "grey80") +
  scale_fill_viridis_c(option = "C", direction = -1,
                       limits = c(0, 1),
                       guide = guide_colorbar(barheight = 8)) +
  geom_text(aes(label = scales::percent(share, accuracy = 1)),
            color = "white", fontface = "bold", size = 4) +
  scale_x_discrete(limits = state_levels) +
  scale_y_discrete(limits = rev(state_levels)) +
  facet_grid(g ~ scenario_id) +
  labs(title = "t → t+1 Employment Transitions",
       x = "State at t+1", y = "State at t") +
  theme_minimal(base_size = 13) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.spacing = unit(1, "lines")
  )

print(p_trans_clean)
library(gt)

transitions_table <- transitions_wide |>
  gt(groupname_col = "scenario_id", rowname_col = "state_t") |>
  fmt_percent(columns = -c(scenario_id, g, state_t), decimals = 1) |>
  tab_spanner(label = "States at t+1",
              columns = state_levels) |>
  tab_stubhead(label = "State at t") |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_stub()
  ) |>
  tab_options(
    table.font.size = 14,
    data_row.padding = px(3),
    table.width = pct(100)
  ) |>
  opt_row_striping()

print(transitions_table)
p_bubble <- ggplot(transitions_all,
    aes(x = state_t1, y = state_t)) +
  geom_point(aes(size = share, fill = share), shape = 21, color = "black") +
  scale_size(range = c(1, 20)) +
  scale_fill_viridis_c(option = "D") +
  geom_text(aes(label = scales::percent(share, 1)), vjust = -1, size = 3.5) +
  facet_grid(g ~ scenario_id) +
  theme_minimal(base_size = 13) +
  labs(title = "Transition Matrix (Bubble Plot)",
       x = "State at t+1", y = "State at t", size = "Share")

print(p_bubble)

(ii) Employment Transitions by Kids-Change: Baseline vs. S4. Only Fem.

# Build transitions with fertility-change bins

#selected_scenarios <- c("B0", "B1", "B2", "C0", "C1","C2")
#selected_scenarios <- c("BASELINE", "S3_offers", "S4_fertility", "S5_offers_fertility")
# assumes: selected_scenarios, state_levels, panel_all, and helpers add_states_and_kids/summarise_transitions exist
# we'll define a slim variant that builds the 2-bin flag and then reuses the same summarise logic

add_states_and_kids2 <- function(df) {
  df %>%
    arrange(t, .by_group = TRUE) %>%
    mutate(
      state_t  = factor(ifelse(is.na(sector), "HME", as.character(sector)), levels = state_levels),
      state_t1 = dplyr::lead(state_t),
      F_bin_t  = ifelse(F >= 1, "kids", "no"),
      F_bin_t1 = dplyr::lead(F_bin_t),
      kid_change2 = dplyr::case_when(
        F_bin_t == "no"   & F_bin_t1 == "no"   ~ "no-change",
        F_bin_t == "kids" & F_bin_t1 == "kids" ~ "no-change",
        F_bin_t == "no"   & F_bin_t1 == "kids" ~ "change",
        TRUE ~ NA_character_
      ),
      # explicitly set the factor order
      kid_change2 = factor(kid_change2, levels = c("no-change", "change"))
    )
}

summarise_transitions2 <- function(df) {
  df %>%
    dplyr::ungroup() %>%
    dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
    dplyr::count(scenario_id, g, kid_change2, state_t, state_t1, name = "n") %>%
    tidyr::complete(scenario_id, g, kid_change2, state_t, state_t1, fill = list(n = 0)) %>%
    dplyr::mutate(
      state_t  = factor(state_t,  levels = state_levels),
      state_t1 = factor(state_t1, levels = state_levels)
    ) %>%
    dplyr::group_by(scenario_id, g, kid_change2, state_t) %>%
    dplyr::mutate(share = n / sum(n)) %>%
    dplyr::ungroup()
}



# ---- build transitions (women in the two scenarios) with 2-bin fertility flag
transitions_kid2 <- panel_all %>%
  dplyr::filter(scenario_id %in% scenario_list, g == "fem") %>%
  dplyr::group_by(scenario_id, g, i) %>%
  add_states_and_kids2() %>%
  summarise_transitions2()

# ---- heatmap
#p_trans_kid2 <- ggplot(transitions_kid2,
#  aes(x = state_t1, y = state_t, fill = share)) +
#  geom_tile() +
#  geom_text(aes(label = scales::percent(share, accuracy = 1)), size = 3) +
#  scale_x_discrete(limits = state_levels) +
#  scale_y_discrete(limits = rev(state_levels)) +
#  scale_fill_viridis_c(option = "E") + 
#  facet_grid(scenario_id ~ g + kid_change2) +  # scenarios as rows; columns = gender × change bin
#  labs(title = "t → t+1 Employment Transitions by Scenario, Gender (only Fem), and Kids-Change (2 bins)",
#       x = "State at t+1", y = "State at t", fill = "Share") +
#  theme_minimal()
#print(p_trans_kid2)



library(kableExtra)


to_wide2 <- function(trans_long) {
  trans_long %>%
    tidyr::pivot_wider(
      id_cols = c(scenario_id, g, kid_change2, state_t),
      names_from = state_t1,
      values_from = share,
      values_fill = 0
    ) %>%
    dplyr::arrange(scenario_id, g, kid_change2, state_t)
}
# ---- wide matrix (optional)
transitions_kid2_wide <- to_wide2(transitions_kid2)
#print(transitions_kid2_wide, n = 20)


# Remove gender and reorder rows as requested
transitions_kid2_wide_fem <- transitions_kid2_wide %>%
  dplyr::filter(g == "fem") %>%             # keep only females
  dplyr::select(-g) %>%                     # drop gender column
  dplyr::arrange(
    factor(scenario_id, levels = scenario_list),
    factor(state_t,    levels = c("HME","SUB","PRI","PUB")),
    factor(kid_change2, levels = c("no-change","change"))
  )

knitr::kable(
  transitions_kid2_wide_fem,
  caption = "Transition Matrix by Scenario × Kids-Change (Fem)",
  align = "c",
  digits = 2,
  format = "html"
) %>%
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    font_size = 13
  ) %>%
  kableExtra::collapse_rows(
    columns = 1:2,    # scenario_id + state_t
    valign = "top"
  )
Transition Matrix by Scenario × Kids-Change (Fem)
scenario_id kid_change2 state_t HME SUB PRI PUB
B0 no-change HME 0.47 0.18 0.17 0.18
change HME 0.53 0.12 0.16 0.19
no-change SUB 0.21 0.60 0.09 0.10
change SUB 0.18 0.70 0.06 0.06
no-change PRI 0.22 0.10 0.59 0.09
change PRI 0.22 0.06 0.66 0.06
no-change PUB 0.21 0.11 0.09 0.60
change PUB 0.14 0.14 0.14 0.59
B1 no-change HME 0.52 0.16 0.16 0.17
change HME 0.46 0.18 0.12 0.24
no-change SUB 0.20 0.60 0.10 0.09
change SUB 0.09 0.68 0.09 0.14
no-change PRI 0.19 0.10 0.60 0.11
change PRI 0.16 0.10 0.63 0.12
no-change PUB 0.20 0.10 0.09 0.60
change PUB 0.27 0.09 0.09 0.55
B2 no-change HME 0.61 0.12 0.15 0.12
change HME 0.54 0.19 0.14 0.13
no-change SUB 0.17 0.61 0.12 0.10
change SUB 0.23 0.60 0.10 0.07
no-change PRI 0.18 0.12 0.59 0.11
change PRI 0.10 0.21 0.52 0.17
no-change PUB 0.17 0.13 0.14 0.57
change PUB 0.14 0.11 0.18 0.57
C0 no-change HME 0.46 0.15 0.18 0.21
change HME 0.44 0.17 0.22 0.17
no-change SUB 0.18 0.62 0.09 0.11
change SUB 0.21 0.54 0.11 0.14
no-change PRI 0.20 0.09 0.60 0.11
change PRI 0.21 0.14 0.54 0.11
no-change PUB 0.21 0.09 0.11 0.59
change PUB 0.23 0.09 0.17 0.51
C1 no-change HME 0.48 0.16 0.13 0.22
change HME 0.52 0.15 0.03 0.30
no-change SUB 0.16 0.60 0.11 0.13
change SUB 0.08 0.69 0.15 0.08
no-change PRI 0.18 0.12 0.58 0.12
change PRI 0.21 0.08 0.59 0.13
no-change PUB 0.18 0.10 0.11 0.61
change PUB 0.23 0.11 0.03 0.63
C2 no-change HME 0.53 0.14 0.13 0.20
change HME 0.57 0.10 0.10 0.24
no-change SUB 0.16 0.62 0.11 0.12
change SUB 0.03 0.58 0.19 0.19
no-change PRI 0.16 0.11 0.60 0.13
change PRI 0.12 0.12 0.68 0.08
no-change PUB 0.19 0.12 0.11 0.59
change PUB 0.15 0.15 0.15 0.55
FR0 no-change HME 0.48 0.16 0.16 0.20
change HME 0.33 0.15 0.26 0.26
no-change SUB 0.16 0.61 0.10 0.12
change SUB 0.14 0.54 0.14 0.17
no-change PRI 0.17 0.11 0.59 0.13
change PRI 0.06 0.11 0.64 0.19
no-change PUB 0.16 0.10 0.12 0.62
change PUB 0.20 0.17 0.06 0.57
FR1 no-change HME 0.44 0.19 0.17 0.20
change HME 0.39 0.12 0.20 0.29
no-change SUB 0.18 0.59 0.10 0.13
change SUB 0.10 0.58 0.19 0.13
no-change PRI 0.16 0.11 0.61 0.12
change PRI 0.15 0.03 0.68 0.15
no-change PUB 0.16 0.09 0.11 0.63
change PUB 0.18 0.13 0.16 0.53

(iii) Staying hazards by sector

# --- Staying hazards by sector × (kid-change 2 bins) × scenario × gender
haz_kid2 <- panel_all %>%
  dplyr::filter(scenario_id %in% scenario_list) %>%
  dplyr::group_by(scenario_id, g, i) %>%
  add_states_and_kids2() %>%
  dplyr::ungroup() %>%
  dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
  # focus on employed origins only (SUB/PRI/PUB). Drop H unless you want "stay at home" hazards
  dplyr::filter(state_t %in% sectors) %>%
  dplyr::group_by(scenario_id, g, kid_change2, sector = state_t) %>%
  dplyr::summarise(
    stay_hazard = mean(state_t1 == sector),
    n = dplyr::n(),
    .groups = "drop"
  ) %>%
  dplyr::mutate(sector = factor(sector, levels = sectors))



# --- Filter for women, public sector only ---
haz_pub <- haz_kid2 %>%
  filter(g == "fem", sector == "PUB") %>%
  mutate(
    scenario_id = factor(scenario_id, levels = scenario_list),
    kid_change2 = factor(kid_change2, levels = c("no-change", "change"))
  )

# --- Nice spacing: create a composite x variable with spacing between scenarios ---
# One way: treat scenario as x-axis, kid_change2 as "fill"/dodged subgroup
# That automatically generates: S0(no-change/change), S1(no-change/change), ...
p_pub <- ggplot(haz_pub, aes(x = scenario_id, y = stay_hazard,
                             fill = kid_change2)) +
  geom_col(position = position_dodge(width = 0.7), width = 0.6) +
  scale_fill_manual(
    values = c("no-change" = "gray", "change" = "pink"),
    labels = c("No change", "No Kids → Kids")
  ) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  labs(
    title = "Staying Hazard in the Public Sector (PUB)\nWomen Only, by Scenario and Kids-Change",
    subtitle = "Two bars per scenario: no-change vs kids-change",
    x = "Scenario",
    y = "Pr(stay in Public sector at t+1)",
    fill = "Fertility transition"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(size = 12),
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 13),
    legend.position = "right"
  )

print(p_pub)

(iv) Within-sector log-wage densities (by gender) — pins down β0 vs Σ_g

## Compute means for each sector × scenario × gender
mean_df <- wages_all |>
  dplyr::group_by(scenario_id, sector, g) |>
  dplyr::summarise(mean_logW = mean(logW, na.rm = TRUE), .groups = "drop")

p1 <- ggplot(wages_all, aes(x = logW, color = g, fill = g)) +
  
  ## Density with semi-transparent fill
  geom_density(alpha = 0.15, linewidth = 0.9) +
  
  ## Vertical dashed lines at the mean
  geom_vline(
    data = mean_df,
    aes(xintercept = mean_logW, color = g),
    linetype = "dashed",
    linewidth = 0.7,
    alpha = 0.8
  ) +
  
  ## Muted academic colors
  scale_color_manual(
    values = c(
      "men" = "#4E79A7",
      "fem" = "darkgrey"
    )
  ) +
  scale_fill_manual(
    values = c(
      "men" = "#4E79A7",
      "fem" = "darkgrey"
    )
  ) +
  
  facet_grid(
    sector ~ scenario_id,
    scales = "free_y",
    switch = "y"
  ) +
  
  labs(
    title = "Log-Wage Density by Sector and Scenario",
    x = "Log wage",
    y = "Density",
    color = "Gender",
    fill = "Gender"
  ) +
  
  theme_minimal(base_size = 14) +
  theme(
    strip.background = element_rect(fill = "grey90", color = NA),
    strip.text = element_text(face = "bold", size = 12),
    
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "grey85", linewidth = 0.3),
    
    axis.text = element_text(size = 11),
    axis.title = element_text(size = 13),
    
    legend.position = "bottom",
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 11),
    
    plot.title = element_text(size = 16, face = "bold", hjust = 0)
  )

print(p1)

## Wage distributions by sector × gender × fertility × scenario (pooled)
#wages_pool <- panel_all |>
#  dplyr::filter(!is.na(sector), !is.na(logW)) |>
#  dplyr::mutate(F_bin = ifelse(F >= 1, "kids>=1", "no kids"),
#                gF    = paste(g, F_bin),
#                sector = to_factor(sector))
#p_dens <- ggplot(wages_pool, aes(x = logW, color = gF)) +
#  geom_density() +
#  facet_grid(sector ~ scenario_id, scales = "free_y") +
#  labs(title = "Log-wage distributions by sector × scenario (pooled)",
#       x = "log wage", y = "density", color = "Gender × Fertility") +
#  theme_minimal()
#print(p_dens)

## Mean log-wages over time by sector × gender × fertility × scenario
#wage_means <- panel_all |>
#  dplyr::filter(!is.na(sector), !is.na(logW)) |>
#  dplyr::mutate(F_bin = ifelse(F >= 1, "kids>=1", "no kids"),
#                sector = to_factor(sector)) |>
#  dplyr::group_by(scenario_id, t, g, F_bin, sector) |>
#  dplyr::summarise(mean_logW = mean(logW), .groups = "drop")

#p_wmean <- ggplot(wage_means, aes(x = t, y = mean_logW, color = F_bin)) +
#  geom_line() +
#  facet_grid(rows = vars(g), cols = vars(scenario_id, sector)) +
#  labs(title = "Mean log wages over time by scenario",
#       x = "Period", y = "Mean log wage", color = "Fertility") +
#  theme_minimal()
#print(p_wmean)

(v) Wage transitions for stayers and movers

library(gt)

## 2) Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage
moves_keep <- moves_all |> dplyr::filter(orig %in% c("PUB","PRI"), dest %in% c("PUB","PRI"), orig!=dest)

#moves_plot <- moves_keep |>
#  dplyr::filter((orig == "PRI" & dest == "PUB") |
#                (orig == "PUB" & dest == "PRI")) |>
#  dplyr::mutate(move = paste(orig, dest, sep = "→"))

moves_plot <- moves_keep |>
  dplyr::filter((orig == "PRI" & dest == "PUB") ) |>
  dplyr::mutate(move = paste(orig, dest, sep = "→"))


p2 <- ggplot(moves_plot, aes(x = g, y = dlogW, fill = g)) +
  geom_hline(yintercept = 0, linetype = 2, linewidth = 0.4, color = "grey40") +
  geom_boxplot(outlier.alpha = 0.20, width = 0.6, color = "black") +
  facet_grid(move ~ scenario_id) +
  scale_fill_manual(values = c("men" = "#4E79A7", "fem" = "lightgray")) +
  labs(
    title = "Mover Gains (Δ log wage) by Origin→Destination, Gender, and Scenario",
    x = "Gender",
    y = "Δ log wage",
    fill = "Gender"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    strip.text = element_text(face = "bold", size = 13),
    panel.grid.minor = element_blank(),
    panel.grid.major.x = element_blank(),
    axis.text.x = element_text(size = 12),
    legend.position = "bottom",
    plot.title = element_text(size = 16, face = "bold")
  )
print(p2)

## 3) Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)
rr_table <- rr_all |>
  dplyr::arrange(g, scenario_id) |>
  gt(rowname_col = "scenario_id", groupname_col = "g") |>
  tab_header(
    title = md("**Rank–Rank Slope for PUB → PRI Movers**"),
    subtitle = md(
      "A lower slope indicates stronger comparative advantage: \
      it measures how strongly a worker’s within-PUB sector rank \
      predicts their rank in PRIVATE sector upon moving."
    )
  ) |>
  fmt_number(columns = dplyr::everything(), decimals = 3) |> ## Format slope values nicely
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_stub() ## Styling
  ) |>
  opt_row_striping() |>
  tab_options(
    table.width = pct(100),
    data_row.padding = px(4),
    table.font.size = 13
  )
rr_table
Rank–Rank Slope for PUB → PRI Movers
A lower slope indicates stronger comparative advantage: it measures how strongly a worker’s within-PUB sector rank predicts their rank in PRIVATE sector upon moving.
rank_slope scenario
fem
B0 0.014 B0
B1 0.114 B1
B2 0.120 B2
C0 −0.212 C0
C1 0.066 C1
C2 0.064 C2
FR0 0.008 FR0
FR1 0.053 FR1
## 4) Stayer–leaver residual gap in G (higher gap ⇒ stronger selection on ε in G)
sl_table <- sl_all |>
  dplyr::arrange(g, scenario) |>
  gt(rowname_col = "scenario", groupname_col = "g") |>
  tab_header(
    title = md("**Stayer–Leaver Residual Gap in Origin Sector PUB**"),
    subtitle = md(
      "Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: \
      ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector."
    )
  ) |>
  fmt_number(columns = c(stayer, leaver, gap), decimals = 3) |> ## Format columns nicely
  opt_row_striping() |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_stub() ## Styling
  ) |>
  tab_options(
    table.width = pct(100),
    table.font.size = 13,
    data_row.padding = px(4)
  )
sl_table
Stayer–Leaver Residual Gap in Origin Sector PUB
Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector.
scenario_id leaver stayer gap
fem
B0 B0 −0.040 0.006 0.046
B1 B1 −0.007 0.005 0.013
B2 B2 0.011 0.002 −0.008
C0 C0 −0.013 −0.002 0.011
C1 C1 0.008 −0.008 −0.016
C2 C2 0.024 −0.010 −0.033
FR0 FR0 0.005 0.003 −0.003
FR1 FR1 0.008 −0.001 −0.008
## Wage transitions for consecutive employment (by scenario)
transitions <- panel_all |>
  filter(g == "fem") %>%
  dplyr::group_by(scenario_id, i) |>
  dplyr::arrange(t, .by_group = TRUE) |>
  dplyr::mutate(next_logW   = dplyr::lead(logW),
                next_sector = dplyr::lead(sector),
                next_F      = dplyr::lead(F),
                employed_next = dplyr::lead(employed)) |>
  dplyr::ungroup() |>
  dplyr::filter(employed == 1, employed_next == 1) |>
  dplyr::mutate(dlogW = next_logW - logW,
                F_trans = dplyr::case_when(
                  F == 0 & next_F == 0   ~ "no kids -> no kids",
                  F == 0 & next_F >= 1   ~ "no kids -> kids",
                  F >= 1 & next_F >= 1   ~ "kids -> kids",
                  F >= 1 & next_F == 0   ~ "kids -> no kids"
                ))

dlogW_stats <- transitions |>
  dplyr::mutate(orig = sector, dest = next_sector) |>
  dplyr::group_by(scenario_id, g, orig, dest, F_trans) |>
  dplyr::summarise(mean_dlogW = mean(dlogW, na.rm = TRUE),
                   n = n(), .groups = "drop") |>
  dplyr::mutate(orig = to_factor(orig), dest = to_factor(dest)) |>
  dplyr::arrange(scenario_id, g, orig, dest, F_trans)

print(head(dlogW_stats, 20))
## # A tibble: 20 × 7
##    scenario_id g     orig  dest  F_trans            mean_dlogW     n
##    <chr>       <chr> <fct> <fct> <chr>                   <dbl> <int>
##  1 B0          fem   SUB   SUB   kids -> kids          0         146
##  2 B0          fem   SUB   SUB   no kids -> kids       0          23
##  3 B0          fem   SUB   SUB   no kids -> no kids    0         378
##  4 B0          fem   SUB   PRI   kids -> kids          0.0325     30
##  5 B0          fem   SUB   PRI   no kids -> kids      -0.156       2
##  6 B0          fem   SUB   PRI   no kids -> no kids    0.0298     52
##  7 B0          fem   SUB   PUB   kids -> kids         -0.00170    17
##  8 B0          fem   SUB   PUB   no kids -> kids       0.286       2
##  9 B0          fem   SUB   PUB   no kids -> no kids   -0.0326     66
## 10 B0          fem   PRI   SUB   kids -> kids          0.0546     28
## 11 B0          fem   PRI   SUB   no kids -> kids      -0.0872      2
## 12 B0          fem   PRI   SUB   no kids -> no kids    0.0111     62
## 13 B0          fem   PRI   PRI   kids -> kids          0         143
## 14 B0          fem   PRI   PRI   no kids -> kids       0          21
## 15 B0          fem   PRI   PRI   no kids -> no kids    0         377
## 16 B0          fem   PRI   PUB   kids -> kids          0.00922    23
## 17 B0          fem   PRI   PUB   no kids -> kids      -0.0231      2
## 18 B0          fem   PRI   PUB   no kids -> no kids   -0.0138     56
## 19 B0          fem   PUB   SUB   kids -> kids          0.0895     26
## 20 B0          fem   PUB   SUB   no kids -> kids      -0.0436      5
p_dlog <- dlogW_stats |>
  dplyr::filter(n >= 2) |>
  ggplot(aes(x = interaction(orig, dest, sep = "→"),
             y = mean_dlogW, fill = F_trans)) +
  geom_col(position = "dodge") +
  facet_grid(rows = vars(g), cols = vars(scenario_id)) +
  labs(title = "Mean Δ log wage for consecutive employment by scenario",
       x = "Origin → Destination sector", y = "Mean Δ log wage",
       fill = "Fertility transition") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(p_dlog)

(vi) Emp and Wage Profiles

## --- Sector shares over time by gender × scenario ---
shares <- panel_all |>
  dplyr::filter(!is.na(sector)) |>
  dplyr::count(scenario_id, t, g, sector, name = "n") |>
  dplyr::group_by(scenario_id, t, g) |>
  dplyr::mutate(share = n / sum(n)) |>
  dplyr::ungroup()

p_shares <- ggplot(shares, aes(x = t, y = share, color = g)) +
  geom_line() +
  facet_grid(sector ~ scenario_id, scales = "free_y") +
  labs(title = "Employment shares over time by sector × scenario",
       x = "Period", y = "Share", color = "Gender") +
  theme_minimal()

print(p_shares)

## --- Log wages by time × sector × scenario × gender ---
logwages <- panel_all |>
  dplyr::filter(!is.na(sector), !is.na(w)) |>
  dplyr::mutate(logw = log(w)) |>
  dplyr::group_by(scenario_id, t, g, sector) |>
  dplyr::summarise(mean_logw = mean(logw), .groups = "drop") |>
  dplyr::mutate(sector = factor(sector, levels = sectors))

p_logw <- ggplot(logwages, aes(x = t, y = mean_logw, color = g)) +
  geom_line() +
  facet_grid(sector ~ scenario_id, scales = "free_y") +
  labs(
    title = "Mean log wages over time by sector × scenario",
    x = "Period",
    y = "Mean log wage",
    color = "Gender"
  ) +
  theme_minimal()

print(p_logw)
#How does the model estimation do with tihs niformation or how does it interpret it. A big female mean loss moving from public to private (e.g. -0.46 vs men's -0.09) could come from either a larger price disadvantage in P for women, or stronger negative ($\varepsilon_{iP}- \varepsilon_{iG}$) among the women who move. What separate prices from comparative advantage: To separate them, we use the following: *


#*In order to understand the factors that give rise to these observed gender differences (ie distinguish the roles of price gaps ($\Delta \beta_{0,g}$) vs. comparative advantage/endowment st ructure ($\varepsilon$ means/variances/covariances) ), we use many data moments at once, not just the mean wages or mean log wage differences for stayers and movers. In the state-dependent offer model specified above, each block pushes different patterns in the data:*