1 MODEL SETUP

Time is discrete, \(t=0,1,\dots,T\). Each individual \(i\) has gender \(g\in\{m,f\}\) and education \(e\). There are 3 employment sectors, \[ \mathcal S \equiv \{\text{Subsistence }(SUB),\ \text{Private }(PRI),\ \text{Public }(PUB)\}, \] and a home alternative \(H\) (non-employment).

The state at time \(t\) for individual \(i\) is: \[ \Xi_{it}=\Big(g,e,\ r_{it},\ \mathbf{\varepsilon}_i,\ F_{it},\ M_{it}\Big), \] where:

Assume \(F_{it}\) evolves exogenously via a Markov process \(Q(F_{t+1}\mid F_t)\) that is excluded from wages and excluded from offer arrival (it only affects utility).

1.1 Wages (Fixed Conditional on Permanent Skills)

If \(i\) works in sector \(s\in\mathcal S\) at \(t\), the (log) wage is: \[ \boxed{ \ln W_{it}^g(s) = \beta_{0,g}(s)\;+\;\beta_1\,\text{educ}_i\;\;+\;\varepsilon_{is}. } \] Excluding work experience for now.

  • \(\beta_{0,g}(s)\): skill price (gender- and sector-specific; may capture discrimination).

  • \(\beta_1\): return to education (common).

  • \(\beta_2\): return to sector-specific experience if we have experience (common).

  • \(\varepsilon_{is}\): permanent sector ability (drawn at entry; no transitory wage shocks; no new wage draws).

Distribution of permanent skills. For tractability and mover-based identification, assume a multivariate Normal by gender: \[ \mathbf{\varepsilon}_i \mid g \sim \mathcal N\big(\mathbf{0},\ \Sigma_g\big),\qquad \Sigma_g= \begin{pmatrix} \sigma^2_{1,g} & \rho_{12,g} \,\, \sigma_{1,g}\,\, \sigma_{2,g} & \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} \\ \rho_{12,g} \,\, \sigma_{1,g} \,\, \sigma_{2,g} & \sigma^2_{2,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g}\\ \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g} & \sigma^2_{3,g} \end{pmatrix}. \]

(Alternative: independent Fréchet components; results below do not require closed-form shares.)

For \(\Sigma_g\) to be a positive semidefinite (ie a valid covariance matrix), the corr matrix must also satisfy \[ 1 + 2 \, \rho_{12,g}\, \rho_{13,g}\, \rho_{23,g} - \, \rho_{12,g}^2 - \, \rho_{13,g}^2 - \, \rho_{23,g}^2 \] So for various pars values, make sure you check this condition.

1.2 Offers: One Sector per Period, Home Always Available

Each period, at most one employment offer arrives. Let \(O_{it}\in\{S,G,P,\varnothing\}\) denote the sector offered at \(t\) (or \(\varnothing\) if no offer). Define arrival probabilities conditional on observables \(X_{it}\) (age, past sector, etc.):

\[ \Pr\big(O_{it}=s\mid X_{it},g\big)=\lambda^g_s(X_{it}),\quad s\in\{S,G,P\},\qquad \Pr(O_{it}=\varnothing\mid X_{it},g)=1-\sum_{s}\lambda^g_s(X_{it}). \] Key friction: If \(r_{it}=s\) (currently attached to \(s\)), the agent can continue in sector \(s\) only if \(O_{it}=s\). Otherwise, they can’t remain employed in \(s\) this period.

Home \(H\) is always available (no offer needed).

In other words, agents compare at most one employment sector vs home choice each period.

1.3 Preferences

Let per-period utility be quasi-separable in (log) wages and non-pecuniary sector tastes shifted by fertility (F) and marital status (M):

\[ u_{it}(s) = \beta_u\,\ln W_{it}^g(s) \;+\; \gamma_s^g\,F_{it} \;+\; \delta_s^g\,M_{it},\qquad s\in\{SUB,PRI,PUB\}, \] with a home utility \(u_{it}(H)= b_0 \;+\; \gamma_H\,F_{it} \;+\; \delta_H\,M_{it},\qquad b_0\in\mathbb R.\)

  • \(\beta_u>0\) scales log-wage into utils.

  • \(\gamma_s\) and \(\delta_s\) are sector-specific preference shifters and they differ by gender (exogenous; do not affect wages or offers).

  • No additional idiosyncratic taste shocks are included (only permanent wage heterogeneity). This focuses identification on fertility as an exclusion.

1.4 Dynamic Program

Timing Within Period: At the start of period \(t\) given state \(\Xi_{it}\):

  1. Offer draw: \(O_{it}\in\{S,G,P,\varnothing\}\) is realized from \(\{\lambda^g_s(X_{it})\}\).

  2. Choice set: \(\mathcal C_{it}=\{H\}\cup\{O_{it}\}\,\) (home plus the one offered sector if any).

  3. Decision: choose \(d_{it}\in\mathcal C_{it}\) to maximize current utility plus continuation.

  4. Experience update given choice; then \(F_{it}\) evolves to \(F_{it+1}\sim Q(\cdot\mid F_{it})\).

Let \(V_t(\Xi)\) be the value function. Define the sector-specific continuation value if offered \(s\):

\[ \mathcal V_t(s;\Xi) \equiv u_t(s) \;+\; \beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=s\big], \]

and the home continuation value: \[ \mathcal V_t(H;\Xi) \equiv u_t(H)\;+\;\beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=H\big], \] with discount factor \(\beta\in(0,1)\).

Given the one-offer constraint, \[ V_t(\Xi) \;=\; \underbrace{\Pr(O_t=\varnothing\mid X,g)}_{\text{no offer}} \cdot \max\{\mathcal V_t(H;\Xi)\} \;+\; \sum_{s\in\{S,G,P\}} \underbrace{\Pr(O_t=s\mid X,g)}_{\lambda^g_s(X)} \cdot \max\{\mathcal V_t(H;\Xi),\ \mathcal V_t(s;\Xi)\}. \] The law of motion inside the expectations updates experience per choice and draws \(F_{t+1}\) exogenously.

stayers vs. movers:

If \(r_t=s\) and \(O_t=s\), the agent may stay in \(s\) by comparing \(\mathcal V_t(s;\Xi)\) to \(\mathcal V_t(H;\Xi)\).

If \(O_t\ne s\) (or \(O_t=\varnothing\)), staying in \(s\) is infeasible; the agent compares only \(H\) and the offered \(O_t\) (if any).

Movers are observed when \(O_t=s'\ne r_t\) and \(\mathcal V_t(s';\Xi)\ge \mathcal V_t(H;\Xi)\).

1.5 Normalizations and Interpretability

To interpret gender wedges as skill price differences:

  • Fix a base sector \(s_0\) (e.g., Private) and set \(\beta_{0,m}(s_0)=0\).

  • Center permanent skills by gender–sector: \(\mathbb E[\varepsilon_{is}\mid g]=0\).
    Then \(\beta_{0,f}(s)-\beta_{0,m}(s)\) is a price/discrimination wedge in sector \(s\).

  • Optionally fix \(b_0=0\) or \(\beta_u=1\) for utility scale.

Observables and Moments: Given panel data on \(\{s_{it},\ \ln W_{it}\ \text{if } s_{it}\in\mathcal S,\ F_{it},\ e,\ \mathbf{x}_{it}\}\).

Define moments (by gender and also by education/experience if applicable):

  • Sector shares: \(\pi_{gs}(t)=\Pr(s_{it}=s\mid g,t)\).

  • Transitions: \(P_g(s'|s)=\Pr(s_{it+1}=s'\mid s_{it}=s,g)\).

  • Within-sector wages: \(\bar w_{gs}(t)=\mathbb E[\ln W_{it}\mid s_{it}=s,g,t]\), and dispersion \(\sigma^2_{gs}(t)\).

  • Mover wage gaps: \(\mathbb E[\Delta \ln W_{i} \mid r_t=s,\ s_{t+1}=s',\ g]\).

  • Fertility responses: event-study \(\Delta \pi_{gs}\) and \(\Delta P_g(s'|s)\) around \(\Delta F_{it}\).

2 Identification (Heuristic)

2.1 a. Sector Skill Prices by Gender: \(\beta_{0,g}(s)\) (relative)

Using movers and the one-offer structure:

\[ \mathbb E\big[\ln W_{i}(s') - \ln W_{i}(s)\mid r_t=s,\ s_{t+1}=s',\ g\big] = \underbrace{\beta_{0,g}(s')-\beta_{0,g}(s)}_{\text{price difference}} + \beta_2\,\Delta x_i + \underbrace{\mathbb E[\varepsilon_{is'}-\varepsilon_{is}\mid \text{move } s\!\to\! s']}_{\text{selection term}}, \]

where the selection term arises because movers are selected on \(\mathbf{\varepsilon}\) given the acceptance rule and offers.

Given \(\Sigma_g\) (below) and the offer process (below), the selection term is pinned down in the simulated model; the residual identifies \(\beta_{0,g}(s')-\beta_{0,g}(s)\) up to the base-sector normalization.

Gender wedges. With centered \(\varepsilon\), the cross-gender difference \(\beta_{0,f}(s)-\beta_{0,m}(s)\) is interpretable as discrimination (skill price) in sector \(s\).

2.2 b. Permanent Skill Distribution: \(\Sigma_g\)

  • Within-sector residual dispersion of \(\ln W\) (net of \(\beta_1,\beta_2\)) identifies variances \(\sigma^2_{gs}\).

  • Mover patterns (who moves where; gains upon moving) identify covariances \(\rho_{g,ss}\): strong positive correlation implies individuals who are strong in \(s\) are also strong in \(s'\) (affecting acceptance when offered \(s'\)).

  • \(\Sigma_g\) cannot mimic within-person, event-timed changes tied to fertility (see below), which helps separate it from \(\gamma\).

2.3 c. Fertility Taste Shifters: \(\gamma_s\) (Exclusion)

Fertility \(F_{it}\) enters utility only (not wages/offers). Around fertility events:

  • Shares and transitions shift due to acceptance, not arrival:

\[ \frac{\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k+1)} {\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k)} \approx \frac{\text{acceptance}(s'|F=k+1)}{\text{acceptance}(s'|F=k)}. \]

  • These within-person changes identify \(\gamma_s\) because \(\Sigma_g\) is permanent and \(\lambda^g_s(\cdot)\) is excluded from \(F\).

If public-sector transitions and shares jump for women at childbirth while wages/offer proxies do not, then \(\gamma_G>0\) (family-friendly public sector) is identified—separate from \(\beta_{0,g}(s)\) and \(\Sigma_g\).

2.4 d. Offer Process: \(\lambda^g_s(X)\)

Given \(\gamma\) (acceptance shifts at fertility) and wage-side parameters, the levels of transitions identify the arrival probabilities:

\[ \Pr(s_{t+1}=s'\mid s_t=s) = \underbrace{\lambda^g_{s'}(X_{t+1})}_{\text{arrival}} \times \underbrace{\Pr(\text{accept } s'\mid O_{t+1}=s',\Xi_{t+1};\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g)}_{\text{acceptance}}. \]

Fertility-timed contrasts net out acceptance shifts, isolating \(\lambda^g_{s'}(\cdot)\) from the remaining variation in transitions.

2.5 e. Prices and Endowments

  • With appropriate normalization e.g. \(\mathbb E[\varepsilon_{is}\mid g]=0\), mean gender differences within a sector load on \(\beta_{0,f}(s)-\beta_{0,m}(s)\) (instead of \(\varepsilon\)).

  • Relative female advantage in public (vs private) can arise from either

    • higher \(\beta_{0,f}(PUB)-\beta_{0,m}(PUB)\) (skill price advantages), or

    • higher \(\sigma_{3,f}\) or favorable \(\rho_{23,f}\) (women’s endowment more aligned with \(PUB\) ie comparative advantages).

  • Will use mover wage gaps and fertility-timed acceptance shifts to separate these channels: \(\beta_0\) affects wage levels and mover gaps; \(\gamma\) affects acceptance around fertility; \(\Sigma_g\) affects dispersion and who moves, not event-timed shifts.

3 Estimation Method (SMM/SML Skeleton)

  1. Parameter blocks: \(\Theta=\{\beta_u,\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g,\lambda\}\) with normalizations (\(\beta_{0,m}(PRI)=0\), centered \(\varepsilon\)).

  2. Simulate offer draws \(O_{it}\) from \(\lambda^g_s(X_{it})\) over observed \(X_{it}\) paths.

  3. Solve choices period-by-period (myopic or with continuation via forward simulation) given \(F_{it}\), wages, and one-offer constraint.

  4. Match moments: sector shares, transitions (overall and by fertility states), within-sector wage means/variances, and mover wage gaps.

  5. Over-ID checks: out-of-sample moments (e.g., wage tails, acceptance at non-fertility events).

Possible counterfactuals to run using the estimated model:

4 Discussion (model mechanisms, identification, etc.)

This dynamic Roy-with-offers model hardwires the “one-offer-per-period” friction, forcing agents to compare one employment sector against home each period (home always feasible; staying in the same job requires a fresh offer). Wages depend only on permanent sector-specific skills and observables; there are no transitory wage shocks. Fertility is an observed, exogenous shifter of non-pecuniary sector utility only.

Identification separates:

The combination of these aspects of the data allows to separate the role of skill prices, skill comparative advantage and offer probabilities.

For example: The avg log wage difference for women who move from public to private is around -0.46 and -0.09 for women and men, respectively. Can use this information in conjunction with other aspects of the data. In order to distinguish the roles of skill price gaps vs. comparative advange/endowmentstructure, we use many data oments at once, nost jst te mean waegs ormean log wage differences for stayers and movers. In the state-depdendent offer model specified above, each block pushes differnet patterns in the data:

  1. Variance and shape of mover gains, not just the mean.

    • Price differences shift the mean of \(\Delta w\) almost uniformly, the variance and skew of \(\Delta w \mid s \rightarrow s',g\) are driven by the distribution of \(\varepsilon_{is'}- \varepsilon_{is}\) under selection.

    • Matching \(Var(\Delta w)\) by origin-destination-gender pins down \(\sigma_{gs}, \sigma_{gs'}\) and especially \(\rho_{g,ss'}\). A big mean gap with small variance points to \(\Delta \beta_{0,g}\); a big variance/left tail points to comparative advantage sorting.

  2. Rank-rank (or quantile-quantile) patterns for movers. - Regress post-move residual-wage rank in s’ on pre-move residual rank in s for movers of gender g. The slope identifies \(\rho_{g,ss'}\).

    • If prices drive the gap, the rank mapping is near 1 (high earners in PUB remain high in PRI).
    • If comparative advantage drives it, the slope flattens (or even inverts in tails): high PUB earners are not high in PRI.
  3. Stayer-leaver contrasts within origin sectors.

    • Compare residual wages (after removing \(\beta_{0,g}(s) + \beta_1 educ\) of stayers vs leavers from \(s\), by gender.

    • Strong negative selection of leavers (leavers much lower residuals in \(s\) than stayers) is evidence of comparative advantage. Pure price gaps do not create differential stayer-leaver sorting strength.

  4. Bidirectional flows: symmetry tests.

    • Jointly match \(PUB \rightarrow PRI\) and \(PRI \rightarrow PUB\) rates and gains by gender.

    • A price gap \(\Delta \beta_{0,g}(PRI,PUB)\) pushes both directions’ means (opposite sign, similar magnitude after accounting for selection).

    • Comparative advantage shows asymmetric selection and different dispersion across the two directions.

  5. Within-sector dispersion (pooled), by gender.

    • \(Var(\text{residual } lnW \mid s,g)\) pins down \(\sigma_{gs}^2\).

    • If women’s residual dispersion is notably different in \(PRI\) vs. \(PUB\), you can’t match both dispersions and the mover distributions with a pure price story; you’re forced to adjust \(\Sigma_g\) (including \(\rho_{23,g}\)).

  6. Event-time around fertility (the main exclusion):

    • Fertility shifts acceptance via \(\gamma_s\), not wages. After fertility events:
      • If sector shares shift and mean residual wages within sector change via composition, while mover gains’ mean doesn’t shift systematically, that’s a cceptance/selection, not prices.
      • Prices \(\beta_{0,g}\) are time-invariant here, so fertility timed movements help you attribute composition changes to \(\gamma\) and \(\Sigma_g\). not \(\beta_0\).
  7. State-dependent offer matrix moments.

    • The added tweak (offers depend on current sector) gives extra moments: staying hazards \(P(s \rightarrow s)\), directional flows \(P(PUB \rightarrow PRI)\) vs \(P(PRI \rightarrow PUB)\) by gender and fertility.

    • These depend strongly on \(\lambda_{s' \mid s,g}\) and \(\gamma\) but not on \(\beta_0\). Matching these pins down frictions and acceptance,leaving \(\beta_0\) to be identified by wage levels and mover means once selection is accounted for.

Practically, how does the estimator tell them apart?

In the structural SMM:

No single moment is decisive; the joint fit forces a decomposition: If you try to explain the women’s -0.46 mean with only a large \(\Delta \beta_{0,f}(PRI,PUB)\), you will misfit - the vairance of \(\Delta w\), - the mover rank-rank slope, (iii) stayer-leaver gaps, or (iv) the reverse \(PRI \rightarrow PUB\) flow so the optimizer shifts weight into \(\Sigma_f\) (comparative advantage) until all moments line up. Conversely, if you try to do it only with \(\Sigma_f\), you will miss within-sector mean levels and pooled wage gaps unless \(\beta_{0,f}(P)- \beta_{0,f}(G)\) adjusts.

Two concrete diagnostics:

Sources of gender differences and discussion of which moments move. What identifies prices \(\beta_0\), vs. endowments (\(\Sigma_g\)), and separates both from offers \(\lambda\) and tastes \(\gamma\). How to read the scenario results (ie what identifies what)

4.1 IDENTIFICATION OF GENDER DIFFERENCES

We want to distinguish (and stress-test potential observational equivalences between):

  1. Gender differences in sector-specific skill prices \(\beta_{0,g}(s)\) (e.g. a female public-sector premium).
  2. Gender differences in comparative advantage in permanent skills \(\Sigma_g\) (e.g. women have stronger PUB-specific advantage or lower cross-sector correlation).
  3. Gender differences in family-friendliness/child-related utility \(\gamma_g(s)\) (e.g. PUB becomes more attractive for women when kids arrive).
  4. Gender differences in offer arrival rates \(\lambda_{g}(s)\) (e.g. women receive relatively more PUB offers).

To identify each mechanism, we want scenarios where only one block of parameters differs by gender at a time, plus a few “confusability” scenarios that intentionally generate similar public-sector shares with different underlying channels.


5 MODEL SOUTION, SIMULATION, AND COUNTERFACTUALS

5.1 Baseline

  • Baseline can be chosen acccording to what comparisons are needed.

  • If baseline is chosen so that men and women are identical, it makes every other scenario interpretable as a one-dimensional deviation.

  • \(\beta_{0,m}(s)=\beta_{0,f}(s)\) for all \(s\).

  • \(\Sigma_m=\Sigma_f\) with moderate dispersion and zero correlation (no comparative advantage by construction).

  • \(\lambda_m=\lambda_f\) (identical offer rates).

  • \(\gamma_m=\gamma_f=\mathbf{0}\) (no family-friendliness).

  • If baseline is chosen so that they differ in various dimensions, then counterfactuals where each dimension is equated one at a time or in a sequence by adding each element in a sequence, then thecontribution each compoennt can be quantified.


5.2 Scenario list

5.2.1 Scenario CF1_price_gap

Pure “prices” (skill-price intercepts) channel

  • Skill prices are changed so that men have a PRI sector premium \(\beta_{0,m}(PRI) > \beta_{0,m}(s)\) for all s.

  • Skill prices are changed so that women have PUB sector premium \(\beta_{0,f}(PUB) > \beta_{0,f}(s)\) for all s.

  • All else identical to baseline

-Answers the following question: Can the model attribute higher female PUB employment AND a smaller PUB wage gap to a price difference?


5.2.2 Scenario CF2_disp

Can a higher upper tail of women’s PUB wages (and greater persistence) can be attributed to \(\sigma_{f}(\text{PUB})\).

Check by changing only women’s PUB skill dispersion (higher \(\sigma\) in PUB). Correlation is not canged

5.2.3 Scenario CF3_compadv

Answers the following question: Can sorting into PUB and PUB wage moments can be generated by selection on permanent skills without any price premium.

Check by changing only the correlation structure for women (stronger comparative advantage).

Dispersion of women’s sector-specific skill distribution remains unchanged Correlation between women’s skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)

5.2.4 Scenario CF4_strong_compadv

(We can also combine dispersion + negative correlation if we want even sharper comparative advantages. See below.) STRONG COMP ADV Dispersion of women’s sector-specific skill distribution for only the public sector (i.e. sigma) is increased Correlation between women’s skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)

``dispersion + negative correlation’’ produces STRONGER comparative advantage because: Comparative advantage in the Roy setup comes from relative rankings across sectors, not just levels. Two ingredients sharpen sorting:

  1. Higher dispersion in one sector (higher \(\sigma\))

    • \(\rightarrow\)creates a thicker right tail

    • \(\rightarrow\) more extreme specialists.

  2. Negative cross-sector correlation (\(\rho<0\))

    • \(\rightarrow\) being good in PUB implies being bad in PRI (and vice versa)

    • \(\rightarrow\) stronger specialization

    • \(\rightarrow\) stronger selection

    • \(\rightarrow\) lower mobility

    • \(\rightarrow\) steeper sorting moments.

  3. When we combine both:

    • \(\rightarrow\) more mass of workers who are extremely PUB-specializaed and unlikely to leave

    • \(\rightarrow\) stronger persistence, bigger stayer-leaver gap, flatter rank-rank slope.

Concrete parameter example

Men (benchmark, weak comparative advantage)

sig <- c(0.30, 0.30, 0.30)

rho <- c(0.00, 0.00, 0.00) # no correlation \(\rightarrow\) weak specialization

  • This gives similar dispersion in all sectors.

  • Skills independent \(\rightarrow\) people can be good everywhere \(\rightarrow\) weak sorting.

Women (strong comparative advantage in PUB)

sig <- c(0.30, 0.30, 0.50) # higher dispersion in PUB

rho <- c(-0.30, -0.40, -0.35) # negative correlations across sectors

  • \(\sigma(PUB)=0.50\) \(\rightarrow\) bigger right tail \(\rightarrow\) some people are very good in PUB.

  • \(\rho<0\) \(\rightarrow\) PUB-skill negatively related to PRI/SUB skill.

  • Workers who are in PUB tend to be weak in PRI \(\rightarrow\) strong specialization.

Compared to the ``equal \(\sigma\) and zero \(\rho\) case’’:

- PUB emp share (women) *rises slightly (more specialization)*. 

- PUB staying hazard *rises strongly*

- Stayer-leaver residual gap (PUB) *increases a lot*.

- Rank-rank slope PUB $\rightarrow$ PRI *falls sharply (stronger comparative adv.)*. 

- Mover gains aare *asymmetric, more dispersion*. 

- Wage density (PUB) gives rise to a fatter upper tail. 

The key diagnostic is: Rank–rank slope falls + stayer–leaver gap rises = covariance structure, not prices.

5.2.5 Scenario CF5_family

Can higher female PUB shares can arise without a wage premium, and whether the fertility-conditioned moments pick it up.

Check by changing only \(\gamma_f(\text{PUB})>0\) (women value PUB more when fertility changes / kids arrive). This is the same idea as FR0/FR1.

Pure “family-friendliness” channel Women’s child utility for the public sector is increased

5.2.6 Scenario CF6_offers

Can female PUB employment be driven by the search environment without changing wages directly (except through selection)?

Check by changing only women’s offer arrival rates (higher \(\lambda_f\) in PUB, compensating reductions elsewhere to keep overall offer intensity similar). Women’s offer arrival rate from the PUB sector is increased

5.2.7 Scenario EQ1_PRICES

Equate skill prices between men and women

5.2.8 Scenario EQ2_ENDOWMENTS

Equate the distributin of skills between men and women

5.2.9 Scenario EQ3_OFFERS

Equate arrival rates of offers between men and women

5.2.10 Scenario EQ4_FAMILY

Equate sector-specific child utilities between men and women


5.2.11 Some potential “same shares, different mechanism” stress tests

These are deliberately redundant in aggregate public-sector shares but should differ in wage distributions, mover gains, and fertility splits. They help show that your moment set can tell the mechanisms apart:

  1. Skill prices vs. Family stuff: Compare CF1_price_gap vs CF5_family.

Intuition:

  • Price gap should shift PUB wages (level shift in PUB log-wage density, mover gains PRI→PUB), and affect both “kids-change” and “no-change” groups similarly.

  • Family friendliness should mainly show up in fertility-conditioned transitions and staying hazards, with smaller wage shifts.

  1. Skill prices vs. Offers

Compare CF1_price_gap vs CF6_offers.

Intuition:

- Offers reallocate sector shares through entry/flows, but do not create
  a direct wage intercept shift. Wage changes are composition-driven.

- Price changes shift wages even holding composition fixed.
  1. Comparative Advantages vs. Offers

Compare CF4_strong_compadv vs CF6_offers.

Intuition:

  • Comparative advantage should show up strongly in rank–rank slopes and stayer–leaver residual gaps (selection moments).

  • Offer-rate shifts should show up mostly in transitions and sector shares, with weaker rank–rank implications.

5.3 BASIC DEFINITIONS AND FUNCTIONS FOR THE SOLUTION/SIMULATION

HOME    <- "HME"
sectors <- c("SUB","PRI","PUB")
R_states <- c(HOME, sectors)  
state_levels <- c("HME","SUB","PRI","PUB")


library(gt)

## ---- State-dependent (current-sector) offer process.
## tilt the offered sector toward the CURRENT r_t by adding a "stay bonus"
#stay_bonus <- 0.20  # extra mass for re-offer from current sector
## Keep "none" prob equal to the original base-none, and renormalize across S,G,P.
##  A1) Build conditional offer matrices with a given base + stay_bonus
## returns a list: probs[[r]] is a named vector over c(S,G,P,None) for origin r ∈ {H,S,G,P}
build_lambda_conditional <- function(lambda, bonus) {
  out <- list()
  base_sum <- sum(lambda)
  #print(base_sum)
  #print(paste("base_sum:", base_sum));
  
  # From HOME: use base probabilities (no tilt)
  out[[HOME]] <- c(lambda, None = 1 - base_sum)
  
  # From each sector: add stay bonus to that sector, renormalize offers to sum to base_sum
  for (r in sectors) {
    prob <- lambda
    prob[r] <- prob[r] + bonus
    prob <- prob * (base_sum / sum(prob))  # renormalize to sum to base_sum
    out[[r]] <- c(prob, None = 1 - sum(prob))
  }
  out
  #print(paste("out:", out[["PUB"]] ));
}


cov_matrix <- function(sig, rho) {
  k <- length(sig)
  if (k < 2) stop("Length of 'sig' must be at least 2.")
  if (any(sig <= 0)) stop("'sig' must contain strictly positive values.")
  
  # correlation matrix from scalar rho
  cor_matrix_scalar <- function(rho, k) {
    if (rho <= -1 || rho >= 1)
      stop("Scalar rho must be strictly between -1 and 1.")
    M <- matrix(rho, k, k)
    diag(M) <- 1
    M
  }
  
  # correlation matrix from vector rho
  cor_matrix_vector <- function(rhos, k) {
    needed <- k*(k-1)/2
    if (length(rhos) != needed)
      stop(sprintf("For k=%d, rho vector must be length %d.", k, needed))
    if (any(rhos <= -1 | rhos >= 1))
      stop("All correlations must be strictly between -1 and 1.")
    
    M <- matrix(1, k, k)
    M[upper.tri(M)] <- rhos
    M[lower.tri(M)] <- t(M)[lower.tri(M)]
    M
  }
  
  # choose scalar or vector case
  R <- if (length(rho) == 1) {
    cor_matrix_scalar(rho, k)
  } else {
    cor_matrix_vector(rho, k)
  }
  
  # Σ = D R D
  D <- diag(sig)
  D %*% R %*% D
}

# Check for positive semidefiniteness
is_valid_corr <- function(rho) {
  # rho = c(r12, r13, r23)
  if (length(rho) != 3) stop("rho must be a vector of length 3")
  
  r12 <- rho[1]
  r13 <- rho[2]
  r23 <- rho[3]
  
  R <- matrix(c(
    1,    r12,  r13,
    r12,  1,    r23,
    r13,  r23,  1
  ), nrow = 3, byrow = TRUE)
  
  lambda_min <- min(eigen(R)$values)
  valid <- lambda_min > -1e-8
  
  if (!valid) {
    warning(sprintf(
      "Invalid correlation triple: smallest eigenvalue = %.4g",
      lambda_min
    ))
  }
  
  # Return nicely formatted result
  return(
    data.frame(
      r12 = r12,
      r13 = r13,
      r23 = r23,
      valid = valid
    )
  )
}


# Examples
rho <- c(0.3, 0.4, 0.2)
#is_valid_corr(rho)     # TRUE
rho <- c(0.9, 0.9, -0.9)    
#is_valid_corr(rho)     # FALSE
rho <- c(1, -1, 1)    
#is_valid_corr(rho)     # FALSE
rho <- c(0.7, -0.5, -0.2) 
#is_valid_corr(rho)     # TRUE





rmvnorm <- function(n, mu, Sigma) {
  Z <- matrix(rnorm(n * length(mu)), nrow = n)
  L <- chol(Sigma)
  sweep(Z %*% L, 2, mu, `+`)
}

5.4 CHOOSE WHICH BASELINE AND COUNTERFACTUAL SCENARIOS TO RUN

  • Key parameters that vary across counterfactual scenarios:

    • Sector- and gender-specific skill prices.

    • Joint distribution of sector-specific skills (gender-specific).

    • Sector- and gender-specific non-pecuniary utilities which differ by kids and marital status.

    • Arrival rates of offers (gender-specific).

  • See the function make_scenario to see what parameter values each scenario scn corresponds to.

  • In the code block below, you can change:

    • Baseline parameter values

    • Parameter values for each counterfactual scenario

    • Scenario list to compare

IN ORDER TO RUN THE CODE WITH ANY CHANGES YOU’D LIKE TO IMPLEMENT, OPEN THE CORRESPONDING .RMD FILE IN RSTUDIO, MAKE YOUR CHANGES, SAVE AND THEN CLICK ON KNIT AT THE TOP. THIS WILL RECOMPILE EVERYTHING AND CREATE A NEW HTML OR PDF FILE, WHICH SHOULD AUTOMATICALLY APPEAR ON VIEWER ON THE RIGHT PANEL.

    #Choose the number of periods 
    T_per <- 10
    
    #Choose the number of people to simulate (has to be at least 2-3K)
    N     <- 4000
    
    #Choose baseline
    #STARTSAME: Everything the same between men and women
    #STARTDIFF: Everything different between men and women
    #You can choose the value of WHICH_SCENARIO (whether you want to start from a base where
    #men and woen are equal or not)
    WHICH_SCENARIO <- "STARTSAME" #other alternatives: STARTDIFF
    
    
    #Depending on whether WHICH_SCENARIO is STARTSAME or STARTDIFF, assign parameter values for the baseline accordingly
    #You can change the parameter values for each (i.e. can change these base values beta0, sigma0 etc.) 
    if (WHICH_SCENARIO == "STARTSAME"){
        beta0 <- tibble::tribble(
          ~g, ~s, ~par,
          "men","SUB",  0.20,
          "men","PRI",  0.20,
          "men","PUB",  0.20,
          "fem","SUB",  0.20,
          "fem","PRI",  0.20,
          "fem","PUB",  0.20
        )
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        beta0 <- tibble::tribble(
          ~g, ~s, ~par,
          "men","SUB", 0.20,
          "men","PRI", 0.40,
          "men","PUB", 0.20,
          "fem","SUB", 0.20,
          "fem","PRI", 0.20,
          "fem","PUB", 0.40
          )
      }
      
      
     if (WHICH_SCENARIO == "STARTSAME"){
        sig_m0 <- c(0.30, 0.30, 0.30)
        rho_m0 <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
        sig_f0 <- c(0.30, 0.30, 0.30)
        rho_f0 <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        # men at base: no correlation → weak specializat-0.40,-0.35) # (12, 13, 23)
        sig_m0 <- c(0.30,0.50,0.30)
        rho_m0 <- c(0.0,0.0,0.0)      # (12, 13, 23)
        # women at base: strong comp. adv. i.e.higher dispersion inf PUB and negative corr across sectors
        sig_f0 <- c(0.30,0.30,0.30)
        rho_f0 <- c(0.0,0.0,-0.40) # (12, 13, 23)
    }
    #  is_valid_corr(rho_m0)
    #  is_valid_corr(rho_f0)
      
      
      
      ## Fertility shifters (non-pecuniary) by sector incl. home (as before)
    if (WHICH_SCENARIO == "STARTSAME"){
        gamma_m0 <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
        gamma_f0 <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        gamma_m0 <- c(HME=0, SUB=0, PRI=0, PUB=0)
        gamma_f0 <- c(HME=0, SUB=0, PRI=0, PUB=0.5)
    }
      
      
    if (WHICH_SCENARIO == "STARTSAME"){
        lambda_m0 <- c(SUB = 0.30, PRI = 0.30, PUB= 0.30)  # men
        lambda_f0 <- c(SUB = 0.30, PRI = 0.30, PUB = 0.30)  # women
        stay_bonus0 <- 0.40
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        lambda_m0 <- c(SUB=0.30, PRI=0.45, PUB=0.30)
        lambda_f0 <- c(SUB=0.30, PRI=0.30, PUB=0.45)
        stay_bonus0 <- 0.40
      }
    

      #Depending on whether WHICH_SCENARIO is STARTSAME or STARTDIFF, pick scenario_list
      #scenario_list is the list of counterfactuals that the code will run and compare
      #If WHICH_SCENARIO IS STARTSAME, then the scenario list consist of scenarios where each 
      #parameter group is changed one at a time. 
      #If WHICH_SCENARIO IS STARTDIFF, then the scenario list consist of scenarios where each 
      #parameter group is equated one at a time. 
      #You can change the specififc parameter values thateach scenario corresponds to (see  
      # make_scenario below)
     if (WHICH_SCENARIO == "STARTSAME"){
        scenario_list <- c(
          "BASELINE",
          "CF1_price_gap",
          "CF2_disp",
          "CF3_compadv",
          "CF4_strong_compadv",
          "CF5_family",
          "CF6_offers"
        )
      } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        scenario_list <- c(
          "BASELINE",
          "EQ1_PRICES",
          "EQ2_ENDOWMENTS",
          "EQ3_OFFERS",
          "EQ4_FAMILY"
        )
      }
    



## Scenario parameter builders
## Define the parameter values for each scenario
make_scenario <- function(scn){
  out <- list(label = scn)
  
  #First assign the baseline parameters to the vector `out`
  out$beta0_df <- beta0
  out$Sigma_m <- cov_matrix(sig_m0, rho_m0)
  out$Sigma_f <- cov_matrix(sig_f0, rho_f0)
  out$lambda_m_cond <- build_lambda_conditional(lambda_m0, stay_bonus0)
  out$lambda_f_cond <- build_lambda_conditional(lambda_f0, stay_bonus0) 
  out$gamma_m <- gamma_m0
  out$gamma_f <- gamma_f0
  
  
  if (scn == "BASELINE"){
    #No need to do anything here. out wil just be wht's defined above (ie baseline)
  }
  
  else if (scn == "CF1_price_gap"){
    #Skill prices are changed so that men have a private sector premium $\(\beta_{0,g}(s)\).
    #Skill prices are changed so that women have a public sector premium $\(\beta_{0,g}(s)\). 
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~ 0.20,
      g=="men" & s=="PRI" ~ 0.40,
      g=="men" & s=="PUB" ~ 0.20,
      g=="fem" & s=="SUB" ~ 0.20,
      g=="fem" & s=="PRI" ~ 0.20,
      g=="fem" & s=="PUB" ~ 0.40
    ))
  }
  

  else if (scn == "CF2_disp"){
    #Dispersion of women's sector-specific skill distribution for only the public sector (i.e. sigma) is increased
    #Correlation remains unchanged  
    sig_f <- c(0.30, 0.30, 0.50)  # higher PUB dispersion for women
    rho_f <- rho_f0
    out$Sigma_f <- cov_matrix(sig_f, rho_f)
  }
  
    
  else if (scn == "CF3_compadv"){
    #Dispersion of women's sector-specific skill distribution remains unchanged
    #Correlation between women's skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
    sig_f <- sig_f0 #c(0.30, 0.30, 0.30)
    rho_f <- c(0.0, 0.0, -0.40)
    is_valid_corr(rho_f)
    out$Sigma_f <- cov_matrix(sig_f, rho_f)
  }
  
  else if (scn == "CF4_strong_compadv"){
    #Dispersion of women's sector-specific skill distribution for only the public sector (i.e. sigma) is increased
    #Correlation between women's skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
    sig_f <- c(0.30, 0.30, 0.50)  # higher PUB dispersion for women
    rho_f <- c(0.0, 0.0, -0.40)
    out$Sigma_f <- cov_matrix(sig_f, rho_f)
  }
  
  else if (scn == "CF5_family"){
    #Women's child  utility for the public sector is increased
    out$gamma_f <- c(HME=0.00, SUB=0.00, PRI=0.00, PUB=1.00)
  }
  
  else if (scn == "CF6_offers"){
    #Women's offer arrival rate from the PUB sector is increased
    lambda_f <- c(SUB=0.20, PRI=0.20, PUB=0.80)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus0)
  } 
  
  else if (scn == "EQ1_PRICES"){
    #Women's sector specific skill prices are set equal to that of men's
    #out$beta0_df <- out$beta0_df |>
    #  dplyr::group_by(s) |>
    #  dplyr::mutate(par = mean(par)) |>
    #  dplyr::ungroup()
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~ 0.20,
      g=="men" & s=="PRI" ~ 0.40,
      g=="men" & s=="PUB" ~ 0.20,
      g=="fem" & s=="SUB" ~ 0.20,
      g=="fem" & s=="PRI" ~ 0.40,
      g=="fem" & s=="PUB" ~ 0.20
    ))
  }
  
  else if (scn == "EQ2_ENDOWMENTS"){
    #Women's correlation matrix is set equal to that o fmen
    out$Sigma_f <- out$Sigma_m
  }
  
  else if (scn == "EQ3_OFFERS"){
    #Women's offer arrival rates are set equal to that of men 
    out$lambda_f_cond <- out$lambda_m_cond
  }
  
  else if (scn == "EQ4_FAMILY"){
    #Women's sector-specific child utilty for all sectors is set equal to that of men 
    out$gamma_f <- out$gamma_m
  }
  
  
  else {
    stop("Unknown scenario label")
  }
  out
}

#Choose what gender to do the simulations for.
#This will usually just be "both" unless you want to only simulate women or men
RUN_GENDER <- "both"   # other alternatives: "men", "fem", "both"
if (RUN_GENDER == "both") {
  genders <- c("men", "fem")
} else if (RUN_GENDER == "men") {
  genders <- c("men")
} else if (RUN_GENDER == "fem") {
  genders <- c("fem")
}

5.5 SOLUTION AND SIMULATION SUBROUTINES

Dynamic Roy Model with state-dependent offers (experience removed; permanent sector skills only)

  • 3 employment sectors

  • Home alternative always available

  • One offer per period (or none)

  • Offer arrival probabilities depend on current sector

  • Wages fixed per individual-sector (permanent epsilon + observables)

  • Fertility and marriage shifts preferences (not wages, not offers)

  • Finite horizon, solved by backward induction

## =========================================================
## Dynamic Roy with state-dependent offers
## (Experience removed; permanent sector skills only)
## ---------------------------------------------------------
## - 3 employment sectors
## - Home H always available
## - One offer per period (or none)
## - Offer arrival PROBABILITIES depend on CURRENT SECTOR (state)
## - Wages fixed per individual-sector (permanent epsilon + observables)
## - Fertility shifts utility (not wages, not offers)
## - Finite horizon, solved by backward induction
## =========================================================

set.seed(42)
suppressPackageStartupMessages({
  library(dplyr); library(tidyr); library(ggplot2); library(kableExtra)
})


## =========================================================
## COMMON RANDOM NUMBERS (CRN): master uniforms
## =========================================================

#set.seed(42)

# Max number of offer categories (SUB,PRI,PUB,None)
K_offer <- length(sectors) + 1  

# Uniforms for: wages (eps), offers, fertility, marriage
U_eps  <- matrix(runif(N * length(sectors)), N, length(sectors))
U_F    <- matrix(runif(N * T_per), N, T_per)
U_M    <- matrix(runif(N * T_per), N, T_per)
U_off  <- matrix(runif(N * T_per), N, T_per)
U_initM <- runif(N)
U_initF <- runif(N)
U_gender <- runif(N)
U_educ   <- runif(N)


## =========================================================
## FIXED PARAMETERS AND TRANSITION MATRICES
## =========================================================

## Fertility: exogenous Markov
F_states <- 0:5
Q_F <- matrix(0, 6, 6, dimnames=list(F_states, F_states))
Q_F["0","0"] <- 0.95;  Q_F["0","1"] <- 0.05; Q_F["0","2"] <- 0.00; Q_F["0","3"] <- 0.00; Q_F["0","4"] <- 0.00; Q_F["0","5"] <- 0.00
Q_F["1","0"] <- 0.00;  Q_F["1","1"] <- 0.96; Q_F["1","2"] <- 0.04; Q_F["1","3"] <- 0.00; Q_F["1","4"] <- 0.00; Q_F["1","5"] <- 0.00
Q_F["2","0"] <- 0.00;  Q_F["2","1"] <- 0.00; Q_F["2","2"] <- 0.97; Q_F["2","3"] <- 0.03; Q_F["2","4"] <- 0.00; Q_F["2","5"] <- 0.00
Q_F["3","0"] <- 0.00;  Q_F["3","1"] <- 0.00; Q_F["3","2"] <- 0.00; Q_F["3","3"] <- 0.98; Q_F["3","4"] <- 0.02; Q_F["3","5"] <- 0.00
Q_F["4","0"] <- 0.00;  Q_F["4","1"] <- 0.00; Q_F["4","2"] <- 0.00; Q_F["4","3"] <- 0.00; Q_F["4","4"] <- 0.99; Q_F["4","5"] <- 0.01
Q_F["5","0"] <- 0.00;  Q_F["5","1"] <- 0.00; Q_F["5","2"] <- 0.00; Q_F["5","3"] <- 0.00; Q_F["5","4"] <- 0.00; Q_F["5","5"] <- 1.00

#Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
#Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05   # marriage probability
#Q_M["1","1"] <- 1.00                         # married stays married


## Markov transition Q_M(m'|m)
## Example: small probability of marriage when single; absorbing once married.
M_states <- 0:1          # 0 = single, 1 = married
Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05   # Pr(marry)
Q_M["1","1"] <- 1.00                         # married absorbing
## (can freely change this to allow divorce.)

## Marital-status taste shifters (parallel to gamma for fertility)
## delta_s_m  where s ∈ {HME, SUB, PRI, PUB} and m ∈ {0,1}
## These are non-pecuniary utility additions.
delta_men <- c(HME=0, SUB=0, PRI=0, PUB=0)   # men
delta_fem <- c(HME=0, SUB=0, PRI=0, PUB=0)   # women


## Draw next M
#draw_next_M <- function(M_cur){
#   sample(M_states, size=1, prob = Q_M[as.character(M_cur),])
#}

p_educ  <- 0.5   # Pr(educ=1)
beta_u <- 1.0    # Utility scale and home intercept
beta   <- 0.95   # discount factor
beta1  <- 0.15   # Education return (log-wage)



## ---------------------------------------------------------
## START SOLVING ETC. 
## ---------------------------------------------------------

## Keep the same N, T_per, genders, sectors, HOME, beta1, beta_u, beta, Q_F, etc.
## We'll reuse global g_vec, educ_vec for comparability across scenarios.

draw_eps_by_gender_CRN <- function(g_vec, Sigma_m, Sigma_f){
  Nloc <- length(g_vec)
  eps_mat <- matrix(NA_real_, Nloc, length(sectors),
                    dimnames=list(NULL, sectors))
  
  L_m <- chol(Sigma_m)
  L_f <- chol(Sigma_f)
  
  Z <- qnorm(U_eps)   # SAME underlying shocks across scenarios
  
  for (i in seq_len(Nloc)){
    if (g_vec[i] == "men"){
      eps_mat[i, ] <- Z[i, ] %*% L_m
    } else {
      eps_mat[i, ] <- Z[i, ] %*% L_f
    }
  }
  eps_mat
}

##  A3) Precompute per-person log wages for all sectors
## A3) Precompute per-person log wages for all sectors (robust, base-R)
## Precompute per-person log wages by sector (fixed forever)
precompute_logW_all <- function(g_vec, educ_vec, eps_mat, beta0_df){
  stopifnot(is.matrix(eps_mat),
            nrow(eps_mat) == length(g_vec),
            all(colnames(eps_mat) %in% sectors))
  
  Nloc <- length(g_vec)
  logW_all <- matrix(NA_real_, nrow = Nloc, ncol = length(sectors),
                     dimnames = list(NULL, sectors))
  
  # helper: fetch scalar beta0 for (g,s)
  get_beta0 <- function(gg, ss){
    idx <- (beta0_df$g == gg) & (beta0_df$s == ss)
    b <- beta0_df$par[idx]
    if (length(b) != 1L) stop(sprintf("beta0_df mismatch for g=%s, s=%s (found %d rows)", gg, ss, length(b)))
    b
  }
  
  for (ss in sectors){
    col <- match(ss, sectors)
    beta0_vec <- vapply(seq_len(Nloc), function(i) get_beta0(g_vec[i], ss), numeric(1))
    logW_all[, col] <- beta0_vec + beta1 * educ_vec + eps_mat[, col]
  }
  logW_all
}

## ----------------------------
##  A4) Solve V for one person *given scenario-specific objects*
## 4) Value Function (Backward Induction, person-specific)
## ----------------------------
## State: (t, r ∈ {H,S,G,P}, F ∈ {0,1,2})
## V_{T+1}(.) = 0. For t=T,...,1 compute V_t(r,F) = E_{O|r} [ max{ V_H, V_emp(O) } ]
## where V_H = u_home(F) + β E_F' V_{t+1}(H, F'), and
##       V_emp(s) = u_employed(logW_s, s, F) + β E_F' V_{t+1}(s, F').
solve_value_person_scn <- function(i, logW_all, g_vec, gamma_m,gamma_f,
                                   lambda_m_cond, lambda_f_cond){
  g <- g_vec[i]
  lambda_cond <- if (g=="men") lambda_m_cond else lambda_f_cond
  gamma <- if (g=="men") gamma_m else gamma_f
  delta <- if (g=="men") delta_men else delta_fem
  logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
  t_names <- as.character(1:(T_per+1)) # time index: 1..(T+1); V[t = T+1, ., .] = 0 (terminal)
  #V <- array(0, dim=c(T_per+1, length(R_states), length(F_states)),
  #           dimnames=list(t=t_names, r=R_states, F=as.character(F_states)))
  V <- array(
    0,
    dim=c(T_per+1, length(R_states), length(F_states), length(M_states)),
    dimnames=list(
      t=t_names,
      r=R_states,
      F=as.character(F_states),
      M=as.character(M_states)
    )
  )
  
  # backward induction: tt = T, ..., 1
  for (tt in T_per:1){
    for (r in R_states){
      p_offers <- lambda_cond[[r]]
      for (Fcur in F_states){
        for (Mcur in M_states){
          
          ## continuation if choose Home
          EV_next_HME <- sum(
            Q_F[as.character(Fcur), ] *
              V[as.character(tt+1), HOME, , as.character(Mcur)]
          )
          
          V_emp <- c(SUB=-Inf, PRI=-Inf, PUB=-Inf)
          
          for (s in sectors){
            EV_next_s <- sum(
              Q_F[as.character(Fcur), ] *
                V[as.character(tt+1), s, , as.character(Mcur)]
            )
            V_emp[s] <- beta_u * logW_i[s] +
              Fcur * gamma[s] +    #ifelse(Fcur >= 1, gamma[s], 0) +
              ifelse(Mcur == 1, delta[s], 0) +
              beta * EV_next_s
          }
          
          V_HME <- 0 +
            Fcur * gamma["HME"] +   #ifelse(Fcur >= 1, gamma["HME"], 0) +
            ifelse(Mcur == 1, delta["HME"], 0) +
            beta * EV_next_HME
          
          EV <- p_offers["None"] * V_HME +
            sum(p_offers[sectors] *
                  pmax(V_HME, V_emp[sectors]))
          
          V[as.character(tt), r, as.character(Fcur), as.character(Mcur)] <- EV
          
        }  # end Mcur
      }  # end Fcur
    }  # end r
  }  # end tt
  
}




## ----------------------------
## Simulation of optimal paths
## ----------------------------
##  Simulate optimal paths *given* V_all and scenario objects
## Policy: Given (t, r, F) and realized offer O:
## choose O if V_emp(O) >= V_H; else choose Home.
## Offers drawn from λ_{·|r,g}; F evolves via Q_F.
simulate_paths_scn <- function(
    V_all, logW_all, g_vec, F0, gamma_m, gamma_f,
    lambda_m_cond, lambda_f_cond
){
  
  # helper: draw one offer given gender and current attachment r
  #draw_offer_cond <- function(g, r){
  #  probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
  #  draw  <- sample(c(sectors, "None"), size = 1, prob = probs)
  #  if (draw == "None") NA_character_ else draw
  #}
  
  draw_offer_cond_CRN <- function(g, r, i, t){
    probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
    u <- U_off[i, t]
    idx <- which(u <= cumsum(probs))[1]
    draw <- names(probs)[idx]
    if (draw == "None") NA_character_ else draw
  }
  
  out_list <- vector("list", length = N)
  
  for (i in seq_len(N)) {
    g_i    <- g_vec[i]
    educ_i <- educ_vec[i]
    V_i    <- V_all[[i]]
    logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
    
    # start-of-panel state for person i
    r_cur  <- HOME
    F_cur  <- F0[i]
    M_cur  <- M0[i]
    
    # storage for this person's path
    rows <- vector("list", length = T_per)
    
    for (tt in 1:T_per) {
      # gender-specific gamma for THIS person
      gamma <- if (g_i == "men") gamma_m else gamma_f
      
      # draw an offer conditional on current attachment r_cur
      #offer <- draw_offer_cond(g_i, r_cur)
      offer <- draw_offer_cond_CRN(g_i, r_cur, i, tt)
      
      # continuation value if choose Home now
      EV_next_HME <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), HOME, ])
      #V_HME <- 0 + ifelse(F_cur >= 1, gamma[["HME"]], 0) + beta * EV_next_HME
      V_HME <- 0 + F_cur * gamma[["HME"]] + beta * EV_next_HME
      
      # default choice is Home
      choice <- HOME
      employed <- 0L
      sector   <- NA_character_
      logW_chosen <- NA_real_
      
      # if there is an offer, compare employed value vs home
      if (!is.na(offer)) {
        s <- offer
        EV_next_s <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), s, ])
        V_emp_off <- beta_u * logW_i[s] +
          F_cur * gamma[[s]] + #ifelse(F_cur >= 1, gamma[[s]], 0) +
          beta * EV_next_s
        
        if (V_emp_off >= V_HME) {
          choice <- s
          employed <- 1L
          sector <- s
          logW_chosen <- logW_i[s]
        }
      }
      
      # record this period
      rows[[tt]] <- data.frame(
        i = i, t = tt, g = g_i, educ = educ_i, F = F_cur, M=M_cur,      
        r = r_cur, offer = offer, choice = choice,
        employed = employed, sector = sector, logW = logW_chosen,
        stringsAsFactors = FALSE
      )
      
      # move to next period
      r_cur <- choice
      #if (tt < T_per) {
      #  F_cur <- draw_next_F(F_cur)
      #  M_cur <- draw_next_M(M_cur)
      #}
      if (tt < T_per) {
        F_cur <- draw_next_F_CRN(F_cur, i, tt)
        M_cur <- draw_next_M_CRN(M_cur, i, tt)
      }
      
      
    }
    
    out_list[[i]] <- do.call(rbind, rows)
  }
  
  dt <- do.call(rbind, out_list)
  rownames(dt) <- NULL
  dt
}

5.6 RUN BASELINE AND COUNTERFACTUAL SCENARIOS AND SAVE OUTPUT

## We'll keep the same g_vec, educ_vec, F0 across scenarios (comparability)
#g_vec <- sample(genders, N, replace=TRUE)
#educ_vec <- rbinom(N, 1, p_educ)
#F0 <- sample(F_states, size=N, replace=TRUE, prob=c(0.8, 0.18, 0.02))
#M0 <- sample(M_states, size=N, replace=TRUE, prob=c(0.90, 0.10))
#draw_next_F <- function(F){ sample(F_states, 1, prob=Q_F[as.character(F), ]) }
#draw_next_M <- function(M){ sample(M_states, 1, prob=Q_M[as.character(M),])}


## ---- deterministic draws from master uniforms ----
g_vec <- ifelse(U_gender <= 0.5, "men", "fem")
educ_vec <- as.integer(U_educ <= p_educ)
F0 <- sapply(U_initF, function(u)
  F_states[ which(u <= cumsum(c(0.8, 0.18, 0.02)))[1] ])
M0 <- sapply(U_initM, function(u)
  M_states[ which(u <= cumsum(c(0.90, 0.10)))[1] ])

draw_next_F_CRN <- function(F_cur, i, t){
  p <- Q_F[as.character(F_cur), ]
  u <- U_F[i, t]
  F_states[ which(u <= cumsum(p))[1] ]
}

draw_next_M_CRN <- function(M_cur, i, t){
  p <- Q_M[as.character(M_cur), ]
  u <- U_M[i, t]
  M_states[ which(u <= cumsum(p))[1] ]
}


run_scenario <- function(scn_label){
  scn <- make_scenario(scn_label)
  
  
  ## Draw epsilons (only if needed)
#  if (scn$redraw_eps){
#    eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
#  } else {
#    ## reuse the eps_mat from your baseline if present; else draw with base Sigmas
#    if (exists("eps_mat")) {
#      eps_mat_scn <- eps_mat
#    } else {
#      #eps_mat_scn <- draw_eps_by_gender(g_vec, Sigma_m, Sigma_f)
#      eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
#    }
#  }
  
  eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
  logW_all_scn <- precompute_logW_all(g_vec, educ_vec, eps_mat_scn, scn$beta0_df)
  
  ## Solve V for each person
  ## Solve V for everyone (vectorized over i)
  ## (This is the heavy step: 5000 people × (T=10 × r=4 × F=3) is very feasible.)
  V_all_scn <- lapply(1:N, function(i)
    solve_value_person_scn(i, logW_all_scn, g_vec, scn$gamma_m,scn$gamma_f,
                           scn$lambda_m_cond, scn$lambda_f_cond))
  
  ## Simulate paths
  dt_scn <- simulate_paths_scn(V_all_scn, logW_all_scn, g_vec, F0, scn$gamma_m,scn$gamma_f,
                               scn$lambda_m_cond, scn$lambda_f_cond)
  dt_scn$scenario <- scn_label
  
  
  ## === Diagnostics ===
  
  ## (1) Pooled log-wage densities by sector×gender
  wages_pool <- dt_scn |>
    dplyr::filter(!is.na(sector), !is.na(logW)) |>
    dplyr::mutate(sector=factor(sector, levels=sectors))
  
  ## (2) Movers: Δ logW by origin→dest, gender
  moves <- dt_scn |>
    dplyr::group_by(i) |>
    dplyr::arrange(t, .by_group = TRUE) |>
    dplyr::mutate(next_logW = dplyr::lead(logW),
                  next_sector = dplyr::lead(sector),
                  employed_next = dplyr::lead(employed)) |>
    dplyr::ungroup() |>
    dplyr::filter(employed==1, employed_next==1,
                  !is.na(sector), !is.na(next_sector)) |>
    dplyr::mutate(orig=sector, dest=next_sector,
                  dlogW = next_logW - logW)
  
  ## (3) Residuals: remove sector×gender×educ mean (proxy for unobs)
  dt_resid <- dt_scn |>
    dplyr::filter(!is.na(sector), !is.na(logW)) |>
    dplyr::group_by(g, sector, educ) |>
    dplyr::mutate(resid = logW - mean(logW)) |>
    dplyr::ungroup()
  
  ## (4) Rank–rank slopes for G→P movers (by gender)
  rr <- dt_resid |>
    dplyr::select(i,t,g,sector,resid) |>
    dplyr::group_by(i) |>
    dplyr::arrange(t,.by_group=TRUE) |>
    dplyr::mutate(next_resid = dplyr::lead(resid),
                  next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    dplyr::filter(sector=="PUB", next_sector=="PRI") |>
    dplyr::group_by(g) |>
    dplyr::summarise(rank_slope = {
      x <- rank(resid, ties.method="average")/dplyr::n()
      y <- rank(next_resid, ties.method="average")/dplyr::n()
      if (length(x)>5) coef(lm(y ~ x))[2] else NA_real_
    }, .groups="drop")
  rr$scenario <- scn_label
  
  ## (5) Stayer–leaver residual gap in origin sector G (by gender)
  sl <- dt_resid |>
    # 1. Sort each person's panel properly
    dplyr::group_by(i) |>
    dplyr::arrange(t, .by_group = TRUE) |>
    
    # 2. Next-period sector
    dplyr::mutate(next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    
    # 3. Keep only observations where current sector is PUB
    dplyr::filter(sector == "PUB") |>
    
    # 4. Drop final rows with no next observation
    dplyr::filter(!is.na(next_sector)) |>
    
    # 5. Stayer vs. leaver classification
    dplyr::mutate(
      status = dplyr::case_when(
        next_sector == "PUB" ~ "stayer",
        next_sector != "PUB" ~ "leaver"
      )
    ) |>
    
    # 6. Compute mean residuals by gender × status
    dplyr::group_by(g, status) |>
    dplyr::summarise(
      mean_resid = mean(resid, na.rm = TRUE),
      .groups = "drop"
    ) |>
    
    # 7. Reshape wide: stayer, leaver columns
    tidyr::pivot_wider(
      names_from = status,
      values_from = mean_resid,
      values_fill = NA
    ) |>
    
    # 8. Compute gap safely
    dplyr::mutate(
      gap = stayer - leaver,
      scenario = scn_label
    )
  
  
  ## (6) Staying hazards by current sector (state dependence proxy)
  haz <- dt_scn |>
    dplyr::group_by(i) |>
    dplyr::arrange(t,.by_group=TRUE) |>
    dplyr::mutate(next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    dplyr::filter(!is.na(sector)) |>
    dplyr::mutate(stay = as.integer(sector==next_sector)) |>
    dplyr::group_by(g, sector) |>
    dplyr::summarise(stay_hazard = mean(stay, na.rm=TRUE), .groups="drop") |>
    dplyr::mutate(scenario=scn_label)
  
  list(dt=dt_scn,
       wages_pool=wages_pool,
       moves=moves,
       rr=rr,
       sl=sl,
       haz=haz)
}




# Run the scenarios included in the scenario_list and save the output in `results`    
results <- lapply(scenario_list, run_scenario)
names(results) <- scenario_list




## Bind outputs for plotting / tables
wages_all <- dplyr::bind_rows(lapply(results, `[[`, "wages_pool"), .id="scenario_id")
moves_all <- dplyr::bind_rows(lapply(results, `[[`, "moves"), .id="scenario_id")
rr_all    <- dplyr::bind_rows(lapply(results, `[[`, "rr"), .id="scenario_id")
sl_all    <- dplyr::bind_rows(lapply(results, `[[`, "sl"), .id="scenario_id")
haz_all   <- dplyr::bind_rows(lapply(results, `[[`, "haz"), .id="scenario_id")


## Combine scenarios into one panel
to_factor <- function(x) factor(x, levels = c("SUB","PRI","PUB"))
panel_all <- dplyr::bind_rows(
  lapply(results, \(res) res$dt |> dplyr::mutate(sector = to_factor(sector))),
  .id = "scenario_id"
)

5.7 COMPARE KEY MOMENTS ACROSS SCENARIOS

5.7.1 (i) Employment Transitions by Gender

transitions_all <- panel_all |>
  dplyr::group_by(scenario_id, g, i) |>
  dplyr::arrange(t, .by_group = TRUE) |>
  dplyr::mutate(
    state_t  = factor(ifelse(is.na(sector), "HME", as.character(sector)),
                      levels = state_levels),
    state_t1 = dplyr::lead(state_t)
  ) |>
  dplyr::ungroup() |>
  dplyr::filter(!is.na(state_t1)) |>
  dplyr::count(scenario_id, g, state_t, state_t1, name = "n") |>
  dplyr::group_by(scenario_id, g, state_t) |>
  dplyr::mutate(share = n / sum(n)) |>
  dplyr::ungroup()







# --- Step 1: wide by destination state (same as you already do)
transitions_wide <- transitions_all |>
  tidyr::pivot_wider(
    id_cols  = c(scenario_id, g, state_t),
    names_from = state_t1,
    values_from = share,
    values_fill = 0
  )

# --- Step 2: widen again by gender 
transitions_wide_both <- transitions_wide %>%
  dplyr::arrange(
    factor(scenario_id, levels = scenario_list),
    factor(state_t, levels = c("HME","SUB","PRI","PUB")),
    g
  ) %>%
  tidyr::pivot_wider(
    names_from = g,
    values_from = -c(scenario_id, state_t, g),
    names_glue = "{.value}_{g}"
  )

tbl <- knitr::kable(
  transitions_wide_both,
  caption = "Transition Matrix by Scenario (Women vs Men)",
  align = "c",
  digits = 2,
  format = "html"
) %>%
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    font_size = 13
  ) %>%
  kableExtra::collapse_rows(
    columns = 1,   # only scenario_id now (state_t should NOT be collapsed)
    valign = "top"
  )

kableExtra::save_kable(tbl, "table_trans.png", zoom = 2, density = 300)
tbl
Transition Matrix by Scenario (Women vs Men)
scenario_id state_t HME_fem HME_men SUB_fem SUB_men PRI_fem PRI_men PUB_fem PUB_men
BASELINE HME 0.42 0.43 0.21 0.18 0.20 0.18 0.18 0.20
SUB 0.19 0.19 0.49 0.47 0.17 0.17 0.16 0.17
PRI 0.18 0.19 0.17 0.16 0.49 0.48 0.15 0.17
PUB 0.18 0.19 0.16 0.16 0.17 0.16 0.48 0.50
CF1_price_gap HME 0.37 0.37 0.19 0.18 0.19 0.25 0.25 0.20
SUB 0.16 0.15 0.49 0.47 0.17 0.20 0.19 0.17
PRI 0.15 0.19 0.17 0.16 0.49 0.49 0.18 0.17
PUB 0.18 0.16 0.16 0.16 0.18 0.19 0.47 0.50
CF2_disp HME 0.45 0.43 0.21 0.18 0.20 0.18 0.14 0.20
SUB 0.21 0.19 0.49 0.47 0.17 0.17 0.13 0.17
PRI 0.21 0.19 0.17 0.16 0.49 0.48 0.13 0.17
PUB 0.18 0.19 0.16 0.16 0.18 0.16 0.48 0.50
CF3_compadv HME 0.40 0.43 0.20 0.18 0.20 0.18 0.19 0.20
SUB 0.18 0.19 0.49 0.47 0.17 0.17 0.16 0.17
PRI 0.19 0.19 0.17 0.16 0.49 0.48 0.15 0.17
PUB 0.19 0.19 0.17 0.16 0.17 0.16 0.48 0.50
CF4_strong_compadv HME 0.42 0.43 0.21 0.18 0.21 0.18 0.15 0.20
SUB 0.21 0.19 0.49 0.47 0.17 0.17 0.13 0.17
PRI 0.21 0.19 0.17 0.16 0.49 0.48 0.13 0.17
PUB 0.19 0.19 0.16 0.16 0.16 0.16 0.48 0.50
CF5_family HME 0.38 0.43 0.20 0.18 0.19 0.18 0.22 0.20
SUB 0.17 0.19 0.49 0.47 0.17 0.17 0.17 0.17
PRI 0.17 0.19 0.17 0.16 0.49 0.48 0.17 0.17
PUB 0.18 0.19 0.16 0.16 0.18 0.16 0.48 0.50
CF6_offers HME 0.56 0.43 0.13 0.18 0.15 0.18 0.16 0.20
SUB 0.12 0.19 0.45 0.47 0.12 0.17 0.31 0.17
PRI 0.12 0.19 0.13 0.16 0.45 0.48 0.30 0.17
PUB 0.06 0.19 0.12 0.16 0.12 0.16 0.70 0.50
#Now do by kids
add_states_and_kids2 <- function(df) {
  df %>%
    arrange(t, .by_group = TRUE) %>%
    mutate(
      state_t  = factor(ifelse(is.na(sector), "HME", as.character(sector)),
                        levels = state_levels),
      state_t1 = dplyr::lead(state_t),
      
      # Next-period fertility level
      F_next = dplyr::lead(F),
      
      # Define change ONLY when fertility strictly increases
      kid_change2 = dplyr::case_when(
        !is.na(F_next) & F != 0 & F_next != F ~ "change",
        !is.na(F_next) & F != 0 & F_next == F ~ "no-change",
        TRUE                                  ~ NA_character_
      ),
      kid_change2 = factor(kid_change2, levels = c("no-change", "change"))
    )
}


summarise_transitions2 <- function(df) {
  df %>%
    dplyr::ungroup() %>%
    dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
    dplyr::count(scenario_id, g, kid_change2, state_t, state_t1, name = "n") %>%
    tidyr::complete(scenario_id, g, kid_change2, state_t, state_t1, fill = list(n = 0)) %>%
    dplyr::mutate(
      state_t  = factor(state_t,  levels = state_levels),
      state_t1 = factor(state_t1, levels = state_levels)
    ) %>%
    dplyr::group_by(scenario_id, g, kid_change2, state_t) %>%
    dplyr::mutate(share = n / sum(n)) %>%
    dplyr::ungroup()
}

5.7.2 (ii) Transitions by Fertility Change

transitions_kid2 <- panel_all %>%
  dplyr::filter(scenario_id %in% scenario_list) %>%   # ← keep BOTH genders
  dplyr::group_by(scenario_id, g, i) %>%
  add_states_and_kids2() %>%
  summarise_transitions2()



library(kableExtra)


to_wide2 <- function(trans_long) {
  trans_long %>%
    tidyr::pivot_wider(
      id_cols = c(scenario_id, g, kid_change2, state_t),
      names_from = state_t1,
      values_from = share,
      values_fill = 0
    ) %>%
    dplyr::arrange(scenario_id, g, kid_change2, state_t)
}


transitions_kid2_wide <- to_wide2(transitions_kid2)

# Keep both genders and pivot gender into columns
transitions_kid2_wide_both <- transitions_kid2_wide %>%
  dplyr::arrange(
    factor(scenario_id, levels = scenario_list),
    factor(state_t, levels = c("HME","SUB","PRI","PUB")),
    factor(kid_change2, levels = c("no-change","change")),
    g
  ) %>%
  tidyr::pivot_wider(
    names_from = g,
    values_from = -c(scenario_id, state_t, kid_change2, g),
    names_glue = "{.value}_{g}"
  )

tbl <- knitr::kable(
  transitions_kid2_wide_both,
  caption = "Transition Matrix by Scenario × Kids-Change (Fem vs Men)",
  align = "c",
  digits = 2,
  format = "html"
) %>%
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    font_size = 13
  ) %>%
  kableExtra::collapse_rows(
    columns = 1:2,   # scenario_id + state_t
    valign = "top"
  )
kableExtra::save_kable(tbl, "table_trans_bykidschge.png", zoom = 2, density = 300)
tbl
Transition Matrix by Scenario × Kids-Change (Fem vs Men)
scenario_id kid_change2 state_t HME_fem HME_men SUB_fem SUB_men PRI_fem PRI_men PUB_fem PUB_men
BASELINE no-change HME 0.43 0.43 0.20 0.18 0.21 0.19 0.17 0.19
change HME 0.47 0.36 0.25 0.19 0.11 0.23 0.18 0.22
no-change SUB 0.17 0.18 0.49 0.49 0.17 0.18 0.16 0.16
change SUB 0.19 0.15 0.43 0.39 0.21 0.20 0.17 0.26
no-change PRI 0.18 0.19 0.17 0.15 0.51 0.50 0.14 0.17
change PRI 0.12 0.18 0.22 0.22 0.58 0.39 0.08 0.21
no-change PUB 0.17 0.18 0.17 0.15 0.18 0.16 0.48 0.51
change PUB 0.22 0.15 0.10 0.16 0.15 0.21 0.53 0.49
CF1_price_gap no-change HME 0.37 0.37 0.19 0.18 0.20 0.27 0.24 0.19
change HME 0.48 0.33 0.22 0.19 0.06 0.30 0.24 0.19
no-change SUB 0.14 0.14 0.49 0.49 0.17 0.21 0.19 0.16
change SUB 0.17 0.14 0.44 0.38 0.21 0.22 0.17 0.26
no-change PRI 0.15 0.19 0.17 0.15 0.50 0.50 0.17 0.16
change PRI 0.10 0.15 0.22 0.24 0.58 0.38 0.10 0.23
no-change PUB 0.17 0.15 0.17 0.15 0.19 0.18 0.47 0.51
change PUB 0.20 0.12 0.10 0.15 0.17 0.24 0.52 0.49
CF2_disp no-change HME 0.46 0.43 0.21 0.18 0.21 0.19 0.13 0.19
change HME 0.54 0.36 0.23 0.19 0.11 0.23 0.12 0.22
no-change SUB 0.20 0.18 0.49 0.49 0.17 0.18 0.13 0.16
change SUB 0.21 0.15 0.43 0.39 0.21 0.20 0.15 0.26
no-change PRI 0.20 0.19 0.17 0.15 0.51 0.50 0.13 0.17
change PRI 0.12 0.18 0.22 0.22 0.58 0.39 0.08 0.21
no-change PUB 0.16 0.18 0.17 0.15 0.18 0.16 0.49 0.51
change PUB 0.21 0.15 0.12 0.16 0.15 0.21 0.52 0.49
CF3_compadv no-change HME 0.40 0.43 0.20 0.18 0.22 0.19 0.18 0.19
change HME 0.49 0.36 0.22 0.19 0.12 0.23 0.17 0.22
no-change SUB 0.17 0.18 0.49 0.49 0.17 0.18 0.17 0.16
change SUB 0.19 0.15 0.44 0.39 0.21 0.20 0.15 0.26
no-change PRI 0.18 0.19 0.17 0.15 0.51 0.50 0.14 0.17
change PRI 0.10 0.18 0.22 0.22 0.59 0.39 0.08 0.21
no-change PUB 0.17 0.18 0.17 0.15 0.17 0.16 0.49 0.51
change PUB 0.23 0.15 0.12 0.16 0.15 0.21 0.50 0.49
CF4_strong_compadv no-change HME 0.42 0.43 0.21 0.18 0.23 0.19 0.14 0.19
change HME 0.48 0.36 0.22 0.19 0.14 0.23 0.16 0.22
no-change SUB 0.19 0.18 0.49 0.49 0.17 0.18 0.14 0.16
change SUB 0.23 0.15 0.43 0.39 0.21 0.20 0.13 0.26
no-change PRI 0.20 0.19 0.17 0.15 0.51 0.50 0.12 0.17
change PRI 0.13 0.18 0.22 0.22 0.58 0.39 0.07 0.21
no-change PUB 0.18 0.18 0.17 0.15 0.16 0.16 0.49 0.51
change PUB 0.24 0.15 0.11 0.16 0.17 0.21 0.48 0.49
CF5_family no-change HME 0.32 0.43 0.18 0.18 0.20 0.19 0.30 0.19
change HME 0.46 0.36 0.22 0.19 0.06 0.23 0.26 0.22
no-change SUB 0.13 0.18 0.49 0.49 0.17 0.18 0.21 0.16
change SUB 0.16 0.15 0.45 0.39 0.22 0.20 0.18 0.26
no-change PRI 0.14 0.19 0.17 0.15 0.51 0.50 0.19 0.17
change PRI 0.09 0.18 0.22 0.22 0.57 0.39 0.12 0.21
no-change PUB 0.18 0.18 0.17 0.15 0.19 0.16 0.47 0.51
change PUB 0.20 0.15 0.10 0.16 0.18 0.21 0.52 0.49
CF6_offers no-change HME 0.58 0.43 0.13 0.18 0.16 0.19 0.14 0.19
change HME 0.52 0.36 0.07 0.19 0.20 0.23 0.22 0.22
no-change SUB 0.11 0.18 0.44 0.49 0.12 0.18 0.32 0.16
change SUB 0.11 0.15 0.47 0.39 0.08 0.20 0.33 0.26
no-change PRI 0.13 0.19 0.14 0.15 0.46 0.50 0.27 0.17
change PRI 0.06 0.18 0.23 0.22 0.45 0.39 0.26 0.21
no-change PUB 0.06 0.18 0.12 0.15 0.12 0.16 0.70 0.51
change PUB 0.09 0.15 0.06 0.16 0.18 0.21 0.67 0.49

5.7.3 (iii) Staying Hazard in Public Sector

haz_kid2 <- panel_all %>%
  dplyr::group_by(scenario_id, g, i) %>%
  add_states_and_kids2() %>%
  dplyr::ungroup() %>%
  dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
  dplyr::filter(state_t %in% sectors) %>%
  dplyr::group_by(scenario_id, g, kid_change2, sector = state_t) %>%
  dplyr::summarise(stay_hazard = mean(state_t1 == sector), .groups="drop")
# --- WOMEN ---
haz_pub_fem <- haz_kid2 %>%
  dplyr::filter(g == "fem", sector == "PUB")

p_women <- ggplot(haz_pub_fem,
                  aes(x = scenario_id, y = stay_hazard, fill = kid_change2)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Staying Hazard in Public Sector (Women)",
       x = "Scenario", y = "Stay probability") +
  theme_minimal()

ggsave("fig_stayhaz_fem.png", dpi = 300, width = 10, height = 8)

#print(p_women)                                                    
# --- MEN ---
haz_pub_men <- haz_kid2 %>%
  dplyr::filter(g != "fem", sector == "PUB")

p_men <- ggplot(haz_pub_men,
                aes(x = scenario_id, y = stay_hazard, fill = kid_change2)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Staying Hazard in Public Sector (Men)",
       x = "Scenario", y = "Stay probability") +
  theme_minimal()
ggsave("fig_stayhaz_men.png", dpi = 300, width = 10, height = 8)
#print(p_men)                                                    
gridExtra::grid.arrange(p_women, p_men, ncol = 1)


5.7.4 (iv) Log‑Wage Densities by Sector and Gender

mean_df <- wages_all |>
  dplyr::group_by(scenario_id, sector, g) |>
  dplyr::summarise(mean_logW = mean(logW, na.rm=TRUE), .groups="drop")

plot <- ggplot(wages_all, aes(x = logW, color = g, fill = g)) +
  geom_density(alpha = 0.2) +
  geom_vline(data = mean_df,
             aes(xintercept = mean_logW, color = g),
             linetype="dashed") +
  facet_grid(sector ~ scenario_id, scales="free_y") +
  theme_minimal() +
  labs(title = "Log Wage Density by Sector",
       x = "Log wage", y = "Density")
#print(plot)
#ggsave("log_wage_density.png", plot = plot,width = 10, height = 8, dpi = 300)
ggsave("fig_logw_density.png", dpi = 300, width = 10, height = 8)
print(plot)


5.7.5 (v) Mover Gains (\(\Delta\) log wage PRI \(\rightarrow\) PUB)

Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage

moves_plot <- moves_all |>
  dplyr::filter(orig == "PRI", dest == "PUB")

plot <- ggplot(moves_plot, aes(x = g, y = dlogW, fill = g)) +
  geom_boxplot(outlier.alpha = 0.2) +
  geom_hline(yintercept = 0, linetype=2) +
  facet_grid(. ~ scenario_id) +
  theme_minimal() +
  labs(title = "Mover Gains PRI → PUB",
       x = "Gender", y = "Δ log wage")
ggsave("fig_movergain1.png", dpi = 300, width = 10, height = 8)
print(plot)


5.7.6 (vi) Mover Gains (\(\Delta\) log wage PUB \(\rightarrow\) PRI)

Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage

moves_plot2 <- moves_all |>
  dplyr::filter(orig == "PUB", dest == "PRI")

plot <- ggplot(moves_plot2, aes(x = g, y = dlogW, fill = g)) +
  geom_boxplot(outlier.alpha = 0.2) +
  geom_hline(yintercept = 0, linetype=2) +
  facet_grid(. ~ scenario_id) +
  theme_minimal() +
  labs(title = "Mover Gains PUB → PRI",
       x = "Gender", y = "Δ log wage")
ggsave("fig_movergain2.png", dpi = 300, width = 10, height = 8)
print(plot)

5.7.7 (vii) Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)

Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)

rr_table <- rr_all |>
  dplyr::arrange(g, scenario_id) |>
  gt(rowname_col = "scenario_id", groupname_col = "g") |>
  tab_header(
    title = md("**Rank–Rank Slope for PUB → PRI Movers**"),
    subtitle = md(
      "A lower slope indicates stronger comparative advantage: \
      it measures how strongly a worker’s within-PUB sector rank \
      predicts their rank in PRIVATE sector upon moving."
    )
  ) |>
  fmt_number(columns = dplyr::everything(), decimals = 3) |> ## Format slope values nicely
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_stub() ## Styling
  ) |>
  opt_row_striping() |>
  tab_options(
    table.width = pct(100),
    data_row.padding = px(4),
    table.font.size = 13
  )
#rr_table

gtsave(rr_table, "table_ranks.png")      # PNG
gtsave(rr_table, "table_ranks.pdf")      # PDF 
rr_table
Rank–Rank Slope for PUB → PRI Movers
A lower slope indicates stronger comparative advantage: it measures how strongly a worker’s within-PUB sector rank predicts their rank in PRIVATE sector upon moving.
rank_slope scenario
fem
BASELINE 0.020 BASELINE
CF1_price_gap 0.003 CF1_price_gap
CF2_disp 0.040 CF2_disp
CF3_compadv −0.197 CF3_compadv
CF4_strong_compadv −0.160 CF4_strong_compadv
CF5_family 0.006 CF5_family
CF6_offers −0.039 CF6_offers
men
BASELINE −0.069 BASELINE
CF1_price_gap −0.031 CF1_price_gap
CF2_disp −0.069 CF2_disp
CF3_compadv −0.069 CF3_compadv
CF4_strong_compadv −0.069 CF4_strong_compadv
CF5_family −0.069 CF5_family
CF6_offers −0.069 CF6_offers

5.7.8 (viii) Stayer–leaver residual gap in G (higher gap ⇒ stronger selection on ε in G)

sl_table <- sl_all |>
  dplyr::arrange(g, scenario) |>
  gt(rowname_col = "scenario", groupname_col = "g") |>
  tab_header(
    title = md("**Stayer–Leaver Residual Gap in Origin Sector PUB**"),
    subtitle = md(
      "Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: \
      ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector."
    )
  ) |>
  fmt_number(columns = c(stayer, leaver, gap), decimals = 3) |> ## Format columns nicely
  opt_row_striping() |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_stub() ## Styling
  ) |>
  tab_options(
    table.width = pct(100),
    table.font.size = 13,
    data_row.padding = px(4)
  )

gtsave(sl_table, "table_stayleave_haz.png")      # PNG
gtsave(sl_table, "table_stayleave_haz.pdf")      # PDF 
sl_table
Stayer–Leaver Residual Gap in Origin Sector PUB
Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector.
scenario_id leaver stayer gap
fem
BASELINE BASELINE −0.001 0.001 0.002
CF1_price_gap CF1_price_gap −0.008 0.005 0.014
CF2_disp CF2_disp −0.002 −0.003 −0.001
CF3_compadv CF3_compadv −0.012 0.006 0.017
CF4_strong_compadv CF4_strong_compadv −0.012 0.002 0.014
CF5_family CF5_family −0.005 0.006 0.011
CF6_offers CF6_offers 0.001 0.001 0.000
men
BASELINE BASELINE 0.007 −0.002 −0.009
CF1_price_gap CF1_price_gap 0.006 −0.002 −0.008
CF2_disp CF2_disp 0.007 −0.002 −0.009
CF3_compadv CF3_compadv 0.007 −0.002 −0.009
CF4_strong_compadv CF4_strong_compadv 0.007 −0.002 −0.009
CF5_family CF5_family 0.007 −0.002 −0.009
CF6_offers CF6_offers 0.007 −0.002 −0.009

5.7.9 (ix) Mean Δ log wage for consecutive employment

Wage transitions for consecutive employment (by scenario)

transitions <- panel_all |>
  filter(g == "fem") %>%
  dplyr::group_by(scenario_id, i) |>
  dplyr::arrange(t, .by_group = TRUE) |>
  dplyr::mutate(next_logW   = dplyr::lead(logW),
                next_sector = dplyr::lead(sector),
                next_F      = dplyr::lead(F),
                employed_next = dplyr::lead(employed)) |>
  dplyr::ungroup() |>
  dplyr::filter(employed == 1, employed_next == 1) |>
  dplyr::mutate(dlogW = next_logW - logW,
                F_trans = dplyr::case_when(
                  F == 0 & next_F == 0   ~ "no kids -> no kids",
                  F == 0 & next_F >= 1   ~ "no kids -> kids",
                  F >= 1 & next_F >= 1   ~ "kids -> kids",
                  F >= 1 & next_F == 0   ~ "kids -> no kids"
                ))


dlogW_stats <- transitions |>
  dplyr::mutate(orig = sector, dest = next_sector) |>
  
  # >>> keep only PUB→PRI or PRI→PUB transitions <<<
  dplyr::filter(
    (orig == "PUB" & dest == "PRI") |
      (orig == "PRI" & dest == "PUB")
  ) |>
  dplyr::group_by(scenario_id, g, orig, dest, F_trans) |>
  dplyr::summarise(mean_dlogW = mean(dlogW, na.rm = TRUE),
                   n = n(), .groups = "drop") |>
  dplyr::mutate(orig = to_factor(orig), dest = to_factor(dest)) |>
  dplyr::arrange(scenario_id, g, orig, dest, F_trans)
#print(head(dlogW_stats, 20))

p_dlog <- dlogW_stats |>
  dplyr::filter(n >= 2) |>
  ggplot(aes(x = interaction(orig, dest, sep = "→"),
             y = mean_dlogW, fill = F_trans)) +
  geom_col(position = "dodge") +
  facet_grid(rows = vars(g), cols = vars(scenario_id)) +
  labs(title = "Mean Δ log wage for consecutive employment by scenario",
       x = "Origin → Destination sector", y = "Mean Δ log wage",
       fill = "Fertility transition") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("fig_wtrans_kids.png", dpi = 300, width = 10, height = 8)
print(p_dlog)

5.7.10 (x) Sector Employment Shares Over Time

shares <- panel_all |>
  dplyr::filter(!is.na(sector)) |>
  dplyr::count(scenario_id, t, g, sector, name="n") |>
  dplyr::group_by(scenario_id, t, g) |>
  dplyr::mutate(share = n / sum(n)) |>
  dplyr::ungroup()

plot <- ggplot(shares,
               aes(x = t, y = share, color = sector, linetype = g)) +
  geom_line() +
  facet_grid(sector ~ scenario_id, scales="free_y") +
  theme_minimal() +
  labs(title = "Sector Employment Shares Over Time",
       x = "Time", y = "Share")
ggsave("fig_empshares.png", dpi = 300, width = 10, height = 8)
print(plot)

5.7.11 (xi) Mean Log Wages Over Time by Sector

logwages <- panel_all |>
  dplyr::filter(!is.na(sector), !is.na(logW)) |>
  dplyr::group_by(scenario_id, t, g, sector) |>
  dplyr::summarise(mean_logw = mean(logW), .groups="drop")

plot <- ggplot(logwages,
               aes(x = t, y = mean_logw, color = sector, linetype = g)) +
  geom_line() +
  facet_grid(sector ~ scenario_id, scales="free_y") +
  theme_minimal() +
  labs(title = "Mean Log Wages Over Time",
       x = "Time", y = "Mean log wage")
ggsave("fig_wage.png", dpi = 300, width = 10, height = 8)
print(plot)