1 MODEL SETUP
2 Identification (Heuristic)
3 Estimation Method (SMM/SML Skeleton)
4 Discussion (model mechanisms, identification, etc.)
- 4.1 IDENTIFICATION OF GENDER DIFFERENCES
5 MODEL SOUTION, SIMULATION, AND COUNTERFACTUALS

1 MODEL SETUP

Time is discrete, \(t=0,1,\dots,T\). Each individual \(i\) has gender \(g\in\{m,f\}\) and education \(e\). There are 3 employment sectors, \[ \mathcal S \equiv \{\text{Subsistence }(SUB),\ \text{Private }(PRI),\ \text{Public }(PUB)\}, \] and a home alternative \(H\) (non-employment).

The state at time \(t\) for individual \(i\) is: \[ \Xi_{it}=\Big(g,e,\ r_{it},\ \mathbf{\varepsilon}_i,\ F_{it},\ M_{it}\Big), \] where:

\(r_{it}\in \mathcal S\cup\{H\}\) is last period’s chosen alternative (current “attachment”),
\(\mathbf{\varepsilon}_i=(\varepsilon_{i,sub},\varepsilon_{i,pri},\varepsilon_{i,pub})\) are permanent sector-specific skills,
\(F_{it}\in\{0,1,2,\dots\}\) is an exogenous fertility index (e.g., number or age-weighted count of children).
\(M_{it}\in\{0,1 \dots\}\) is exogenous marital status.

Assume \(F_{it}\) evolves exogenously via a Markov process \(Q(F_{t+1}\mid F_t)\) that is excluded from wages and excluded from offer arrival (it only affects utility).

1.1 Wages (Fixed Conditional on Permanent Skills)

If \(i\) works in sector \(s\in\mathcal S\) at \(t\), the (log) wage is: \[ \boxed{ \ln W_{it}^g(s) = \beta_{0,g}(s)\;+\;\beta_1\,\text{educ}_i\;\;+\;\varepsilon_{is}. } \] Excluding work experience for now.

\(\beta_{0,g}(s)\): skill price (gender- and sector-specific; may capture discrimination).
\(\beta_1\): return to education (common).
\(\beta_2\): return to sector-specific experience if we have experience (common).
\(\varepsilon_{is}\): permanent sector ability (drawn at entry; no transitory wage shocks; no new wage draws).

Distribution of permanent skills. For tractability and mover-based identification, assume a multivariate Normal by gender: \[ \mathbf{\varepsilon}_i \mid g \sim \mathcal N\big(\mathbf{0},\ \Sigma_g\big),\qquad \Sigma_g= \begin{pmatrix} \sigma^2_{1,g} & \rho_{12,g} \,\, \sigma_{1,g}\,\, \sigma_{2,g} & \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} \\ \rho_{12,g} \,\, \sigma_{1,g} \,\, \sigma_{2,g} & \sigma^2_{2,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g}\\ \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g} & \sigma^2_{3,g} \end{pmatrix}. \]

(Alternative: independent Fréchet components; results below do not require closed-form shares.)

For \(\Sigma_g\) to be a positive semidefinite (ie a valid covariance matrix), the corr matrix must also satisfy \[ 1 + 2 \, \rho_{12,g}\, \rho_{13,g}\, \rho_{23,g} - \, \rho_{12,g}^2 - \, \rho_{13,g}^2 - \, \rho_{23,g}^2 \] So for various pars values, make sure you check this condition.

1.2 Offers: One Sector per Period, Home Always Available

Each period, at most one employment offer arrives. Let \(O_{it}\in\{S,G,P,\varnothing\}\) denote the sector offered at \(t\) (or \(\varnothing\) if no offer). Define arrival probabilities conditional on observables \(X_{it}\) (age, past sector, etc.):

\[ \Pr\big(O_{it}=s\mid X_{it},g\big)=\lambda^g_s(X_{it}),\quad s\in\{S,G,P\},\qquad \Pr(O_{it}=\varnothing\mid X_{it},g)=1-\sum_{s}\lambda^g_s(X_{it}). \] Key friction: If \(r_{it}=s\) (currently attached to \(s\)), the agent can continue in sector \(s\) only if \(O_{it}=s\). Otherwise, they can’t remain employed in \(s\) this period.

Home \(H\) is always available (no offer needed).

In other words, agents compare at most one employment sector vs home choice each period.

1.3 Preferences

Let per-period utility be quasi-separable in (log) wages and non-pecuniary sector tastes shifted by fertility (F) and marital status (M):

\[ u_{it}(s) = \beta_u\,\ln W_{it}^g(s) \;+\; \gamma_s^g\,F_{it} \;+\; \delta_s^g\,M_{it},\qquad s\in\{SUB,PRI,PUB\}, \] with a home utility \(u_{it}(H)= b_0 \;+\; \gamma_H\,F_{it} \;+\; \delta_H\,M_{it},\qquad b_0\in\mathbb R.\)

\(\beta_u>0\) scales log-wage into utils.
\(\gamma_s\) and \(\delta_s\) are sector-specific preference shifters and they differ by gender (exogenous; do not affect wages or offers).
No additional idiosyncratic taste shocks are included (only permanent wage heterogeneity). This focuses identification on fertility as an exclusion.

1.4 Dynamic Program

Timing Within Period: At the start of period \(t\) given state \(\Xi_{it}\):

Offer draw: \(O_{it}\in\{S,G,P,\varnothing\}\) is realized from \(\{\lambda^g_s(X_{it})\}\).
Choice set: \(\mathcal C_{it}=\{H\}\cup\{O_{it}\}\,\) (home plus the one offered sector if any).
Decision: choose \(d_{it}\in\mathcal C_{it}\) to maximize current utility plus continuation.
Experience update given choice; then \(F_{it}\) evolves to \(F_{it+1}\sim Q(\cdot\mid F_{it})\).

Let \(V_t(\Xi)\) be the value function. Define the sector-specific continuation value if offered \(s\):

\[ \mathcal V_t(s;\Xi) \equiv u_t(s) \;+\; \beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=s\big], \]

and the home continuation value: \[ \mathcal V_t(H;\Xi) \equiv u_t(H)\;+\;\beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=H\big], \] with discount factor \(\beta\in(0,1)\).

Given the one-offer constraint, \[ V_t(\Xi) \;=\; \underbrace{\Pr(O_t=\varnothing\mid X,g)}_{\text{no offer}} \cdot \max\{\mathcal V_t(H;\Xi)\} \;+\; \sum_{s\in\{S,G,P\}} \underbrace{\Pr(O_t=s\mid X,g)}_{\lambda^g_s(X)} \cdot \max\{\mathcal V_t(H;\Xi),\ \mathcal V_t(s;\Xi)\}. \] The law of motion inside the expectations updates experience per choice and draws \(F_{t+1}\) exogenously.

stayers vs. movers:

If \(r_t=s\) and \(O_t=s\), the agent may stay in \(s\) by comparing \(\mathcal V_t(s;\Xi)\) to \(\mathcal V_t(H;\Xi)\).

If \(O_t\ne s\) (or \(O_t=\varnothing\)), staying in \(s\) is infeasible; the agent compares only \(H\) and the offered \(O_t\) (if any).

Movers are observed when \(O_t=s'\ne r_t\) and \(\mathcal V_t(s';\Xi)\ge \mathcal V_t(H;\Xi)\).

1.5 Normalizations and Interpretability

To interpret gender wedges as skill price differences:

Fix a base sector \(s_0\) (e.g., Private) and set \(\beta_{0,m}(s_0)=0\).
Center permanent skills by gender–sector: \(\mathbb E[\varepsilon_{is}\mid g]=0\).
Then \(\beta_{0,f}(s)-\beta_{0,m}(s)\) is a price/discrimination wedge in sector \(s\).
Optionally fix \(b_0=0\) or \(\beta_u=1\) for utility scale.

Observables and Moments: Given panel data on \(\{s_{it},\ \ln W_{it}\ \text{if } s_{it}\in\mathcal S,\ F_{it},\ e,\ \mathbf{x}_{it}\}\).

Define moments (by gender and also by education/experience if applicable):

Sector shares: \(\pi_{gs}(t)=\Pr(s_{it}=s\mid g,t)\).
Transitions: \(P_g(s'|s)=\Pr(s_{it+1}=s'\mid s_{it}=s,g)\).
Within-sector wages: \(\bar w_{gs}(t)=\mathbb E[\ln W_{it}\mid s_{it}=s,g,t]\), and dispersion \(\sigma^2_{gs}(t)\).
Mover wage gaps: \(\mathbb E[\Delta \ln W_{i} \mid r_t=s,\ s_{t+1}=s',\ g]\).
Fertility responses: event-study \(\Delta \pi_{gs}\) and \(\Delta P_g(s'|s)\) around \(\Delta F_{it}\).

2 Identification (Heuristic)

2.1 a. Sector Skill Prices by Gender: \(\beta_{0,g}(s)\) (relative)

Using movers and the one-offer structure:

\[ \mathbb E\big[\ln W_{i}(s') - \ln W_{i}(s)\mid r_t=s,\ s_{t+1}=s',\ g\big] = \underbrace{\beta_{0,g}(s')-\beta_{0,g}(s)}_{\text{price difference}} + \beta_2\,\Delta x_i + \underbrace{\mathbb E[\varepsilon_{is'}-\varepsilon_{is}\mid \text{move } s\!\to\! s']}_{\text{selection term}}, \]

where the selection term arises because movers are selected on \(\mathbf{\varepsilon}\) given the acceptance rule and offers.

Given \(\Sigma_g\) (below) and the offer process (below), the selection term is pinned down in the simulated model; the residual identifies \(\beta_{0,g}(s')-\beta_{0,g}(s)\) up to the base-sector normalization.

Gender wedges. With centered \(\varepsilon\), the cross-gender difference \(\beta_{0,f}(s)-\beta_{0,m}(s)\) is interpretable as discrimination (skill price) in sector \(s\).

2.2 b. Permanent Skill Distribution: \(\Sigma_g\)

Within-sector residual dispersion of \(\ln W\) (net of \(\beta_1,\beta_2\)) identifies variances \(\sigma^2_{gs}\).
Mover patterns (who moves where; gains upon moving) identify covariances \(\rho_{g,ss}\): strong positive correlation implies individuals who are strong in \(s\) are also strong in \(s'\) (affecting acceptance when offered \(s'\)).
\(\Sigma_g\) cannot mimic within-person, event-timed changes tied to fertility (see below), which helps separate it from \(\gamma\).

2.3 c. Fertility Taste Shifters: \(\gamma_s\) (Exclusion)

Fertility \(F_{it}\) enters utility only (not wages/offers). Around fertility events:

Shares and transitions shift due to acceptance, not arrival:

\[ \frac{\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k+1)} {\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k)} \approx \frac{\text{acceptance}(s'|F=k+1)}{\text{acceptance}(s'|F=k)}. \]

These within-person changes identify \(\gamma_s\) because \(\Sigma_g\) is permanent and \(\lambda^g_s(\cdot)\) is excluded from \(F\).

If public-sector transitions and shares jump for women at childbirth while wages/offer proxies do not, then \(\gamma_G>0\) (family-friendly public sector) is identified—separate from \(\beta_{0,g}(s)\) and \(\Sigma_g\).

2.4 d. Offer Process: \(\lambda^g_s(X)\)

Given \(\gamma\) (acceptance shifts at fertility) and wage-side parameters, the levels of transitions identify the arrival probabilities:

\[ \Pr(s_{t+1}=s'\mid s_t=s) = \underbrace{\lambda^g_{s'}(X_{t+1})}_{\text{arrival}} \times \underbrace{\Pr(\text{accept } s'\mid O_{t+1}=s',\Xi_{t+1};\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g)}_{\text{acceptance}}. \]

Fertility-timed contrasts net out acceptance shifts, isolating \(\lambda^g_{s'}(\cdot)\) from the remaining variation in transitions.

2.5 e. Prices and Endowments

With appropriate normalization e.g. \(\mathbb E[\varepsilon_{is}\mid g]=0\), mean gender differences within a sector load on \(\beta_{0,f}(s)-\beta_{0,m}(s)\) (instead of \(\varepsilon\)).
Relative female advantage in public (vs private) can arise from either
- higher \(\beta_{0,f}(PUB)-\beta_{0,m}(PUB)\) (skill price advantages), or
- higher \(\sigma_{3,f}\) or favorable \(\rho_{23,f}\) (women’s endowment more aligned with \(PUB\) ie comparative advantages).
Will use mover wage gaps and fertility-timed acceptance shifts to separate these channels: \(\beta_0\) affects wage levels and mover gaps; \(\gamma\) affects acceptance around fertility; \(\Sigma_g\) affects dispersion and who moves, not event-timed shifts.

3 Estimation Method (SMM/SML Skeleton)

Parameter blocks: \(\Theta=\{\beta_u,\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g,\lambda\}\) with normalizations (\(\beta_{0,m}(PRI)=0\), centered \(\varepsilon\)).
Simulate offer draws \(O_{it}\) from \(\lambda^g_s(X_{it})\) over observed \(X_{it}\) paths.
Solve choices period-by-period (myopic or with continuation via forward simulation) given \(F_{it}\), wages, and one-offer constraint.
Match moments: sector shares, transitions (overall and by fertility states), within-sector wage means/variances, and mover wage gaps.
Over-ID checks: out-of-sample moments (e.g., wage tails, acceptance at non-fertility events).

Possible counterfactuals to run using the estimated model:

No discrimination: set \(\beta_{0,f}(s)=\beta_{0,m}(s)\) for all \(s\), keep \(\Sigma_g,\gamma,\lambda\) fixed, recompute outcomes.
No family-friendliness differential: set

\[ \gamma_{PUB}=\gamma_{PRI}=\gamma_{SUB} \]

(or \(\gamma_s=0\)), examine how fertility no longer tilts acceptance.
Offer equalization: set \(\lambda^f_s(X)=\lambda^m_s(X)\), keep other primitives.

4 Discussion (model mechanisms, identification, etc.)

This dynamic Roy-with-offers model hardwires the “one-offer-per-period” friction, forcing agents to compare one employment sector against home each period (home always feasible; staying in the same job requires a fresh offer). Wages depend only on permanent sector-specific skills and observables; there are no transitory wage shocks. Fertility is an observed, exogenous shifter of non-pecuniary sector utility only.

Identification separates:

Prices \(\beta_{0,g}(s)\) via mover wage gaps and wage levels,
Endowments \(\Sigma_g\) via dispersion and selection patterns,
Offers \(\lambda\) via transition levels net of acceptance,
Family-friendliness \(\gamma\) via within-person,fertility-timed acceptance shifts.

The combination of these aspects of the data allows to separate the role of skill prices, skill comparative advantage and offer probabilities.

For example: The avg log wage difference for women who move from public to private is around -0.46 and -0.09 for women and men, respectively. Can use this information in conjunction with other aspects of the data. In order to distinguish the roles of skill price gaps vs. comparative advange/endowmentstructure, we use many data oments at once, nost jst te mean waegs ormean log wage differences for stayers and movers. In the state-depdendent offer model specified above, each block pushes differnet patterns in the data:

Variance and shape of mover gains, not just the mean.
- Price differences shift the mean of \(\Delta w\) almost uniformly, the variance and skew of \(\Delta w \mid s \rightarrow s',g\) are driven by the distribution of \(\varepsilon_{is'}- \varepsilon_{is}\) under selection.
- Matching \(Var(\Delta w)\) by origin-destination-gender pins down \(\sigma_{gs}, \sigma_{gs'}\) and especially \(\rho_{g,ss'}\). A big mean gap with small variance points to \(\Delta \beta_{0,g}\); a big variance/left tail points to comparative advantage sorting.
Rank-rank (or quantile-quantile) patterns for movers. - Regress post-move residual-wage rank in s’ on pre-move residual rank in s for movers of gender g. The slope identifies \(\rho_{g,ss'}\).
- If prices drive the gap, the rank mapping is near 1 (high earners in PUB remain high in PRI).
- If comparative advantage drives it, the slope flattens (or even inverts in tails): high PUB earners are not high in PRI.
Stayer-leaver contrasts within origin sectors.
- Compare residual wages (after removing \(\beta_{0,g}(s) + \beta_1 educ\) of stayers vs leavers from \(s\), by gender.
- Strong negative selection of leavers (leavers much lower residuals in \(s\) than stayers) is evidence of comparative advantage. Pure price gaps do not create differential stayer-leaver sorting strength.
Bidirectional flows: symmetry tests.
- Jointly match \(PUB \rightarrow PRI\) and \(PRI \rightarrow PUB\) rates and gains by gender.
- A price gap \(\Delta \beta_{0,g}(PRI,PUB)\) pushes both directions’ means (opposite sign, similar magnitude after accounting for selection).
- Comparative advantage shows asymmetric selection and different dispersion across the two directions.
Within-sector dispersion (pooled), by gender.
- \(Var(\text{residual } lnW \mid s,g)\) pins down \(\sigma_{gs}^2\).
- If women’s residual dispersion is notably different in \(PRI\) vs. \(PUB\), you can’t match both dispersions and the mover distributions with a pure price story; you’re forced to adjust \(\Sigma_g\) (including \(\rho_{23,g}\)).
Event-time around fertility (the main exclusion):
- Fertility shifts acceptance via \(\gamma_s\), not wages. After fertility events:
  - If sector shares shift and mean residual wages within sector change via composition, while mover gains’ mean doesn’t shift systematically, that’s a cceptance/selection, not prices.
  - Prices \(\beta_{0,g}\) are time-invariant here, so fertility timed movements help you attribute composition changes to \(\gamma\) and \(\Sigma_g\). not \(\beta_0\).
State-dependent offer matrix moments.
- The added tweak (offers depend on current sector) gives extra moments: staying hazards \(P(s \rightarrow s)\), directional flows \(P(PUB \rightarrow PRI)\) vs \(P(PRI \rightarrow PUB)\) by gender and fertility.
- These depend strongly on \(\lambda_{s' \mid s,g}\) and \(\gamma\) but not on \(\beta_0\). Matching these pins down frictions and acceptance,leaving \(\beta_0\) to be identified by wage levels and mover means once selection is accounted for.

Practically, how does the estimator tell them apart?

In the structural SMM:

We normalize \(E[\varepsilon_{is} \mid g]=0\) by gender and sector. and set a base \(\beta_{0,m}(s_0)=0\). Then mean within-sector wages by gender identify relative \(\beta_{0,g}(s)\), while dispersions and rank-rank moments identify \(\Sigma_g\) (including \(\rho_{g,ss'}\)).
We match simultaneously:
- within-sector means/variances of residual wages by g,s
- full distribution (or bins) of \(\Delta w\) for movers by (origin-destination-gender).
- rank-rankslopes among movers,
- stayer-leaver residual gaps within sectors,
- bidirectional flow rates and mover shares,
- event-time (fertility) changes in shares/transitions,
- state-dependent staying hazards \(P(s \rightarrow s)\) vs exits

No single moment is decisive; the joint fit forces a decomposition: If you try to explain the women’s -0.46 mean with only a large \(\Delta \beta_{0,f}(PRI,PUB)\), you will misfit - the vairance of \(\Delta w\), - the mover rank-rank slope, (iii) stayer-leaver gaps, or (iv) the reverse \(PRI \rightarrow PUB\) flow so the optimizer shifts weight into \(\Sigma_f\) (comparative advantage) until all moments line up. Conversely, if you try to do it only with \(\Sigma_f\), you will miss within-sector mean levels and pooled wage gaps unless \(\beta_{0,f}(P)- \beta_{0,f}(G)\) adjusts.

Two concrete diagnostics:

Rank-rank plot for movers \(PUB \rightarrow PRI\) (separately by gender): Compute residual ranks in PUB before move and residual ranks in PRI after move. A much flatter slope for women than men \(\rightarrow\) stronger negative \(\rho_{23,f}\) / comparartive advantage channel.
Stayer-leaver residual gap in G (by gender): \(E[\tilde{w} \mid \text{stay in PUB}] - E[\tilde{w} \mid \text{leave PUB} ]\). A larger gap for women indicates stronger selection on \(\varepsilon\) among female leavers, again pointing to endowments rather than prices.

Sources of gender differences and discussion of which moments move. What identifies prices \(\beta_0\), vs. endowments (\(\Sigma_g\)), and separates both from offers \(\lambda\) and tastes \(\gamma\). How to read the scenario results (ie what identifies what)

Scenario 1 (prices only): Gender gaps come only from \(\beta_{0,g}(s)\). You should see parallel shifts in within-sector wage means, similar dispersion across genders, and mover \(\`Delta log W\) distributions shifting mainly in means. Rank-rank slopes (rr) and stayer-leaver gaps (sl) should be similar across genders - little evidence of comparative advantage differences.
Scenario 2 (endowments only): Prices equalized; genders differ only in \(\Sigma_g\) (variances/covariances for the joint distribution of sector-specific skill endowments). Expect different dispersions, different shapes/variances of mover gains, flatter female rank-rank slope \(PUB \rightarrow PRI\) if \(\rho_{23,g}\) is lower (rr), and larger stayer-leaver residual gap for the gender with stronger selection (sl). Means won’t shift much without price differences.
Scenario 3 (offers only): Prices and \(\Sigma\) equal; only offer matrices differ (state persistence). Within-sector wages look similar, mover gains distributions similar, but staying hazards differ by gender and sector. Sector shares (not shown but trivial to add) will reflect opportunity, not wages.
Scenario 4 (fertility tastes only): Prices, \(\Sigma\) offers equal; only \(\gamma\) differs. Around births you will see gender-specific shifts into the more family-friendly sector (public), with composition changes in p1/p2 but no shifts in underlying prices.

4.1 IDENTIFICATION OF GENDER DIFFERENCES

We want to distinguish (and stress-test potential observational equivalences between):

Gender differences in sector-specific skill prices \(\beta_{0,g}(s)\) (e.g. a female public-sector premium).
Gender differences in comparative advantage in permanent skills \(\Sigma_g\) (e.g. women have stronger PUB-specific advantage or lower cross-sector correlation).
Gender differences in family-friendliness/child-related utility \(\gamma_g(s)\) (e.g. PUB becomes more attractive for women when kids arrive).
Gender differences in offer arrival rates \(\lambda_{g}(s)\) (e.g. women receive relatively more PUB offers).

To identify each mechanism, we want scenarios where only one block of parameters differs by gender at a time, plus a few “confusability” scenarios that intentionally generate similar public-sector shares with different underlying channels.

5 MODEL SOUTION, SIMULATION, AND COUNTERFACTUALS

For each subsection, click on show to see corresponding code.
The program first specifies number of periods, simulation size etc. and then functional forms.
It then assigns some parameter values which are loosely calibrated to make wage and employment transitions to look somewhat reasonable. Such calibrated values would be the starting values in the actual estimation but here we’re just using it to demonstrate how the model works.
Each scenario refers to a different combination of parameter values to see which moments are most responsive to which parameters, for identification of key parameters.

5.1 Baseline

Baseline can be chosen acccording to what comparisons are needed.
If baseline is chosen so that men and women are identical, it makes every other scenario interpretable as a one-dimensional deviation.
\(\beta_{0,m}(s)=\beta_{0,f}(s)\) for all \(s\).
\(\Sigma_m=\Sigma_f\) with moderate dispersion and zero correlation (no comparative advantage by construction).
\(\lambda_m=\lambda_f\) (identical offer rates).
\(\gamma_m=\gamma_f=\mathbf{0}\) (no family-friendliness).
If baseline is chosen so that they differ in various dimensions, then counterfactuals where each dimension is equated one at a time or in a sequence by adding each element in a sequence, then thecontribution each compoennt can be quantified.

5.2 Scenario list

5.2.1 Scenario `CF1_price_gap`

Pure “prices” (skill-price intercepts) channel

Skill prices are changed so that men have a PRI sector premium \(\beta_{0,m}(PRI) > \beta_{0,m}(s)\) for all s.
Skill prices are changed so that women have PUB sector premium \(\beta_{0,f}(PUB) > \beta_{0,f}(s)\) for all s.
All else identical to baseline

-Answers the following question: Can the model attribute higher female PUB employment AND a smaller PUB wage gap to a price difference?

5.2.2 Scenario `CF2_disp`

Can a higher upper tail of women’s PUB wages (and greater persistence) can be attributed to \(\sigma_{f}(\text{PUB})\).

Check by changing only women’s PUB skill dispersion (higher \(\sigma\) in PUB). Correlation is not canged

5.2.3 Scenario `CF3_compadv`

Answers the following question: Can sorting into PUB and PUB wage moments can be generated by selection on permanent skills without any price premium.

Check by changing only the correlation structure for women (stronger comparative advantage).

Dispersion of women’s sector-specific skill distribution remains unchanged Correlation between women’s skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)

5.2.4 Scenario `CF4_strong_compadv`

(We can also combine dispersion + negative correlation if we want even sharper comparative advantages. See below.) STRONG COMP ADV Dispersion of women’s sector-specific skill distribution for only the public sector (i.e. sigma) is increased Correlation between women’s skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)

``dispersion + negative correlation’’ produces STRONGER comparative advantage because: Comparative advantage in the Roy setup comes from relative rankings across sectors, not just levels. Two ingredients sharpen sorting:

Higher dispersion in one sector (higher \(\sigma\))
- \(\rightarrow\)creates a thicker right tail
- \(\rightarrow\) more extreme specialists.
Negative cross-sector correlation (\(\rho<0\))
- \(\rightarrow\) being good in PUB implies being bad in PRI (and vice versa)
- \(\rightarrow\) stronger specialization
- \(\rightarrow\) stronger selection
- \(\rightarrow\) lower mobility
- \(\rightarrow\) steeper sorting moments.
When we combine both:
- \(\rightarrow\) more mass of workers who are extremely PUB-specializaed and unlikely to leave
- \(\rightarrow\) stronger persistence, bigger stayer-leaver gap, flatter rank-rank slope.

Concrete parameter example

Men (benchmark, weak comparative advantage)

sig <- c(0.30, 0.30, 0.30)

rho <- c(0.00, 0.00, 0.00) # no correlation \(\rightarrow\) weak specialization

This gives similar dispersion in all sectors.
Skills independent \(\rightarrow\) people can be good everywhere \(\rightarrow\) weak sorting.

Women (strong comparative advantage in PUB)

sig <- c(0.30, 0.30, 0.50) # higher dispersion in PUB

rho <- c(-0.30, -0.40, -0.35) # negative correlations across sectors

\(\sigma(PUB)=0.50\) \(\rightarrow\) bigger right tail \(\rightarrow\) some people are very good in PUB.
\(\rho<0\) \(\rightarrow\) PUB-skill negatively related to PRI/SUB skill.
Workers who are in PUB tend to be weak in PRI \(\rightarrow\) strong specialization.

Compared to the ``equal \(\sigma\) and zero \(\rho\) case’’:

- PUB emp share (women) *rises slightly (more specialization)*. 

- PUB staying hazard *rises strongly*

- Stayer-leaver residual gap (PUB) *increases a lot*.

- Rank-rank slope PUB $\rightarrow$ PRI *falls sharply (stronger comparative adv.)*. 

- Mover gains aare *asymmetric, more dispersion*. 

- Wage density (PUB) gives rise to a fatter upper tail.

The key diagnostic is: Rank–rank slope falls + stayer–leaver gap rises = covariance structure, not prices.

5.2.5 Scenario `CF5_family`

Can higher female PUB shares can arise without a wage premium, and whether the fertility-conditioned moments pick it up.

Check by changing only \(\gamma_f(\text{PUB})>0\) (women value PUB more when fertility changes / kids arrive). This is the same idea as FR0/FR1.

Pure “family-friendliness” channel Women’s child utility for the public sector is increased

5.2.6 Scenario `CF6_offers`

Can female PUB employment be driven by the search environment without changing wages directly (except through selection)?

Check by changing only women’s offer arrival rates (higher \(\lambda_f\) in PUB, compensating reductions elsewhere to keep overall offer intensity similar). Women’s offer arrival rate from the PUB sector is increased

5.2.7 Scenario `EQ1_PRICES`

Equate skill prices between men and women

5.2.8 Scenario `EQ2_ENDOWMENTS`

Equate the distributin of skills between men and women

5.2.9 Scenario `EQ3_OFFERS`

Equate arrival rates of offers between men and women

5.2.10 Scenario `EQ4_FAMILY`

Equate sector-specific child utilities between men and women

5.2.11 Some potential “same shares, different mechanism” stress tests

These are deliberately redundant in aggregate public-sector shares but should differ in wage distributions, mover gains, and fertility splits. They help show that your moment set can tell the mechanisms apart:

Skill prices vs. Family stuff: Compare CF1_price_gap vs CF5_family.

Intuition:

Price gap should shift PUB wages (level shift in PUB log-wage density, mover gains PRI→PUB), and affect both “kids-change” and “no-change” groups similarly.
Family friendliness should mainly show up in fertility-conditioned transitions and staying hazards, with smaller wage shifts.

Skill prices vs. Offers

Compare CF1_price_gap vs CF6_offers.

Intuition:

- Offers reallocate sector shares through entry/flows, but do not create
  a direct wage intercept shift. Wage changes are composition-driven.

- Price changes shift wages even holding composition fixed.

Comparative Advantages vs. Offers

Compare CF4_strong_compadv vs CF6_offers.

Intuition:

Comparative advantage should show up strongly in rank–rank slopes and stayer–leaver residual gaps (selection moments).
Offer-rate shifts should show up mostly in transitions and sector shares, with weaker rank–rank implications.

5.3 BASIC DEFINITIONS AND FUNCTIONS FOR THE SOLUTION/SIMULATION

HOME    <- "HME"
sectors <- c("SUB","PRI","PUB")
R_states <- c(HOME, sectors)  
state_levels <- c("HME","SUB","PRI","PUB")


library(gt)

## ---- State-dependent (current-sector) offer process.
## tilt the offered sector toward the CURRENT r_t by adding a "stay bonus"
#stay_bonus <- 0.20  # extra mass for re-offer from current sector
## Keep "none" prob equal to the original base-none, and renormalize across S,G,P.
##  A1) Build conditional offer matrices with a given base + stay_bonus
## returns a list: probs[[r]] is a named vector over c(S,G,P,None) for origin r ∈ {H,S,G,P}
build_lambda_conditional <- function(lambda, bonus) {
  out <- list()
  base_sum <- sum(lambda)
  #print(base_sum)
  #print(paste("base_sum:", base_sum));
  
  # From HOME: use base probabilities (no tilt)
  out[[HOME]] <- c(lambda, None = 1 - base_sum)
  
  # From each sector: add stay bonus to that sector, renormalize offers to sum to base_sum
  for (r in sectors) {
    prob <- lambda
    prob[r] <- prob[r] + bonus
    prob <- prob * (base_sum / sum(prob))  # renormalize to sum to base_sum
    out[[r]] <- c(prob, None = 1 - sum(prob))
  }
  out
  #print(paste("out:", out[["PUB"]] ));
}


cov_matrix <- function(sig, rho) {
  k <- length(sig)
  if (k < 2) stop("Length of 'sig' must be at least 2.")
  if (any(sig <= 0)) stop("'sig' must contain strictly positive values.")
  
  # correlation matrix from scalar rho
  cor_matrix_scalar <- function(rho, k) {
    if (rho <= -1 || rho >= 1)
      stop("Scalar rho must be strictly between -1 and 1.")
    M <- matrix(rho, k, k)
    diag(M) <- 1
    M
  }
  
  # correlation matrix from vector rho
  cor_matrix_vector <- function(rhos, k) {
    needed <- k*(k-1)/2
    if (length(rhos) != needed)
      stop(sprintf("For k=%d, rho vector must be length %d.", k, needed))
    if (any(rhos <= -1 | rhos >= 1))
      stop("All correlations must be strictly between -1 and 1.")
    
    M <- matrix(1, k, k)
    M[upper.tri(M)] <- rhos
    M[lower.tri(M)] <- t(M)[lower.tri(M)]
    M
  }
  
  # choose scalar or vector case
  R <- if (length(rho) == 1) {
    cor_matrix_scalar(rho, k)
  } else {
    cor_matrix_vector(rho, k)
  }
  
  # Σ = D R D
  D <- diag(sig)
  D %*% R %*% D
}

# Check for positive semidefiniteness
is_valid_corr <- function(rho) {
  # rho = c(r12, r13, r23)
  if (length(rho) != 3) stop("rho must be a vector of length 3")
  
  r12 <- rho[1]
  r13 <- rho[2]
  r23 <- rho[3]
  
  R <- matrix(c(
    1,    r12,  r13,
    r12,  1,    r23,
    r13,  r23,  1
  ), nrow = 3, byrow = TRUE)
  
  lambda_min <- min(eigen(R)$values)
  valid <- lambda_min > -1e-8
  
  if (!valid) {
    warning(sprintf(
      "Invalid correlation triple: smallest eigenvalue = %.4g",
      lambda_min
    ))
  }
  
  # Return nicely formatted result
  return(
    data.frame(
      r12 = r12,
      r13 = r13,
      r23 = r23,
      valid = valid
    )
  )
}


# Examples
rho <- c(0.3, 0.4, 0.2)
#is_valid_corr(rho)     # TRUE
rho <- c(0.9, 0.9, -0.9)    
#is_valid_corr(rho)     # FALSE
rho <- c(1, -1, 1)    
#is_valid_corr(rho)     # FALSE
rho <- c(0.7, -0.5, -0.2) 
#is_valid_corr(rho)     # TRUE





rmvnorm <- function(n, mu, Sigma) {
  Z <- matrix(rnorm(n * length(mu)), nrow = n)
  L <- chol(Sigma)
  sweep(Z %*% L, 2, mu, `+`)
}

5.4 CHOOSE WHICH BASELINE AND COUNTERFACTUAL SCENARIOS TO RUN

Key parameters that vary across counterfactual scenarios:
- Sector- and gender-specific skill prices.
- Joint distribution of sector-specific skills (gender-specific).
- Sector- and gender-specific non-pecuniary utilities which differ by kids and marital status.
- Arrival rates of offers (gender-specific).
See the function make_scenario to see what parameter values each scenario scn corresponds to.
In the code block below, you can change:
- Baseline parameter values
- Parameter values for each counterfactual scenario
- Scenario list to compare

IN ORDER TO RUN THE CODE WITH ANY CHANGES YOU’D LIKE TO IMPLEMENT, OPEN THE CORRESPONDING .RMD FILE IN RSTUDIO, MAKE YOUR CHANGES, SAVE AND THEN CLICK ON KNIT AT THE TOP. THIS WILL RECOMPILE EVERYTHING AND CREATE A NEW HTML OR PDF FILE, WHICH SHOULD AUTOMATICALLY APPEAR ON VIEWER ON THE RIGHT PANEL.

    #Choose the number of periods 
    T_per <- 10
    
    #Choose the number of people to simulate (has to be at least 2-3K)
    N     <- 4000
    
    #Choose baseline
    #STARTSAME: Everything the same between men and women
    #STARTDIFF: Everything different between men and women
    #You can choose the value of WHICH_SCENARIO (whether you want to start from a base where
    #men and woen are equal or not)
    WHICH_SCENARIO <- "STARTSAME" #other alternatives: STARTDIFF
    
    
    #Depending on whether WHICH_SCENARIO is STARTSAME or STARTDIFF, assign parameter values for the baseline accordingly
    #You can change the parameter values for each (i.e. can change these base values beta0, sigma0 etc.) 
    if (WHICH_SCENARIO == "STARTSAME"){
        beta0 <- tibble::tribble(
          ~g, ~s, ~par,
          "men","SUB",  0.20,
          "men","PRI",  0.20,
          "men","PUB",  0.20,
          "fem","SUB",  0.20,
          "fem","PRI",  0.20,
          "fem","PUB",  0.20
        )
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        beta0 <- tibble::tribble(
          ~g, ~s, ~par,
          "men","SUB", 0.20,
          "men","PRI", 0.40,
          "men","PUB", 0.20,
          "fem","SUB", 0.20,
          "fem","PRI", 0.20,
          "fem","PUB", 0.40
          )
      }
      
      
     if (WHICH_SCENARIO == "STARTSAME"){
        sig_m0 <- c(0.30, 0.30, 0.30)
        rho_m0 <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
        sig_f0 <- c(0.30, 0.30, 0.30)
        rho_f0 <- c(0.0, 0.0, 0.0)   # (12, 13, 23)
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        # men at base: no correlation → weak specializat-0.40,-0.35) # (12, 13, 23)
        sig_m0 <- c(0.30,0.50,0.30)
        rho_m0 <- c(0.0,0.0,0.0)      # (12, 13, 23)
        # women at base: strong comp. adv. i.e.higher dispersion inf PUB and negative corr across sectors
        sig_f0 <- c(0.30,0.30,0.30)
        rho_f0 <- c(0.0,0.0,-0.40) # (12, 13, 23)
    }
    #  is_valid_corr(rho_m0)
    #  is_valid_corr(rho_f0)
      
      
      
      ## Fertility shifters (non-pecuniary) by sector incl. home (as before)
    if (WHICH_SCENARIO == "STARTSAME"){
        gamma_m0 <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
        gamma_f0 <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        gamma_m0 <- c(HME=0, SUB=0, PRI=0, PUB=0)
        gamma_f0 <- c(HME=0, SUB=0, PRI=0, PUB=0.5)
    }
      
      
    if (WHICH_SCENARIO == "STARTSAME"){
        lambda_m0 <- c(SUB = 0.30, PRI = 0.30, PUB= 0.30)  # men
        lambda_f0 <- c(SUB = 0.30, PRI = 0.30, PUB = 0.30)  # women
        stay_bonus0 <- 0.40
    } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        lambda_m0 <- c(SUB=0.30, PRI=0.45, PUB=0.30)
        lambda_f0 <- c(SUB=0.30, PRI=0.30, PUB=0.45)
        stay_bonus0 <- 0.40
      }
    

      #Depending on whether WHICH_SCENARIO is STARTSAME or STARTDIFF, pick scenario_list
      #scenario_list is the list of counterfactuals that the code will run and compare
      #If WHICH_SCENARIO IS STARTSAME, then the scenario list consist of scenarios where each 
      #parameter group is changed one at a time. 
      #If WHICH_SCENARIO IS STARTDIFF, then the scenario list consist of scenarios where each 
      #parameter group is equated one at a time. 
      #You can change the specififc parameter values thateach scenario corresponds to (see  
      # make_scenario below)
     if (WHICH_SCENARIO == "STARTSAME"){
        scenario_list <- c(
          "BASELINE",
          "CF1_price_gap",
          "CF2_disp",
          "CF3_compadv",
          "CF4_strong_compadv",
          "CF5_family",
          "CF6_offers"
        )
      } else if (WHICH_SCENARIO == "STARTDIFFERENT"){
        scenario_list <- c(
          "BASELINE",
          "EQ1_PRICES",
          "EQ2_ENDOWMENTS",
          "EQ3_OFFERS",
          "EQ4_FAMILY"
        )
      }
    



## Scenario parameter builders
## Define the parameter values for each scenario
make_scenario <- function(scn){
  out <- list(label = scn)
  
  #First assign the baseline parameters to the vector `out`
  out$beta0_df <- beta0
  out$Sigma_m <- cov_matrix(sig_m0, rho_m0)
  out$Sigma_f <- cov_matrix(sig_f0, rho_f0)
  out$lambda_m_cond <- build_lambda_conditional(lambda_m0, stay_bonus0)
  out$lambda_f_cond <- build_lambda_conditional(lambda_f0, stay_bonus0) 
  out$gamma_m <- gamma_m0
  out$gamma_f <- gamma_f0
  
  
  if (scn == "BASELINE"){
    #No need to do anything here. out wil just be wht's defined above (ie baseline)
  }
  
  else if (scn == "CF1_price_gap"){
    #Skill prices are changed so that men have a private sector premium $\(\beta_{0,g}(s)\).
    #Skill prices are changed so that women have a public sector premium $\(\beta_{0,g}(s)\). 
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~ 0.20,
      g=="men" & s=="PRI" ~ 0.40,
      g=="men" & s=="PUB" ~ 0.20,
      g=="fem" & s=="SUB" ~ 0.20,
      g=="fem" & s=="PRI" ~ 0.20,
      g=="fem" & s=="PUB" ~ 0.40
    ))
  }
  

  else if (scn == "CF2_disp"){
    #Dispersion of women's sector-specific skill distribution for only the public sector (i.e. sigma) is increased
    #Correlation remains unchanged  
    sig_f <- c(0.30, 0.30, 0.50)  # higher PUB dispersion for women
    rho_f <- rho_f0
    out$Sigma_f <- cov_matrix(sig_f, rho_f)
  }
  
    
  else if (scn == "CF3_compadv"){
    #Dispersion of women's sector-specific skill distribution remains unchanged
    #Correlation between women's skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
    sig_f <- sig_f0 #c(0.30, 0.30, 0.30)
    rho_f <- c(0.0, 0.0, -0.40)
    is_valid_corr(rho_f)
    out$Sigma_f <- cov_matrix(sig_f, rho_f)
  }
  
  else if (scn == "CF4_strong_compadv"){
    #Dispersion of women's sector-specific skill distribution for only the public sector (i.e. sigma) is increased
    #Correlation between women's skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
    sig_f <- c(0.30, 0.30, 0.50)  # higher PUB dispersion for women
    rho_f <- c(0.0, 0.0, -0.40)
    out$Sigma_f <- cov_matrix(sig_f, rho_f)
  }
  
  else if (scn == "CF5_family"){
    #Women's child  utility for the public sector is increased
    out$gamma_f <- c(HME=0.00, SUB=0.00, PRI=0.00, PUB=1.00)
  }
  
  else if (scn == "CF6_offers"){
    #Women's offer arrival rate from the PUB sector is increased
    lambda_f <- c(SUB=0.20, PRI=0.20, PUB=0.80)
    out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus0)
  } 
  
  else if (scn == "EQ1_PRICES"){
    #Women's sector specific skill prices are set equal to that of men's
    #out$beta0_df <- out$beta0_df |>
    #  dplyr::group_by(s) |>
    #  dplyr::mutate(par = mean(par)) |>
    #  dplyr::ungroup()
    out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
      g=="men" & s=="SUB" ~ 0.20,
      g=="men" & s=="PRI" ~ 0.40,
      g=="men" & s=="PUB" ~ 0.20,
      g=="fem" & s=="SUB" ~ 0.20,
      g=="fem" & s=="PRI" ~ 0.40,
      g=="fem" & s=="PUB" ~ 0.20
    ))
  }
  
  else if (scn == "EQ2_ENDOWMENTS"){
    #Women's correlation matrix is set equal to that o fmen
    out$Sigma_f <- out$Sigma_m
  }
  
  else if (scn == "EQ3_OFFERS"){
    #Women's offer arrival rates are set equal to that of men 
    out$lambda_f_cond <- out$lambda_m_cond
  }
  
  else if (scn == "EQ4_FAMILY"){
    #Women's sector-specific child utilty for all sectors is set equal to that of men 
    out$gamma_f <- out$gamma_m
  }
  
  
  else {
    stop("Unknown scenario label")
  }
  out
}

#Choose what gender to do the simulations for.
#This will usually just be "both" unless you want to only simulate women or men
RUN_GENDER <- "both"   # other alternatives: "men", "fem", "both"
if (RUN_GENDER == "both") {
  genders <- c("men", "fem")
} else if (RUN_GENDER == "men") {
  genders <- c("men")
} else if (RUN_GENDER == "fem") {
  genders <- c("fem")
}

5.5 SOLUTION AND SIMULATION SUBROUTINES

Dynamic Roy Model with state-dependent offers (experience removed; permanent sector skills only)

3 employment sectors
Home alternative always available
One offer per period (or none)
Offer arrival probabilities depend on current sector
Wages fixed per individual-sector (permanent epsilon + observables)
Fertility and marriage shifts preferences (not wages, not offers)
Finite horizon, solved by backward induction

## =========================================================
## Dynamic Roy with state-dependent offers
## (Experience removed; permanent sector skills only)
## ---------------------------------------------------------
## - 3 employment sectors
## - Home H always available
## - One offer per period (or none)
## - Offer arrival PROBABILITIES depend on CURRENT SECTOR (state)
## - Wages fixed per individual-sector (permanent epsilon + observables)
## - Fertility shifts utility (not wages, not offers)
## - Finite horizon, solved by backward induction
## =========================================================

set.seed(42)
suppressPackageStartupMessages({
  library(dplyr); library(tidyr); library(ggplot2); library(kableExtra)
})


## =========================================================
## COMMON RANDOM NUMBERS (CRN): master uniforms
## =========================================================

#set.seed(42)

# Max number of offer categories (SUB,PRI,PUB,None)
K_offer <- length(sectors) + 1  

# Uniforms for: wages (eps), offers, fertility, marriage
U_eps  <- matrix(runif(N * length(sectors)), N, length(sectors))
U_F    <- matrix(runif(N * T_per), N, T_per)
U_M    <- matrix(runif(N * T_per), N, T_per)
U_off  <- matrix(runif(N * T_per), N, T_per)
U_initM <- runif(N)
U_initF <- runif(N)
U_gender <- runif(N)
U_educ   <- runif(N)


## =========================================================
## FIXED PARAMETERS AND TRANSITION MATRICES
## =========================================================

## Fertility: exogenous Markov
F_states <- 0:5
Q_F <- matrix(0, 6, 6, dimnames=list(F_states, F_states))
Q_F["0","0"] <- 0.95;  Q_F["0","1"] <- 0.05; Q_F["0","2"] <- 0.00; Q_F["0","3"] <- 0.00; Q_F["0","4"] <- 0.00; Q_F["0","5"] <- 0.00
Q_F["1","0"] <- 0.00;  Q_F["1","1"] <- 0.96; Q_F["1","2"] <- 0.04; Q_F["1","3"] <- 0.00; Q_F["1","4"] <- 0.00; Q_F["1","5"] <- 0.00
Q_F["2","0"] <- 0.00;  Q_F["2","1"] <- 0.00; Q_F["2","2"] <- 0.97; Q_F["2","3"] <- 0.03; Q_F["2","4"] <- 0.00; Q_F["2","5"] <- 0.00
Q_F["3","0"] <- 0.00;  Q_F["3","1"] <- 0.00; Q_F["3","2"] <- 0.00; Q_F["3","3"] <- 0.98; Q_F["3","4"] <- 0.02; Q_F["3","5"] <- 0.00
Q_F["4","0"] <- 0.00;  Q_F["4","1"] <- 0.00; Q_F["4","2"] <- 0.00; Q_F["4","3"] <- 0.00; Q_F["4","4"] <- 0.99; Q_F["4","5"] <- 0.01
Q_F["5","0"] <- 0.00;  Q_F["5","1"] <- 0.00; Q_F["5","2"] <- 0.00; Q_F["5","3"] <- 0.00; Q_F["5","4"] <- 0.00; Q_F["5","5"] <- 1.00

#Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
#Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05   # marriage probability
#Q_M["1","1"] <- 1.00                         # married stays married


## Markov transition Q_M(m'|m)
## Example: small probability of marriage when single; absorbing once married.
M_states <- 0:1          # 0 = single, 1 = married
Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05   # Pr(marry)
Q_M["1","1"] <- 1.00                         # married absorbing
## (can freely change this to allow divorce.)

## Marital-status taste shifters (parallel to gamma for fertility)
## delta_s_m  where s ∈ {HME, SUB, PRI, PUB} and m ∈ {0,1}
## These are non-pecuniary utility additions.
delta_men <- c(HME=0, SUB=0, PRI=0, PUB=0)   # men
delta_fem <- c(HME=0, SUB=0, PRI=0, PUB=0)   # women


## Draw next M
#draw_next_M <- function(M_cur){
#   sample(M_states, size=1, prob = Q_M[as.character(M_cur),])
#}

p_educ  <- 0.5   # Pr(educ=1)
beta_u <- 1.0    # Utility scale and home intercept
beta   <- 0.95   # discount factor
beta1  <- 0.15   # Education return (log-wage)



## ---------------------------------------------------------
## START SOLVING ETC. 
## ---------------------------------------------------------

## Keep the same N, T_per, genders, sectors, HOME, beta1, beta_u, beta, Q_F, etc.
## We'll reuse global g_vec, educ_vec for comparability across scenarios.

draw_eps_by_gender_CRN <- function(g_vec, Sigma_m, Sigma_f){
  Nloc <- length(g_vec)
  eps_mat <- matrix(NA_real_, Nloc, length(sectors),
                    dimnames=list(NULL, sectors))
  
  L_m <- chol(Sigma_m)
  L_f <- chol(Sigma_f)
  
  Z <- qnorm(U_eps)   # SAME underlying shocks across scenarios
  
  for (i in seq_len(Nloc)){
    if (g_vec[i] == "men"){
      eps_mat[i, ] <- Z[i, ] %*% L_m
    } else {
      eps_mat[i, ] <- Z[i, ] %*% L_f
    }
  }
  eps_mat
}

##  A3) Precompute per-person log wages for all sectors
## A3) Precompute per-person log wages for all sectors (robust, base-R)
## Precompute per-person log wages by sector (fixed forever)
precompute_logW_all <- function(g_vec, educ_vec, eps_mat, beta0_df){
  stopifnot(is.matrix(eps_mat),
            nrow(eps_mat) == length(g_vec),
            all(colnames(eps_mat) %in% sectors))
  
  Nloc <- length(g_vec)
  logW_all <- matrix(NA_real_, nrow = Nloc, ncol = length(sectors),
                     dimnames = list(NULL, sectors))
  
  # helper: fetch scalar beta0 for (g,s)
  get_beta0 <- function(gg, ss){
    idx <- (beta0_df$g == gg) & (beta0_df$s == ss)
    b <- beta0_df$par[idx]
    if (length(b) != 1L) stop(sprintf("beta0_df mismatch for g=%s, s=%s (found %d rows)", gg, ss, length(b)))
    b
  }
  
  for (ss in sectors){
    col <- match(ss, sectors)
    beta0_vec <- vapply(seq_len(Nloc), function(i) get_beta0(g_vec[i], ss), numeric(1))
    logW_all[, col] <- beta0_vec + beta1 * educ_vec + eps_mat[, col]
  }
  logW_all
}

## ----------------------------
##  A4) Solve V for one person *given scenario-specific objects*
## 4) Value Function (Backward Induction, person-specific)
## ----------------------------
## State: (t, r ∈ {H,S,G,P}, F ∈ {0,1,2})
## V_{T+1}(.) = 0. For t=T,...,1 compute V_t(r,F) = E_{O|r} [ max{ V_H, V_emp(O) } ]
## where V_H = u_home(F) + β E_F' V_{t+1}(H, F'), and
##       V_emp(s) = u_employed(logW_s, s, F) + β E_F' V_{t+1}(s, F').
solve_value_person_scn <- function(i, logW_all, g_vec, gamma_m,gamma_f,
                                   lambda_m_cond, lambda_f_cond){
  g <- g_vec[i]
  lambda_cond <- if (g=="men") lambda_m_cond else lambda_f_cond
  gamma <- if (g=="men") gamma_m else gamma_f
  delta <- if (g=="men") delta_men else delta_fem
  logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
  t_names <- as.character(1:(T_per+1)) # time index: 1..(T+1); V[t = T+1, ., .] = 0 (terminal)
  #V <- array(0, dim=c(T_per+1, length(R_states), length(F_states)),
  #           dimnames=list(t=t_names, r=R_states, F=as.character(F_states)))
  V <- array(
    0,
    dim=c(T_per+1, length(R_states), length(F_states), length(M_states)),
    dimnames=list(
      t=t_names,
      r=R_states,
      F=as.character(F_states),
      M=as.character(M_states)
    )
  )
  
  # backward induction: tt = T, ..., 1
  for (tt in T_per:1){
    for (r in R_states){
      p_offers <- lambda_cond[[r]]
      for (Fcur in F_states){
        for (Mcur in M_states){
          
          ## continuation if choose Home
          EV_next_HME <- sum(
            Q_F[as.character(Fcur), ] *
              V[as.character(tt+1), HOME, , as.character(Mcur)]
          )
          
          V_emp <- c(SUB=-Inf, PRI=-Inf, PUB=-Inf)
          
          for (s in sectors){
            EV_next_s <- sum(
              Q_F[as.character(Fcur), ] *
                V[as.character(tt+1), s, , as.character(Mcur)]
            )
            V_emp[s] <- beta_u * logW_i[s] +
              Fcur * gamma[s] +    #ifelse(Fcur >= 1, gamma[s], 0) +
              ifelse(Mcur == 1, delta[s], 0) +
              beta * EV_next_s
          }
          
          V_HME <- 0 +
            Fcur * gamma["HME"] +   #ifelse(Fcur >= 1, gamma["HME"], 0) +
            ifelse(Mcur == 1, delta["HME"], 0) +
            beta * EV_next_HME
          
          EV <- p_offers["None"] * V_HME +
            sum(p_offers[sectors] *
                  pmax(V_HME, V_emp[sectors]))
          
          V[as.character(tt), r, as.character(Fcur), as.character(Mcur)] <- EV
          
        }  # end Mcur
      }  # end Fcur
    }  # end r
  }  # end tt
  
}




## ----------------------------
## Simulation of optimal paths
## ----------------------------
##  Simulate optimal paths *given* V_all and scenario objects
## Policy: Given (t, r, F) and realized offer O:
## choose O if V_emp(O) >= V_H; else choose Home.
## Offers drawn from λ_{·|r,g}; F evolves via Q_F.
simulate_paths_scn <- function(
    V_all, logW_all, g_vec, F0, gamma_m, gamma_f,
    lambda_m_cond, lambda_f_cond
){
  
  # helper: draw one offer given gender and current attachment r
  #draw_offer_cond <- function(g, r){
  #  probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
  #  draw  <- sample(c(sectors, "None"), size = 1, prob = probs)
  #  if (draw == "None") NA_character_ else draw
  #}
  
  draw_offer_cond_CRN <- function(g, r, i, t){
    probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
    u <- U_off[i, t]
    idx <- which(u <= cumsum(probs))[1]
    draw <- names(probs)[idx]
    if (draw == "None") NA_character_ else draw
  }
  
  out_list <- vector("list", length = N)
  
  for (i in seq_len(N)) {
    g_i    <- g_vec[i]
    educ_i <- educ_vec[i]
    V_i    <- V_all[[i]]
    logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
    
    # start-of-panel state for person i
    r_cur  <- HOME
    F_cur  <- F0[i]
    M_cur  <- M0[i]
    
    # storage for this person's path
    rows <- vector("list", length = T_per)
    
    for (tt in 1:T_per) {
      # gender-specific gamma for THIS person
      gamma <- if (g_i == "men") gamma_m else gamma_f
      
      # draw an offer conditional on current attachment r_cur
      #offer <- draw_offer_cond(g_i, r_cur)
      offer <- draw_offer_cond_CRN(g_i, r_cur, i, tt)
      
      # continuation value if choose Home now
      EV_next_HME <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), HOME, ])
      #V_HME <- 0 + ifelse(F_cur >= 1, gamma[["HME"]], 0) + beta * EV_next_HME
      V_HME <- 0 + F_cur * gamma[["HME"]] + beta * EV_next_HME
      
      # default choice is Home
      choice <- HOME
      employed <- 0L
      sector   <- NA_character_
      logW_chosen <- NA_real_
      
      # if there is an offer, compare employed value vs home
      if (!is.na(offer)) {
        s <- offer
        EV_next_s <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), s, ])
        V_emp_off <- beta_u * logW_i[s] +
          F_cur * gamma[[s]] + #ifelse(F_cur >= 1, gamma[[s]], 0) +
          beta * EV_next_s
        
        if (V_emp_off >= V_HME) {
          choice <- s
          employed <- 1L
          sector <- s
          logW_chosen <- logW_i[s]
        }
      }
      
      # record this period
      rows[[tt]] <- data.frame(
        i = i, t = tt, g = g_i, educ = educ_i, F = F_cur, M=M_cur,      
        r = r_cur, offer = offer, choice = choice,
        employed = employed, sector = sector, logW = logW_chosen,
        stringsAsFactors = FALSE
      )
      
      # move to next period
      r_cur <- choice
      #if (tt < T_per) {
      #  F_cur <- draw_next_F(F_cur)
      #  M_cur <- draw_next_M(M_cur)
      #}
      if (tt < T_per) {
        F_cur <- draw_next_F_CRN(F_cur, i, tt)
        M_cur <- draw_next_M_CRN(M_cur, i, tt)
      }
      
      
    }
    
    out_list[[i]] <- do.call(rbind, rows)
  }
  
  dt <- do.call(rbind, out_list)
  rownames(dt) <- NULL
  dt
}

5.6 RUN BASELINE AND COUNTERFACTUAL SCENARIOS AND SAVE OUTPUT

## We'll keep the same g_vec, educ_vec, F0 across scenarios (comparability)
#g_vec <- sample(genders, N, replace=TRUE)
#educ_vec <- rbinom(N, 1, p_educ)
#F0 <- sample(F_states, size=N, replace=TRUE, prob=c(0.8, 0.18, 0.02))
#M0 <- sample(M_states, size=N, replace=TRUE, prob=c(0.90, 0.10))
#draw_next_F <- function(F){ sample(F_states, 1, prob=Q_F[as.character(F), ]) }
#draw_next_M <- function(M){ sample(M_states, 1, prob=Q_M[as.character(M),])}


## ---- deterministic draws from master uniforms ----
g_vec <- ifelse(U_gender <= 0.5, "men", "fem")
educ_vec <- as.integer(U_educ <= p_educ)
F0 <- sapply(U_initF, function(u)
  F_states[ which(u <= cumsum(c(0.8, 0.18, 0.02)))[1] ])
M0 <- sapply(U_initM, function(u)
  M_states[ which(u <= cumsum(c(0.90, 0.10)))[1] ])

draw_next_F_CRN <- function(F_cur, i, t){
  p <- Q_F[as.character(F_cur), ]
  u <- U_F[i, t]
  F_states[ which(u <= cumsum(p))[1] ]
}

draw_next_M_CRN <- function(M_cur, i, t){
  p <- Q_M[as.character(M_cur), ]
  u <- U_M[i, t]
  M_states[ which(u <= cumsum(p))[1] ]
}


run_scenario <- function(scn_label){
  scn <- make_scenario(scn_label)
  
  
  ## Draw epsilons (only if needed)
#  if (scn$redraw_eps){
#    eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
#  } else {
#    ## reuse the eps_mat from your baseline if present; else draw with base Sigmas
#    if (exists("eps_mat")) {
#      eps_mat_scn <- eps_mat
#    } else {
#      #eps_mat_scn <- draw_eps_by_gender(g_vec, Sigma_m, Sigma_f)
#      eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
#    }
#  }
  
  eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
  logW_all_scn <- precompute_logW_all(g_vec, educ_vec, eps_mat_scn, scn$beta0_df)
  
  ## Solve V for each person
  ## Solve V for everyone (vectorized over i)
  ## (This is the heavy step: 5000 people × (T=10 × r=4 × F=3) is very feasible.)
  V_all_scn <- lapply(1:N, function(i)
    solve_value_person_scn(i, logW_all_scn, g_vec, scn$gamma_m,scn$gamma_f,
                           scn$lambda_m_cond, scn$lambda_f_cond))
  
  ## Simulate paths
  dt_scn <- simulate_paths_scn(V_all_scn, logW_all_scn, g_vec, F0, scn$gamma_m,scn$gamma_f,
                               scn$lambda_m_cond, scn$lambda_f_cond)
  dt_scn$scenario <- scn_label
  
  
  ## === Diagnostics ===
  
  ## (1) Pooled log-wage densities by sector×gender
  wages_pool <- dt_scn |>
    dplyr::filter(!is.na(sector), !is.na(logW)) |>
    dplyr::mutate(sector=factor(sector, levels=sectors))
  
  ## (2) Movers: Δ logW by origin→dest, gender
  moves <- dt_scn |>
    dplyr::group_by(i) |>
    dplyr::arrange(t, .by_group = TRUE) |>
    dplyr::mutate(next_logW = dplyr::lead(logW),
                  next_sector = dplyr::lead(sector),
                  employed_next = dplyr::lead(employed)) |>
    dplyr::ungroup() |>
    dplyr::filter(employed==1, employed_next==1,
                  !is.na(sector), !is.na(next_sector)) |>
    dplyr::mutate(orig=sector, dest=next_sector,
                  dlogW = next_logW - logW)
  
  ## (3) Residuals: remove sector×gender×educ mean (proxy for unobs)
  dt_resid <- dt_scn |>
    dplyr::filter(!is.na(sector), !is.na(logW)) |>
    dplyr::group_by(g, sector, educ) |>
    dplyr::mutate(resid = logW - mean(logW)) |>
    dplyr::ungroup()
  
  ## (4) Rank–rank slopes for G→P movers (by gender)
  rr <- dt_resid |>
    dplyr::select(i,t,g,sector,resid) |>
    dplyr::group_by(i) |>
    dplyr::arrange(t,.by_group=TRUE) |>
    dplyr::mutate(next_resid = dplyr::lead(resid),
                  next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    dplyr::filter(sector=="PUB", next_sector=="PRI") |>
    dplyr::group_by(g) |>
    dplyr::summarise(rank_slope = {
      x <- rank(resid, ties.method="average")/dplyr::n()
      y <- rank(next_resid, ties.method="average")/dplyr::n()
      if (length(x)>5) coef(lm(y ~ x))[2] else NA_real_
    }, .groups="drop")
  rr$scenario <- scn_label
  
  ## (5) Stayer–leaver residual gap in origin sector G (by gender)
  sl <- dt_resid |>
    # 1. Sort each person's panel properly
    dplyr::group_by(i) |>
    dplyr::arrange(t, .by_group = TRUE) |>
    
    # 2. Next-period sector
    dplyr::mutate(next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    
    # 3. Keep only observations where current sector is PUB
    dplyr::filter(sector == "PUB") |>
    
    # 4. Drop final rows with no next observation
    dplyr::filter(!is.na(next_sector)) |>
    
    # 5. Stayer vs. leaver classification
    dplyr::mutate(
      status = dplyr::case_when(
        next_sector == "PUB" ~ "stayer",
        next_sector != "PUB" ~ "leaver"
      )
    ) |>
    
    # 6. Compute mean residuals by gender × status
    dplyr::group_by(g, status) |>
    dplyr::summarise(
      mean_resid = mean(resid, na.rm = TRUE),
      .groups = "drop"
    ) |>
    
    # 7. Reshape wide: stayer, leaver columns
    tidyr::pivot_wider(
      names_from = status,
      values_from = mean_resid,
      values_fill = NA
    ) |>
    
    # 8. Compute gap safely
    dplyr::mutate(
      gap = stayer - leaver,
      scenario = scn_label
    )
  
  
  ## (6) Staying hazards by current sector (state dependence proxy)
  haz <- dt_scn |>
    dplyr::group_by(i) |>
    dplyr::arrange(t,.by_group=TRUE) |>
    dplyr::mutate(next_sector = dplyr::lead(sector)) |>
    dplyr::ungroup() |>
    dplyr::filter(!is.na(sector)) |>
    dplyr::mutate(stay = as.integer(sector==next_sector)) |>
    dplyr::group_by(g, sector) |>
    dplyr::summarise(stay_hazard = mean(stay, na.rm=TRUE), .groups="drop") |>
    dplyr::mutate(scenario=scn_label)
  
  list(dt=dt_scn,
       wages_pool=wages_pool,
       moves=moves,
       rr=rr,
       sl=sl,
       haz=haz)
}




# Run the scenarios included in the scenario_list and save the output in `results`    
results <- lapply(scenario_list, run_scenario)
names(results) <- scenario_list




## Bind outputs for plotting / tables
wages_all <- dplyr::bind_rows(lapply(results, `[[`, "wages_pool"), .id="scenario_id")
moves_all <- dplyr::bind_rows(lapply(results, `[[`, "moves"), .id="scenario_id")
rr_all    <- dplyr::bind_rows(lapply(results, `[[`, "rr"), .id="scenario_id")
sl_all    <- dplyr::bind_rows(lapply(results, `[[`, "sl"), .id="scenario_id")
haz_all   <- dplyr::bind_rows(lapply(results, `[[`, "haz"), .id="scenario_id")


## Combine scenarios into one panel
to_factor <- function(x) factor(x, levels = c("SUB","PRI","PUB"))
panel_all <- dplyr::bind_rows(
  lapply(results, \(res) res$dt |> dplyr::mutate(sector = to_factor(sector))),
  .id = "scenario_id"
)

5.7 COMPARE KEY MOMENTS ACROSS SCENARIOS

5.7.1 (i) Employment Transitions by Gender

transitions_all <- panel_all |>
  dplyr::group_by(scenario_id, g, i) |>
  dplyr::arrange(t, .by_group = TRUE) |>
  dplyr::mutate(
    state_t  = factor(ifelse(is.na(sector), "HME", as.character(sector)),
                      levels = state_levels),
    state_t1 = dplyr::lead(state_t)
  ) |>
  dplyr::ungroup() |>
  dplyr::filter(!is.na(state_t1)) |>
  dplyr::count(scenario_id, g, state_t, state_t1, name = "n") |>
  dplyr::group_by(scenario_id, g, state_t) |>
  dplyr::mutate(share = n / sum(n)) |>
  dplyr::ungroup()







# --- Step 1: wide by destination state (same as you already do)
transitions_wide <- transitions_all |>
  tidyr::pivot_wider(
    id_cols  = c(scenario_id, g, state_t),
    names_from = state_t1,
    values_from = share,
    values_fill = 0
  )

# --- Step 2: widen again by gender 
transitions_wide_both <- transitions_wide %>%
  dplyr::arrange(
    factor(scenario_id, levels = scenario_list),
    factor(state_t, levels = c("HME","SUB","PRI","PUB")),
    g
  ) %>%
  tidyr::pivot_wider(
    names_from = g,
    values_from = -c(scenario_id, state_t, g),
    names_glue = "{.value}_{g}"
  )

tbl <- knitr::kable(
  transitions_wide_both,
  caption = "Transition Matrix by Scenario (Women vs Men)",
  align = "c",
  digits = 2,
  format = "html"
) %>%
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    font_size = 13
  ) %>%
  kableExtra::collapse_rows(
    columns = 1,   # only scenario_id now (state_t should NOT be collapsed)
    valign = "top"
  )

kableExtra::save_kable(tbl, "table_trans.png", zoom = 2, density = 300)
tbl

Transition Matrix by Scenario (Women vs Men)
scenario_id	state_t	HME_fem	HME_men	SUB_fem	SUB_men	PRI_fem	PRI_men	PUB_fem	PUB_men
BASELINE	HME	0.42	0.43	0.21	0.18	0.20	0.18	0.18	0.20
	SUB	0.19	0.19	0.49	0.47	0.17	0.17	0.16	0.17
	PRI	0.18	0.19	0.17	0.16	0.49	0.48	0.15	0.17
	PUB	0.18	0.19	0.16	0.16	0.17	0.16	0.48	0.50
CF1_price_gap	HME	0.37	0.37	0.19	0.18	0.19	0.25	0.25	0.20
	SUB	0.16	0.15	0.49	0.47	0.17	0.20	0.19	0.17
	PRI	0.15	0.19	0.17	0.16	0.49	0.49	0.18	0.17
	PUB	0.18	0.16	0.16	0.16	0.18	0.19	0.47	0.50
CF2_disp	HME	0.45	0.43	0.21	0.18	0.20	0.18	0.14	0.20
	SUB	0.21	0.19	0.49	0.47	0.17	0.17	0.13	0.17
	PRI	0.21	0.19	0.17	0.16	0.49	0.48	0.13	0.17
	PUB	0.18	0.19	0.16	0.16	0.18	0.16	0.48	0.50
CF3_compadv	HME	0.40	0.43	0.20	0.18	0.20	0.18	0.19	0.20
	SUB	0.18	0.19	0.49	0.47	0.17	0.17	0.16	0.17
	PRI	0.19	0.19	0.17	0.16	0.49	0.48	0.15	0.17
	PUB	0.19	0.19	0.17	0.16	0.17	0.16	0.48	0.50
CF4_strong_compadv	HME	0.42	0.43	0.21	0.18	0.21	0.18	0.15	0.20
	SUB	0.21	0.19	0.49	0.47	0.17	0.17	0.13	0.17
	PRI	0.21	0.19	0.17	0.16	0.49	0.48	0.13	0.17
	PUB	0.19	0.19	0.16	0.16	0.16	0.16	0.48	0.50
CF5_family	HME	0.38	0.43	0.20	0.18	0.19	0.18	0.22	0.20
	SUB	0.17	0.19	0.49	0.47	0.17	0.17	0.17	0.17
	PRI	0.17	0.19	0.17	0.16	0.49	0.48	0.17	0.17
	PUB	0.18	0.19	0.16	0.16	0.18	0.16	0.48	0.50
CF6_offers	HME	0.56	0.43	0.13	0.18	0.15	0.18	0.16	0.20
	SUB	0.12	0.19	0.45	0.47	0.12	0.17	0.31	0.17
	PRI	0.12	0.19	0.13	0.16	0.45	0.48	0.30	0.17
	PUB	0.06	0.19	0.12	0.16	0.12	0.16	0.70	0.50

#Now do by kids
add_states_and_kids2 <- function(df) {
  df %>%
    arrange(t, .by_group = TRUE) %>%
    mutate(
      state_t  = factor(ifelse(is.na(sector), "HME", as.character(sector)),
                        levels = state_levels),
      state_t1 = dplyr::lead(state_t),
      
      # Next-period fertility level
      F_next = dplyr::lead(F),
      
      # Define change ONLY when fertility strictly increases
      kid_change2 = dplyr::case_when(
        !is.na(F_next) & F != 0 & F_next != F ~ "change",
        !is.na(F_next) & F != 0 & F_next == F ~ "no-change",
        TRUE                                  ~ NA_character_
      ),
      kid_change2 = factor(kid_change2, levels = c("no-change", "change"))
    )
}


summarise_transitions2 <- function(df) {
  df %>%
    dplyr::ungroup() %>%
    dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
    dplyr::count(scenario_id, g, kid_change2, state_t, state_t1, name = "n") %>%
    tidyr::complete(scenario_id, g, kid_change2, state_t, state_t1, fill = list(n = 0)) %>%
    dplyr::mutate(
      state_t  = factor(state_t,  levels = state_levels),
      state_t1 = factor(state_t1, levels = state_levels)
    ) %>%
    dplyr::group_by(scenario_id, g, kid_change2, state_t) %>%
    dplyr::mutate(share = n / sum(n)) %>%
    dplyr::ungroup()
}

5.7.2 (ii) Transitions by Fertility Change

transitions_kid2 <- panel_all %>%
  dplyr::filter(scenario_id %in% scenario_list) %>%   # ← keep BOTH genders
  dplyr::group_by(scenario_id, g, i) %>%
  add_states_and_kids2() %>%
  summarise_transitions2()



library(kableExtra)


to_wide2 <- function(trans_long) {
  trans_long %>%
    tidyr::pivot_wider(
      id_cols = c(scenario_id, g, kid_change2, state_t),
      names_from = state_t1,
      values_from = share,
      values_fill = 0
    ) %>%
    dplyr::arrange(scenario_id, g, kid_change2, state_t)
}


transitions_kid2_wide <- to_wide2(transitions_kid2)

# Keep both genders and pivot gender into columns
transitions_kid2_wide_both <- transitions_kid2_wide %>%
  dplyr::arrange(
    factor(scenario_id, levels = scenario_list),
    factor(state_t, levels = c("HME","SUB","PRI","PUB")),
    factor(kid_change2, levels = c("no-change","change")),
    g
  ) %>%
  tidyr::pivot_wider(
    names_from = g,
    values_from = -c(scenario_id, state_t, kid_change2, g),
    names_glue = "{.value}_{g}"
  )

tbl <- knitr::kable(
  transitions_kid2_wide_both,
  caption = "Transition Matrix by Scenario × Kids-Change (Fem vs Men)",
  align = "c",
  digits = 2,
  format = "html"
) %>%
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    font_size = 13
  ) %>%
  kableExtra::collapse_rows(
    columns = 1:2,   # scenario_id + state_t
    valign = "top"
  )
kableExtra::save_kable(tbl, "table_trans_bykidschge.png", zoom = 2, density = 300)
tbl

Transition Matrix by Scenario × Kids-Change (Fem vs Men)
scenario_id	kid_change2	state_t	HME_fem	HME_men	SUB_fem	SUB_men	PRI_fem	PRI_men	PUB_fem	PUB_men
BASELINE	no-change	HME	0.43	0.43	0.20	0.18	0.21	0.19	0.17	0.19
	change	HME	0.47	0.36	0.25	0.19	0.11	0.23	0.18	0.22
	no-change	SUB	0.17	0.18	0.49	0.49	0.17	0.18	0.16	0.16
	change	SUB	0.19	0.15	0.43	0.39	0.21	0.20	0.17	0.26
	no-change	PRI	0.18	0.19	0.17	0.15	0.51	0.50	0.14	0.17
	change	PRI	0.12	0.18	0.22	0.22	0.58	0.39	0.08	0.21
	no-change	PUB	0.17	0.18	0.17	0.15	0.18	0.16	0.48	0.51
	change	PUB	0.22	0.15	0.10	0.16	0.15	0.21	0.53	0.49
CF1_price_gap	no-change	HME	0.37	0.37	0.19	0.18	0.20	0.27	0.24	0.19
	change	HME	0.48	0.33	0.22	0.19	0.06	0.30	0.24	0.19
	no-change	SUB	0.14	0.14	0.49	0.49	0.17	0.21	0.19	0.16
	change	SUB	0.17	0.14	0.44	0.38	0.21	0.22	0.17	0.26
	no-change	PRI	0.15	0.19	0.17	0.15	0.50	0.50	0.17	0.16
	change	PRI	0.10	0.15	0.22	0.24	0.58	0.38	0.10	0.23
	no-change	PUB	0.17	0.15	0.17	0.15	0.19	0.18	0.47	0.51
	change	PUB	0.20	0.12	0.10	0.15	0.17	0.24	0.52	0.49
CF2_disp	no-change	HME	0.46	0.43	0.21	0.18	0.21	0.19	0.13	0.19
	change	HME	0.54	0.36	0.23	0.19	0.11	0.23	0.12	0.22
	no-change	SUB	0.20	0.18	0.49	0.49	0.17	0.18	0.13	0.16
	change	SUB	0.21	0.15	0.43	0.39	0.21	0.20	0.15	0.26
	no-change	PRI	0.20	0.19	0.17	0.15	0.51	0.50	0.13	0.17
	change	PRI	0.12	0.18	0.22	0.22	0.58	0.39	0.08	0.21
	no-change	PUB	0.16	0.18	0.17	0.15	0.18	0.16	0.49	0.51
	change	PUB	0.21	0.15	0.12	0.16	0.15	0.21	0.52	0.49
CF3_compadv	no-change	HME	0.40	0.43	0.20	0.18	0.22	0.19	0.18	0.19
	change	HME	0.49	0.36	0.22	0.19	0.12	0.23	0.17	0.22
	no-change	SUB	0.17	0.18	0.49	0.49	0.17	0.18	0.17	0.16
	change	SUB	0.19	0.15	0.44	0.39	0.21	0.20	0.15	0.26
	no-change	PRI	0.18	0.19	0.17	0.15	0.51	0.50	0.14	0.17
	change	PRI	0.10	0.18	0.22	0.22	0.59	0.39	0.08	0.21
	no-change	PUB	0.17	0.18	0.17	0.15	0.17	0.16	0.49	0.51
	change	PUB	0.23	0.15	0.12	0.16	0.15	0.21	0.50	0.49
CF4_strong_compadv	no-change	HME	0.42	0.43	0.21	0.18	0.23	0.19	0.14	0.19
	change	HME	0.48	0.36	0.22	0.19	0.14	0.23	0.16	0.22
	no-change	SUB	0.19	0.18	0.49	0.49	0.17	0.18	0.14	0.16
	change	SUB	0.23	0.15	0.43	0.39	0.21	0.20	0.13	0.26
	no-change	PRI	0.20	0.19	0.17	0.15	0.51	0.50	0.12	0.17
	change	PRI	0.13	0.18	0.22	0.22	0.58	0.39	0.07	0.21
	no-change	PUB	0.18	0.18	0.17	0.15	0.16	0.16	0.49	0.51
	change	PUB	0.24	0.15	0.11	0.16	0.17	0.21	0.48	0.49
CF5_family	no-change	HME	0.32	0.43	0.18	0.18	0.20	0.19	0.30	0.19
	change	HME	0.46	0.36	0.22	0.19	0.06	0.23	0.26	0.22
	no-change	SUB	0.13	0.18	0.49	0.49	0.17	0.18	0.21	0.16
	change	SUB	0.16	0.15	0.45	0.39	0.22	0.20	0.18	0.26
	no-change	PRI	0.14	0.19	0.17	0.15	0.51	0.50	0.19	0.17
	change	PRI	0.09	0.18	0.22	0.22	0.57	0.39	0.12	0.21
	no-change	PUB	0.18	0.18	0.17	0.15	0.19	0.16	0.47	0.51
	change	PUB	0.20	0.15	0.10	0.16	0.18	0.21	0.52	0.49
CF6_offers	no-change	HME	0.58	0.43	0.13	0.18	0.16	0.19	0.14	0.19
	change	HME	0.52	0.36	0.07	0.19	0.20	0.23	0.22	0.22
	no-change	SUB	0.11	0.18	0.44	0.49	0.12	0.18	0.32	0.16
	change	SUB	0.11	0.15	0.47	0.39	0.08	0.20	0.33	0.26
	no-change	PRI	0.13	0.19	0.14	0.15	0.46	0.50	0.27	0.17
	change	PRI	0.06	0.18	0.23	0.22	0.45	0.39	0.26	0.21
	no-change	PUB	0.06	0.18	0.12	0.15	0.12	0.16	0.70	0.51
	change	PUB	0.09	0.15	0.06	0.16	0.18	0.21	0.67	0.49

5.7.3 (iii) Staying Hazard in Public Sector

haz_kid2 <- panel_all %>%
  dplyr::group_by(scenario_id, g, i) %>%
  add_states_and_kids2() %>%
  dplyr::ungroup() %>%
  dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
  dplyr::filter(state_t %in% sectors) %>%
  dplyr::group_by(scenario_id, g, kid_change2, sector = state_t) %>%
  dplyr::summarise(stay_hazard = mean(state_t1 == sector), .groups="drop")

# --- WOMEN ---
haz_pub_fem <- haz_kid2 %>%
  dplyr::filter(g == "fem", sector == "PUB")

p_women <- ggplot(haz_pub_fem,
                  aes(x = scenario_id, y = stay_hazard, fill = kid_change2)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Staying Hazard in Public Sector (Women)",
       x = "Scenario", y = "Stay probability") +
  theme_minimal()

ggsave("fig_stayhaz_fem.png", dpi = 300, width = 10, height = 8)

#print(p_women)

# --- MEN ---
haz_pub_men <- haz_kid2 %>%
  dplyr::filter(g != "fem", sector == "PUB")

p_men <- ggplot(haz_pub_men,
                aes(x = scenario_id, y = stay_hazard, fill = kid_change2)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Staying Hazard in Public Sector (Men)",
       x = "Scenario", y = "Stay probability") +
  theme_minimal()
ggsave("fig_stayhaz_men.png", dpi = 300, width = 10, height = 8)
#print(p_men)

gridExtra::grid.arrange(p_women, p_men, ncol = 1)

5.7.4 (iv) Log‑Wage Densities by Sector and Gender

mean_df <- wages_all |>
  dplyr::group_by(scenario_id, sector, g) |>
  dplyr::summarise(mean_logW = mean(logW, na.rm=TRUE), .groups="drop")

plot <- ggplot(wages_all, aes(x = logW, color = g, fill = g)) +
  geom_density(alpha = 0.2) +
  geom_vline(data = mean_df,
             aes(xintercept = mean_logW, color = g),
             linetype="dashed") +
  facet_grid(sector ~ scenario_id, scales="free_y") +
  theme_minimal() +
  labs(title = "Log Wage Density by Sector",
       x = "Log wage", y = "Density")
#print(plot)
#ggsave("log_wage_density.png", plot = plot,width = 10, height = 8, dpi = 300)
ggsave("fig_logw_density.png", dpi = 300, width = 10, height = 8)
print(plot)

5.7.5 (v) Mover Gains (\(\Delta\) log wage PRI \(\rightarrow\) PUB)

Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage

moves_plot <- moves_all |>
  dplyr::filter(orig == "PRI", dest == "PUB")

plot <- ggplot(moves_plot, aes(x = g, y = dlogW, fill = g)) +
  geom_boxplot(outlier.alpha = 0.2) +
  geom_hline(yintercept = 0, linetype=2) +
  facet_grid(. ~ scenario_id) +
  theme_minimal() +
  labs(title = "Mover Gains PRI → PUB",
       x = "Gender", y = "Δ log wage")
ggsave("fig_movergain1.png", dpi = 300, width = 10, height = 8)
print(plot)

5.7.6 (vi) Mover Gains (\(\Delta\) log wage PUB \(\rightarrow\) PRI)

Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage

moves_plot2 <- moves_all |>
  dplyr::filter(orig == "PUB", dest == "PRI")

plot <- ggplot(moves_plot2, aes(x = g, y = dlogW, fill = g)) +
  geom_boxplot(outlier.alpha = 0.2) +
  geom_hline(yintercept = 0, linetype=2) +
  facet_grid(. ~ scenario_id) +
  theme_minimal() +
  labs(title = "Mover Gains PUB → PRI",
       x = "Gender", y = "Δ log wage")
ggsave("fig_movergain2.png", dpi = 300, width = 10, height = 8)
print(plot)

5.7.7 (vii) Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)

Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)

rr_table <- rr_all |>
  dplyr::arrange(g, scenario_id) |>
  gt(rowname_col = "scenario_id", groupname_col = "g") |>
  tab_header(
    title = md("**Rank–Rank Slope for PUB → PRI Movers**"),
    subtitle = md(
      "A lower slope indicates stronger comparative advantage: \
      it measures how strongly a worker’s within-PUB sector rank \
      predicts their rank in PRIVATE sector upon moving."
    )
  ) |>
  fmt_number(columns = dplyr::everything(), decimals = 3) |> ## Format slope values nicely
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_stub() ## Styling
  ) |>
  opt_row_striping() |>
  tab_options(
    table.width = pct(100),
    data_row.padding = px(4),
    table.font.size = 13
  )
#rr_table

gtsave(rr_table, "table_ranks.png")      # PNG
gtsave(rr_table, "table_ranks.pdf")      # PDF 
rr_table

	rank_slope	scenario
Rank–Rank Slope for PUB → PRI Movers
A lower slope indicates stronger comparative advantage: it measures how strongly a worker’s within-PUB sector rank predicts their rank in PRIVATE sector upon moving.
fem
BASELINE	0.020	BASELINE
CF1_price_gap	0.003	CF1_price_gap
CF2_disp	0.040	CF2_disp
CF3_compadv	−0.197	CF3_compadv
CF4_strong_compadv	−0.160	CF4_strong_compadv
CF5_family	0.006	CF5_family
CF6_offers	−0.039	CF6_offers
men
BASELINE	−0.069	BASELINE
CF1_price_gap	−0.031	CF1_price_gap
CF2_disp	−0.069	CF2_disp
CF3_compadv	−0.069	CF3_compadv
CF4_strong_compadv	−0.069	CF4_strong_compadv
CF5_family	−0.069	CF5_family
CF6_offers	−0.069	CF6_offers

5.7.8 (viii) Stayer–leaver residual gap in G (higher gap ⇒ stronger selection on ε in G)

sl_table <- sl_all |>
  dplyr::arrange(g, scenario) |>
  gt(rowname_col = "scenario", groupname_col = "g") |>
  tab_header(
    title = md("**Stayer–Leaver Residual Gap in Origin Sector PUB**"),
    subtitle = md(
      "Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: \
      ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector."
    )
  ) |>
  fmt_number(columns = c(stayer, leaver, gap), decimals = 3) |> ## Format columns nicely
  opt_row_striping() |>
  tab_style(
    style = cell_text(weight = "bold"),
    locations = cells_stub() ## Styling
  ) |>
  tab_options(
    table.width = pct(100),
    table.font.size = 13,
    data_row.padding = px(4)
  )

gtsave(sl_table, "table_stayleave_haz.png")      # PNG
gtsave(sl_table, "table_stayleave_haz.pdf")      # PDF 
sl_table

	scenario_id	leaver	stayer	gap
Stayer–Leaver Residual Gap in Origin Sector PUB
Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector.
fem
BASELINE	BASELINE	−0.001	0.001	0.002
CF1_price_gap	CF1_price_gap	−0.008	0.005	0.014
CF2_disp	CF2_disp	−0.002	−0.003	−0.001
CF3_compadv	CF3_compadv	−0.012	0.006	0.017
CF4_strong_compadv	CF4_strong_compadv	−0.012	0.002	0.014
CF5_family	CF5_family	−0.005	0.006	0.011
CF6_offers	CF6_offers	0.001	0.001	0.000
men
BASELINE	BASELINE	0.007	−0.002	−0.009
CF1_price_gap	CF1_price_gap	0.006	−0.002	−0.008
CF2_disp	CF2_disp	0.007	−0.002	−0.009
CF3_compadv	CF3_compadv	0.007	−0.002	−0.009
CF4_strong_compadv	CF4_strong_compadv	0.007	−0.002	−0.009
CF5_family	CF5_family	0.007	−0.002	−0.009
CF6_offers	CF6_offers	0.007	−0.002	−0.009

5.7.9 (ix) Mean Δ log wage for consecutive employment

Wage transitions for consecutive employment (by scenario)

transitions <- panel_all |>
  filter(g == "fem") %>%
  dplyr::group_by(scenario_id, i) |>
  dplyr::arrange(t, .by_group = TRUE) |>
  dplyr::mutate(next_logW   = dplyr::lead(logW),
                next_sector = dplyr::lead(sector),
                next_F      = dplyr::lead(F),
                employed_next = dplyr::lead(employed)) |>
  dplyr::ungroup() |>
  dplyr::filter(employed == 1, employed_next == 1) |>
  dplyr::mutate(dlogW = next_logW - logW,
                F_trans = dplyr::case_when(
                  F == 0 & next_F == 0   ~ "no kids -> no kids",
                  F == 0 & next_F >= 1   ~ "no kids -> kids",
                  F >= 1 & next_F >= 1   ~ "kids -> kids",
                  F >= 1 & next_F == 0   ~ "kids -> no kids"
                ))


dlogW_stats <- transitions |>
  dplyr::mutate(orig = sector, dest = next_sector) |>
  
  # >>> keep only PUB→PRI or PRI→PUB transitions <<<
  dplyr::filter(
    (orig == "PUB" & dest == "PRI") |
      (orig == "PRI" & dest == "PUB")
  ) |>
  dplyr::group_by(scenario_id, g, orig, dest, F_trans) |>
  dplyr::summarise(mean_dlogW = mean(dlogW, na.rm = TRUE),
                   n = n(), .groups = "drop") |>
  dplyr::mutate(orig = to_factor(orig), dest = to_factor(dest)) |>
  dplyr::arrange(scenario_id, g, orig, dest, F_trans)
#print(head(dlogW_stats, 20))

p_dlog <- dlogW_stats |>
  dplyr::filter(n >= 2) |>
  ggplot(aes(x = interaction(orig, dest, sep = "→"),
             y = mean_dlogW, fill = F_trans)) +
  geom_col(position = "dodge") +
  facet_grid(rows = vars(g), cols = vars(scenario_id)) +
  labs(title = "Mean Δ log wage for consecutive employment by scenario",
       x = "Origin → Destination sector", y = "Mean Δ log wage",
       fill = "Fertility transition") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("fig_wtrans_kids.png", dpi = 300, width = 10, height = 8)
print(p_dlog)

5.7.10 (x) Sector Employment Shares Over Time

shares <- panel_all |>
  dplyr::filter(!is.na(sector)) |>
  dplyr::count(scenario_id, t, g, sector, name="n") |>
  dplyr::group_by(scenario_id, t, g) |>
  dplyr::mutate(share = n / sum(n)) |>
  dplyr::ungroup()

plot <- ggplot(shares,
               aes(x = t, y = share, color = sector, linetype = g)) +
  geom_line() +
  facet_grid(sector ~ scenario_id, scales="free_y") +
  theme_minimal() +
  labs(title = "Sector Employment Shares Over Time",
       x = "Time", y = "Share")
ggsave("fig_empshares.png", dpi = 300, width = 10, height = 8)
print(plot)

5.7.11 (xi) Mean Log Wages Over Time by Sector

logwages <- panel_all |>
  dplyr::filter(!is.na(sector), !is.na(logW)) |>
  dplyr::group_by(scenario_id, t, g, sector) |>
  dplyr::summarise(mean_logw = mean(logW), .groups="drop")

plot <- ggplot(logwages,
               aes(x = t, y = mean_logw, color = sector, linetype = g)) +
  geom_line() +
  facet_grid(sector ~ scenario_id, scales="free_y") +
  theme_minimal() +
  labs(title = "Mean Log Wages Over Time",
       x = "Time", y = "Mean log wage")
ggsave("fig_wage.png", dpi = 300, width = 10, height = 8)
print(plot)

A Dynamic Roy Model with Permanent Sector-Specific Skills and Fertility Shifters

1 MODEL SETUP

1.1 Wages (Fixed Conditional on Permanent Skills)

1.2 Offers: One Sector per Period, Home Always Available

1.3 Preferences

1.4 Dynamic Program

1.5 Normalizations and Interpretability

2 Identification (Heuristic)

2.1 a. Sector Skill Prices by Gender: \(\beta_{0,g}(s)\) (relative)

2.2 b. Permanent Skill Distribution: \(\Sigma_g\)

2.3 c. Fertility Taste Shifters: \(\gamma_s\) (Exclusion)

2.4 d. Offer Process: \(\lambda^g_s(X)\)

2.5 e. Prices and Endowments

3 Estimation Method (SMM/SML Skeleton)

4 Discussion (model mechanisms, identification, etc.)

4.1 IDENTIFICATION OF GENDER DIFFERENCES

5 MODEL SOUTION, SIMULATION, AND COUNTERFACTUALS

5.1 Baseline

5.2 Scenario list

5.2.1 Scenario CF1_price_gap

5.2.2 Scenario CF2_disp

5.2.3 Scenario CF3_compadv

5.2.4 Scenario CF4_strong_compadv

5.2.5 Scenario CF5_family

5.2.6 Scenario CF6_offers

5.2.7 Scenario EQ1_PRICES

5.2.8 Scenario EQ2_ENDOWMENTS

5.2.9 Scenario EQ3_OFFERS

5.2.10 Scenario EQ4_FAMILY

5.2.11 Some potential “same shares, different mechanism” stress tests

5.3 BASIC DEFINITIONS AND FUNCTIONS FOR THE SOLUTION/SIMULATION

5.4 CHOOSE WHICH BASELINE AND COUNTERFACTUAL SCENARIOS TO RUN

5.5 SOLUTION AND SIMULATION SUBROUTINES

5.6 RUN BASELINE AND COUNTERFACTUAL SCENARIOS AND SAVE OUTPUT

5.7 COMPARE KEY MOMENTS ACROSS SCENARIOS

5.7.1 (i) Employment Transitions by Gender

5.7.2 (ii) Transitions by Fertility Change

5.7.3 (iii) Staying Hazard in Public Sector

5.7.4 (iv) Log‑Wage Densities by Sector and Gender

5.7.5 (v) Mover Gains (\(\Delta\) log wage PRI \(\rightarrow\) PUB)

5.7.6 (vi) Mover Gains (\(\Delta\) log wage PUB \(\rightarrow\) PRI)

5.7.7 (vii) Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)

5.7.8 (viii) Stayer–leaver residual gap in G (higher gap ⇒ stronger selection on ε in G)

5.7.9 (ix) Mean Δ log wage for consecutive employment

5.7.10 (x) Sector Employment Shares Over Time

5.7.11 (xi) Mean Log Wages Over Time by Sector

5.2.1 Scenario `CF1_price_gap`

5.2.2 Scenario `CF2_disp`

5.2.3 Scenario `CF3_compadv`

5.2.4 Scenario `CF4_strong_compadv`

5.2.5 Scenario `CF5_family`

5.2.6 Scenario `CF6_offers`

5.2.7 Scenario `EQ1_PRICES`

5.2.8 Scenario `EQ2_ENDOWMENTS`

5.2.9 Scenario `EQ3_OFFERS`

5.2.10 Scenario `EQ4_FAMILY`