CF1_price_gapCF2_dispCF3_compadvCF4_strong_compadvCF5_familyCF6_offersEQ1_PRICESEQ2_ENDOWMENTSEQ3_OFFERSEQ4_FAMILYTime is discrete, \(t=0,1,\dots,T\). Each individual \(i\) has gender \(g\in\{m,f\}\) and education \(e\). There are 3 employment sectors, \[ \mathcal S \equiv \{\text{Subsistence }(SUB),\ \text{Private }(PRI),\ \text{Public }(PUB)\}, \] and a home alternative \(H\) (non-employment).
The state at time \(t\) for individual \(i\) is: \[ \Xi_{it}=\Big(g,e,\ r_{it},\ \mathbf{\varepsilon}_i,\ F_{it},\ M_{it}\Big), \] where:
\(r_{it}\in \mathcal S\cup\{H\}\) is last period’s chosen alternative (current “attachment”),
\(\mathbf{\varepsilon}_i=(\varepsilon_{i,sub},\varepsilon_{i,pri},\varepsilon_{i,pub})\) are permanent sector-specific skills,
\(F_{it}\in\{0,1,2,\dots\}\) is an exogenous fertility index (e.g., number or age-weighted count of children).
\(M_{it}\in\{0,1 \dots\}\) is exogenous marital status.
Assume \(F_{it}\) evolves exogenously via a Markov process \(Q(F_{t+1}\mid F_t)\) that is excluded from wages and excluded from offer arrival (it only affects utility).
If \(i\) works in sector \(s\in\mathcal S\) at \(t\), the (log) wage is: \[ \boxed{ \ln W_{it}^g(s) = \beta_{0,g}(s)\;+\;\beta_1\,\text{educ}_i\;\;+\;\varepsilon_{is}. } \] Excluding work experience for now.
\(\beta_{0,g}(s)\): skill price (gender- and sector-specific; may capture discrimination).
\(\beta_1\): return to education (common).
\(\beta_2\): return to sector-specific experience if we have experience (common).
\(\varepsilon_{is}\): permanent sector ability (drawn at entry; no transitory wage shocks; no new wage draws).
Distribution of permanent skills. For tractability and mover-based identification, assume a multivariate Normal by gender: \[ \mathbf{\varepsilon}_i \mid g \sim \mathcal N\big(\mathbf{0},\ \Sigma_g\big),\qquad \Sigma_g= \begin{pmatrix} \sigma^2_{1,g} & \rho_{12,g} \,\, \sigma_{1,g}\,\, \sigma_{2,g} & \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} \\ \rho_{12,g} \,\, \sigma_{1,g} \,\, \sigma_{2,g} & \sigma^2_{2,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g}\\ \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g} & \sigma^2_{3,g} \end{pmatrix}. \]
(Alternative: independent Fréchet components; results below do not require closed-form shares.)
For \(\Sigma_g\) to be a positive semidefinite (ie a valid covariance matrix), the corr matrix must also satisfy \[ 1 + 2 \, \rho_{12,g}\, \rho_{13,g}\, \rho_{23,g} - \, \rho_{12,g}^2 - \, \rho_{13,g}^2 - \, \rho_{23,g}^2 \] So for various pars values, make sure you check this condition.
Each period, at most one employment offer arrives. Let \(O_{it}\in\{S,G,P,\varnothing\}\) denote the sector offered at \(t\) (or \(\varnothing\) if no offer). Define arrival probabilities conditional on observables \(X_{it}\) (age, past sector, etc.):
\[ \Pr\big(O_{it}=s\mid X_{it},g\big)=\lambda^g_s(X_{it}),\quad s\in\{S,G,P\},\qquad \Pr(O_{it}=\varnothing\mid X_{it},g)=1-\sum_{s}\lambda^g_s(X_{it}). \] Key friction: If \(r_{it}=s\) (currently attached to \(s\)), the agent can continue in sector \(s\) only if \(O_{it}=s\). Otherwise, they can’t remain employed in \(s\) this period.
Home \(H\) is always available (no offer needed).
In other words, agents compare at most one employment sector vs home choice each period.
Let per-period utility be quasi-separable in (log) wages and non-pecuniary sector tastes shifted by fertility (F) and marital status (M):
\[ u_{it}(s) = \beta_u\,\ln W_{it}^g(s) \;+\; \gamma_s^g\,F_{it} \;+\; \delta_s^g\,M_{it},\qquad s\in\{SUB,PRI,PUB\}, \] with a home utility \(u_{it}(H)= b_0 \;+\; \gamma_H\,F_{it} \;+\; \delta_H\,M_{it},\qquad b_0\in\mathbb R.\)
\(\beta_u>0\) scales log-wage into utils.
\(\gamma_s\) and \(\delta_s\) are sector-specific preference shifters and they differ by gender (exogenous; do not affect wages or offers).
No additional idiosyncratic taste shocks are included (only permanent wage heterogeneity). This focuses identification on fertility as an exclusion.
Timing Within Period: At the start of period \(t\) given state \(\Xi_{it}\):
Offer draw: \(O_{it}\in\{S,G,P,\varnothing\}\) is realized from \(\{\lambda^g_s(X_{it})\}\).
Choice set: \(\mathcal C_{it}=\{H\}\cup\{O_{it}\}\,\) (home plus the one offered sector if any).
Decision: choose \(d_{it}\in\mathcal C_{it}\) to maximize current utility plus continuation.
Experience update given choice; then \(F_{it}\) evolves to \(F_{it+1}\sim Q(\cdot\mid F_{it})\).
Let \(V_t(\Xi)\) be the value function. Define the sector-specific continuation value if offered \(s\):
\[ \mathcal V_t(s;\Xi) \equiv u_t(s) \;+\; \beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=s\big], \]
and the home continuation value: \[ \mathcal V_t(H;\Xi) \equiv u_t(H)\;+\;\beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=H\big], \] with discount factor \(\beta\in(0,1)\).
Given the one-offer constraint, \[ V_t(\Xi) \;=\; \underbrace{\Pr(O_t=\varnothing\mid X,g)}_{\text{no offer}} \cdot \max\{\mathcal V_t(H;\Xi)\} \;+\; \sum_{s\in\{S,G,P\}} \underbrace{\Pr(O_t=s\mid X,g)}_{\lambda^g_s(X)} \cdot \max\{\mathcal V_t(H;\Xi),\ \mathcal V_t(s;\Xi)\}. \] The law of motion inside the expectations updates experience per choice and draws \(F_{t+1}\) exogenously.
stayers vs. movers:
If \(r_t=s\) and \(O_t=s\), the agent may stay in \(s\) by comparing \(\mathcal V_t(s;\Xi)\) to \(\mathcal V_t(H;\Xi)\).
If \(O_t\ne s\) (or \(O_t=\varnothing\)), staying in \(s\) is infeasible; the agent compares only \(H\) and the offered \(O_t\) (if any).
Movers are observed when \(O_t=s'\ne r_t\) and \(\mathcal V_t(s';\Xi)\ge \mathcal V_t(H;\Xi)\).
To interpret gender wedges as skill price differences:
Fix a base sector \(s_0\) (e.g., Private) and set \(\beta_{0,m}(s_0)=0\).
Center permanent skills by gender–sector: \(\mathbb E[\varepsilon_{is}\mid
g]=0\).
Then \(\beta_{0,f}(s)-\beta_{0,m}(s)\)
is a price/discrimination wedge in sector \(s\).
Optionally fix \(b_0=0\) or \(\beta_u=1\) for utility scale.
Observables and Moments: Given panel data on \(\{s_{it},\ \ln W_{it}\ \text{if } s_{it}\in\mathcal S,\ F_{it},\ e,\ \mathbf{x}_{it}\}\).
Define moments (by gender and also by education/experience if applicable):
Sector shares: \(\pi_{gs}(t)=\Pr(s_{it}=s\mid g,t)\).
Transitions: \(P_g(s'|s)=\Pr(s_{it+1}=s'\mid s_{it}=s,g)\).
Within-sector wages: \(\bar w_{gs}(t)=\mathbb E[\ln W_{it}\mid s_{it}=s,g,t]\), and dispersion \(\sigma^2_{gs}(t)\).
Mover wage gaps: \(\mathbb E[\Delta \ln W_{i} \mid r_t=s,\ s_{t+1}=s',\ g]\).
Fertility responses: event-study \(\Delta \pi_{gs}\) and \(\Delta P_g(s'|s)\) around \(\Delta F_{it}\).
Using movers and the one-offer structure:
\[ \mathbb E\big[\ln W_{i}(s') - \ln W_{i}(s)\mid r_t=s,\ s_{t+1}=s',\ g\big] = \underbrace{\beta_{0,g}(s')-\beta_{0,g}(s)}_{\text{price difference}} + \beta_2\,\Delta x_i + \underbrace{\mathbb E[\varepsilon_{is'}-\varepsilon_{is}\mid \text{move } s\!\to\! s']}_{\text{selection term}}, \]
where the selection term arises because movers are selected on \(\mathbf{\varepsilon}\) given the acceptance
rule and offers.
Given \(\Sigma_g\) (below) and the offer process (below), the selection term is pinned down in the simulated model; the residual identifies \(\beta_{0,g}(s')-\beta_{0,g}(s)\) up to the base-sector normalization.
Gender wedges. With centered \(\varepsilon\), the cross-gender difference \(\beta_{0,f}(s)-\beta_{0,m}(s)\) is interpretable as discrimination (skill price) in sector \(s\).
Within-sector residual dispersion of \(\ln W\) (net of \(\beta_1,\beta_2\)) identifies variances \(\sigma^2_{gs}\).
Mover patterns (who moves where; gains upon moving) identify covariances \(\rho_{g,ss}\): strong positive correlation implies individuals who are strong in \(s\) are also strong in \(s'\) (affecting acceptance when offered \(s'\)).
\(\Sigma_g\) cannot mimic within-person, event-timed changes tied to fertility (see below), which helps separate it from \(\gamma\).
Fertility \(F_{it}\) enters utility only (not wages/offers). Around fertility events:
\[ \frac{\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k+1)} {\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k)} \approx \frac{\text{acceptance}(s'|F=k+1)}{\text{acceptance}(s'|F=k)}. \]
If public-sector transitions and shares jump for women at childbirth while wages/offer proxies do not, then \(\gamma_G>0\) (family-friendly public sector) is identified—separate from \(\beta_{0,g}(s)\) and \(\Sigma_g\).
Given \(\gamma\) (acceptance shifts at fertility) and wage-side parameters, the levels of transitions identify the arrival probabilities:
\[ \Pr(s_{t+1}=s'\mid s_t=s) = \underbrace{\lambda^g_{s'}(X_{t+1})}_{\text{arrival}} \times \underbrace{\Pr(\text{accept } s'\mid O_{t+1}=s',\Xi_{t+1};\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g)}_{\text{acceptance}}. \]
Fertility-timed contrasts net out acceptance shifts, isolating \(\lambda^g_{s'}(\cdot)\) from the remaining variation in transitions.
With appropriate normalization e.g. \(\mathbb E[\varepsilon_{is}\mid g]=0\), mean gender differences within a sector load on \(\beta_{0,f}(s)-\beta_{0,m}(s)\) (instead of \(\varepsilon\)).
Relative female advantage in public (vs private) can arise from either
higher \(\beta_{0,f}(PUB)-\beta_{0,m}(PUB)\) (skill price advantages), or
higher \(\sigma_{3,f}\) or favorable \(\rho_{23,f}\) (women’s endowment more aligned with \(PUB\) ie comparative advantages).
Will use mover wage gaps and fertility-timed acceptance shifts to separate these channels: \(\beta_0\) affects wage levels and mover gaps; \(\gamma\) affects acceptance around fertility; \(\Sigma_g\) affects dispersion and who moves, not event-timed shifts.
Parameter blocks: \(\Theta=\{\beta_u,\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g,\lambda\}\) with normalizations (\(\beta_{0,m}(PRI)=0\), centered \(\varepsilon\)).
Simulate offer draws \(O_{it}\) from \(\lambda^g_s(X_{it})\) over observed \(X_{it}\) paths.
Solve choices period-by-period (myopic or with continuation via forward simulation) given \(F_{it}\), wages, and one-offer constraint.
Match moments: sector shares, transitions (overall and by fertility states), within-sector wage means/variances, and mover wage gaps.
Over-ID checks: out-of-sample moments (e.g., wage tails, acceptance at non-fertility events).
Possible counterfactuals to run using the estimated model:
No discrimination: set \(\beta_{0,f}(s)=\beta_{0,m}(s)\) for all \(s\), keep \(\Sigma_g,\gamma,\lambda\) fixed, recompute outcomes.
No family-friendliness differential: set
\[ \gamma_{PUB}=\gamma_{PRI}=\gamma_{SUB} \]
(or \(\gamma_s=0\)), examine how fertility no longer tilts acceptance.
Offer equalization: set \(\lambda^f_s(X)=\lambda^m_s(X)\), keep other primitives.
This dynamic Roy-with-offers model hardwires the “one-offer-per-period” friction, forcing agents to compare one employment sector against home each period (home always feasible; staying in the same job requires a fresh offer). Wages depend only on permanent sector-specific skills and observables; there are no transitory wage shocks. Fertility is an observed, exogenous shifter of non-pecuniary sector utility only.
Identification separates:
Prices \(\beta_{0,g}(s)\) via mover wage gaps and wage levels,
Endowments \(\Sigma_g\) via dispersion and selection patterns,
Offers \(\lambda\) via transition levels net of acceptance,
Family-friendliness \(\gamma\) via within-person,fertility-timed acceptance shifts.
The combination of these aspects of the data allows to separate the role of skill prices, skill comparative advantage and offer probabilities.
For example: The avg log wage difference for women who move from public to private is around -0.46 and -0.09 for women and men, respectively. Can use this information in conjunction with other aspects of the data. In order to distinguish the roles of skill price gaps vs. comparative advange/endowmentstructure, we use many data oments at once, nost jst te mean waegs ormean log wage differences for stayers and movers. In the state-depdendent offer model specified above, each block pushes differnet patterns in the data:
Variance and shape of mover gains, not just the mean.
Price differences shift the mean of \(\Delta w\) almost uniformly, the variance and skew of \(\Delta w \mid s \rightarrow s',g\) are driven by the distribution of \(\varepsilon_{is'}- \varepsilon_{is}\) under selection.
Matching \(Var(\Delta w)\) by origin-destination-gender pins down \(\sigma_{gs}, \sigma_{gs'}\) and especially \(\rho_{g,ss'}\). A big mean gap with small variance points to \(\Delta \beta_{0,g}\); a big variance/left tail points to comparative advantage sorting.
Rank-rank (or quantile-quantile) patterns for movers. - Regress post-move residual-wage rank in s’ on pre-move residual rank in s for movers of gender g. The slope identifies \(\rho_{g,ss'}\).
Stayer-leaver contrasts within origin sectors.
Compare residual wages (after removing \(\beta_{0,g}(s) + \beta_1 educ\) of stayers vs leavers from \(s\), by gender.
Strong negative selection of leavers (leavers much lower residuals in \(s\) than stayers) is evidence of comparative advantage. Pure price gaps do not create differential stayer-leaver sorting strength.
Bidirectional flows: symmetry tests.
Jointly match \(PUB \rightarrow PRI\) and \(PRI \rightarrow PUB\) rates and gains by gender.
A price gap \(\Delta \beta_{0,g}(PRI,PUB)\) pushes both directions’ means (opposite sign, similar magnitude after accounting for selection).
Comparative advantage shows asymmetric selection and different dispersion across the two directions.
Within-sector dispersion (pooled), by gender.
\(Var(\text{residual } lnW \mid s,g)\) pins down \(\sigma_{gs}^2\).
If women’s residual dispersion is notably different in \(PRI\) vs. \(PUB\), you can’t match both dispersions and the mover distributions with a pure price story; you’re forced to adjust \(\Sigma_g\) (including \(\rho_{23,g}\)).
Event-time around fertility (the main exclusion):
State-dependent offer matrix moments.
The added tweak (offers depend on current sector) gives extra moments: staying hazards \(P(s \rightarrow s)\), directional flows \(P(PUB \rightarrow PRI)\) vs \(P(PRI \rightarrow PUB)\) by gender and fertility.
These depend strongly on \(\lambda_{s' \mid s,g}\) and \(\gamma\) but not on \(\beta_0\). Matching these pins down frictions and acceptance,leaving \(\beta_0\) to be identified by wage levels and mover means once selection is accounted for.
Practically, how does the estimator tell them apart?
In the structural SMM:
We normalize \(E[\varepsilon_{is} \mid g]=0\) by gender and sector. and set a base \(\beta_{0,m}(s_0)=0\). Then mean within-sector wages by gender identify relative \(\beta_{0,g}(s)\), while dispersions and rank-rank moments identify \(\Sigma_g\) (including \(\rho_{g,ss'}\)).
We match simultaneously:
No single moment is decisive; the joint fit forces a decomposition: If you try to explain the women’s -0.46 mean with only a large \(\Delta \beta_{0,f}(PRI,PUB)\), you will misfit - the vairance of \(\Delta w\), - the mover rank-rank slope, (iii) stayer-leaver gaps, or (iv) the reverse \(PRI \rightarrow PUB\) flow so the optimizer shifts weight into \(\Sigma_f\) (comparative advantage) until all moments line up. Conversely, if you try to do it only with \(\Sigma_f\), you will miss within-sector mean levels and pooled wage gaps unless \(\beta_{0,f}(P)- \beta_{0,f}(G)\) adjusts.
Two concrete diagnostics:
Rank-rank plot for movers \(PUB \rightarrow PRI\) (separately by gender): Compute residual ranks in PUB before move and residual ranks in PRI after move. A much flatter slope for women than men \(\rightarrow\) stronger negative \(\rho_{23,f}\) / comparartive advantage channel.
Stayer-leaver residual gap in G (by gender): \(E[\tilde{w} \mid \text{stay in PUB}] - E[\tilde{w} \mid \text{leave PUB} ]\). A larger gap for women indicates stronger selection on \(\varepsilon\) among female leavers, again pointing to endowments rather than prices.
Sources of gender differences and discussion of which moments move. What identifies prices \(\beta_0\), vs. endowments (\(\Sigma_g\)), and separates both from offers \(\lambda\) and tastes \(\gamma\). How to read the scenario results (ie what identifies what)
Scenario 1 (prices only): Gender gaps come only from \(\beta_{0,g}(s)\). You should see parallel shifts in within-sector wage means, similar dispersion across genders, and mover \(\`Delta log W\) distributions shifting mainly in means. Rank-rank slopes (rr) and stayer-leaver gaps (sl) should be similar across genders - little evidence of comparative advantage differences.
Scenario 2 (endowments only): Prices equalized; genders differ only in \(\Sigma_g\) (variances/covariances for the joint distribution of sector-specific skill endowments). Expect different dispersions, different shapes/variances of mover gains, flatter female rank-rank slope \(PUB \rightarrow PRI\) if \(\rho_{23,g}\) is lower (rr), and larger stayer-leaver residual gap for the gender with stronger selection (sl). Means won’t shift much without price differences.
Scenario 3 (offers only): Prices and \(\Sigma\) equal; only offer matrices differ (state persistence). Within-sector wages look similar, mover gains distributions similar, but staying hazards differ by gender and sector. Sector shares (not shown but trivial to add) will reflect opportunity, not wages.
Scenario 4 (fertility tastes only): Prices, \(\Sigma\) offers equal; only \(\gamma\) differs. Around births you will see gender-specific shifts into the more family-friendly sector (public), with composition changes in p1/p2 but no shifts in underlying prices.
We want to distinguish (and stress-test potential observational equivalences between):
To identify each mechanism, we want scenarios where only one block of parameters differs by gender at a time, plus a few “confusability” scenarios that intentionally generate similar public-sector shares with different underlying channels.
For each subsection, click on show to see
corresponding code.
The program first specifies number of periods, simulation size etc. and then functional forms.
It then assigns some parameter values which are loosely calibrated to make wage and employment transitions to look somewhat reasonable. Such calibrated values would be the starting values in the actual estimation but here we’re just using it to demonstrate how the model works.
Each scenario refers to a different combination of
parameter values to see which moments are most responsive to which
parameters, for identification of key parameters.
Baseline can be chosen acccording to what comparisons are needed.
If baseline is chosen so that men and women are identical, it makes every other scenario interpretable as a one-dimensional deviation.
\(\beta_{0,m}(s)=\beta_{0,f}(s)\) for all \(s\).
\(\Sigma_m=\Sigma_f\) with moderate dispersion and zero correlation (no comparative advantage by construction).
\(\lambda_m=\lambda_f\) (identical offer rates).
\(\gamma_m=\gamma_f=\mathbf{0}\) (no family-friendliness).
If baseline is chosen so that they differ in various dimensions, then counterfactuals where each dimension is equated one at a time or in a sequence by adding each element in a sequence, then thecontribution each compoennt can be quantified.
CF1_price_gapPure “prices” (skill-price intercepts) channel
Skill prices are changed so that men have a PRI sector premium \(\beta_{0,m}(PRI) > \beta_{0,m}(s)\) for all s.
Skill prices are changed so that women have PUB sector premium \(\beta_{0,f}(PUB) > \beta_{0,f}(s)\) for all s.
All else identical to baseline
-Answers the following question: Can the model attribute higher female PUB employment AND a smaller PUB wage gap to a price difference?
CF2_dispCan a higher upper tail of women’s PUB wages (and greater persistence) can be attributed to \(\sigma_{f}(\text{PUB})\).
Check by changing only women’s PUB skill dispersion (higher \(\sigma\) in PUB). Correlation is not canged
CF3_compadvAnswers the following question: Can sorting into PUB and PUB wage moments can be generated by selection on permanent skills without any price premium.
Check by changing only the correlation structure for women (stronger comparative advantage).
Dispersion of women’s sector-specific skill distribution remains unchanged Correlation between women’s skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
CF4_strong_compadv(We can also combine dispersion + negative correlation if we want even sharper comparative advantages. See below.) STRONG COMP ADV Dispersion of women’s sector-specific skill distribution for only the public sector (i.e. sigma) is increased Correlation between women’s skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
``dispersion + negative correlation’’ produces STRONGER comparative advantage because: Comparative advantage in the Roy setup comes from relative rankings across sectors, not just levels. Two ingredients sharpen sorting:
Higher dispersion in one sector (higher \(\sigma\))
\(\rightarrow\)creates a thicker right tail
\(\rightarrow\) more extreme specialists.
Negative cross-sector correlation (\(\rho<0\))
\(\rightarrow\) being good in PUB implies being bad in PRI (and vice versa)
\(\rightarrow\) stronger specialization
\(\rightarrow\) stronger selection
\(\rightarrow\) lower mobility
\(\rightarrow\) steeper sorting moments.
When we combine both:
\(\rightarrow\) more mass of workers who are extremely PUB-specializaed and unlikely to leave
\(\rightarrow\) stronger persistence, bigger stayer-leaver gap, flatter rank-rank slope.
Concrete parameter example
Men (benchmark, weak comparative advantage)
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.00, 0.00, 0.00) # no correlation \(\rightarrow\) weak specialization
This gives similar dispersion in all sectors.
Skills independent \(\rightarrow\) people can be good everywhere \(\rightarrow\) weak sorting.
Women (strong comparative advantage in PUB)
sig <- c(0.30, 0.30, 0.50) # higher dispersion in PUB
rho <- c(-0.30, -0.40, -0.35) # negative correlations across sectors
\(\sigma(PUB)=0.50\) \(\rightarrow\) bigger right tail \(\rightarrow\) some people are very good in PUB.
\(\rho<0\) \(\rightarrow\) PUB-skill negatively related to PRI/SUB skill.
Workers who are in PUB tend to be weak in PRI \(\rightarrow\) strong specialization.
Compared to the ``equal \(\sigma\) and zero \(\rho\) case’’:
- PUB emp share (women) *rises slightly (more specialization)*.
- PUB staying hazard *rises strongly*
- Stayer-leaver residual gap (PUB) *increases a lot*.
- Rank-rank slope PUB $\rightarrow$ PRI *falls sharply (stronger comparative adv.)*.
- Mover gains aare *asymmetric, more dispersion*.
- Wage density (PUB) gives rise to a fatter upper tail.
The key diagnostic is: Rank–rank slope falls + stayer–leaver gap rises = covariance structure, not prices.
CF5_familyCan higher female PUB shares can arise without a wage premium, and whether the fertility-conditioned moments pick it up.
Check by changing only \(\gamma_f(\text{PUB})>0\) (women value
PUB more when fertility changes / kids arrive). This is the same idea as
FR0/FR1.
Pure “family-friendliness” channel Women’s child utility for the public sector is increased
CF6_offersCan female PUB employment be driven by the search environment without changing wages directly (except through selection)?
Check by changing only women’s offer arrival rates (higher \(\lambda_f\) in PUB, compensating reductions elsewhere to keep overall offer intensity similar). Women’s offer arrival rate from the PUB sector is increased
EQ1_PRICESEquate skill prices between men and women
EQ2_ENDOWMENTSEquate the distributin of skills between men and women
EQ3_OFFERSEquate arrival rates of offers between men and women
EQ4_FAMILYEquate sector-specific child utilities between men and women
HOME <- "HME"
sectors <- c("SUB","PRI","PUB")
R_states <- c(HOME, sectors)
state_levels <- c("HME","SUB","PRI","PUB")
library(gt)
## ---- State-dependent (current-sector) offer process.
## tilt the offered sector toward the CURRENT r_t by adding a "stay bonus"
#stay_bonus <- 0.20 # extra mass for re-offer from current sector
## Keep "none" prob equal to the original base-none, and renormalize across S,G,P.
## A1) Build conditional offer matrices with a given base + stay_bonus
## returns a list: probs[[r]] is a named vector over c(S,G,P,None) for origin r ∈ {H,S,G,P}
build_lambda_conditional <- function(lambda, bonus) {
out <- list()
base_sum <- sum(lambda)
#print(base_sum)
#print(paste("base_sum:", base_sum));
# From HOME: use base probabilities (no tilt)
out[[HOME]] <- c(lambda, None = 1 - base_sum)
# From each sector: add stay bonus to that sector, renormalize offers to sum to base_sum
for (r in sectors) {
prob <- lambda
prob[r] <- prob[r] + bonus
prob <- prob * (base_sum / sum(prob)) # renormalize to sum to base_sum
out[[r]] <- c(prob, None = 1 - sum(prob))
}
out
#print(paste("out:", out[["PUB"]] ));
}
cov_matrix <- function(sig, rho) {
k <- length(sig)
if (k < 2) stop("Length of 'sig' must be at least 2.")
if (any(sig <= 0)) stop("'sig' must contain strictly positive values.")
# correlation matrix from scalar rho
cor_matrix_scalar <- function(rho, k) {
if (rho <= -1 || rho >= 1)
stop("Scalar rho must be strictly between -1 and 1.")
M <- matrix(rho, k, k)
diag(M) <- 1
M
}
# correlation matrix from vector rho
cor_matrix_vector <- function(rhos, k) {
needed <- k*(k-1)/2
if (length(rhos) != needed)
stop(sprintf("For k=%d, rho vector must be length %d.", k, needed))
if (any(rhos <= -1 | rhos >= 1))
stop("All correlations must be strictly between -1 and 1.")
M <- matrix(1, k, k)
M[upper.tri(M)] <- rhos
M[lower.tri(M)] <- t(M)[lower.tri(M)]
M
}
# choose scalar or vector case
R <- if (length(rho) == 1) {
cor_matrix_scalar(rho, k)
} else {
cor_matrix_vector(rho, k)
}
# Σ = D R D
D <- diag(sig)
D %*% R %*% D
}
# Check for positive semidefiniteness
is_valid_corr <- function(rho) {
# rho = c(r12, r13, r23)
if (length(rho) != 3) stop("rho must be a vector of length 3")
r12 <- rho[1]
r13 <- rho[2]
r23 <- rho[3]
R <- matrix(c(
1, r12, r13,
r12, 1, r23,
r13, r23, 1
), nrow = 3, byrow = TRUE)
lambda_min <- min(eigen(R)$values)
valid <- lambda_min > -1e-8
if (!valid) {
warning(sprintf(
"Invalid correlation triple: smallest eigenvalue = %.4g",
lambda_min
))
}
# Return nicely formatted result
return(
data.frame(
r12 = r12,
r13 = r13,
r23 = r23,
valid = valid
)
)
}
# Examples
rho <- c(0.3, 0.4, 0.2)
#is_valid_corr(rho) # TRUE
rho <- c(0.9, 0.9, -0.9)
#is_valid_corr(rho) # FALSE
rho <- c(1, -1, 1)
#is_valid_corr(rho) # FALSE
rho <- c(0.7, -0.5, -0.2)
#is_valid_corr(rho) # TRUE
rmvnorm <- function(n, mu, Sigma) {
Z <- matrix(rnorm(n * length(mu)), nrow = n)
L <- chol(Sigma)
sweep(Z %*% L, 2, mu, `+`)
}
Key parameters that vary across counterfactual scenarios:
Sector- and gender-specific skill prices.
Joint distribution of sector-specific skills (gender-specific).
Sector- and gender-specific non-pecuniary utilities which differ by kids and marital status.
Arrival rates of offers (gender-specific).
See the function make_scenario to see what parameter
values each scenario scn corresponds to.
In the code block below, you can change:
Baseline parameter values
Parameter values for each counterfactual scenario
Scenario list to compare
IN ORDER TO RUN THE CODE WITH ANY CHANGES YOU’D LIKE TO
IMPLEMENT, OPEN THE CORRESPONDING .RMD FILE IN RSTUDIO,
MAKE YOUR CHANGES, SAVE AND THEN CLICK ON KNIT AT THE TOP.
THIS WILL RECOMPILE EVERYTHING AND CREATE A NEW HTML OR PDF FILE, WHICH
SHOULD AUTOMATICALLY APPEAR ON VIEWER ON THE RIGHT PANEL.
#Choose the number of periods
T_per <- 10
#Choose the number of people to simulate (has to be at least 2-3K)
N <- 4000
#Choose baseline
#STARTSAME: Everything the same between men and women
#STARTDIFF: Everything different between men and women
#You can choose the value of WHICH_SCENARIO (whether you want to start from a base where
#men and woen are equal or not)
WHICH_SCENARIO <- "STARTSAME" #other alternatives: STARTDIFF
#Depending on whether WHICH_SCENARIO is STARTSAME or STARTDIFF, assign parameter values for the baseline accordingly
#You can change the parameter values for each (i.e. can change these base values beta0, sigma0 etc.)
if (WHICH_SCENARIO == "STARTSAME"){
beta0 <- tibble::tribble(
~g, ~s, ~par,
"men","SUB", 0.20,
"men","PRI", 0.20,
"men","PUB", 0.20,
"fem","SUB", 0.20,
"fem","PRI", 0.20,
"fem","PUB", 0.20
)
} else if (WHICH_SCENARIO == "STARTDIFFERENT"){
beta0 <- tibble::tribble(
~g, ~s, ~par,
"men","SUB", 0.20,
"men","PRI", 0.40,
"men","PUB", 0.20,
"fem","SUB", 0.20,
"fem","PRI", 0.20,
"fem","PUB", 0.40
)
}
if (WHICH_SCENARIO == "STARTSAME"){
sig_m0 <- c(0.30, 0.30, 0.30)
rho_m0 <- c(0.0, 0.0, 0.0) # (12, 13, 23)
sig_f0 <- c(0.30, 0.30, 0.30)
rho_f0 <- c(0.0, 0.0, 0.0) # (12, 13, 23)
} else if (WHICH_SCENARIO == "STARTDIFFERENT"){
# men at base: no correlation → weak specializat-0.40,-0.35) # (12, 13, 23)
sig_m0 <- c(0.30,0.50,0.30)
rho_m0 <- c(0.0,0.0,0.0) # (12, 13, 23)
# women at base: strong comp. adv. i.e.higher dispersion inf PUB and negative corr across sectors
sig_f0 <- c(0.30,0.30,0.30)
rho_f0 <- c(0.0,0.0,-0.40) # (12, 13, 23)
}
# is_valid_corr(rho_m0)
# is_valid_corr(rho_f0)
## Fertility shifters (non-pecuniary) by sector incl. home (as before)
if (WHICH_SCENARIO == "STARTSAME"){
gamma_m0 <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
gamma_f0 <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
} else if (WHICH_SCENARIO == "STARTDIFFERENT"){
gamma_m0 <- c(HME=0, SUB=0, PRI=0, PUB=0)
gamma_f0 <- c(HME=0, SUB=0, PRI=0, PUB=0.5)
}
if (WHICH_SCENARIO == "STARTSAME"){
lambda_m0 <- c(SUB = 0.30, PRI = 0.30, PUB= 0.30) # men
lambda_f0 <- c(SUB = 0.30, PRI = 0.30, PUB = 0.30) # women
stay_bonus0 <- 0.40
} else if (WHICH_SCENARIO == "STARTDIFFERENT"){
lambda_m0 <- c(SUB=0.30, PRI=0.45, PUB=0.30)
lambda_f0 <- c(SUB=0.30, PRI=0.30, PUB=0.45)
stay_bonus0 <- 0.40
}
#Depending on whether WHICH_SCENARIO is STARTSAME or STARTDIFF, pick scenario_list
#scenario_list is the list of counterfactuals that the code will run and compare
#If WHICH_SCENARIO IS STARTSAME, then the scenario list consist of scenarios where each
#parameter group is changed one at a time.
#If WHICH_SCENARIO IS STARTDIFF, then the scenario list consist of scenarios where each
#parameter group is equated one at a time.
#You can change the specififc parameter values thateach scenario corresponds to (see
# make_scenario below)
if (WHICH_SCENARIO == "STARTSAME"){
scenario_list <- c(
"BASELINE",
"CF1_price_gap",
"CF2_disp",
"CF3_compadv",
"CF4_strong_compadv",
"CF5_family",
"CF6_offers"
)
} else if (WHICH_SCENARIO == "STARTDIFFERENT"){
scenario_list <- c(
"BASELINE",
"EQ1_PRICES",
"EQ2_ENDOWMENTS",
"EQ3_OFFERS",
"EQ4_FAMILY"
)
}
## Scenario parameter builders
## Define the parameter values for each scenario
make_scenario <- function(scn){
out <- list(label = scn)
#First assign the baseline parameters to the vector `out`
out$beta0_df <- beta0
out$Sigma_m <- cov_matrix(sig_m0, rho_m0)
out$Sigma_f <- cov_matrix(sig_f0, rho_f0)
out$lambda_m_cond <- build_lambda_conditional(lambda_m0, stay_bonus0)
out$lambda_f_cond <- build_lambda_conditional(lambda_f0, stay_bonus0)
out$gamma_m <- gamma_m0
out$gamma_f <- gamma_f0
if (scn == "BASELINE"){
#No need to do anything here. out wil just be wht's defined above (ie baseline)
}
else if (scn == "CF1_price_gap"){
#Skill prices are changed so that men have a private sector premium $\(\beta_{0,g}(s)\).
#Skill prices are changed so that women have a public sector premium $\(\beta_{0,g}(s)\).
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.20,
g=="men" & s=="PRI" ~ 0.40,
g=="men" & s=="PUB" ~ 0.20,
g=="fem" & s=="SUB" ~ 0.20,
g=="fem" & s=="PRI" ~ 0.20,
g=="fem" & s=="PUB" ~ 0.40
))
}
else if (scn == "CF2_disp"){
#Dispersion of women's sector-specific skill distribution for only the public sector (i.e. sigma) is increased
#Correlation remains unchanged
sig_f <- c(0.30, 0.30, 0.50) # higher PUB dispersion for women
rho_f <- rho_f0
out$Sigma_f <- cov_matrix(sig_f, rho_f)
}
else if (scn == "CF3_compadv"){
#Dispersion of women's sector-specific skill distribution remains unchanged
#Correlation between women's skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
sig_f <- sig_f0 #c(0.30, 0.30, 0.30)
rho_f <- c(0.0, 0.0, -0.40)
is_valid_corr(rho_f)
out$Sigma_f <- cov_matrix(sig_f, rho_f)
}
else if (scn == "CF4_strong_compadv"){
#Dispersion of women's sector-specific skill distribution for only the public sector (i.e. sigma) is increased
#Correlation between women's skills in PRI and PUB is decreased to a negative number (i.e. COMPARATIVE ADVANTAGE)
sig_f <- c(0.30, 0.30, 0.50) # higher PUB dispersion for women
rho_f <- c(0.0, 0.0, -0.40)
out$Sigma_f <- cov_matrix(sig_f, rho_f)
}
else if (scn == "CF5_family"){
#Women's child utility for the public sector is increased
out$gamma_f <- c(HME=0.00, SUB=0.00, PRI=0.00, PUB=1.00)
}
else if (scn == "CF6_offers"){
#Women's offer arrival rate from the PUB sector is increased
lambda_f <- c(SUB=0.20, PRI=0.20, PUB=0.80)
out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus0)
}
else if (scn == "EQ1_PRICES"){
#Women's sector specific skill prices are set equal to that of men's
#out$beta0_df <- out$beta0_df |>
# dplyr::group_by(s) |>
# dplyr::mutate(par = mean(par)) |>
# dplyr::ungroup()
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.20,
g=="men" & s=="PRI" ~ 0.40,
g=="men" & s=="PUB" ~ 0.20,
g=="fem" & s=="SUB" ~ 0.20,
g=="fem" & s=="PRI" ~ 0.40,
g=="fem" & s=="PUB" ~ 0.20
))
}
else if (scn == "EQ2_ENDOWMENTS"){
#Women's correlation matrix is set equal to that o fmen
out$Sigma_f <- out$Sigma_m
}
else if (scn == "EQ3_OFFERS"){
#Women's offer arrival rates are set equal to that of men
out$lambda_f_cond <- out$lambda_m_cond
}
else if (scn == "EQ4_FAMILY"){
#Women's sector-specific child utilty for all sectors is set equal to that of men
out$gamma_f <- out$gamma_m
}
else {
stop("Unknown scenario label")
}
out
}
#Choose what gender to do the simulations for.
#This will usually just be "both" unless you want to only simulate women or men
RUN_GENDER <- "both" # other alternatives: "men", "fem", "both"
if (RUN_GENDER == "both") {
genders <- c("men", "fem")
} else if (RUN_GENDER == "men") {
genders <- c("men")
} else if (RUN_GENDER == "fem") {
genders <- c("fem")
}
Dynamic Roy Model with state-dependent offers (experience removed; permanent sector skills only)
3 employment sectors
Home alternative always available
One offer per period (or none)
Offer arrival probabilities depend on current sector
Wages fixed per individual-sector (permanent epsilon + observables)
Fertility and marriage shifts preferences (not wages, not offers)
Finite horizon, solved by backward induction
## =========================================================
## Dynamic Roy with state-dependent offers
## (Experience removed; permanent sector skills only)
## ---------------------------------------------------------
## - 3 employment sectors
## - Home H always available
## - One offer per period (or none)
## - Offer arrival PROBABILITIES depend on CURRENT SECTOR (state)
## - Wages fixed per individual-sector (permanent epsilon + observables)
## - Fertility shifts utility (not wages, not offers)
## - Finite horizon, solved by backward induction
## =========================================================
set.seed(42)
suppressPackageStartupMessages({
library(dplyr); library(tidyr); library(ggplot2); library(kableExtra)
})
## =========================================================
## COMMON RANDOM NUMBERS (CRN): master uniforms
## =========================================================
#set.seed(42)
# Max number of offer categories (SUB,PRI,PUB,None)
K_offer <- length(sectors) + 1
# Uniforms for: wages (eps), offers, fertility, marriage
U_eps <- matrix(runif(N * length(sectors)), N, length(sectors))
U_F <- matrix(runif(N * T_per), N, T_per)
U_M <- matrix(runif(N * T_per), N, T_per)
U_off <- matrix(runif(N * T_per), N, T_per)
U_initM <- runif(N)
U_initF <- runif(N)
U_gender <- runif(N)
U_educ <- runif(N)
## =========================================================
## FIXED PARAMETERS AND TRANSITION MATRICES
## =========================================================
## Fertility: exogenous Markov
F_states <- 0:5
Q_F <- matrix(0, 6, 6, dimnames=list(F_states, F_states))
Q_F["0","0"] <- 0.95; Q_F["0","1"] <- 0.05; Q_F["0","2"] <- 0.00; Q_F["0","3"] <- 0.00; Q_F["0","4"] <- 0.00; Q_F["0","5"] <- 0.00
Q_F["1","0"] <- 0.00; Q_F["1","1"] <- 0.96; Q_F["1","2"] <- 0.04; Q_F["1","3"] <- 0.00; Q_F["1","4"] <- 0.00; Q_F["1","5"] <- 0.00
Q_F["2","0"] <- 0.00; Q_F["2","1"] <- 0.00; Q_F["2","2"] <- 0.97; Q_F["2","3"] <- 0.03; Q_F["2","4"] <- 0.00; Q_F["2","5"] <- 0.00
Q_F["3","0"] <- 0.00; Q_F["3","1"] <- 0.00; Q_F["3","2"] <- 0.00; Q_F["3","3"] <- 0.98; Q_F["3","4"] <- 0.02; Q_F["3","5"] <- 0.00
Q_F["4","0"] <- 0.00; Q_F["4","1"] <- 0.00; Q_F["4","2"] <- 0.00; Q_F["4","3"] <- 0.00; Q_F["4","4"] <- 0.99; Q_F["4","5"] <- 0.01
Q_F["5","0"] <- 0.00; Q_F["5","1"] <- 0.00; Q_F["5","2"] <- 0.00; Q_F["5","3"] <- 0.00; Q_F["5","4"] <- 0.00; Q_F["5","5"] <- 1.00
#Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
#Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05 # marriage probability
#Q_M["1","1"] <- 1.00 # married stays married
## Markov transition Q_M(m'|m)
## Example: small probability of marriage when single; absorbing once married.
M_states <- 0:1 # 0 = single, 1 = married
Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05 # Pr(marry)
Q_M["1","1"] <- 1.00 # married absorbing
## (can freely change this to allow divorce.)
## Marital-status taste shifters (parallel to gamma for fertility)
## delta_s_m where s ∈ {HME, SUB, PRI, PUB} and m ∈ {0,1}
## These are non-pecuniary utility additions.
delta_men <- c(HME=0, SUB=0, PRI=0, PUB=0) # men
delta_fem <- c(HME=0, SUB=0, PRI=0, PUB=0) # women
## Draw next M
#draw_next_M <- function(M_cur){
# sample(M_states, size=1, prob = Q_M[as.character(M_cur),])
#}
p_educ <- 0.5 # Pr(educ=1)
beta_u <- 1.0 # Utility scale and home intercept
beta <- 0.95 # discount factor
beta1 <- 0.15 # Education return (log-wage)
## ---------------------------------------------------------
## START SOLVING ETC.
## ---------------------------------------------------------
## Keep the same N, T_per, genders, sectors, HOME, beta1, beta_u, beta, Q_F, etc.
## We'll reuse global g_vec, educ_vec for comparability across scenarios.
draw_eps_by_gender_CRN <- function(g_vec, Sigma_m, Sigma_f){
Nloc <- length(g_vec)
eps_mat <- matrix(NA_real_, Nloc, length(sectors),
dimnames=list(NULL, sectors))
L_m <- chol(Sigma_m)
L_f <- chol(Sigma_f)
Z <- qnorm(U_eps) # SAME underlying shocks across scenarios
for (i in seq_len(Nloc)){
if (g_vec[i] == "men"){
eps_mat[i, ] <- Z[i, ] %*% L_m
} else {
eps_mat[i, ] <- Z[i, ] %*% L_f
}
}
eps_mat
}
## A3) Precompute per-person log wages for all sectors
## A3) Precompute per-person log wages for all sectors (robust, base-R)
## Precompute per-person log wages by sector (fixed forever)
precompute_logW_all <- function(g_vec, educ_vec, eps_mat, beta0_df){
stopifnot(is.matrix(eps_mat),
nrow(eps_mat) == length(g_vec),
all(colnames(eps_mat) %in% sectors))
Nloc <- length(g_vec)
logW_all <- matrix(NA_real_, nrow = Nloc, ncol = length(sectors),
dimnames = list(NULL, sectors))
# helper: fetch scalar beta0 for (g,s)
get_beta0 <- function(gg, ss){
idx <- (beta0_df$g == gg) & (beta0_df$s == ss)
b <- beta0_df$par[idx]
if (length(b) != 1L) stop(sprintf("beta0_df mismatch for g=%s, s=%s (found %d rows)", gg, ss, length(b)))
b
}
for (ss in sectors){
col <- match(ss, sectors)
beta0_vec <- vapply(seq_len(Nloc), function(i) get_beta0(g_vec[i], ss), numeric(1))
logW_all[, col] <- beta0_vec + beta1 * educ_vec + eps_mat[, col]
}
logW_all
}
## ----------------------------
## A4) Solve V for one person *given scenario-specific objects*
## 4) Value Function (Backward Induction, person-specific)
## ----------------------------
## State: (t, r ∈ {H,S,G,P}, F ∈ {0,1,2})
## V_{T+1}(.) = 0. For t=T,...,1 compute V_t(r,F) = E_{O|r} [ max{ V_H, V_emp(O) } ]
## where V_H = u_home(F) + β E_F' V_{t+1}(H, F'), and
## V_emp(s) = u_employed(logW_s, s, F) + β E_F' V_{t+1}(s, F').
solve_value_person_scn <- function(i, logW_all, g_vec, gamma_m,gamma_f,
lambda_m_cond, lambda_f_cond){
g <- g_vec[i]
lambda_cond <- if (g=="men") lambda_m_cond else lambda_f_cond
gamma <- if (g=="men") gamma_m else gamma_f
delta <- if (g=="men") delta_men else delta_fem
logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
t_names <- as.character(1:(T_per+1)) # time index: 1..(T+1); V[t = T+1, ., .] = 0 (terminal)
#V <- array(0, dim=c(T_per+1, length(R_states), length(F_states)),
# dimnames=list(t=t_names, r=R_states, F=as.character(F_states)))
V <- array(
0,
dim=c(T_per+1, length(R_states), length(F_states), length(M_states)),
dimnames=list(
t=t_names,
r=R_states,
F=as.character(F_states),
M=as.character(M_states)
)
)
# backward induction: tt = T, ..., 1
for (tt in T_per:1){
for (r in R_states){
p_offers <- lambda_cond[[r]]
for (Fcur in F_states){
for (Mcur in M_states){
## continuation if choose Home
EV_next_HME <- sum(
Q_F[as.character(Fcur), ] *
V[as.character(tt+1), HOME, , as.character(Mcur)]
)
V_emp <- c(SUB=-Inf, PRI=-Inf, PUB=-Inf)
for (s in sectors){
EV_next_s <- sum(
Q_F[as.character(Fcur), ] *
V[as.character(tt+1), s, , as.character(Mcur)]
)
V_emp[s] <- beta_u * logW_i[s] +
Fcur * gamma[s] + #ifelse(Fcur >= 1, gamma[s], 0) +
ifelse(Mcur == 1, delta[s], 0) +
beta * EV_next_s
}
V_HME <- 0 +
Fcur * gamma["HME"] + #ifelse(Fcur >= 1, gamma["HME"], 0) +
ifelse(Mcur == 1, delta["HME"], 0) +
beta * EV_next_HME
EV <- p_offers["None"] * V_HME +
sum(p_offers[sectors] *
pmax(V_HME, V_emp[sectors]))
V[as.character(tt), r, as.character(Fcur), as.character(Mcur)] <- EV
} # end Mcur
} # end Fcur
} # end r
} # end tt
}
## ----------------------------
## Simulation of optimal paths
## ----------------------------
## Simulate optimal paths *given* V_all and scenario objects
## Policy: Given (t, r, F) and realized offer O:
## choose O if V_emp(O) >= V_H; else choose Home.
## Offers drawn from λ_{·|r,g}; F evolves via Q_F.
simulate_paths_scn <- function(
V_all, logW_all, g_vec, F0, gamma_m, gamma_f,
lambda_m_cond, lambda_f_cond
){
# helper: draw one offer given gender and current attachment r
#draw_offer_cond <- function(g, r){
# probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
# draw <- sample(c(sectors, "None"), size = 1, prob = probs)
# if (draw == "None") NA_character_ else draw
#}
draw_offer_cond_CRN <- function(g, r, i, t){
probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
u <- U_off[i, t]
idx <- which(u <= cumsum(probs))[1]
draw <- names(probs)[idx]
if (draw == "None") NA_character_ else draw
}
out_list <- vector("list", length = N)
for (i in seq_len(N)) {
g_i <- g_vec[i]
educ_i <- educ_vec[i]
V_i <- V_all[[i]]
logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
# start-of-panel state for person i
r_cur <- HOME
F_cur <- F0[i]
M_cur <- M0[i]
# storage for this person's path
rows <- vector("list", length = T_per)
for (tt in 1:T_per) {
# gender-specific gamma for THIS person
gamma <- if (g_i == "men") gamma_m else gamma_f
# draw an offer conditional on current attachment r_cur
#offer <- draw_offer_cond(g_i, r_cur)
offer <- draw_offer_cond_CRN(g_i, r_cur, i, tt)
# continuation value if choose Home now
EV_next_HME <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), HOME, ])
#V_HME <- 0 + ifelse(F_cur >= 1, gamma[["HME"]], 0) + beta * EV_next_HME
V_HME <- 0 + F_cur * gamma[["HME"]] + beta * EV_next_HME
# default choice is Home
choice <- HOME
employed <- 0L
sector <- NA_character_
logW_chosen <- NA_real_
# if there is an offer, compare employed value vs home
if (!is.na(offer)) {
s <- offer
EV_next_s <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), s, ])
V_emp_off <- beta_u * logW_i[s] +
F_cur * gamma[[s]] + #ifelse(F_cur >= 1, gamma[[s]], 0) +
beta * EV_next_s
if (V_emp_off >= V_HME) {
choice <- s
employed <- 1L
sector <- s
logW_chosen <- logW_i[s]
}
}
# record this period
rows[[tt]] <- data.frame(
i = i, t = tt, g = g_i, educ = educ_i, F = F_cur, M=M_cur,
r = r_cur, offer = offer, choice = choice,
employed = employed, sector = sector, logW = logW_chosen,
stringsAsFactors = FALSE
)
# move to next period
r_cur <- choice
#if (tt < T_per) {
# F_cur <- draw_next_F(F_cur)
# M_cur <- draw_next_M(M_cur)
#}
if (tt < T_per) {
F_cur <- draw_next_F_CRN(F_cur, i, tt)
M_cur <- draw_next_M_CRN(M_cur, i, tt)
}
}
out_list[[i]] <- do.call(rbind, rows)
}
dt <- do.call(rbind, out_list)
rownames(dt) <- NULL
dt
}
## We'll keep the same g_vec, educ_vec, F0 across scenarios (comparability)
#g_vec <- sample(genders, N, replace=TRUE)
#educ_vec <- rbinom(N, 1, p_educ)
#F0 <- sample(F_states, size=N, replace=TRUE, prob=c(0.8, 0.18, 0.02))
#M0 <- sample(M_states, size=N, replace=TRUE, prob=c(0.90, 0.10))
#draw_next_F <- function(F){ sample(F_states, 1, prob=Q_F[as.character(F), ]) }
#draw_next_M <- function(M){ sample(M_states, 1, prob=Q_M[as.character(M),])}
## ---- deterministic draws from master uniforms ----
g_vec <- ifelse(U_gender <= 0.5, "men", "fem")
educ_vec <- as.integer(U_educ <= p_educ)
F0 <- sapply(U_initF, function(u)
F_states[ which(u <= cumsum(c(0.8, 0.18, 0.02)))[1] ])
M0 <- sapply(U_initM, function(u)
M_states[ which(u <= cumsum(c(0.90, 0.10)))[1] ])
draw_next_F_CRN <- function(F_cur, i, t){
p <- Q_F[as.character(F_cur), ]
u <- U_F[i, t]
F_states[ which(u <= cumsum(p))[1] ]
}
draw_next_M_CRN <- function(M_cur, i, t){
p <- Q_M[as.character(M_cur), ]
u <- U_M[i, t]
M_states[ which(u <= cumsum(p))[1] ]
}
run_scenario <- function(scn_label){
scn <- make_scenario(scn_label)
## Draw epsilons (only if needed)
# if (scn$redraw_eps){
# eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
# } else {
# ## reuse the eps_mat from your baseline if present; else draw with base Sigmas
# if (exists("eps_mat")) {
# eps_mat_scn <- eps_mat
# } else {
# #eps_mat_scn <- draw_eps_by_gender(g_vec, Sigma_m, Sigma_f)
# eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
# }
# }
eps_mat_scn <- draw_eps_by_gender_CRN(g_vec, scn$Sigma_m, scn$Sigma_f)
logW_all_scn <- precompute_logW_all(g_vec, educ_vec, eps_mat_scn, scn$beta0_df)
## Solve V for each person
## Solve V for everyone (vectorized over i)
## (This is the heavy step: 5000 people × (T=10 × r=4 × F=3) is very feasible.)
V_all_scn <- lapply(1:N, function(i)
solve_value_person_scn(i, logW_all_scn, g_vec, scn$gamma_m,scn$gamma_f,
scn$lambda_m_cond, scn$lambda_f_cond))
## Simulate paths
dt_scn <- simulate_paths_scn(V_all_scn, logW_all_scn, g_vec, F0, scn$gamma_m,scn$gamma_f,
scn$lambda_m_cond, scn$lambda_f_cond)
dt_scn$scenario <- scn_label
## === Diagnostics ===
## (1) Pooled log-wage densities by sector×gender
wages_pool <- dt_scn |>
dplyr::filter(!is.na(sector), !is.na(logW)) |>
dplyr::mutate(sector=factor(sector, levels=sectors))
## (2) Movers: Δ logW by origin→dest, gender
moves <- dt_scn |>
dplyr::group_by(i) |>
dplyr::arrange(t, .by_group = TRUE) |>
dplyr::mutate(next_logW = dplyr::lead(logW),
next_sector = dplyr::lead(sector),
employed_next = dplyr::lead(employed)) |>
dplyr::ungroup() |>
dplyr::filter(employed==1, employed_next==1,
!is.na(sector), !is.na(next_sector)) |>
dplyr::mutate(orig=sector, dest=next_sector,
dlogW = next_logW - logW)
## (3) Residuals: remove sector×gender×educ mean (proxy for unobs)
dt_resid <- dt_scn |>
dplyr::filter(!is.na(sector), !is.na(logW)) |>
dplyr::group_by(g, sector, educ) |>
dplyr::mutate(resid = logW - mean(logW)) |>
dplyr::ungroup()
## (4) Rank–rank slopes for G→P movers (by gender)
rr <- dt_resid |>
dplyr::select(i,t,g,sector,resid) |>
dplyr::group_by(i) |>
dplyr::arrange(t,.by_group=TRUE) |>
dplyr::mutate(next_resid = dplyr::lead(resid),
next_sector = dplyr::lead(sector)) |>
dplyr::ungroup() |>
dplyr::filter(sector=="PUB", next_sector=="PRI") |>
dplyr::group_by(g) |>
dplyr::summarise(rank_slope = {
x <- rank(resid, ties.method="average")/dplyr::n()
y <- rank(next_resid, ties.method="average")/dplyr::n()
if (length(x)>5) coef(lm(y ~ x))[2] else NA_real_
}, .groups="drop")
rr$scenario <- scn_label
## (5) Stayer–leaver residual gap in origin sector G (by gender)
sl <- dt_resid |>
# 1. Sort each person's panel properly
dplyr::group_by(i) |>
dplyr::arrange(t, .by_group = TRUE) |>
# 2. Next-period sector
dplyr::mutate(next_sector = dplyr::lead(sector)) |>
dplyr::ungroup() |>
# 3. Keep only observations where current sector is PUB
dplyr::filter(sector == "PUB") |>
# 4. Drop final rows with no next observation
dplyr::filter(!is.na(next_sector)) |>
# 5. Stayer vs. leaver classification
dplyr::mutate(
status = dplyr::case_when(
next_sector == "PUB" ~ "stayer",
next_sector != "PUB" ~ "leaver"
)
) |>
# 6. Compute mean residuals by gender × status
dplyr::group_by(g, status) |>
dplyr::summarise(
mean_resid = mean(resid, na.rm = TRUE),
.groups = "drop"
) |>
# 7. Reshape wide: stayer, leaver columns
tidyr::pivot_wider(
names_from = status,
values_from = mean_resid,
values_fill = NA
) |>
# 8. Compute gap safely
dplyr::mutate(
gap = stayer - leaver,
scenario = scn_label
)
## (6) Staying hazards by current sector (state dependence proxy)
haz <- dt_scn |>
dplyr::group_by(i) |>
dplyr::arrange(t,.by_group=TRUE) |>
dplyr::mutate(next_sector = dplyr::lead(sector)) |>
dplyr::ungroup() |>
dplyr::filter(!is.na(sector)) |>
dplyr::mutate(stay = as.integer(sector==next_sector)) |>
dplyr::group_by(g, sector) |>
dplyr::summarise(stay_hazard = mean(stay, na.rm=TRUE), .groups="drop") |>
dplyr::mutate(scenario=scn_label)
list(dt=dt_scn,
wages_pool=wages_pool,
moves=moves,
rr=rr,
sl=sl,
haz=haz)
}
# Run the scenarios included in the scenario_list and save the output in `results`
results <- lapply(scenario_list, run_scenario)
names(results) <- scenario_list
## Bind outputs for plotting / tables
wages_all <- dplyr::bind_rows(lapply(results, `[[`, "wages_pool"), .id="scenario_id")
moves_all <- dplyr::bind_rows(lapply(results, `[[`, "moves"), .id="scenario_id")
rr_all <- dplyr::bind_rows(lapply(results, `[[`, "rr"), .id="scenario_id")
sl_all <- dplyr::bind_rows(lapply(results, `[[`, "sl"), .id="scenario_id")
haz_all <- dplyr::bind_rows(lapply(results, `[[`, "haz"), .id="scenario_id")
## Combine scenarios into one panel
to_factor <- function(x) factor(x, levels = c("SUB","PRI","PUB"))
panel_all <- dplyr::bind_rows(
lapply(results, \(res) res$dt |> dplyr::mutate(sector = to_factor(sector))),
.id = "scenario_id"
)
transitions_all <- panel_all |>
dplyr::group_by(scenario_id, g, i) |>
dplyr::arrange(t, .by_group = TRUE) |>
dplyr::mutate(
state_t = factor(ifelse(is.na(sector), "HME", as.character(sector)),
levels = state_levels),
state_t1 = dplyr::lead(state_t)
) |>
dplyr::ungroup() |>
dplyr::filter(!is.na(state_t1)) |>
dplyr::count(scenario_id, g, state_t, state_t1, name = "n") |>
dplyr::group_by(scenario_id, g, state_t) |>
dplyr::mutate(share = n / sum(n)) |>
dplyr::ungroup()
# --- Step 1: wide by destination state (same as you already do)
transitions_wide <- transitions_all |>
tidyr::pivot_wider(
id_cols = c(scenario_id, g, state_t),
names_from = state_t1,
values_from = share,
values_fill = 0
)
# --- Step 2: widen again by gender
transitions_wide_both <- transitions_wide %>%
dplyr::arrange(
factor(scenario_id, levels = scenario_list),
factor(state_t, levels = c("HME","SUB","PRI","PUB")),
g
) %>%
tidyr::pivot_wider(
names_from = g,
values_from = -c(scenario_id, state_t, g),
names_glue = "{.value}_{g}"
)
tbl <- knitr::kable(
transitions_wide_both,
caption = "Transition Matrix by Scenario (Women vs Men)",
align = "c",
digits = 2,
format = "html"
) %>%
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
font_size = 13
) %>%
kableExtra::collapse_rows(
columns = 1, # only scenario_id now (state_t should NOT be collapsed)
valign = "top"
)
kableExtra::save_kable(tbl, "table_trans.png", zoom = 2, density = 300)
tbl
| scenario_id | state_t | HME_fem | HME_men | SUB_fem | SUB_men | PRI_fem | PRI_men | PUB_fem | PUB_men |
|---|---|---|---|---|---|---|---|---|---|
| BASELINE | HME | 0.42 | 0.43 | 0.21 | 0.18 | 0.20 | 0.18 | 0.18 | 0.20 |
| SUB | 0.19 | 0.19 | 0.49 | 0.47 | 0.17 | 0.17 | 0.16 | 0.17 | |
| PRI | 0.18 | 0.19 | 0.17 | 0.16 | 0.49 | 0.48 | 0.15 | 0.17 | |
| PUB | 0.18 | 0.19 | 0.16 | 0.16 | 0.17 | 0.16 | 0.48 | 0.50 | |
| CF1_price_gap | HME | 0.37 | 0.37 | 0.19 | 0.18 | 0.19 | 0.25 | 0.25 | 0.20 |
| SUB | 0.16 | 0.15 | 0.49 | 0.47 | 0.17 | 0.20 | 0.19 | 0.17 | |
| PRI | 0.15 | 0.19 | 0.17 | 0.16 | 0.49 | 0.49 | 0.18 | 0.17 | |
| PUB | 0.18 | 0.16 | 0.16 | 0.16 | 0.18 | 0.19 | 0.47 | 0.50 | |
| CF2_disp | HME | 0.45 | 0.43 | 0.21 | 0.18 | 0.20 | 0.18 | 0.14 | 0.20 |
| SUB | 0.21 | 0.19 | 0.49 | 0.47 | 0.17 | 0.17 | 0.13 | 0.17 | |
| PRI | 0.21 | 0.19 | 0.17 | 0.16 | 0.49 | 0.48 | 0.13 | 0.17 | |
| PUB | 0.18 | 0.19 | 0.16 | 0.16 | 0.18 | 0.16 | 0.48 | 0.50 | |
| CF3_compadv | HME | 0.40 | 0.43 | 0.20 | 0.18 | 0.20 | 0.18 | 0.19 | 0.20 |
| SUB | 0.18 | 0.19 | 0.49 | 0.47 | 0.17 | 0.17 | 0.16 | 0.17 | |
| PRI | 0.19 | 0.19 | 0.17 | 0.16 | 0.49 | 0.48 | 0.15 | 0.17 | |
| PUB | 0.19 | 0.19 | 0.17 | 0.16 | 0.17 | 0.16 | 0.48 | 0.50 | |
| CF4_strong_compadv | HME | 0.42 | 0.43 | 0.21 | 0.18 | 0.21 | 0.18 | 0.15 | 0.20 |
| SUB | 0.21 | 0.19 | 0.49 | 0.47 | 0.17 | 0.17 | 0.13 | 0.17 | |
| PRI | 0.21 | 0.19 | 0.17 | 0.16 | 0.49 | 0.48 | 0.13 | 0.17 | |
| PUB | 0.19 | 0.19 | 0.16 | 0.16 | 0.16 | 0.16 | 0.48 | 0.50 | |
| CF5_family | HME | 0.38 | 0.43 | 0.20 | 0.18 | 0.19 | 0.18 | 0.22 | 0.20 |
| SUB | 0.17 | 0.19 | 0.49 | 0.47 | 0.17 | 0.17 | 0.17 | 0.17 | |
| PRI | 0.17 | 0.19 | 0.17 | 0.16 | 0.49 | 0.48 | 0.17 | 0.17 | |
| PUB | 0.18 | 0.19 | 0.16 | 0.16 | 0.18 | 0.16 | 0.48 | 0.50 | |
| CF6_offers | HME | 0.56 | 0.43 | 0.13 | 0.18 | 0.15 | 0.18 | 0.16 | 0.20 |
| SUB | 0.12 | 0.19 | 0.45 | 0.47 | 0.12 | 0.17 | 0.31 | 0.17 | |
| PRI | 0.12 | 0.19 | 0.13 | 0.16 | 0.45 | 0.48 | 0.30 | 0.17 | |
| PUB | 0.06 | 0.19 | 0.12 | 0.16 | 0.12 | 0.16 | 0.70 | 0.50 |
#Now do by kids
add_states_and_kids2 <- function(df) {
df %>%
arrange(t, .by_group = TRUE) %>%
mutate(
state_t = factor(ifelse(is.na(sector), "HME", as.character(sector)),
levels = state_levels),
state_t1 = dplyr::lead(state_t),
# Next-period fertility level
F_next = dplyr::lead(F),
# Define change ONLY when fertility strictly increases
kid_change2 = dplyr::case_when(
!is.na(F_next) & F != 0 & F_next != F ~ "change",
!is.na(F_next) & F != 0 & F_next == F ~ "no-change",
TRUE ~ NA_character_
),
kid_change2 = factor(kid_change2, levels = c("no-change", "change"))
)
}
summarise_transitions2 <- function(df) {
df %>%
dplyr::ungroup() %>%
dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
dplyr::count(scenario_id, g, kid_change2, state_t, state_t1, name = "n") %>%
tidyr::complete(scenario_id, g, kid_change2, state_t, state_t1, fill = list(n = 0)) %>%
dplyr::mutate(
state_t = factor(state_t, levels = state_levels),
state_t1 = factor(state_t1, levels = state_levels)
) %>%
dplyr::group_by(scenario_id, g, kid_change2, state_t) %>%
dplyr::mutate(share = n / sum(n)) %>%
dplyr::ungroup()
}
transitions_kid2 <- panel_all %>%
dplyr::filter(scenario_id %in% scenario_list) %>% # ← keep BOTH genders
dplyr::group_by(scenario_id, g, i) %>%
add_states_and_kids2() %>%
summarise_transitions2()
library(kableExtra)
to_wide2 <- function(trans_long) {
trans_long %>%
tidyr::pivot_wider(
id_cols = c(scenario_id, g, kid_change2, state_t),
names_from = state_t1,
values_from = share,
values_fill = 0
) %>%
dplyr::arrange(scenario_id, g, kid_change2, state_t)
}
transitions_kid2_wide <- to_wide2(transitions_kid2)
# Keep both genders and pivot gender into columns
transitions_kid2_wide_both <- transitions_kid2_wide %>%
dplyr::arrange(
factor(scenario_id, levels = scenario_list),
factor(state_t, levels = c("HME","SUB","PRI","PUB")),
factor(kid_change2, levels = c("no-change","change")),
g
) %>%
tidyr::pivot_wider(
names_from = g,
values_from = -c(scenario_id, state_t, kid_change2, g),
names_glue = "{.value}_{g}"
)
tbl <- knitr::kable(
transitions_kid2_wide_both,
caption = "Transition Matrix by Scenario × Kids-Change (Fem vs Men)",
align = "c",
digits = 2,
format = "html"
) %>%
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
font_size = 13
) %>%
kableExtra::collapse_rows(
columns = 1:2, # scenario_id + state_t
valign = "top"
)
kableExtra::save_kable(tbl, "table_trans_bykidschge.png", zoom = 2, density = 300)
tbl
| scenario_id | kid_change2 | state_t | HME_fem | HME_men | SUB_fem | SUB_men | PRI_fem | PRI_men | PUB_fem | PUB_men |
|---|---|---|---|---|---|---|---|---|---|---|
| BASELINE | no-change | HME | 0.43 | 0.43 | 0.20 | 0.18 | 0.21 | 0.19 | 0.17 | 0.19 |
| change | HME | 0.47 | 0.36 | 0.25 | 0.19 | 0.11 | 0.23 | 0.18 | 0.22 | |
| no-change | SUB | 0.17 | 0.18 | 0.49 | 0.49 | 0.17 | 0.18 | 0.16 | 0.16 | |
| change | SUB | 0.19 | 0.15 | 0.43 | 0.39 | 0.21 | 0.20 | 0.17 | 0.26 | |
| no-change | PRI | 0.18 | 0.19 | 0.17 | 0.15 | 0.51 | 0.50 | 0.14 | 0.17 | |
| change | PRI | 0.12 | 0.18 | 0.22 | 0.22 | 0.58 | 0.39 | 0.08 | 0.21 | |
| no-change | PUB | 0.17 | 0.18 | 0.17 | 0.15 | 0.18 | 0.16 | 0.48 | 0.51 | |
| change | PUB | 0.22 | 0.15 | 0.10 | 0.16 | 0.15 | 0.21 | 0.53 | 0.49 | |
| CF1_price_gap | no-change | HME | 0.37 | 0.37 | 0.19 | 0.18 | 0.20 | 0.27 | 0.24 | 0.19 |
| change | HME | 0.48 | 0.33 | 0.22 | 0.19 | 0.06 | 0.30 | 0.24 | 0.19 | |
| no-change | SUB | 0.14 | 0.14 | 0.49 | 0.49 | 0.17 | 0.21 | 0.19 | 0.16 | |
| change | SUB | 0.17 | 0.14 | 0.44 | 0.38 | 0.21 | 0.22 | 0.17 | 0.26 | |
| no-change | PRI | 0.15 | 0.19 | 0.17 | 0.15 | 0.50 | 0.50 | 0.17 | 0.16 | |
| change | PRI | 0.10 | 0.15 | 0.22 | 0.24 | 0.58 | 0.38 | 0.10 | 0.23 | |
| no-change | PUB | 0.17 | 0.15 | 0.17 | 0.15 | 0.19 | 0.18 | 0.47 | 0.51 | |
| change | PUB | 0.20 | 0.12 | 0.10 | 0.15 | 0.17 | 0.24 | 0.52 | 0.49 | |
| CF2_disp | no-change | HME | 0.46 | 0.43 | 0.21 | 0.18 | 0.21 | 0.19 | 0.13 | 0.19 |
| change | HME | 0.54 | 0.36 | 0.23 | 0.19 | 0.11 | 0.23 | 0.12 | 0.22 | |
| no-change | SUB | 0.20 | 0.18 | 0.49 | 0.49 | 0.17 | 0.18 | 0.13 | 0.16 | |
| change | SUB | 0.21 | 0.15 | 0.43 | 0.39 | 0.21 | 0.20 | 0.15 | 0.26 | |
| no-change | PRI | 0.20 | 0.19 | 0.17 | 0.15 | 0.51 | 0.50 | 0.13 | 0.17 | |
| change | PRI | 0.12 | 0.18 | 0.22 | 0.22 | 0.58 | 0.39 | 0.08 | 0.21 | |
| no-change | PUB | 0.16 | 0.18 | 0.17 | 0.15 | 0.18 | 0.16 | 0.49 | 0.51 | |
| change | PUB | 0.21 | 0.15 | 0.12 | 0.16 | 0.15 | 0.21 | 0.52 | 0.49 | |
| CF3_compadv | no-change | HME | 0.40 | 0.43 | 0.20 | 0.18 | 0.22 | 0.19 | 0.18 | 0.19 |
| change | HME | 0.49 | 0.36 | 0.22 | 0.19 | 0.12 | 0.23 | 0.17 | 0.22 | |
| no-change | SUB | 0.17 | 0.18 | 0.49 | 0.49 | 0.17 | 0.18 | 0.17 | 0.16 | |
| change | SUB | 0.19 | 0.15 | 0.44 | 0.39 | 0.21 | 0.20 | 0.15 | 0.26 | |
| no-change | PRI | 0.18 | 0.19 | 0.17 | 0.15 | 0.51 | 0.50 | 0.14 | 0.17 | |
| change | PRI | 0.10 | 0.18 | 0.22 | 0.22 | 0.59 | 0.39 | 0.08 | 0.21 | |
| no-change | PUB | 0.17 | 0.18 | 0.17 | 0.15 | 0.17 | 0.16 | 0.49 | 0.51 | |
| change | PUB | 0.23 | 0.15 | 0.12 | 0.16 | 0.15 | 0.21 | 0.50 | 0.49 | |
| CF4_strong_compadv | no-change | HME | 0.42 | 0.43 | 0.21 | 0.18 | 0.23 | 0.19 | 0.14 | 0.19 |
| change | HME | 0.48 | 0.36 | 0.22 | 0.19 | 0.14 | 0.23 | 0.16 | 0.22 | |
| no-change | SUB | 0.19 | 0.18 | 0.49 | 0.49 | 0.17 | 0.18 | 0.14 | 0.16 | |
| change | SUB | 0.23 | 0.15 | 0.43 | 0.39 | 0.21 | 0.20 | 0.13 | 0.26 | |
| no-change | PRI | 0.20 | 0.19 | 0.17 | 0.15 | 0.51 | 0.50 | 0.12 | 0.17 | |
| change | PRI | 0.13 | 0.18 | 0.22 | 0.22 | 0.58 | 0.39 | 0.07 | 0.21 | |
| no-change | PUB | 0.18 | 0.18 | 0.17 | 0.15 | 0.16 | 0.16 | 0.49 | 0.51 | |
| change | PUB | 0.24 | 0.15 | 0.11 | 0.16 | 0.17 | 0.21 | 0.48 | 0.49 | |
| CF5_family | no-change | HME | 0.32 | 0.43 | 0.18 | 0.18 | 0.20 | 0.19 | 0.30 | 0.19 |
| change | HME | 0.46 | 0.36 | 0.22 | 0.19 | 0.06 | 0.23 | 0.26 | 0.22 | |
| no-change | SUB | 0.13 | 0.18 | 0.49 | 0.49 | 0.17 | 0.18 | 0.21 | 0.16 | |
| change | SUB | 0.16 | 0.15 | 0.45 | 0.39 | 0.22 | 0.20 | 0.18 | 0.26 | |
| no-change | PRI | 0.14 | 0.19 | 0.17 | 0.15 | 0.51 | 0.50 | 0.19 | 0.17 | |
| change | PRI | 0.09 | 0.18 | 0.22 | 0.22 | 0.57 | 0.39 | 0.12 | 0.21 | |
| no-change | PUB | 0.18 | 0.18 | 0.17 | 0.15 | 0.19 | 0.16 | 0.47 | 0.51 | |
| change | PUB | 0.20 | 0.15 | 0.10 | 0.16 | 0.18 | 0.21 | 0.52 | 0.49 | |
| CF6_offers | no-change | HME | 0.58 | 0.43 | 0.13 | 0.18 | 0.16 | 0.19 | 0.14 | 0.19 |
| change | HME | 0.52 | 0.36 | 0.07 | 0.19 | 0.20 | 0.23 | 0.22 | 0.22 | |
| no-change | SUB | 0.11 | 0.18 | 0.44 | 0.49 | 0.12 | 0.18 | 0.32 | 0.16 | |
| change | SUB | 0.11 | 0.15 | 0.47 | 0.39 | 0.08 | 0.20 | 0.33 | 0.26 | |
| no-change | PRI | 0.13 | 0.19 | 0.14 | 0.15 | 0.46 | 0.50 | 0.27 | 0.17 | |
| change | PRI | 0.06 | 0.18 | 0.23 | 0.22 | 0.45 | 0.39 | 0.26 | 0.21 | |
| no-change | PUB | 0.06 | 0.18 | 0.12 | 0.15 | 0.12 | 0.16 | 0.70 | 0.51 | |
| change | PUB | 0.09 | 0.15 | 0.06 | 0.16 | 0.18 | 0.21 | 0.67 | 0.49 |
haz_kid2 <- panel_all %>%
dplyr::group_by(scenario_id, g, i) %>%
add_states_and_kids2() %>%
dplyr::ungroup() %>%
dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
dplyr::filter(state_t %in% sectors) %>%
dplyr::group_by(scenario_id, g, kid_change2, sector = state_t) %>%
dplyr::summarise(stay_hazard = mean(state_t1 == sector), .groups="drop")
# --- WOMEN ---
haz_pub_fem <- haz_kid2 %>%
dplyr::filter(g == "fem", sector == "PUB")
p_women <- ggplot(haz_pub_fem,
aes(x = scenario_id, y = stay_hazard, fill = kid_change2)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "Staying Hazard in Public Sector (Women)",
x = "Scenario", y = "Stay probability") +
theme_minimal()
ggsave("fig_stayhaz_fem.png", dpi = 300, width = 10, height = 8)
#print(p_women)
# --- MEN ---
haz_pub_men <- haz_kid2 %>%
dplyr::filter(g != "fem", sector == "PUB")
p_men <- ggplot(haz_pub_men,
aes(x = scenario_id, y = stay_hazard, fill = kid_change2)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::percent_format()) +
labs(title = "Staying Hazard in Public Sector (Men)",
x = "Scenario", y = "Stay probability") +
theme_minimal()
ggsave("fig_stayhaz_men.png", dpi = 300, width = 10, height = 8)
#print(p_men)
gridExtra::grid.arrange(p_women, p_men, ncol = 1)
mean_df <- wages_all |>
dplyr::group_by(scenario_id, sector, g) |>
dplyr::summarise(mean_logW = mean(logW, na.rm=TRUE), .groups="drop")
plot <- ggplot(wages_all, aes(x = logW, color = g, fill = g)) +
geom_density(alpha = 0.2) +
geom_vline(data = mean_df,
aes(xintercept = mean_logW, color = g),
linetype="dashed") +
facet_grid(sector ~ scenario_id, scales="free_y") +
theme_minimal() +
labs(title = "Log Wage Density by Sector",
x = "Log wage", y = "Density")
#print(plot)
#ggsave("log_wage_density.png", plot = plot,width = 10, height = 8, dpi = 300)
ggsave("fig_logw_density.png", dpi = 300, width = 10, height = 8)
print(plot)
Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage
moves_plot <- moves_all |>
dplyr::filter(orig == "PRI", dest == "PUB")
plot <- ggplot(moves_plot, aes(x = g, y = dlogW, fill = g)) +
geom_boxplot(outlier.alpha = 0.2) +
geom_hline(yintercept = 0, linetype=2) +
facet_grid(. ~ scenario_id) +
theme_minimal() +
labs(title = "Mover Gains PRI → PUB",
x = "Gender", y = "Δ log wage")
ggsave("fig_movergain1.png", dpi = 300, width = 10, height = 8)
print(plot)
Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage
moves_plot2 <- moves_all |>
dplyr::filter(orig == "PUB", dest == "PRI")
plot <- ggplot(moves_plot2, aes(x = g, y = dlogW, fill = g)) +
geom_boxplot(outlier.alpha = 0.2) +
geom_hline(yintercept = 0, linetype=2) +
facet_grid(. ~ scenario_id) +
theme_minimal() +
labs(title = "Mover Gains PUB → PRI",
x = "Gender", y = "Δ log wage")
ggsave("fig_movergain2.png", dpi = 300, width = 10, height = 8)
print(plot)
Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)
rr_table <- rr_all |>
dplyr::arrange(g, scenario_id) |>
gt(rowname_col = "scenario_id", groupname_col = "g") |>
tab_header(
title = md("**Rank–Rank Slope for PUB → PRI Movers**"),
subtitle = md(
"A lower slope indicates stronger comparative advantage: \
it measures how strongly a worker’s within-PUB sector rank \
predicts their rank in PRIVATE sector upon moving."
)
) |>
fmt_number(columns = dplyr::everything(), decimals = 3) |> ## Format slope values nicely
tab_style(
style = cell_text(weight = "bold"),
locations = cells_stub() ## Styling
) |>
opt_row_striping() |>
tab_options(
table.width = pct(100),
data_row.padding = px(4),
table.font.size = 13
)
#rr_table
gtsave(rr_table, "table_ranks.png") # PNG
gtsave(rr_table, "table_ranks.pdf") # PDF
rr_table
| Rank–Rank Slope for PUB → PRI Movers | ||
| A lower slope indicates stronger comparative advantage: it measures how strongly a worker’s within-PUB sector rank predicts their rank in PRIVATE sector upon moving. | ||
| rank_slope | scenario | |
|---|---|---|
| fem | ||
| BASELINE | 0.020 | BASELINE |
| CF1_price_gap | 0.003 | CF1_price_gap |
| CF2_disp | 0.040 | CF2_disp |
| CF3_compadv | −0.197 | CF3_compadv |
| CF4_strong_compadv | −0.160 | CF4_strong_compadv |
| CF5_family | 0.006 | CF5_family |
| CF6_offers | −0.039 | CF6_offers |
| men | ||
| BASELINE | −0.069 | BASELINE |
| CF1_price_gap | −0.031 | CF1_price_gap |
| CF2_disp | −0.069 | CF2_disp |
| CF3_compadv | −0.069 | CF3_compadv |
| CF4_strong_compadv | −0.069 | CF4_strong_compadv |
| CF5_family | −0.069 | CF5_family |
| CF6_offers | −0.069 | CF6_offers |
sl_table <- sl_all |>
dplyr::arrange(g, scenario) |>
gt(rowname_col = "scenario", groupname_col = "g") |>
tab_header(
title = md("**Stayer–Leaver Residual Gap in Origin Sector PUB**"),
subtitle = md(
"Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: \
‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector."
)
) |>
fmt_number(columns = c(stayer, leaver, gap), decimals = 3) |> ## Format columns nicely
opt_row_striping() |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_stub() ## Styling
) |>
tab_options(
table.width = pct(100),
table.font.size = 13,
data_row.padding = px(4)
)
gtsave(sl_table, "table_stayleave_haz.png") # PNG
gtsave(sl_table, "table_stayleave_haz.pdf") # PDF
sl_table
| Stayer–Leaver Residual Gap in Origin Sector PUB | ||||
| Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector. | ||||
| scenario_id | leaver | stayer | gap | |
|---|---|---|---|---|
| fem | ||||
| BASELINE | BASELINE | −0.001 | 0.001 | 0.002 |
| CF1_price_gap | CF1_price_gap | −0.008 | 0.005 | 0.014 |
| CF2_disp | CF2_disp | −0.002 | −0.003 | −0.001 |
| CF3_compadv | CF3_compadv | −0.012 | 0.006 | 0.017 |
| CF4_strong_compadv | CF4_strong_compadv | −0.012 | 0.002 | 0.014 |
| CF5_family | CF5_family | −0.005 | 0.006 | 0.011 |
| CF6_offers | CF6_offers | 0.001 | 0.001 | 0.000 |
| men | ||||
| BASELINE | BASELINE | 0.007 | −0.002 | −0.009 |
| CF1_price_gap | CF1_price_gap | 0.006 | −0.002 | −0.008 |
| CF2_disp | CF2_disp | 0.007 | −0.002 | −0.009 |
| CF3_compadv | CF3_compadv | 0.007 | −0.002 | −0.009 |
| CF4_strong_compadv | CF4_strong_compadv | 0.007 | −0.002 | −0.009 |
| CF5_family | CF5_family | 0.007 | −0.002 | −0.009 |
| CF6_offers | CF6_offers | 0.007 | −0.002 | −0.009 |
Wage transitions for consecutive employment (by scenario)
transitions <- panel_all |>
filter(g == "fem") %>%
dplyr::group_by(scenario_id, i) |>
dplyr::arrange(t, .by_group = TRUE) |>
dplyr::mutate(next_logW = dplyr::lead(logW),
next_sector = dplyr::lead(sector),
next_F = dplyr::lead(F),
employed_next = dplyr::lead(employed)) |>
dplyr::ungroup() |>
dplyr::filter(employed == 1, employed_next == 1) |>
dplyr::mutate(dlogW = next_logW - logW,
F_trans = dplyr::case_when(
F == 0 & next_F == 0 ~ "no kids -> no kids",
F == 0 & next_F >= 1 ~ "no kids -> kids",
F >= 1 & next_F >= 1 ~ "kids -> kids",
F >= 1 & next_F == 0 ~ "kids -> no kids"
))
dlogW_stats <- transitions |>
dplyr::mutate(orig = sector, dest = next_sector) |>
# >>> keep only PUB→PRI or PRI→PUB transitions <<<
dplyr::filter(
(orig == "PUB" & dest == "PRI") |
(orig == "PRI" & dest == "PUB")
) |>
dplyr::group_by(scenario_id, g, orig, dest, F_trans) |>
dplyr::summarise(mean_dlogW = mean(dlogW, na.rm = TRUE),
n = n(), .groups = "drop") |>
dplyr::mutate(orig = to_factor(orig), dest = to_factor(dest)) |>
dplyr::arrange(scenario_id, g, orig, dest, F_trans)
#print(head(dlogW_stats, 20))
p_dlog <- dlogW_stats |>
dplyr::filter(n >= 2) |>
ggplot(aes(x = interaction(orig, dest, sep = "→"),
y = mean_dlogW, fill = F_trans)) +
geom_col(position = "dodge") +
facet_grid(rows = vars(g), cols = vars(scenario_id)) +
labs(title = "Mean Δ log wage for consecutive employment by scenario",
x = "Origin → Destination sector", y = "Mean Δ log wage",
fill = "Fertility transition") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggsave("fig_wtrans_kids.png", dpi = 300, width = 10, height = 8)
print(p_dlog)
logwages <- panel_all |>
dplyr::filter(!is.na(sector), !is.na(logW)) |>
dplyr::group_by(scenario_id, t, g, sector) |>
dplyr::summarise(mean_logw = mean(logW), .groups="drop")
plot <- ggplot(logwages,
aes(x = t, y = mean_logw, color = sector, linetype = g)) +
geom_line() +
facet_grid(sector ~ scenario_id, scales="free_y") +
theme_minimal() +
labs(title = "Mean Log Wages Over Time",
x = "Time", y = "Mean log wage")
ggsave("fig_wage.png", dpi = 300, width = 10, height = 8)
print(plot)