Time is discrete, \(t=0,1,\dots,T\). Each individual \(i\) has gender \(g\in\{m,f\}\) and education \(e\). There are 3 employment sectors, \[ \mathcal S \equiv \{\text{Subsistence }(SUB),\ \text{Private }(PRI),\ \text{Public }(PUB)\}, \] and a home alternative \(H\) (non-employment).
The state at time \(t\) for individual \(i\) is: \[ \Xi_{it}=\Big(g,e,\ r_{it},\ \mathbf{\varepsilon}_i,\ F_{it},\ M_{it}\Big), \] where:
\(\mathbf{x}_{it}=(x_{it}(SUB),x_{it}(PRI),x_{it}(PUB))\) are sector-specific experience stocks (years),
\(r_{it}\in \mathcal S\cup\{H\}\) is last period’s chosen alternative (current “attachment”),
\(\mathbf{\varepsilon}_i=(\varepsilon_{i,sub},\varepsilon_{i,pri},\varepsilon_{i,pub})\) are permanent sector-specific skills,
\(F_{it}\in\{0,1,2,\dots\}\) is an exogenous fertility index (e.g., number or age-weighted count of children).
\(M_{it}\in\{0,1 \dots\}\) is exogenous marital status.
Assume \(F_{it}\) evolves exogenously via a Markov process \(Q(F_{t+1}\mid F_t)\) that is excluded from wages and excluded from offer arrival (it only affects utility).
If \(i\) works in sector \(s\in\mathcal S\) at \(t\), the (log) wage is: \[ \boxed{ \ln W_{it}^g(s) = \beta_{0,g}(s)\;+\;\beta_1\,\text{educ}_i\;\;+\;\varepsilon_{is}. } \] - Excluding work experience for now.
\(\beta_{0,g}(s)\): skill price (gender- and sector-specific; may capture discrimination).
\(\beta_1\): return to education (common).
\(\beta_2\): return to sector-specific experience if we have experience (common).
\(\varepsilon_{is}\): permanent sector ability (drawn at entry; no transitory wage shocks; no new wage draws).
Distribution of permanent skills. For tractability and mover-based identification, assume a multivariate Normal by gender: \[ \mathbf{\varepsilon}_i \mid g \sim \mathcal N\big(\mathbf{0},\ \Sigma_g\big),\qquad \Sigma_g= \begin{pmatrix} \sigma^2_{1,g} & \rho_{12,g} \,\, \sigma_{1,g}\,\, \sigma_{2,g} & \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} \\ \rho_{12,g} \,\, \sigma_{1,g} \,\, \sigma_{2,g} & \sigma^2_{2,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g}\\ \rho_{13,g} \,\, \sigma_{1,g} \,\, \sigma_{3,g} & \rho_{23,g} \,\, \sigma_{2,g} \,\, \sigma_{3,g} & \sigma^2_{3,g} \end{pmatrix}. \]
(Alternative: independent Fréchet components; results below do not require closed-form shares.)
For \(\Sigma_g\) to be a positive semidefinite (ie a valid covariance matrix), the corr matrix must also satisfy \[ 1 + 2 \, \rho_{12,g}\, \rho_{13,g}\, \rho_{23,g} - \, \rho_{12,g}^2 - \, \rho_{13,g}^2 - \, \rho_{23,g}^2 \] So for various pars values, make sure you check this condition.
Each period, at most one employment offer arrives. Let \(O_{it}\in\{S,G,P,\varnothing\}\) denote the sector offered at \(t\) (or \(\varnothing\) if no offer). Define arrival probabilities conditional on observables \(X_{it}\) (age, past sector, etc.):
\[ \Pr\big(O_{it}=s\mid X_{it},g\big)=\lambda^g_s(X_{it}),\quad s\in\{S,G,P\},\qquad \Pr(O_{it}=\varnothing\mid X_{it},g)=1-\sum_{s}\lambda^g_s(X_{it}). \] Key friction: If \(r_{it}=s\) (currently attached to \(s\)), the agent can continue in sector \(s\) only if \(O_{it}=s\). Otherwise, they can’t remain employed in \(s\) this period.
Home \(H\) is always available (no offer needed).
In other words, agents compare at most one employment sector vs home choice each period.
Let per-period utility be quasi-separable in (log) wages and non-pecuniary sector tastes shifted by fertility:
\[ u_{it}(s) = \beta_u\,\ln W_{it}^g(s) \;+\; \gamma_s\,F_{it},\qquad s\in\{SUB,PRI,PUB\}, \] with a home utility \(u_{it}(H)= b_0 \;+\; \gamma_H\,F_{it},\qquad b_0\in\mathbb R.\)
\(\beta_u>0\) scales log-wage into utils.
\(\gamma_s\) are sector-specific fertility shifters (exogenous; do not affect wages or offers).
No additional idiosyncratic taste shocks are included (only permanent wage heterogeneity). This focuses identification on fertility as an exclusion.
Timing Within Period: At the start of period \(t\) given state \(\Xi_{it}\):
Offer draw: \(O_{it}\in\{S,G,P,\varnothing\}\) is realized from \(\{\lambda^g_s(X_{it})\}\).
Choice set: \(\mathcal C_{it}=\{H\}\cup\{O_{it}\}\,\) (home plus the one offered sector if any).
Decision: choose \(d_{it}\in\mathcal C_{it}\) to maximize current utility plus continuation.
Experience update given choice; then \(F_{it}\) evolves to \(F_{it+1}\sim Q(\cdot\mid F_{it})\).
Let \(V_t(\Xi)\) be the value function. Define the sector-specific continuation value if offered \(s\):
\[ \mathcal V_t(s;\Xi) \equiv u_t(s) \;+\; \beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=s\big], \]
and the home continuation value: \[ \mathcal V_t(H;\Xi) \equiv u_t(H)\;+\;\beta\,\mathbb E\big[V_{t+1}(\Xi_{t+1})\ \big|\ \Xi_t=\Xi,\ r_{t+1}=H\big], \] with discount factor \(\beta\in(0,1)\).
Given the one-offer constraint, \[ V_t(\Xi) \;=\; \underbrace{\Pr(O_t=\varnothing\mid X,g)}_{\text{no offer}} \cdot \max\{\mathcal V_t(H;\Xi)\} \;+\; \sum_{s\in\{S,G,P\}} \underbrace{\Pr(O_t=s\mid X,g)}_{\lambda^g_s(X)} \cdot \max\{\mathcal V_t(H;\Xi),\ \mathcal V_t(s;\Xi)\}. \] The law of motion inside the expectations updates experience per choice and draws \(F_{t+1}\) exogenously.
stayers vs. movers:
If \(r_t=s\) and \(O_t=s\), the agent may stay in \(s\) by comparing \(\mathcal V_t(s;\Xi)\) to \(\mathcal V_t(H;\Xi)\).
If \(O_t\ne s\) (or \(O_t=\varnothing\)), staying in \(s\) is infeasible; the agent compares only \(H\) and the offered \(O_t\) (if any).
Movers are observed when \(O_t=s'\ne r_t\) and \(\mathcal V_t(s';\Xi)\ge \mathcal V_t(H;\Xi)\).
To interpret gender wedges as skill price differences:
Fix a base sector \(s_0\) (e.g., Private) and set \(\beta_{0,m}(s_0)=0\).
Center permanent skills by gender–sector: \(\mathbb E[\varepsilon_{is}\mid
g]=0\).
Then \(\beta_{0,f}(s)-\beta_{0,m}(s)\)
is a price/discrimination wedge in sector \(s\).
Optionally fix \(b_0=0\) or \(\beta_u=1\) for utility scale.
Observables and Moments: Given panel data on \(\{s_{it},\ \ln W_{it}\ \text{if } s_{it}\in\mathcal S,\ F_{it},\ e,\ \mathbf{x}_{it}\}\).
Define moments (by gender and also by education/experience if applicable):
Sector shares: \(\pi_{gs}(t)=\Pr(s_{it}=s\mid g,t)\).
Transitions: \(P_g(s'|s)=\Pr(s_{it+1}=s'\mid s_{it}=s,g)\).
Within-sector wages: \(\bar w_{gs}(t)=\mathbb E[\ln W_{it}\mid s_{it}=s,g,t]\), and dispersion \(\sigma^2_{gs}(t)\).
Mover wage gaps: \(\mathbb E[\Delta \ln W_{i} \mid r_t=s,\ s_{t+1}=s',\ g]\).
Fertility responses: event-study \(\Delta \pi_{gs}\) and \(\Delta P_g(s'|s)\) around \(\Delta F_{it}\).
Within-sector cross-sectional variation in education identifies \(\beta_1\).
Within-person wage growth for stayers in \(s\) identifies \(\beta_2\):
\[ \Delta \ln W_{it}^g(s) \;=\; \beta_2\cdot \Delta x_{it}(s),\quad \text{since }\varepsilon_{is}\ \text{is permanent}. \]
Using movers and the one-offer structure:
\[ \mathbb E\big[\ln W_{i}(s') - \ln W_{i}(s)\mid r_t=s,\ s_{t+1}=s',\ g\big] = \underbrace{\beta_{0,g}(s')-\beta_{0,g}(s)}_{\text{price difference}} + \beta_2\,\Delta x_i + \underbrace{\mathbb E[\varepsilon_{is'}-\varepsilon_{is}\mid \text{move } s\!\to\! s']}_{\text{selection term}}, \]
where the selection term arises because movers are selected on \(\mathbf{\varepsilon}\) given the acceptance
rule and offers.
Given \(\Sigma_g\) (below) and the offer process (below), the selection term is pinned down in the simulated model; the residual identifies \(\beta_{0,g}(s')-\beta_{0,g}(s)\) up to the base-sector normalization.
Gender wedges. With centered \(\varepsilon\), the cross-gender difference \(\beta_{0,f}(s)-\beta_{0,m}(s)\) is interpretable as discrimination (skill price) in sector \(s\).
Within-sector residual dispersion of \(\ln W\) (net of \(\beta_1,\beta_2\)) identifies variances \(\sigma^2_{gs}\).
Mover patterns (who moves where; gains upon moving) identify covariances \(\rho_{g,ss}\): strong positive correlation implies individuals who are strong in \(s\) are also strong in \(s'\) (affecting acceptance when offered \(s'\)).
\(\Sigma_g\) cannot mimic within-person, event-timed changes tied to fertility (see below), which helps separate it from \(\gamma\).
Fertility \(F_{it}\) enters utility only (not wages/offers). Around fertility events:
\[ \frac{\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k+1)} {\Pr(s_{t+1}=s'\mid s_t=s, F_{t+1}=k)} \approx \frac{\text{acceptance}(s'|F=k+1)}{\text{acceptance}(s'|F=k)}. \]
If public-sector transitions and shares jump for women at childbirth while wages/offer proxies do not, then \(\gamma_G>0\) (family-friendly public sector) is identified—separate from \(\beta_{0,g}(s)\) and \(\Sigma_g\).
Given \(\gamma\) (acceptance shifts at fertility) and wage-side parameters, the levels of transitions identify the arrival probabilities:
\[ \Pr(s_{t+1}=s'\mid s_t=s) = \underbrace{\lambda^g_{s'}(X_{t+1})}_{\text{arrival}} \times \underbrace{\Pr(\text{accept } s'\mid O_{t+1}=s',\Xi_{t+1};\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g)}_{\text{acceptance}}. \]
Fertility-timed contrasts net out acceptance shifts, isolating \(\lambda^g_{s'}(\cdot)\) from the remaining variation in transitions.
With appropriate normalization e.g. \(\mathbb E[\varepsilon_{is}\mid g]=0\), mean gender differences within a sector load on \(\beta_{0,f}(s)-\beta_{0,m}(s)\) (instead of \(\varepsilon\)).
Relative female advantage in public (vs private) can arise from either
higher \(\beta_{0,f}(PUB)-\beta_{0,m}(PUB)\) (skill price advantages), or
higher \(\sigma_{3,f}\) or favorable \(\rho_{23,f}\) (women’s endowment more aligned with \(PUB\) ie comparative advantages).
Will use mover wage gaps and fertility-timed acceptance shifts to separate these channels: \(\beta_0\) affects wage levels and mover gaps; \(\gamma\) affects acceptance around fertility; \(\Sigma_g\) affects dispersion and who moves, not event-timed shifts.
Parameter blocks: \(\Theta=\{\beta_u,\beta_0,\beta_1,\beta_2,\gamma,\Sigma_g,\lambda\}\) with normalizations (\(\beta_{0,m}(PRI)=0\), centered \(\varepsilon\)).
Simulate offer draws \(O_{it}\) from \(\lambda^g_s(X_{it})\) over observed \(X_{it}\) paths.
Solve choices period-by-period (myopic or with continuation via forward simulation) given \(F_{it}\), wages, and one-offer constraint.
Match moments: sector shares, transitions (overall and by fertility states), within-sector wage means/variances, and mover wage gaps.
Over-ID checks: out-of-sample moments (e.g., wage tails, acceptance at non-fertility events).
Possible counterfactuals to run using the estimated model:
No discrimination: set \(\beta_{0,f}(s)=\beta_{0,m}(s)\) for all \(s\), keep \(\Sigma_g,\gamma,\lambda\) fixed, recompute outcomes.
No family-friendliness differential: set
\[ \gamma_{PUB}=\gamma_{PRI}=\gamma_{SUB} \]
(or \(\gamma_s=0\)), examine how fertility no longer tilts acceptance.
Offer equalization: set \(\lambda^f_s(X)=\lambda^m_s(X)\), keep other primitives.
This dynamic Roy-with-offers model hardwires the “one-offer-per-period” friction, forcing agents to compare one employment sector against home each period (home always feasible; staying in the same job requires a fresh offer). Wages depend only on permanent sector-specific skills and observables; there are no transitory wage shocks. Fertility is an observed, exogenous shifter of non-pecuniary sector utility only.
Identification separates:
Prices \(\beta_{0,g}(s)\) via mover wage gaps and wage levels,
Endowments \(\Sigma_g\) via dispersion and selection patterns,
Offers \(\lambda\) via transition levels net of acceptance,
Family-friendliness \(\gamma\) via within-person,fertility-timed acceptance shifts.
The combination of these aspects of the data allows to separate the role of skill prices, skill comparative advantage and offer probabilities.
For example: The avg log wage difference for women who move from public to private is around -0.46 and -0.09 for women and men, respectively. Can use this information in conjunction with other aspects of the data. In order to distinguish the roles of skill price gaps vs. comparative advange/endowmentstructure, we use many data oments at once, nost jst te mean waegs ormean log wage differences for stayers and movers. In the state-depdendent offer model specified above, each block pushes differnet patterns in the data:
Variance and shape of mover gains, not just the mean.
Price differences shift the mean of \(\Delta w\) almost uniformly, the variance and skew of \(\Delta w \mid s \rightarrow s',g\) are driven by the distribution of \(\varepsilon_{is'}- \varepsilon_{is}\) under selection.
Matching \(Var(\Delta w)\) by origin-destination-gender pins down \(\sigma_{gs}, \sigma_{gs'}\) and especially \(\rho_{g,ss'}\). A big mean gap with small variance points to \(\Delta \beta_{0,g}\); a big variance/left tail points to comparative advantage sorting.
Rank-rank (or quantile-quantile) patterns for movers. - Regress post-move residual-wage rank in s’ on pre-move residual rank in s for movers of gender g. The slope identifies \(\rho_{g,ss'}\).
Stayer-leaver contrasts within origin sectors.
Compare residual wages (after removing \(\beta_{0,g}(s) + \beta_1 educ\) of stayers vs leavers from \(s\), by gender.
Strong negative selection of leavers (leavers much lower residuals in \(s\) than stayers) is evidence of comparative advantage. Pure price gaps do not create differential stayer-leaver sorting strength.
Bidirectional flows: symmetry tests.
Jointly match \(PUB \rightarrow PRI\) and \(PRI \rightarrow PUB\) rates and gains by gender.
A price gap \(\Delta \beta_{0,g}(PRI,PUB)\) pushes both directions’ means (opposite sign, similar magnitude after accounting for selection).
Comparative advantage shows asymmetric selection and different dispersion across the two directions.
Within-sector dispersion (pooled), by gender.
\(Var(\text{residual } lnW \mid s,g)\) pins down \(\sigma_{gs}^2\).
If women’s residual dispersion is notably different in \(PRI\) vs. \(PUB\), you can’t match both dispersions and the mover distributions with a pure price story; you’re forced to adjust \(\Sigma_g\) (including \(\rho_{23,g}\)).
Event-time around fertility (the main exclusion):
State-dependent offer matrix moments.
The added tweak (offers depend on current sector) gives extra moments: staying hazards \(P(s \rightarrow s)\), directional flows \(P(PUB \rightarrow PRI)\) vs \(P(PRI \rightarrow PUB)\) by gender and fertility.
These depend strongly on \(\lambda_{s' \mid s,g}\) and \(\gamma\) but not on \(\beta_0\). Matching these pins down frictions and acceptance,leaving \(\beta_0\) to be identified by wage levels and mover means once selection is accounted for.
Practically, how does the estimator tell them apart?
In the structural SMM:
We normalize \(E[\varepsilon_{is} \mid g]=0\) by gender and sector. and set a base \(\beta_{0,m}(s_0)=0\). Then mean within-sector wages by gender identify relative \(\beta_{0,g}(s)\), while dispersions and rank-rank moments identify \(\Sigma_g\) (including \(\rho_{g,ss'}\)).
We match simultaneously:
No single moment is decisive; the joint fit forces a decomposition: If you try to explain the women’s -0.46 mean with only a large \(\Delta \beta_{0,f}(PRI,PUB)\), you will misfit - the vairance of \(\Delta w\), - the mover rank-rank slope, (iii) stayer-leaver gaps, or (iv) the reverse \(PRI \rightarrow PUB\) flow so the optimizer shifts weight into \(\Sigma_f\) (comparative advantage) until all moments line up. Conversely, if you try to do it only with \(\Sigma_f\), you will miss within-sector mean levels and pooled wage gaps unless \(\beta_{0,f}(P)- \beta_{0,f}(G)\) adjusts.
Two concrete diagnostics:
Rank-rank plot for movers \(PUB \rightarrow PRI\) (separately by gender): Compute residual ranks in PUB before move and residual ranks in PRI after move. A much flatter slope for women than men \(\rightarrow\) stronger negative \(\rho_{23,f}\) / comparartive advantage channel.
Stayer-leaver residual gap in G (by gender): \(E[\tilde{w} \mid \text{stay in PUB}] - E[\tilde{w} \mid \text{leave PUB} ]\). A larger gap for women indicates stronger selection on \(\varepsilon\) among female leavers, again pointing to endowments rather than prices.
Sources of gender differences and discussion of which moments move. What identifies prices \(\beta_0\), vs. endowments (\(\Sigma_g\)), and separates both from offers \(\lambda\) and tastes \(\gamma\). How to read the scenario results (ie what identifies what)
Scenario 1 (prices only): Gender gaps come only from \(\beta_{0,g}(s)\). You should see parallel shifts in within-sector wage means, similar dispersion across genders, and mover \(\Delta log W\) distributions shifting mainly in means. Rank-rank slopes (rr) and stayer-leaver gaps (sl) should be similar across genders - little evidence of comparative advantage differences.
Scenario 2 (endowments only): Prices equalized; genders differ only in \(\Sigma_g\) (variances/covariances for the joint distribution of sector-specific skill endowments). Expect different dispersions, different shapes/variances of mover gains, flatter female rank-rank slope \(PUB \rightarrow PRI\) if \(\rho_{23,g}\) is lower (rr), and larger stayer-leaver residual gap for the gender with stronger selection (sl). Means won’t shift much without price differences.
Scenario 3 (offers only): Prices and \(\Sigma\) equal; only offer matrices differ (state persistence). Within-sector wages look similar, mover gains distributions similar, but staying hazards differ by gender and sector. Sector shares (not shown but trivial to add) will reflect opportunity, not wages.
Scenario 4 (fertility tastes only): Prices, \(\Sigma\) offers equal; only \(\gamma\) differs. Around births you will see gender-specific shifts into the more family-friendly sector (public), with composition changes in p1/p2 but no shifts in underlying prices.
Click on show to see corresponding code.
The program first specifies basic run specifications such as number of periods, simulation size etc. and then functional forms.
It then assigns some parameter values which are loosely calibrated to make wage and employment transitions to look somewhat reasonable. Such calibrated values would be the starting values in the actual estimation but here we’re just using it to demonstrate how the model works.
Each scenario refers to a different combination of
parameter values to see which moments are most responsive to which
parameters, whichhelps to see what aspets ofthe data will be useful for
identification of key parameters.
## =========================================================
## Dynamic Roy with One-Sector, STATE-DEPENDENT Offers
## (Experience removed; permanent sector skills only)
## ---------------------------------------------------------
## - 3 employment sectors
## - Home H always available
## - One offer per period (or none)
## - Offer arrival PROBABILITIES depend on CURRENT SECTOR (state)
## - Wages fixed per individual-sector (permanent epsilon + observables)
## - Fertility shifts utility (not wages, not offers)
## - Finite horizon, solved by backward induction
## =========================================================
set.seed(42)
suppressPackageStartupMessages({
library(dplyr); library(tidyr); library(ggplot2); library(kableExtra)
})
## ----------------------------
## 1) Model primitives
## ----------------------------
T_per <- 5
N <- 1000
HOME <- "HME"
sectors <- c("SUB","PRI","PUB")
R_states <- c(HOME, sectors)
state_levels <- c("HME","SUB","PRI","PUB")
RUN_GENDER <- "fem" # "men", "fem", "both"
if (RUN_GENDER == "both") {
genders <- c("men", "fem")
} else if (RUN_GENDER == "men") {
genders <- c("men")
} else if (RUN_GENDER == "fem") {
genders <- c("fem")
}
## ---- State-dependent (current-sector) offer process.
## tilt the offered sector toward the CURRENT r_t by adding a "stay bonus"
#stay_bonus <- 0.20 # extra mass for re-offer from current sector
## Keep "none" prob equal to the original base-none, and renormalize across S,G,P.
## A1) Build conditional offer matrices with a given base + stay_bonus
## returns a list: probs[[r]] is a named vector over c(S,G,P,None) for origin r ∈ {H,S,G,P}
build_lambda_conditional <- function(lambda, bonus) {
out <- list()
base_sum <- sum(lambda)
#print(base_sum)
#print(paste("base_sum:", base_sum));
# From HOME: use base probabilities (no tilt)
out[[HOME]] <- c(lambda, None = 1 - base_sum)
# From each sector: add stay bonus to that sector, renormalize offers to sum to base_sum
for (r in sectors) {
prob <- lambda
prob[r] <- prob[r] + bonus
prob <- prob * (base_sum / sum(prob)) # renormalize to sum to base_sum
out[[r]] <- c(prob, None = 1 - sum(prob))
}
out
#print(paste("out:", out[["PUB"]] ));
}
cov_matrix <- function(sig, rho) {
k <- length(sig)
if (k < 2) stop("Length of 'sig' must be at least 2.")
if (any(sig <= 0)) stop("'sig' must contain strictly positive values.")
# correlation matrix from scalar rho
cor_matrix_scalar <- function(rho, k) {
if (rho <= -1 || rho >= 1)
stop("Scalar rho must be strictly between -1 and 1.")
M <- matrix(rho, k, k)
diag(M) <- 1
M
}
# correlation matrix from vector rho
cor_matrix_vector <- function(rhos, k) {
needed <- k*(k-1)/2
if (length(rhos) != needed)
stop(sprintf("For k=%d, rho vector must be length %d.", k, needed))
if (any(rhos <= -1 | rhos >= 1))
stop("All correlations must be strictly between -1 and 1.")
M <- matrix(1, k, k)
M[upper.tri(M)] <- rhos
M[lower.tri(M)] <- t(M)[lower.tri(M)]
M
}
# choose scalar or vector case
R <- if (length(rho) == 1) {
cor_matrix_scalar(rho, k)
} else {
cor_matrix_vector(rho, k)
}
# Σ = D R D
D <- diag(sig)
D %*% R %*% D
}
# Check for positive semidefiniteness
is_valid_corr <- function(rho) {
# rho = c(r12, r13, r23)
if (length(rho) != 3) stop("rho must be a vector of length 3")
r12 <- rho[1]
r13 <- rho[2]
r23 <- rho[3]
R <- matrix(c(
1, r12, r13,
r12, 1, r23,
r13, r23, 1
), nrow = 3, byrow = TRUE)
lambda_min <- min(eigen(R)$values)
valid <- lambda_min > -1e-8
if (!valid) {
warning(sprintf(
"Invalid correlation triple: smallest eigenvalue = %.4g",
lambda_min
))
}
# Return nicely formatted result
return(
data.frame(
r12 = r12,
r13 = r13,
r23 = r23,
valid = valid
)
)
}
# Examples
rho <- c(0.3, 0.4, 0.2)
#is_valid_corr(rho) # TRUE
rho <- c(0.9, 0.9, -0.9)
#is_valid_corr(rho) # FALSE
rho <- c(1, -1, 1)
#is_valid_corr(rho) # FALSE
rho <- c(0.7, -0.5, -0.2)
#is_valid_corr(rho) # TRUE
rmvnorm <- function(n, mu, Sigma) {
Z <- matrix(rnorm(n * length(mu)), nrow = n)
L <- chol(Sigma)
sweep(Z %*% L, 2, mu, `+`)
}
## Fertility: exogenous Markov
F_states <- 0:2
Q_F <- matrix(0, 3, 3, dimnames=list(F_states, F_states))
Q_F["0","0"] <- 0.95; Q_F["0","1"] <- 0.05; Q_F["0","2"] <- 0.00
Q_F["1","1"] <- 0.92; Q_F["1","2"] <- 0.08
Q_F["2","2"] <- 1.00
#Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
#Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05 # marriage probability
#Q_M["1","1"] <- 1.00 # married stays married
## Markov transition Q_M(m'|m)
## Example: small probability of marriage when single; absorbing once married.
M_states <- 0:1 # 0 = single, 1 = married
Q_M <- matrix(0, 2, 2, dimnames=list(M_states, M_states))
Q_M["0","0"] <- 0.95; Q_M["0","1"] <- 0.05 # Pr(marry)
Q_M["1","1"] <- 1.00 # married absorbing
## (can freely change this to allow divorce.)
## Marital-status taste shifters (parallel to gamma for fertility)
## delta_s_m where s ∈ {HME, SUB, PRI, PUB} and m ∈ {0,1}
## These are non-pecuniary utility additions.
delta_men <- c(HME=0, SUB=0, PRI=0, PUB=0) # men
delta_fem <- c(HME=0, SUB=0, PRI=0, PUB=0) # women
## Draw next M
#draw_next_M <- function(M_cur){
# sample(M_states, size=1, prob = Q_M[as.character(M_cur),])
#}
p_educ <- 0.5 # Pr(educ=1)
beta_u <- 1.0 # Utility scale and home intercept
beta <- 0.95 # discount factor
beta1 <- 0.15 # Education return (log-wage)
## Skill prices (log-wage intercepts) by gender & sector (as before)
beta0 <- tibble::tribble(
~g, ~s, ~par,
"men","SUB", 0.15,#-0.20,
"men","PRI", 0.15,#0.30,
"men","PUB", 0.15,
"fem","SUB", 0.15,#-0.20,
"fem","PRI", 0.15,#-0.20,
"fem","PUB", 0.15
)
#sig <- c(0.30, 0.30, 0.10)
#rho <- c(0.20, 0.10, 0.40) # (12, 13, 23)
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.0, 0.0, 0.0) # (12, 13, 23)
#is_valid_corr(rho)
sig0 <- c(0.30, 0.30, 0.30)
rho0 <- c(0.0, 0.0, 0.0) # (12, 13, 23)
## Fertility shifters (non-pecuniary) by sector incl. home (as before)
gamma_m <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
gamma_f <- c(HME = 0.00, SUB = 0.00, PRI = 0.00, PUB = 0.00)
## ---- Base (memoryless) offer probs from previous code (for comparability)
lambda_m <- c(SUB = 0.30, PRI = 0.30, PUB= 0.30) # men
lambda_f <- c(SUB = 0.30, PRI = 0.30, PUB = 0.30) # women
stay_bonus <- 0.90
test <- build_lambda_conditional(lambda_m,stay_bonus)
## Scenario parameter builders
make_scenario <- function(scn){
out <- list(label = scn)
out$beta0_df <- beta0
out$Sigma_m <- cov_matrix(sig0, rho0)
out$Sigma_f <- cov_matrix(sig0, rho0)
out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus) # same as men
out$gamma_m <- gamma_m
out$gamma_f <- gamma_f
out$redraw_eps <- TRUE
if (scn == "BASELINE"){
out$beta0_df <- beta0
out$Sigma_m <- cov_matrix(sig, rho)
out$Sigma_f <- cov_matrix(sig, rho)
out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus) # same as men
out$gamma_m <- gamma_m
out$gamma_f <- gamma_f
out$redraw_eps <- TRUE
}
else if (scn == "S1_prices"){
## Men vs women differ ONLY in beta0 (prices); same Sigma, same offers, same gamma
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ -0.20,
g=="men" & s=="PRI" ~ 0.30,
g=="men" & s=="PUB" ~ 0.15,
g=="fem" & s=="SUB" ~ -0.20,
g=="fem" & s=="PRI" ~ 0.15,
g=="fem" & s=="PUB" ~ 0.30
))
out$beta0_df <- beta0
sig <- c(0.30, 0.30, 0.30)
rho <- c(-0.20, -0.20, -0.20) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_m <- cov_matrix(sig, rho)
sig <- c(0.30, 0.30, 0.30)
rho <- c(-0.10, -0.10, -0.10) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus) # same as men
out$gamma_m <- gamma_m
out$gamma_f <- gamma_f
out$redraw_eps <- TRUE
}
else if (scn == "S2_endoA"){
## Men vs women differ ONLY in Sigma (eps distribution); same prices, offers, gamma
## Make women's (G,P) more negatively correlated to illustrate comparative advantage
out$beta0_df <- beta0
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.0, 0.0, 0.0) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_m <- cov_matrix(sig, rho)
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.10, 0.10, 0.10) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus) # same as men
out$gamma_m <- gamma_m
out$gamma_f <- gamma_f
out$redraw_eps <- TRUE # need to redraw eps given new Sigma_f
}
else if (scn == "S2_endoB"){
## Men vs women differ ONLY in Sigma (eps distribution); same prices, offers, gamma
## Make women's (G,P) more negatively correlated to illustrate comparative advantage
out$beta0_df <- beta0
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.10, 0.10, 0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_m <- cov_matrix(sig, rho)
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.10, 0.10, -0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus) # same as men
out$gamma_m <- gamma_m
out$gamma_f <- gamma_f
out$redraw_eps <- TRUE # need to redraw eps given new Sigma_f
}
else if (scn == "S3_offers"){
## Men vs women differ ONLY in offer arrival (state-dependent); same prices, same Sigma, same gamma
out$beta0_df <- beta0
out$Sigma_m <- cov_matrix(sig, rho)
out$Sigma_f <- cov_matrix(sig, rho)
out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
lam <- c(SUB=0.10, PRI= 0.10, PUB=0.70)
out$lambda_f_cond <- build_lambda_conditional(lam, stay_bonus)
out$gamma_m <- gamma_m
out$gamma_f <- gamma_f
out$redraw_eps <- TRUE
}
else if (scn == "S4_fertility"){
## Men vs women differ ONLY in gamma_s (fertility tastes); same prices, Sigma, offers
out$beta0_df <- beta0
out$Sigma_m <- cov_matrix(sig, rho)
out$Sigma_f <- cov_matrix(sig, rho)
out$lambda_m_cond <- build_lambda_conditional(lambda_m, stay_bonus)
out$lambda_f_cond <- build_lambda_conditional(lambda_f, stay_bonus) # same as men
out$gamma_m <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.00)
out$gamma_f <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.60) # make public more family-friendly for women
out$redraw_eps <- TRUE
}
else if (scn == "B0"){
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.10,
g=="men" & s=="PRI" ~ 0.10,
g=="men" & s=="PUB" ~ 0.10,
g=="fem" & s=="SUB" ~ 0.10,
g=="fem" & s=="PRI" ~ 0.10,
g=="fem" & s=="PUB" ~ 0.10 ))
sig <- c(0.30, 0.30, 0.30)
rho <- c(-0.30, -0.30, -0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "B1"){
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.10,
g=="men" & s=="PRI" ~ 0.10,
g=="men" & s=="PUB" ~ 0.10,
g=="fem" & s=="SUB" ~ 0.10,
g=="fem" & s=="PRI" ~ 0.10,
g=="fem" & s=="PUB" ~ 0.10 ))
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.0, 0.0, 0.0) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "B2"){
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.10,
g=="men" & s=="PRI" ~ 0.10,
g=="men" & s=="PUB" ~ 0.10,
g=="fem" & s=="SUB" ~ 0.10,
g=="fem" & s=="PRI" ~ 0.10,
g=="fem" & s=="PUB" ~ 0.10 ))
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.30, 0.30, 0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "C0"){
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.10,
g=="men" & s=="PRI" ~ 0.10,
g=="men" & s=="PUB" ~ 0.10,
g=="fem" & s=="SUB" ~ 0.10,
g=="fem" & s=="PRI" ~ 0.10,
g=="fem" & s=="PUB" ~ 0.25 ))
sig <- c(0.30, 0.30, 0.30)
rho <- c(-0.30, -0.30, -0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "C1"){
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.10,
g=="men" & s=="PRI" ~ 0.10,
g=="men" & s=="PUB" ~ 0.10,
g=="fem" & s=="SUB" ~ 0.10,
g=="fem" & s=="PRI" ~ 0.10,
g=="fem" & s=="PUB" ~ 0.25 ))
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.0, 0.0, 0.0) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "C2"){
out$beta0_df <- beta0 |> dplyr::mutate(par = dplyr::case_when(
g=="men" & s=="SUB" ~ 0.10,
g=="men" & s=="PRI" ~ 0.10,
g=="men" & s=="PUB" ~ 0.10,
g=="fem" & s=="SUB" ~ 0.10,
g=="fem" & s=="PRI" ~ 0.10,
g=="fem" & s=="PUB" ~ 0.25 ))
sig <- c(0.30, 0.30, 0.30)
rho <- c(0.30, 0.30, 0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "FR0"){
out$gamma_m <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.00)
out$gamma_f <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.60) # make public more family-friendly for women
}
else if (scn == "FR1"){
out$gamma_m <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=0.00)
out$gamma_f <- c( HME=0.00,SUB=0.00, PRI= 0.00, PUB=1.00) # make public more family-friendly for women
}
else if (scn == "S0"){
sig <- c(0.10, 0.10, 0.30)
rho <- c(0.0, 0.0, -0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "S1"){
sig <- c(0.10, 0.10, 0.30)
rho <- c(0.0, 0.0, 0.0) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "S2"){
sig <- c(0.10, 0.10, 0.30)
rho <- c(0.0, 0.0, 0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "S3"){
sig <- c(0.10, 0.10, 0.30)
rho <- c(0.0, 0.0, -0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "S4"){
sig <- c(0.10, 0.10, 0.30)
rho <- c(0.0, 0.0, 0.0) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else if (scn == "S5"){
sig <- c(0.40, 0.40, 0.40)
rho <- c(0.30, 0.30, 0.30) # (12, 13, 23)
is_valid_corr(rho)
out$Sigma_f <- cov_matrix(sig, rho)
}
else {
stop("Unknown scenario label")
}
out
}
## ---------------------------------------------------------
## Block A: Scenario helpers (parameter overrides + wrappers)
## ---------------------------------------------------------
## Keep the same N, T_per, genders, sectors, HOME, beta1, beta_u, beta, Q_F, etc.
## We'll reuse your global g_vec, educ_vec for comparability across scenarios.
## A2) Draw epsilons by gender given Sigma (keeps g_vec fixed)
draw_eps_by_gender <- function(g_vec, Sigma_m, Sigma_f){
eps_mat <- matrix(NA_real_, nrow=length(g_vec), ncol=length(sectors),
dimnames=list(NULL, sectors))
idx_m <- which(g_vec=="men"); idx_f <- which(g_vec=="fem")
if (length(idx_m)>0) eps_mat[idx_m,] <- rmvnorm(length(idx_m), c(0,0,0), Sigma_m)
if (length(idx_f)>0) eps_mat[idx_f,] <- rmvnorm(length(idx_f), c(0,0,0), Sigma_f)
eps_mat
}
## A3) Precompute per-person log wages for all sectors
## A3) Precompute per-person log wages for all sectors (robust, base-R)
## Precompute per-person log wages by sector (fixed forever)
precompute_logW_all <- function(g_vec, educ_vec, eps_mat, beta0_df){
stopifnot(is.matrix(eps_mat),
nrow(eps_mat) == length(g_vec),
all(colnames(eps_mat) %in% sectors))
Nloc <- length(g_vec)
logW_all <- matrix(NA_real_, nrow = Nloc, ncol = length(sectors),
dimnames = list(NULL, sectors))
# helper: fetch scalar beta0 for (g,s)
get_beta0 <- function(gg, ss){
idx <- (beta0_df$g == gg) & (beta0_df$s == ss)
b <- beta0_df$par[idx]
if (length(b) != 1L) stop(sprintf("beta0_df mismatch for g=%s, s=%s (found %d rows)", gg, ss, length(b)))
b
}
for (ss in sectors){
col <- match(ss, sectors)
beta0_vec <- vapply(seq_len(Nloc), function(i) get_beta0(g_vec[i], ss), numeric(1))
logW_all[, col] <- beta0_vec + beta1 * educ_vec + eps_mat[, col]
}
logW_all
}
## ----------------------------
## A4) Solve V for one person *given scenario-specific objects*
## 4) Value Function (Backward Induction, person-specific)
## ----------------------------
## State: (t, r ∈ {H,S,G,P}, F ∈ {0,1,2})
## V_{T+1}(.) = 0. For t=T,...,1 compute V_t(r,F) = E_{O|r} [ max{ V_H, V_emp(O) } ]
## where V_H = u_home(F) + β E_F' V_{t+1}(H, F'), and
## V_emp(s) = u_employed(logW_s, s, F) + β E_F' V_{t+1}(s, F').
solve_value_person_scn <- function(i, logW_all, g_vec, gamma_m,gamma_f,
lambda_m_cond, lambda_f_cond){
g <- g_vec[i]
lambda_cond <- if (g=="men") lambda_m_cond else lambda_f_cond
gamma <- if (g=="men") gamma_m else gamma_f
delta <- if (g=="men") delta_men else delta_fem
logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
t_names <- as.character(1:(T_per+1)) # time index: 1..(T+1); V[t = T+1, ., .] = 0 (terminal)
#V <- array(0, dim=c(T_per+1, length(R_states), length(F_states)),
# dimnames=list(t=t_names, r=R_states, F=as.character(F_states)))
V <- array(
0,
dim=c(T_per+1, length(R_states), length(F_states), length(M_states)),
dimnames=list(
t=t_names,
r=R_states,
F=as.character(F_states),
M=as.character(M_states)
)
)
# backward induction: tt = T, ..., 1
for (tt in T_per:1){
for (r in R_states){
p_offers <- lambda_cond[[r]]
for (Fcur in F_states){
for (Mcur in M_states){
## continuation if choose Home
EV_next_HME <- sum(
Q_F[as.character(Fcur), ] *
V[as.character(tt+1), HOME, , as.character(Mcur)]
)
V_emp <- c(SUB=-Inf, PRI=-Inf, PUB=-Inf)
for (s in sectors){
EV_next_s <- sum(
Q_F[as.character(Fcur), ] *
V[as.character(tt+1), s, , as.character(Mcur)]
)
V_emp[s] <- beta_u * logW_i[s] +
ifelse(Fcur >= 1, gamma[s], 0) +
ifelse(Mcur == 1, delta[s], 0) +
beta * EV_next_s
}
V_HME <- 0 +
ifelse(Fcur >= 1, gamma["HME"], 0) +
ifelse(Mcur == 1, delta["HME"], 0) +
beta * EV_next_HME
EV <- p_offers["None"] * V_HME +
sum(p_offers[sectors] *
pmax(V_HME, V_emp[sectors]))
V[as.character(tt), r, as.character(Fcur), as.character(Mcur)] <- EV
} # end Mcur
} # end Fcur
} # end r
} # end tt
}
## ----------------------------
## 5) Simulation of optimal paths
## ----------------------------
## A5) Simulate optimal paths *given* V_all and scenario objects
## Policy: Given (t, r, F) and realized offer O:
## choose O if V_emp(O) >= V_H; else choose Home.
## Offers drawn from λ_{·|r,g}; F evolves via Q_F.
simulate_paths_scn <- function(
V_all, logW_all, g_vec, F0, gamma_m, gamma_f,
lambda_m_cond, lambda_f_cond
){
# helper: draw one offer given gender and current attachment r
draw_offer_cond <- function(g, r){
probs <- if (g == "men") lambda_m_cond[[r]] else lambda_f_cond[[r]]
draw <- sample(c(sectors, "None"), size = 1, prob = probs)
if (draw == "None") NA_character_ else draw
}
out_list <- vector("list", length = N)
for (i in seq_len(N)) {
g_i <- g_vec[i]
educ_i <- educ_vec[i]
V_i <- V_all[[i]]
logW_i <- setNames(as.numeric(logW_all[i, ]), sectors)
# start-of-panel state for person i
r_cur <- HOME
F_cur <- F0[i]
M_cur <- M0[i]
# storage for this person's path
rows <- vector("list", length = T_per)
for (tt in 1:T_per) {
# gender-specific gamma for THIS person
gamma <- if (g_i == "men") gamma_m else gamma_f
# draw an offer conditional on current attachment r_cur
offer <- draw_offer_cond(g_i, r_cur)
# continuation value if choose Home now
EV_next_HME <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), HOME, ])
V_HME <- 0 + ifelse(F_cur >= 1, gamma[["HME"]], 0) + beta * EV_next_HME
# default choice is Home
choice <- HOME
employed <- 0L
sector <- NA_character_
logW_chosen <- NA_real_
# if there is an offer, compare employed value vs home
if (!is.na(offer)) {
s <- offer
EV_next_s <- sum(Q_F[as.character(F_cur), ] * V_i[as.character(tt + 1), s, ])
V_emp_off <- beta_u * logW_i[s] +
ifelse(F_cur >= 1, gamma[[s]], 0) +
beta * EV_next_s
if (V_emp_off >= V_HME) {
choice <- s
employed <- 1L
sector <- s
logW_chosen <- logW_i[s]
}
}
# record this period
rows[[tt]] <- data.frame(
i = i, t = tt, g = g_i, educ = educ_i, F = F_cur, M=M_cur,
r = r_cur, offer = offer, choice = choice,
employed = employed, sector = sector, logW = logW_chosen,
stringsAsFactors = FALSE
)
# move to next period
r_cur <- choice
if (tt < T_per) {
F_cur <- draw_next_F(F_cur)
M_cur <- draw_next_M(M_cur)
}
}
out_list[[i]] <- do.call(rbind, rows)
}
dt <- do.call(rbind, out_list)
rownames(dt) <- NULL
dt
}
## We'll keep the same g_vec, educ_vec, F0 across scenarios (comparability)
g_vec <- sample(genders, N, replace=TRUE)
educ_vec <- rbinom(N, 1, p_educ)
F0 <- sample(F_states, size=N, replace=TRUE, prob=c(0.8, 0.18, 0.02))
draw_next_F <- function(F){ sample(F_states, 1, prob=Q_F[as.character(F), ]) }
M0 <- sample(M_states, size=N, replace=TRUE, prob=c(0.90, 0.10))
draw_next_M <- function(M){
sample(M_states, 1, prob = Q_M[as.character(M), ])
}
run_scenario <- function(scn_label){
scn <- make_scenario(scn_label)
## Draw epsilons (only if needed)
if (scn$redraw_eps){
eps_mat_scn <- draw_eps_by_gender(g_vec, scn$Sigma_m, scn$Sigma_f)
} else {
## reuse the eps_mat from your baseline if present; else draw with base Sigmas
if (exists("eps_mat")) {
eps_mat_scn <- eps_mat
} else {
eps_mat_scn <- draw_eps_by_gender(g_vec, Sigma_m, Sigma_f)
}
}
logW_all_scn <- precompute_logW_all(g_vec, educ_vec, eps_mat_scn, scn$beta0_df)
## Solve V for each person
## Solve V for everyone (vectorized over i)
## (This is the heavy step: 5000 people × (T=10 × r=4 × F=3) is very feasible.)
V_all_scn <- lapply(1:N, function(i)
solve_value_person_scn(i, logW_all_scn, g_vec, scn$gamma_m,scn$gamma_f,
scn$lambda_m_cond, scn$lambda_f_cond))
## Simulate paths
dt_scn <- simulate_paths_scn(V_all_scn, logW_all_scn, g_vec, F0, scn$gamma_m,scn$gamma_f,
scn$lambda_m_cond, scn$lambda_f_cond)
dt_scn$scenario <- scn_label
## === Diagnostics ===
## (1) Pooled log-wage densities by sector×gender
wages_pool <- dt_scn |>
dplyr::filter(!is.na(sector), !is.na(logW)) |>
dplyr::mutate(sector=factor(sector, levels=sectors))
## (2) Movers: Δ logW by origin→dest, gender
moves <- dt_scn |>
dplyr::group_by(i) |>
dplyr::arrange(t, .by_group = TRUE) |>
dplyr::mutate(next_logW = dplyr::lead(logW),
next_sector = dplyr::lead(sector),
employed_next = dplyr::lead(employed)) |>
dplyr::ungroup() |>
dplyr::filter(employed==1, employed_next==1,
!is.na(sector), !is.na(next_sector)) |>
dplyr::mutate(orig=sector, dest=next_sector,
dlogW = next_logW - logW)
## (3) Residuals: remove sector×gender×educ mean (proxy for unobs)
dt_resid <- dt_scn |>
dplyr::filter(!is.na(sector), !is.na(logW)) |>
dplyr::group_by(g, sector, educ) |>
dplyr::mutate(resid = logW - mean(logW)) |>
dplyr::ungroup()
## (4) Rank–rank slopes for G→P movers (by gender)
rr <- dt_resid |>
dplyr::select(i,t,g,sector,resid) |>
dplyr::group_by(i) |>
dplyr::arrange(t,.by_group=TRUE) |>
dplyr::mutate(next_resid = dplyr::lead(resid),
next_sector = dplyr::lead(sector)) |>
dplyr::ungroup() |>
dplyr::filter(sector=="PUB", next_sector=="PRI") |>
dplyr::group_by(g) |>
dplyr::summarise(rank_slope = {
x <- rank(resid, ties.method="average")/dplyr::n()
y <- rank(next_resid, ties.method="average")/dplyr::n()
if (length(x)>5) coef(lm(y ~ x))[2] else NA_real_
}, .groups="drop")
rr$scenario <- scn_label
## (5) Stayer–leaver residual gap in origin sector G (by gender)
sl <- dt_resid |>
# 1. Sort each person's panel properly
dplyr::group_by(i) |>
dplyr::arrange(t, .by_group = TRUE) |>
# 2. Next-period sector
dplyr::mutate(next_sector = dplyr::lead(sector)) |>
dplyr::ungroup() |>
# 3. Keep only observations where current sector is PUB
dplyr::filter(sector == "PUB") |>
# 4. Drop final rows with no next observation
dplyr::filter(!is.na(next_sector)) |>
# 5. Stayer vs. leaver classification
dplyr::mutate(
status = dplyr::case_when(
next_sector == "PUB" ~ "stayer",
next_sector != "PUB" ~ "leaver"
)
) |>
# 6. Compute mean residuals by gender × status
dplyr::group_by(g, status) |>
dplyr::summarise(
mean_resid = mean(resid, na.rm = TRUE),
.groups = "drop"
) |>
# 7. Reshape wide: stayer, leaver columns
tidyr::pivot_wider(
names_from = status,
values_from = mean_resid,
values_fill = NA
) |>
# 8. Compute gap safely
dplyr::mutate(
gap = stayer - leaver,
scenario = scn_label
)
## (6) Staying hazards by current sector (state dependence proxy)
haz <- dt_scn |>
dplyr::group_by(i) |>
dplyr::arrange(t,.by_group=TRUE) |>
dplyr::mutate(next_sector = dplyr::lead(sector)) |>
dplyr::ungroup() |>
dplyr::filter(!is.na(sector)) |>
dplyr::mutate(stay = as.integer(sector==next_sector)) |>
dplyr::group_by(g, sector) |>
dplyr::summarise(stay_hazard = mean(stay, na.rm=TRUE), .groups="drop") |>
dplyr::mutate(scenario=scn_label)
list(dt=dt_scn,
wages_pool=wages_pool,
moves=moves,
rr=rr,
sl=sl,
haz=haz)
}
## Actually run all four scenarios
#scen_labels <- c("BASELINE","S1_prices","S2_endoA","S2_endoB","S3_offers","S4_fertility")
#results <- lapply(scen_labels, run_scenario)
#names(results) <- scen_labels
#compstats_sigmas <- c("S0","S1","S2","S3","S4","S5")
#results <- lapply(compstats_sigmas, run_scenario)
#names(results) <- compstats_sigmas
scenario_list <- c("B0","B1","B2","C0","C1","C2","FR0","FR1")
#scenario_list <- c("S0","S1","S2")
results <- lapply(scenario_list, run_scenario)
names(results) <- scenario_list
## Bind outputs for plotting / tables
wages_all <- dplyr::bind_rows(lapply(results, `[[`, "wages_pool"), .id="scenario_id")
moves_all <- dplyr::bind_rows(lapply(results, `[[`, "moves"), .id="scenario_id")
rr_all <- dplyr::bind_rows(lapply(results, `[[`, "rr"), .id="scenario_id")
sl_all <- dplyr::bind_rows(lapply(results, `[[`, "sl"), .id="scenario_id")
haz_all <- dplyr::bind_rows(lapply(results, `[[`, "haz"), .id="scenario_id")
## Combine scenarios into one panel
to_factor <- function(x) factor(x, levels = c("SUB","PRI","PUB"))
panel_all <- dplyr::bind_rows(
lapply(results, \(res) res$dt |> dplyr::mutate(sector = to_factor(sector))),
.id = "scenario_id"
)
# pretty printing each component
# temp <- make_scenario("BASELINE")
#for (r in names(temp$lambda_f_cond)) {
# cat("\nFrom", r, ":\n")
# print(round(temp$lambda_f_cond[[r]], 3))
#}
# Build transitions: state_t (at t) -> state_t1 (at t+1)
transitions_all <- panel_all |>
dplyr::group_by(scenario_id, g, i) |>
dplyr::arrange(t, .by_group = TRUE) |>
dplyr::mutate(
state_t = factor(ifelse(is.na(sector), "HME", as.character(sector)),
levels = state_levels),
state_t1 = dplyr::lead(state_t)
) |>
dplyr::ungroup() |>
dplyr::filter(!is.na(state_t1)) |>
dplyr::count(scenario_id, g, state_t, state_t1, name = "n") |>
dplyr::group_by(scenario_id, g, state_t) |>
dplyr::mutate(share = n / sum(n)) |>
dplyr::ungroup()
# (Optional) pretty wide matrices per scenario x gender
transitions_wide <- transitions_all |>
tidyr::pivot_wider(
id_cols = c(scenario_id, g, state_t),
names_from = state_t1,
values_from = share,
values_fill = 0
) |>
dplyr::arrange(scenario_id, g, factor(state_t, levels = state_levels))
knitr::kable(
transitions_wide,
caption = "Transition Matrix by Scenario",
align = "c",
digits = 2,
format = "html"
) %>%
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
font_size = 13
) %>%
kableExtra::collapse_rows(
columns = 1:2, # scenario_id + state_t
valign = "top"
)
| scenario_id | g | state_t | HME | SUB | PRI | PUB |
|---|---|---|---|---|---|---|
| B0 | fem | HME | 0.48 | 0.18 | 0.17 | 0.18 |
| SUB | 0.21 | 0.60 | 0.09 | 0.09 | ||
| PRI | 0.22 | 0.10 | 0.59 | 0.09 | ||
| PUB | 0.20 | 0.11 | 0.09 | 0.60 | ||
| B1 | HME | 0.52 | 0.16 | 0.15 | 0.17 | |
| SUB | 0.20 | 0.61 | 0.10 | 0.09 | ||
| PRI | 0.18 | 0.10 | 0.60 | 0.11 | ||
| PUB | 0.21 | 0.10 | 0.09 | 0.60 | ||
| B2 | HME | 0.60 | 0.13 | 0.15 | 0.12 | |
| SUB | 0.17 | 0.61 | 0.12 | 0.10 | ||
| PRI | 0.18 | 0.13 | 0.58 | 0.11 | ||
| PUB | 0.17 | 0.13 | 0.14 | 0.57 | ||
| C0 | HME | 0.46 | 0.15 | 0.18 | 0.21 | |
| SUB | 0.18 | 0.61 | 0.09 | 0.12 | ||
| PRI | 0.20 | 0.10 | 0.60 | 0.11 | ||
| PUB | 0.21 | 0.09 | 0.12 | 0.59 | ||
| C1 | HME | 0.48 | 0.16 | 0.13 | 0.23 | |
| SUB | 0.15 | 0.60 | 0.12 | 0.13 | ||
| PRI | 0.18 | 0.12 | 0.58 | 0.12 | ||
| PUB | 0.18 | 0.10 | 0.11 | 0.61 | ||
| C2 | HME | 0.54 | 0.14 | 0.12 | 0.20 | |
| SUB | 0.16 | 0.62 | 0.11 | 0.12 | ||
| PRI | 0.16 | 0.11 | 0.60 | 0.13 | ||
| PUB | 0.19 | 0.12 | 0.11 | 0.59 | ||
| FR0 | HME | 0.48 | 0.16 | 0.16 | 0.20 | |
| SUB | 0.16 | 0.61 | 0.10 | 0.12 | ||
| PRI | 0.17 | 0.11 | 0.59 | 0.13 | ||
| PUB | 0.16 | 0.10 | 0.12 | 0.62 | ||
| FR1 | HME | 0.44 | 0.19 | 0.17 | 0.21 | |
| SUB | 0.18 | 0.59 | 0.10 | 0.13 | ||
| PRI | 0.16 | 0.10 | 0.61 | 0.13 | ||
| PUB | 0.16 | 0.10 | 0.11 | 0.63 |
#knitr::kable(
# transitions_all,
# caption = "Transition Matrix by Scenario (Wide)",
# align = "c",
# digits = 2,
#) %>%
# format = "html"
# kableExtra::kable_styling(
# bootstrap_options = c("striped", "hover", "condensed"),
# full_width = FALSE,
# font_size = 13
# ) %>%
# kableExtra::collapse_rows(
# columns = 1:2, # scenario_id + state_t
# valign = "top"
# )
# Heatmap per scenario & gender
p_trans <- ggplot(transitions_all,
aes(x = state_t1, y = state_t, fill = share)) +
geom_tile() +
geom_text(aes(label = scales::percent(share, accuracy = 1)),
size = 3) +
scale_x_discrete(limits = state_levels) +
scale_y_discrete(limits = rev(state_levels)) +
facet_grid(g ~ scenario_id) +
scale_fill_viridis_c(option = "E") +
labs(title = "t → t+1 Employment Transitions by Scenario and Gender",
x = "State at t+1", y = "State at t", fill = "Share") +
theme_minimal()
#print(p_trans)
p_trans_clean <- ggplot(transitions_all,
aes(x = state_t1, y = state_t, fill = share)) +
geom_tile(color = "grey80") +
scale_fill_viridis_c(option = "C", direction = -1,
limits = c(0, 1),
guide = guide_colorbar(barheight = 8)) +
geom_text(aes(label = scales::percent(share, accuracy = 1)),
color = "white", fontface = "bold", size = 4) +
scale_x_discrete(limits = state_levels) +
scale_y_discrete(limits = rev(state_levels)) +
facet_grid(g ~ scenario_id) +
labs(title = "t → t+1 Employment Transitions",
x = "State at t+1", y = "State at t") +
theme_minimal(base_size = 13) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
panel.spacing = unit(1, "lines")
)
print(p_trans_clean)
library(gt)
transitions_table <- transitions_wide |>
gt(groupname_col = "scenario_id", rowname_col = "state_t") |>
fmt_percent(columns = -c(scenario_id, g, state_t), decimals = 1) |>
tab_spanner(label = "States at t+1",
columns = state_levels) |>
tab_stubhead(label = "State at t") |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_stub()
) |>
tab_options(
table.font.size = 14,
data_row.padding = px(3),
table.width = pct(100)
) |>
opt_row_striping()
print(transitions_table)
p_bubble <- ggplot(transitions_all,
aes(x = state_t1, y = state_t)) +
geom_point(aes(size = share, fill = share), shape = 21, color = "black") +
scale_size(range = c(1, 20)) +
scale_fill_viridis_c(option = "D") +
geom_text(aes(label = scales::percent(share, 1)), vjust = -1, size = 3.5) +
facet_grid(g ~ scenario_id) +
theme_minimal(base_size = 13) +
labs(title = "Transition Matrix (Bubble Plot)",
x = "State at t+1", y = "State at t", size = "Share")
print(p_bubble)
# Build transitions with fertility-change bins
#selected_scenarios <- c("B0", "B1", "B2", "C0", "C1","C2")
#selected_scenarios <- c("BASELINE", "S3_offers", "S4_fertility", "S5_offers_fertility")
# assumes: selected_scenarios, state_levels, panel_all, and helpers add_states_and_kids/summarise_transitions exist
# we'll define a slim variant that builds the 2-bin flag and then reuses the same summarise logic
add_states_and_kids2 <- function(df) {
df %>%
arrange(t, .by_group = TRUE) %>%
mutate(
state_t = factor(ifelse(is.na(sector), "HME", as.character(sector)), levels = state_levels),
state_t1 = dplyr::lead(state_t),
F_bin_t = ifelse(F >= 1, "kids", "no"),
F_bin_t1 = dplyr::lead(F_bin_t),
kid_change2 = dplyr::case_when(
F_bin_t == "no" & F_bin_t1 == "no" ~ "no-change",
F_bin_t == "kids" & F_bin_t1 == "kids" ~ "no-change",
F_bin_t == "no" & F_bin_t1 == "kids" ~ "change",
TRUE ~ NA_character_
),
# explicitly set the factor order
kid_change2 = factor(kid_change2, levels = c("no-change", "change"))
)
}
summarise_transitions2 <- function(df) {
df %>%
dplyr::ungroup() %>%
dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
dplyr::count(scenario_id, g, kid_change2, state_t, state_t1, name = "n") %>%
tidyr::complete(scenario_id, g, kid_change2, state_t, state_t1, fill = list(n = 0)) %>%
dplyr::mutate(
state_t = factor(state_t, levels = state_levels),
state_t1 = factor(state_t1, levels = state_levels)
) %>%
dplyr::group_by(scenario_id, g, kid_change2, state_t) %>%
dplyr::mutate(share = n / sum(n)) %>%
dplyr::ungroup()
}
# ---- build transitions (women in the two scenarios) with 2-bin fertility flag
transitions_kid2 <- panel_all %>%
dplyr::filter(scenario_id %in% scenario_list, g == "fem") %>%
dplyr::group_by(scenario_id, g, i) %>%
add_states_and_kids2() %>%
summarise_transitions2()
# ---- heatmap
#p_trans_kid2 <- ggplot(transitions_kid2,
# aes(x = state_t1, y = state_t, fill = share)) +
# geom_tile() +
# geom_text(aes(label = scales::percent(share, accuracy = 1)), size = 3) +
# scale_x_discrete(limits = state_levels) +
# scale_y_discrete(limits = rev(state_levels)) +
# scale_fill_viridis_c(option = "E") +
# facet_grid(scenario_id ~ g + kid_change2) + # scenarios as rows; columns = gender × change bin
# labs(title = "t → t+1 Employment Transitions by Scenario, Gender (only Fem), and Kids-Change (2 bins)",
# x = "State at t+1", y = "State at t", fill = "Share") +
# theme_minimal()
#print(p_trans_kid2)
library(kableExtra)
to_wide2 <- function(trans_long) {
trans_long %>%
tidyr::pivot_wider(
id_cols = c(scenario_id, g, kid_change2, state_t),
names_from = state_t1,
values_from = share,
values_fill = 0
) %>%
dplyr::arrange(scenario_id, g, kid_change2, state_t)
}
# ---- wide matrix (optional)
transitions_kid2_wide <- to_wide2(transitions_kid2)
#print(transitions_kid2_wide, n = 20)
# Remove gender and reorder rows as requested
transitions_kid2_wide_fem <- transitions_kid2_wide %>%
dplyr::filter(g == "fem") %>% # keep only females
dplyr::select(-g) %>% # drop gender column
dplyr::arrange(
factor(scenario_id, levels = scenario_list),
factor(state_t, levels = c("HME","SUB","PRI","PUB")),
factor(kid_change2, levels = c("no-change","change"))
)
knitr::kable(
transitions_kid2_wide_fem,
caption = "Transition Matrix by Scenario × Kids-Change (Fem)",
align = "c",
digits = 2,
format = "html"
) %>%
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
font_size = 13
) %>%
kableExtra::collapse_rows(
columns = 1:2, # scenario_id + state_t
valign = "top"
)
| scenario_id | kid_change2 | state_t | HME | SUB | PRI | PUB |
|---|---|---|---|---|---|---|
| B0 | no-change | HME | 0.47 | 0.18 | 0.17 | 0.18 |
| change | HME | 0.53 | 0.12 | 0.16 | 0.19 | |
| no-change | SUB | 0.21 | 0.60 | 0.09 | 0.10 | |
| change | SUB | 0.18 | 0.70 | 0.06 | 0.06 | |
| no-change | PRI | 0.22 | 0.10 | 0.59 | 0.09 | |
| change | PRI | 0.22 | 0.06 | 0.66 | 0.06 | |
| no-change | PUB | 0.21 | 0.11 | 0.09 | 0.60 | |
| change | PUB | 0.14 | 0.14 | 0.14 | 0.59 | |
| B1 | no-change | HME | 0.52 | 0.16 | 0.16 | 0.17 |
| change | HME | 0.46 | 0.18 | 0.12 | 0.24 | |
| no-change | SUB | 0.20 | 0.60 | 0.10 | 0.09 | |
| change | SUB | 0.09 | 0.68 | 0.09 | 0.14 | |
| no-change | PRI | 0.19 | 0.10 | 0.60 | 0.11 | |
| change | PRI | 0.16 | 0.10 | 0.63 | 0.12 | |
| no-change | PUB | 0.20 | 0.10 | 0.09 | 0.60 | |
| change | PUB | 0.27 | 0.09 | 0.09 | 0.55 | |
| B2 | no-change | HME | 0.61 | 0.12 | 0.15 | 0.12 |
| change | HME | 0.54 | 0.19 | 0.14 | 0.13 | |
| no-change | SUB | 0.17 | 0.61 | 0.12 | 0.10 | |
| change | SUB | 0.23 | 0.60 | 0.10 | 0.07 | |
| no-change | PRI | 0.18 | 0.12 | 0.59 | 0.11 | |
| change | PRI | 0.10 | 0.21 | 0.52 | 0.17 | |
| no-change | PUB | 0.17 | 0.13 | 0.14 | 0.57 | |
| change | PUB | 0.14 | 0.11 | 0.18 | 0.57 | |
| C0 | no-change | HME | 0.46 | 0.15 | 0.18 | 0.21 |
| change | HME | 0.44 | 0.17 | 0.22 | 0.17 | |
| no-change | SUB | 0.18 | 0.62 | 0.09 | 0.11 | |
| change | SUB | 0.21 | 0.54 | 0.11 | 0.14 | |
| no-change | PRI | 0.20 | 0.09 | 0.60 | 0.11 | |
| change | PRI | 0.21 | 0.14 | 0.54 | 0.11 | |
| no-change | PUB | 0.21 | 0.09 | 0.11 | 0.59 | |
| change | PUB | 0.23 | 0.09 | 0.17 | 0.51 | |
| C1 | no-change | HME | 0.48 | 0.16 | 0.13 | 0.22 |
| change | HME | 0.52 | 0.15 | 0.03 | 0.30 | |
| no-change | SUB | 0.16 | 0.60 | 0.11 | 0.13 | |
| change | SUB | 0.08 | 0.69 | 0.15 | 0.08 | |
| no-change | PRI | 0.18 | 0.12 | 0.58 | 0.12 | |
| change | PRI | 0.21 | 0.08 | 0.59 | 0.13 | |
| no-change | PUB | 0.18 | 0.10 | 0.11 | 0.61 | |
| change | PUB | 0.23 | 0.11 | 0.03 | 0.63 | |
| C2 | no-change | HME | 0.53 | 0.14 | 0.13 | 0.20 |
| change | HME | 0.57 | 0.10 | 0.10 | 0.24 | |
| no-change | SUB | 0.16 | 0.62 | 0.11 | 0.12 | |
| change | SUB | 0.03 | 0.58 | 0.19 | 0.19 | |
| no-change | PRI | 0.16 | 0.11 | 0.60 | 0.13 | |
| change | PRI | 0.12 | 0.12 | 0.68 | 0.08 | |
| no-change | PUB | 0.19 | 0.12 | 0.11 | 0.59 | |
| change | PUB | 0.15 | 0.15 | 0.15 | 0.55 | |
| FR0 | no-change | HME | 0.48 | 0.16 | 0.16 | 0.20 |
| change | HME | 0.33 | 0.15 | 0.26 | 0.26 | |
| no-change | SUB | 0.16 | 0.61 | 0.10 | 0.12 | |
| change | SUB | 0.14 | 0.54 | 0.14 | 0.17 | |
| no-change | PRI | 0.17 | 0.11 | 0.59 | 0.13 | |
| change | PRI | 0.06 | 0.11 | 0.64 | 0.19 | |
| no-change | PUB | 0.16 | 0.10 | 0.12 | 0.62 | |
| change | PUB | 0.20 | 0.17 | 0.06 | 0.57 | |
| FR1 | no-change | HME | 0.44 | 0.19 | 0.17 | 0.20 |
| change | HME | 0.39 | 0.12 | 0.20 | 0.29 | |
| no-change | SUB | 0.18 | 0.59 | 0.10 | 0.13 | |
| change | SUB | 0.10 | 0.58 | 0.19 | 0.13 | |
| no-change | PRI | 0.16 | 0.11 | 0.61 | 0.12 | |
| change | PRI | 0.15 | 0.03 | 0.68 | 0.15 | |
| no-change | PUB | 0.16 | 0.09 | 0.11 | 0.63 | |
| change | PUB | 0.18 | 0.13 | 0.16 | 0.53 |
# --- Staying hazards by sector × (kid-change 2 bins) × scenario × gender
haz_kid2 <- panel_all %>%
dplyr::filter(scenario_id %in% scenario_list) %>%
dplyr::group_by(scenario_id, g, i) %>%
add_states_and_kids2() %>%
dplyr::ungroup() %>%
dplyr::filter(!is.na(state_t1), !is.na(kid_change2)) %>%
# focus on employed origins only (SUB/PRI/PUB). Drop H unless you want "stay at home" hazards
dplyr::filter(state_t %in% sectors) %>%
dplyr::group_by(scenario_id, g, kid_change2, sector = state_t) %>%
dplyr::summarise(
stay_hazard = mean(state_t1 == sector),
n = dplyr::n(),
.groups = "drop"
) %>%
dplyr::mutate(sector = factor(sector, levels = sectors))
# --- Filter for women, public sector only ---
haz_pub <- haz_kid2 %>%
filter(g == "fem", sector == "PUB") %>%
mutate(
scenario_id = factor(scenario_id, levels = scenario_list),
kid_change2 = factor(kid_change2, levels = c("no-change", "change"))
)
# --- Nice spacing: create a composite x variable with spacing between scenarios ---
# One way: treat scenario as x-axis, kid_change2 as "fill"/dodged subgroup
# That automatically generates: S0(no-change/change), S1(no-change/change), ...
p_pub <- ggplot(haz_pub, aes(x = scenario_id, y = stay_hazard,
fill = kid_change2)) +
geom_col(position = position_dodge(width = 0.7), width = 0.6) +
scale_fill_manual(
values = c("no-change" = "gray", "change" = "pink"),
labels = c("No change", "No Kids → Kids")
) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
labs(
title = "Staying Hazard in the Public Sector (PUB)\nWomen Only, by Scenario and Kids-Change",
subtitle = "Two bars per scenario: no-change vs kids-change",
x = "Scenario",
y = "Pr(stay in Public sector at t+1)",
fill = "Fertility transition"
) +
theme_minimal(base_size = 14) +
theme(
panel.grid.minor = element_blank(),
axis.text.x = element_text(size = 12),
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 13),
legend.position = "right"
)
print(p_pub)
## Compute means for each sector × scenario × gender
mean_df <- wages_all |>
dplyr::group_by(scenario_id, sector, g) |>
dplyr::summarise(mean_logW = mean(logW, na.rm = TRUE), .groups = "drop")
p1 <- ggplot(wages_all, aes(x = logW, color = g, fill = g)) +
## Density with semi-transparent fill
geom_density(alpha = 0.15, linewidth = 0.9) +
## Vertical dashed lines at the mean
geom_vline(
data = mean_df,
aes(xintercept = mean_logW, color = g),
linetype = "dashed",
linewidth = 0.7,
alpha = 0.8
) +
## Muted academic colors
scale_color_manual(
values = c(
"men" = "#4E79A7",
"fem" = "darkgrey"
)
) +
scale_fill_manual(
values = c(
"men" = "#4E79A7",
"fem" = "darkgrey"
)
) +
facet_grid(
sector ~ scenario_id,
scales = "free_y",
switch = "y"
) +
labs(
title = "Log-Wage Density by Sector and Scenario",
x = "Log wage",
y = "Density",
color = "Gender",
fill = "Gender"
) +
theme_minimal(base_size = 14) +
theme(
strip.background = element_rect(fill = "grey90", color = NA),
strip.text = element_text(face = "bold", size = 12),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "grey85", linewidth = 0.3),
axis.text = element_text(size = 11),
axis.title = element_text(size = 13),
legend.position = "bottom",
legend.title = element_text(size = 12),
legend.text = element_text(size = 11),
plot.title = element_text(size = 16, face = "bold", hjust = 0)
)
print(p1)
## Wage distributions by sector × gender × fertility × scenario (pooled)
#wages_pool <- panel_all |>
# dplyr::filter(!is.na(sector), !is.na(logW)) |>
# dplyr::mutate(F_bin = ifelse(F >= 1, "kids>=1", "no kids"),
# gF = paste(g, F_bin),
# sector = to_factor(sector))
#p_dens <- ggplot(wages_pool, aes(x = logW, color = gF)) +
# geom_density() +
# facet_grid(sector ~ scenario_id, scales = "free_y") +
# labs(title = "Log-wage distributions by sector × scenario (pooled)",
# x = "log wage", y = "density", color = "Gender × Fertility") +
# theme_minimal()
#print(p_dens)
## Mean log-wages over time by sector × gender × fertility × scenario
#wage_means <- panel_all |>
# dplyr::filter(!is.na(sector), !is.na(logW)) |>
# dplyr::mutate(F_bin = ifelse(F >= 1, "kids>=1", "no kids"),
# sector = to_factor(sector)) |>
# dplyr::group_by(scenario_id, t, g, F_bin, sector) |>
# dplyr::summarise(mean_logW = mean(logW), .groups = "drop")
#p_wmean <- ggplot(wage_means, aes(x = t, y = mean_logW, color = F_bin)) +
# geom_line() +
# facet_grid(rows = vars(g), cols = vars(scenario_id, sector)) +
# labs(title = "Mean log wages over time by scenario",
# x = "Period", y = "Mean log wage", color = "Fertility") +
# theme_minimal()
#print(p_wmean)
library(gt)
## 2) Δ logW for movers G→P and P→G (means + dispersion) — separates price vs comp. advantage
moves_keep <- moves_all |> dplyr::filter(orig %in% c("PUB","PRI"), dest %in% c("PUB","PRI"), orig!=dest)
#moves_plot <- moves_keep |>
# dplyr::filter((orig == "PRI" & dest == "PUB") |
# (orig == "PUB" & dest == "PRI")) |>
# dplyr::mutate(move = paste(orig, dest, sep = "→"))
moves_plot <- moves_keep |>
dplyr::filter((orig == "PRI" & dest == "PUB") ) |>
dplyr::mutate(move = paste(orig, dest, sep = "→"))
p2 <- ggplot(moves_plot, aes(x = g, y = dlogW, fill = g)) +
geom_hline(yintercept = 0, linetype = 2, linewidth = 0.4, color = "grey40") +
geom_boxplot(outlier.alpha = 0.20, width = 0.6, color = "black") +
facet_grid(move ~ scenario_id) +
scale_fill_manual(values = c("men" = "#4E79A7", "fem" = "lightgray")) +
labs(
title = "Mover Gains (Δ log wage) by Origin→Destination, Gender, and Scenario",
x = "Gender",
y = "Δ log wage",
fill = "Gender"
) +
theme_minimal(base_size = 14) +
theme(
strip.text = element_text(face = "bold", size = 13),
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
axis.text.x = element_text(size = 12),
legend.position = "bottom",
plot.title = element_text(size = 16, face = "bold")
)
print(p2)
## 3) Rank–rank slope for G→P movers (lower slope ⇒ stronger comparative advantage)
rr_table <- rr_all |>
dplyr::arrange(g, scenario_id) |>
gt(rowname_col = "scenario_id", groupname_col = "g") |>
tab_header(
title = md("**Rank–Rank Slope for PUB → PRI Movers**"),
subtitle = md(
"A lower slope indicates stronger comparative advantage: \
it measures how strongly a worker’s within-PUB sector rank \
predicts their rank in PRIVATE sector upon moving."
)
) |>
fmt_number(columns = dplyr::everything(), decimals = 3) |> ## Format slope values nicely
tab_style(
style = cell_text(weight = "bold"),
locations = cells_stub() ## Styling
) |>
opt_row_striping() |>
tab_options(
table.width = pct(100),
data_row.padding = px(4),
table.font.size = 13
)
rr_table
| Rank–Rank Slope for PUB → PRI Movers | ||
| A lower slope indicates stronger comparative advantage: it measures how strongly a worker’s within-PUB sector rank predicts their rank in PRIVATE sector upon moving. | ||
| rank_slope | scenario | |
|---|---|---|
| fem | ||
| B0 | 0.014 | B0 |
| B1 | 0.114 | B1 |
| B2 | 0.120 | B2 |
| C0 | −0.212 | C0 |
| C1 | 0.066 | C1 |
| C2 | 0.064 | C2 |
| FR0 | 0.008 | FR0 |
| FR1 | 0.053 | FR1 |
## 4) Stayer–leaver residual gap in G (higher gap ⇒ stronger selection on ε in G)
sl_table <- sl_all |>
dplyr::arrange(g, scenario) |>
gt(rowname_col = "scenario", groupname_col = "g") |>
tab_header(
title = md("**Stayer–Leaver Residual Gap in Origin Sector PUB**"),
subtitle = md(
"Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: \
‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector."
)
) |>
fmt_number(columns = c(stayer, leaver, gap), decimals = 3) |> ## Format columns nicely
opt_row_striping() |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_stub() ## Styling
) |>
tab_options(
table.width = pct(100),
table.font.size = 13,
data_row.padding = px(4)
)
sl_table
| Stayer–Leaver Residual Gap in Origin Sector PUB | ||||
| Higher gap ⇒ stronger selection on unobserved ability ε in sector PUB: ‘Stayer’ = worker remains in PUB next period, ‘Leaver’ = exits to another sector. | ||||
| scenario_id | leaver | stayer | gap | |
|---|---|---|---|---|
| fem | ||||
| B0 | B0 | −0.040 | 0.006 | 0.046 |
| B1 | B1 | −0.007 | 0.005 | 0.013 |
| B2 | B2 | 0.011 | 0.002 | −0.008 |
| C0 | C0 | −0.013 | −0.002 | 0.011 |
| C1 | C1 | 0.008 | −0.008 | −0.016 |
| C2 | C2 | 0.024 | −0.010 | −0.033 |
| FR0 | FR0 | 0.005 | 0.003 | −0.003 |
| FR1 | FR1 | 0.008 | −0.001 | −0.008 |
## Wage transitions for consecutive employment (by scenario)
transitions <- panel_all |>
filter(g == "fem") %>%
dplyr::group_by(scenario_id, i) |>
dplyr::arrange(t, .by_group = TRUE) |>
dplyr::mutate(next_logW = dplyr::lead(logW),
next_sector = dplyr::lead(sector),
next_F = dplyr::lead(F),
employed_next = dplyr::lead(employed)) |>
dplyr::ungroup() |>
dplyr::filter(employed == 1, employed_next == 1) |>
dplyr::mutate(dlogW = next_logW - logW,
F_trans = dplyr::case_when(
F == 0 & next_F == 0 ~ "no kids -> no kids",
F == 0 & next_F >= 1 ~ "no kids -> kids",
F >= 1 & next_F >= 1 ~ "kids -> kids",
F >= 1 & next_F == 0 ~ "kids -> no kids"
))
dlogW_stats <- transitions |>
dplyr::mutate(orig = sector, dest = next_sector) |>
dplyr::group_by(scenario_id, g, orig, dest, F_trans) |>
dplyr::summarise(mean_dlogW = mean(dlogW, na.rm = TRUE),
n = n(), .groups = "drop") |>
dplyr::mutate(orig = to_factor(orig), dest = to_factor(dest)) |>
dplyr::arrange(scenario_id, g, orig, dest, F_trans)
print(head(dlogW_stats, 20))
## # A tibble: 20 × 7
## scenario_id g orig dest F_trans mean_dlogW n
## <chr> <chr> <fct> <fct> <chr> <dbl> <int>
## 1 B0 fem SUB SUB kids -> kids 0 146
## 2 B0 fem SUB SUB no kids -> kids 0 23
## 3 B0 fem SUB SUB no kids -> no kids 0 378
## 4 B0 fem SUB PRI kids -> kids 0.0325 30
## 5 B0 fem SUB PRI no kids -> kids -0.156 2
## 6 B0 fem SUB PRI no kids -> no kids 0.0298 52
## 7 B0 fem SUB PUB kids -> kids -0.00170 17
## 8 B0 fem SUB PUB no kids -> kids 0.286 2
## 9 B0 fem SUB PUB no kids -> no kids -0.0326 66
## 10 B0 fem PRI SUB kids -> kids 0.0546 28
## 11 B0 fem PRI SUB no kids -> kids -0.0872 2
## 12 B0 fem PRI SUB no kids -> no kids 0.0111 62
## 13 B0 fem PRI PRI kids -> kids 0 143
## 14 B0 fem PRI PRI no kids -> kids 0 21
## 15 B0 fem PRI PRI no kids -> no kids 0 377
## 16 B0 fem PRI PUB kids -> kids 0.00922 23
## 17 B0 fem PRI PUB no kids -> kids -0.0231 2
## 18 B0 fem PRI PUB no kids -> no kids -0.0138 56
## 19 B0 fem PUB SUB kids -> kids 0.0895 26
## 20 B0 fem PUB SUB no kids -> kids -0.0436 5
p_dlog <- dlogW_stats |>
dplyr::filter(n >= 2) |>
ggplot(aes(x = interaction(orig, dest, sep = "→"),
y = mean_dlogW, fill = F_trans)) +
geom_col(position = "dodge") +
facet_grid(rows = vars(g), cols = vars(scenario_id)) +
labs(title = "Mean Δ log wage for consecutive employment by scenario",
x = "Origin → Destination sector", y = "Mean Δ log wage",
fill = "Fertility transition") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(p_dlog)
## --- Sector shares over time by gender × scenario ---
shares <- panel_all |>
dplyr::filter(!is.na(sector)) |>
dplyr::count(scenario_id, t, g, sector, name = "n") |>
dplyr::group_by(scenario_id, t, g) |>
dplyr::mutate(share = n / sum(n)) |>
dplyr::ungroup()
p_shares <- ggplot(shares, aes(x = t, y = share, color = g)) +
geom_line() +
facet_grid(sector ~ scenario_id, scales = "free_y") +
labs(title = "Employment shares over time by sector × scenario",
x = "Period", y = "Share", color = "Gender") +
theme_minimal()
print(p_shares)
## --- Log wages by time × sector × scenario × gender ---
logwages <- panel_all |>
dplyr::filter(!is.na(sector), !is.na(w)) |>
dplyr::mutate(logw = log(w)) |>
dplyr::group_by(scenario_id, t, g, sector) |>
dplyr::summarise(mean_logw = mean(logw), .groups = "drop") |>
dplyr::mutate(sector = factor(sector, levels = sectors))
p_logw <- ggplot(logwages, aes(x = t, y = mean_logw, color = g)) +
geom_line() +
facet_grid(sector ~ scenario_id, scales = "free_y") +
labs(
title = "Mean log wages over time by sector × scenario",
x = "Period",
y = "Mean log wage",
color = "Gender"
) +
theme_minimal()
print(p_logw)
#How does the model estimation do with tihs niformation or how does it interpret it. A big female mean loss moving from public to private (e.g. -0.46 vs men's -0.09) could come from either a larger price disadvantage in P for women, or stronger negative ($\varepsilon_{iP}- \varepsilon_{iG}$) among the women who move. What separate prices from comparative advantage: To separate them, we use the following: *
#*In order to understand the factors that give rise to these observed gender differences (ie distinguish the roles of price gaps ($\Delta \beta_{0,g}$) vs. comparative advantage/endowment st ructure ($\varepsilon$ means/variances/covariances) ), we use many data moments at once, not just the mean wages or mean log wage differences for stayers and movers. In the state-dependent offer model specified above, each block pushes different patterns in the data:*