slides

Occupational Mobility and Optimal Transport

Simulation results for now — awaiting linked census microdata

Richard Martin
Ministry of Post Secondary Education and Future Skills

Disclaimer: The views expressed are those of the author and do not necessarily reflect those of the Government of British Columbia.

Labour markets are constantly evolving

Sources of structural change include:

technology
COVID-19
government policy
resource depletion

These shocks shift occupational labour demand.

Entry and exit may account for part of the adjustment, but typically some workers must move across occupations to meet changing demand.

What governs mobility between occupations?

How does occupational distance shape mobility?

It is natural to expect mobility to decline with occupational distance.
We therefore ask which notion of occupational distance best rationalizes local mobility?

This requires:

Considering multiple notions of occupational distance
A method to compare model fit
Recognizing two mobility regimes
- Local mobility: gradual movement in occupational space
- Non-local mobility: jumps that bypass distance frictions

Distance should explain local transitions but fail to explain non-local moves.

Where we are going

Local mobility can be modeled using entropy-regularized optimal transport, which produces flows consistent with observed origin and destination margins.
This framework allows competing distance metrics to be compared in a clean “horse race.”
In simulations, when the data-generating process is known, the method recovers the correct distance structure.
This provides a framework for testing whether skill similarity or hierarchical structure better explains local mobility.

Two mobility regimes

Established workers mostly move locally in occupational space.
New credentials can act like wormholes, enabling jumps across distant occupations.

Implication:

Distance should rationalize local mobility.
If the model is correct, it should struggle with wormhole transitions.

Local Mobility

Gravity-style mobility models predict that flows increase with occupation size and decline with occupational distance.
This mirrors Newtonian gravity: \(F \propto \frac{m_1 m_2}{d^2}\).

Goal: separate the mass effect (occupation size) from the distance effect in order to identify the distance measure that best rationalizes local mobility.

What Defines Occupational Distance?

Two natural candidates:

Skill similarity: based on 161 O*NET dimensions.

Institutional hierarchy: based on 5 digit NOC classification

Digit	Level	Description
1	Broad Occupational Category	10 broad categories
2	TEER Category	6 levels
3	Sub-major Group	Occupational cluster
4	Minor Group	Narrow family
5	Unit Group	Detailed occupation

Gravity: standard model for local mobility

\[ P_{ij} = \exp(\alpha_i + \beta_j - \gamma C_{ij}) \]

\(P_{ij}\) is the share of workers moving from occupation \(i\) to \(j\)
\(\alpha_i\) origin fixed effect
\(\beta_j\) destination fixed effect
\(C_{ij}\) occupational distance
\(\gamma\) object of interest: sensitivity of flows to a proposed cost structure.

Gravity Models: An Unfair Horse Race

In horse racing, handicaps add weight to stronger horses, making the outcome less informative about the horses’ true speed.
If the friction specification \(C_{ij}\) is wrong — and it will be — fixed effects may absorb part of the misspecification.
We prefer misspecification to appear as lack of fit rather than being masked by parameter adjustment.
The goal of our horse race is to cleanly identify how well each friction rationalizes the observed flows.
Of course, winning the horse race does not imply the winner was particularly “fast”.

A Fair Horse Race

Entropy-regularized optimal transport enforces the observed marginals exactly:

\[ \sum_j P_{ij} = a_i, \qquad \sum_i P_{ij} = b_j \]

via iterative row and column rescaling.

Equilibrium employment levels are taken as given; we model the mobility flows consistent with those levels.
This corresponds to a decentralized random utility model where mobility utilities include Gumbel shocks, producing logit choice probabilities.

Entropy-regularized optimal transport

\[ \mathcal{L} = \underbrace{\sum_{ij} P_{ij} C_{ij}}_{\text{mass}\times\text{distance}} + \varepsilon\times \underbrace{\sum_{ij} P_{ij}(\log P_{ij}-1)}_{\text{negative entropy}} + \sum_i f_i \underbrace{\left(a_i - \sum_j P_{ij}\right)}_{\text{origin constraint}} + \sum_j g_j \underbrace{\left(b_j - \sum_i P_{ij}\right)}_{\text{destination constraint}} \]

First-order conditions:

\[ \frac{\partial \mathcal{L}}{\partial P_{ij}} = C_{ij} + \varepsilon \log P_{ij} - f_i - g_j = 0. \]

Rearranging:

\[ \log P_{ij} = \frac{f_i}{\varepsilon} + \frac{g_j}{\varepsilon} - \frac{C_{ij}}{\varepsilon}. \]

Structure of the Solution

Exponentiating:

\[ P_{ij} = u_i\, e^{-C_{ij}/\varepsilon}\, v_j, \]

Thus the optimal solution must be a row- and column-rescaling of the kernel \(e^{-C_{ij}/\varepsilon}\).
Given \(v\), there is a unique \(u\) that fixes the rows.
Given \(u\), there is a unique \(v\) that fixes the columns.
Sinkhorn alternates these corrections until both margins are satisfied.
Implemented using a custom log-domain Sinkhorn solver in R (safesink) to ensure numerical stability and dimensional alignment.

Result

Because the objective is strictly convex, the optimal solution \(P^{\star}\) is unique.
If the Sinkhorn algorithm converges, it must converge to \(P^{\star}\).

\(P^{\star}\)

matches the observed origin and destination totals
respects the relative friction structure implied by \(C_{ij}\).

Same structure as gravity, but…

Gravity estimates margins and frictions jointly.

Sinkhorn conditions on the margins, so frictions are identified from the residual structure of flows.

With a misspecified cost matrix, a gravity model merely measures sensitivity to the wrong notion of distance.
When the goal is to compare alternative cost matrices, conditioning on the margins allows cleaner identification of frictions.

Sinkhorn is the right tool for this job.

Interpretation of Temperature: \(\varepsilon\)

\(\varepsilon\) is the scale of random utility shocks (Gumbel noise) affecting mobility choices.

Low \(\varepsilon\)

Mobility concentrates on the lowest-cost transitions
Behavior approaches cost-minimizing reallocation (social planner)

High \(\varepsilon\)

Random shocks dominate cost differences
Mobility becomes diffuse and approaches random choice \(P_{ij} = a_i b_j\)

The distance metrics

Skill distance: represents continuous skill transferability
Hierarchical distance: represents discrete barriers within labour market.

Skill distance: Based on 161 O*NET measures

Hierarchical distance: derived from NOC taxonomy

All five digits match:
- distance = 0
First four digits match:
- distance = 1
First three digits match:
- distance = 2
First digit matches:
- distance = 3 + |ΔTEER|
Otherwise:
- distance = 9

Skill and Hierarchical distances: similar but distinct

Simulation Design

Generate artificial flows using

\[ C = (1-w)C_{skill} + wC_{hier} \]

with

\[ \varepsilon = 1 \]

Then estimate models using:

\(C_{skill}\)
\(C_{hier}\)

Evaluating the Horse Race

We evaluate model fit using the Kullback–Leibler divergence between the observed transition matrix \(P\) and the predicted matrix \(\hat P\).
In our simulations \(P\) is the true transition matrix, so KL measures the divergence between the estimated model and the truth.

\[ KL(P \parallel \hat P) = \sum_{ij} P_{ij}\log\frac{P_{ij}}{\hat P_{ij}} \]

KL divergence measures the log-likelihood loss of the model
Lower values indicate better fit

Which distance metric best rationalizes the data?

Temperature \((\varepsilon) \neq 1\) introduces mild misspecification: \(KL > 0\).
For robustness, we evaluate fit over a range of temperatures \([.5, 2]\).
When the DGP matches one distance metric, that metric is the clear winner.
If flows independent, forcing distance to matter (low \(\varepsilon\)) increases loss.
When the DGP mixes both structures, the race tightens, indicating both metrics contain useful information.

Destination Education Gating

Some occupations draw from very specific educational pipelines, while others recruit from a broad mix of credentials.

Calculate overall workforce education distribution \(p_0\)
Measure KL divergence of each occupation’s education distribution \(p\).

\[ KL(p \parallel p_0) \]

Remove the mechanical relationship with occupation size \(T\)

\[ \text{specificity} = \log(KL) - \widehat{E}[\log(KL) \mid \log(T)] \]

Destination Gating

Why?

Run the cost-matrix “horse race” within each quartile.
Hypothesis: hierarchical distance better rationalizes high-gating destinations, while skill distance better rationalizes low-gating destinations.

Heterogeneity in Distance Structure

To the right we see the result of splitting the labour market into destination gating quartiles.
Treating the labour market as a whole would hide this structure
One additional heterogeneity check would be to split on the basis of destination TEER.
Hierarchical distance may overstate cross-group frictions for TEERs 0 & 5.

Diagnostic Plots

Individual analysis: inter-census education group

Regress distance traveled on size-corrected KL divergence measures from the CIP–NOC table
- Education specificity
  → How concentrated are graduates from a program across occupations
  → Interpreted as likelihood the program acts as a wormhole entrance
- Destination gating
  → How concentrated are entrants to an occupation across programs
  → Interpreted as likelihood the occupation acts as a wormhole exit
Controls: 2-digit NOC origin fixed effects (2016 occupation)
→ Origin occupation may influence both education choice and distance traveled.

Hypothesis

More specific programs and more gated occupations should generate larger occupational jumps.

What we will do (once the linked data are available)

Race two distance metrics on two samples:
- one where distance should rationalize local mobility (pre-2016 attainment)
- one where it should not (inter-census attainment)
Test for heterogeneity in mobility frictions across the labour market:
- split sample by measure of destination gating and race the two metrics on each split.
Characterize the relationship between distance traveled and education/destination specificity among the inter-census attainment group