slides

Occupational Mobility and Optimal Transport

Simulation results for now — awaiting linked census microdata

Richard Martin
Ministry of Post Secondary Education and Future Skills

Disclaimer: The views expressed are those of the author and do not necessarily reflect those of the Government of British Columbia.

Labour markets are constantly evolving

Sources of structural change include:

technology
COVID-19
government policy
resource depletion

These shocks shift occupational labour demand

Entry and exit may account for part of the adjustment, but typically some workers must move across occupations to meet changing demand

What governs mobility between occupations?

How does occupational distance shape mobility?

It is natural to expect mobility to decline with occupational distance
We therefore ask which notion of occupational distance best rationalizes local mobility?

This requires:

Considering multiple notions of occupational distance
A method to compare model fit
Recognizing two mobility regimes
- Local mobility: gradual movement in occupational space
- Non-local mobility: jumps that bypass distance frictions

Distance should explain local transitions but fail to explain non-local moves

Where we are going

Local mobility can be modeled using entropy-regularized optimal transport, which produces flows consistent with observed origin and destination margins
This framework allows competing distance metrics to be compared in a clean “horse race”
In simulations, when the data-generating process is known, the method recovers the correct distance structure
This provides a framework for testing whether skill similarity or hierarchical structure better explains local mobility

Two mobility regimes

Workers whose highest educational attainment is unchanged between Censuses may move locally in occupational space
A change in highest educational attainment between Censuses can act like a wormhole, enabling jumps across distant occupations

Implication:

Distance should rationalize local mobility
If the model is correct, it should struggle with wormhole transitions

Local Mobility

Gravity-style mobility models predict that flows increase with occupation size and decline with occupational distance
This mirrors Newtonian gravity: \(F \propto \frac{m_1 m_2}{d^2}\)

Goal: separate the mass effect (occupation size) from the distance effect in order to identify the distance measure that best rationalizes local mobility

What Defines Occupational Distance?

Two natural candidates:

Skill similarity: based on 161 O*NET dimensions

Institutional hierarchy: based on 5 digit NOC classification

Digit	Level	Description
1	Broad Occupational Category	10 broad categories
2	TEER Category	6 levels
3	Sub-major Group	Occupational cluster
4	Minor Group	Narrow family
5	Unit Group	Detailed occupation

Gravity: standard model for local mobility

\[ P_{ij} = \exp(\alpha_i + \beta_j - \gamma C_{ij}) \]

\(P_{ij}\) is the share of workers moving from occupation \(i\) to \(j\)
\(\alpha_i\) origin fixed effect
\(\beta_j\) destination fixed effect
\(C_{ij}\) occupational distance
\(\gamma\) object of interest: sensitivity of flows to a proposed cost structure

Gravity Models: An Unfair Horse Race

In horse racing, handicaps add weight to stronger horses, making the outcome less informative about the horses’ true speed
If the friction specification \(C_{ij}\) is wrong — and it will be — fixed effects may absorb part of the misspecification
We prefer misspecification to appear as lack of fit rather than being masked by parameter adjustment
The goal of our horse race is to cleanly identify how well each friction rationalizes the observed flows
Of course, winning the horse race does not imply the winner was particularly “fast”

A Fair Horse Race

Entropy-regularized optimal transport enforces the observed marginals exactly:

\[ \sum_j P_{ij} = a_i, \qquad \sum_i P_{ij} = b_j \]

via iterative row and column rescaling

Equilibrium employment levels are taken as given; we model the mobility flows consistent with those levels
This corresponds to a decentralized random utility model where mobility utilities include Gumbel shocks, producing logit choice probabilities

Entropy-regularized optimal transport

\[ \mathcal{L} = \underbrace{\sum_{ij} P_{ij} C_{ij}}_{\text{mass}\times\text{distance}} + \varepsilon\times \underbrace{\sum_{ij} P_{ij}(\log P_{ij}-1)}_{\text{negative entropy}} + \sum_i f_i \underbrace{\left(a_i - \sum_j P_{ij}\right)}_{\text{origin constraint}} + \sum_j g_j \underbrace{\left(b_j - \sum_i P_{ij}\right)}_{\text{destination constraint}} \]

First-order conditions:

\[ \frac{\partial \mathcal{L}}{\partial P_{ij}} = C_{ij} + \varepsilon \log P_{ij} - f_i - g_j = 0 \]

Rearranging:

\[ \log P_{ij} = \frac{f_i}{\varepsilon} + \frac{g_j}{\varepsilon} - \frac{C_{ij}}{\varepsilon} \]

Structure of the Solution

Exponentiating:

\[ P_{ij} = u_i\, e^{-C_{ij}/\varepsilon}\, v_j, \]

Thus the optimal solution must be a row- and column-rescaling of the kernel \(e^{-C_{ij}/\varepsilon}\)
Given \(v\), there is a unique \(u\) that fixes the rows
Given \(u\), there is a unique \(v\) that fixes the columns
Sinkhorn alternates these corrections until both margins are satisfied
Implemented using a custom log-domain Sinkhorn solver in R to ensure numerical stability and dimensional alignment

Result

Because the objective is strictly convex, the optimal solution \(P^{\star}\) is unique
If the Sinkhorn algorithm converges, it must converge to \(P^{\star}\)

\(P^{\star}\)

matches the observed origin and destination totals
respects the relative friction structure implied by \(C_{ij}\)

Same structure as gravity, but…

Gravity estimates margins and frictions jointly

Sinkhorn conditions on the margins, so frictions are identified from the residual structure of flows

With a misspecified cost matrix, a gravity model merely measures sensitivity to the wrong notion of distance
When the goal is to compare alternative cost matrices, conditioning on the margins allows cleaner identification of frictions

Sinkhorn is the right tool for this job

Interpretation of Temperature: \(\varepsilon\)

\(\varepsilon\) is the scale of random utility shocks (Gumbel noise) affecting mobility choices

Low \(\varepsilon\)

Mobility concentrates on the lowest-cost transitions
Behavior approaches cost-minimizing reallocation (social planner)

High \(\varepsilon\)

Random shocks dominate cost differences
Mobility becomes diffuse and approaches random choice \(P_{ij} = a_i b_j\)

The distance metrics

Skill distance: represents continuous skill transferability
Hierarchical distance: represents discrete barriers within labour market

Skill distance: Based on 161 O*NET measures

Hierarchical distance: derived from NOC taxonomy

All five digits match:
- distance = 0
First four digits match:
- distance = 1
First three digits match:
- distance = 2
First digit matches:
- distance = 3 + |ΔTEER|
Otherwise:
- distance = 9

Skill and Hierarchical distances: similar but distinct

Simulation Design

Generate artificial flows using

\[ C = (1-w)C_{skill} + wC_{hier} \]

with

\[ \varepsilon = 1 \]

Then estimate models using:

\(C_{skill}\)
\(C_{hier}\)

Evaluating the Horse Race

We evaluate model fit using the Kullback–Leibler divergence between the observed transition matrix \(P\) and the predicted matrix \(\hat P\)
In our simulations \(P\) is the true transition matrix, so KL measures the divergence between the estimated model and the truth

\[ KL(P \parallel \hat P) = \sum_{ij} P_{ij}\log\frac{P_{ij}}{\hat P_{ij}} \]

KL divergence measures the log-likelihood loss of the model
Lower values indicate better fit

Which distance metric best rationalizes the data?

Temperature \((\varepsilon) \neq 1\) introduces mild misspecification: \(KL > 0\)
For robustness, we evaluate fit over a range of temperatures \([.5, 2]\)
When the DGP matches the distance metric, that metric is the clear winner
If flows independent, forcing distance to matter (low \(\varepsilon\)) increases loss
When the DGP mixes both structures, the race tightens, indicating both metrics contain useful information

Destination Education Gating

Some occupations draw from very specific educational pipelines, while others recruit from a broad mix of credentials

Calculate overall workforce education distribution \(p_0\)
Measure KL divergence of each occupation’s education distribution \(p\)

\[ KL(p \parallel p_0) \]

Remove the mechanical relationship with occupation size \(T\)

\[ \text{specificity} = \log(KL) - \widehat{E}[\log(KL) \mid \log(T)] \]

Destination Gating

Why?

Run the cost-matrix “horse race” within each quartile
Hypothesis: hierarchical distance better rationalizes high-gating destinations, while skill distance better rationalizes low-gating destinations

Heterogeneity in Distance Structure

To the right we see the result of splitting the labour market into destination gating quartiles
Treating the labour market as a whole would hide this structure
We could also split on the basis of destination TEER, as hierarchical distance may overstate cross-group frictions for TEERs 0 & 5

Diagnostic Plots

Individual analysis: the attainment increase group

Regress distance traveled on size-corrected KL divergence measures from the CIP–NOC table
- Education specificity
  → How concentrated are graduates from a program across occupations
  → Interpreted as likelihood the program acts as a wormhole entrance
- Destination gating
  → How concentrated are entrants to an occupation across programs
  → Interpreted as likelihood the occupation acts as a wormhole exit
Controls: 2-digit NOC origin fixed effects (2016 occupation)
→ Origin occupation may influence both education choice and distance traveled

Hypothesis

More specific programs and more gated occupations should generate larger occupational jumps

What we will do (once the linked data are available)

Race two distance metrics on two samples:
- one where distance should rationalize local mobility (highest educational attainment constant)
- one where it should not (highest educational attainment changed)
Test for heterogeneity in mobility frictions across the labour market:
- split sample by measure of destination gating and race the two metrics on each split
Characterize the relationship between distance traveled and education/destination specificity among the inter-census attainment group