Overview

Purpose

This companion documents the construction, calibration, and diagnostic evaluation underlying the occupational mobility analysis. It provides technical detail on distance metrics, educational concentration measures, and transport calibration procedures that support the empirical tests presented in the main paper.

Contents

  • Skill distance: O*NET-based occupational skill vectors, dimensionality reduction via PCA, and the resulting continuous distance matrix.

  • Hierarchical distance: Taxonomy-based discrete distances derived from the NOC structure, including TEER-based vertical differentiation and comparison to skill geometry.

  • Destination education gating intensity: A size-adjusted measure of concentration in educational inflows into occupations, capturing heterogeneity in credential-linked entry barriers.

  • Origin education specificity: A size-adjusted measure of concentration in occupational outcomes across fields of study, capturing heterogeneity in supply-side specialization.

  • Transport calibration: Identification and normalization of the entropy regularization parameter (\(\varepsilon\)), including percentile-based anchoring across distance metrics.

  • Model evaluation: Diagnostic assessment of exponential structure, sensitivity to calibration choices, and robustness of substitution patterns.

Conceptual Structure

Destination gating and origin specificity characterize complementary dimensions of the same allocation problem:

  • Destination gating reflects how narrowly occupations draw from educational pipelines.

  • Origin specificity reflects how narrowly fields of study channel graduates into occupations.

Within the equilibrium transport framework, these indices capture heterogeneity in frictions shaping mobility across the occupational network.

This companion is designed to ensure transparency and reproducibility of all distance construction and calibration choices.

Skill Distance

Column

Screeplot

Column

ONET Skill Distance

Hierarchical distance

Column

Description

Hierarchical distances encode institutional and career-ladder barriers implied by occupational taxonomies:

  1. All five digits match: distance = 0
  2. First four digits match: distance = 1
  3. First three digits match: distance = 2
  4. First digit matches: distance = 3 + |ΔTEER|
  5. Otherwise: distance = 9
  • These distances are intentionally discrete and reflect steps in the NOC taxonomy.
  • In unregularized optimal transport, large blocks of ties can lead to many equally optimal flow allocations (mass splitting).
  • Entropic regularization (Sinkhorn) smooths the solution and makes the problem numerically stable.
  • To the right we compare the internal structure of taxonomy with the same hierarchical distances reordered by skill similarity.
  • The Spearman correlation is 0.285, indicating that institutional proximity and skill similarity are positively but only modestly related.
  • The two metrics encode overlapping yet distinct occupational geometries.

Distance counts

Column

Internal structure of taxonomy

Hierarchical distance reordered by skill similarity

Destination gating

Column

High specificity indicates few educational pathways into the occupation; residualized specificity is empirically orthogonal to log occupation size (Pearson=-0.03, Spearman=0.03).

Column

Education specificity

Column

High specificity indicates few occupational pathways from the education; residualized specificity is empirically orthogonal to log occupation size (Pearson=0.07, Spearman=-0.01).

Column

Transport Calibration

Scale and Identification

In entropic optimal transport, predicted flows take the form

\[ P_{ij} \propto \exp\left(-\frac{C_{ij}}{\varepsilon}\right). \]

Only the ratio \(C/\varepsilon\) is identified: multiplying both the cost matrix and \(\varepsilon\) by a common constant leaves the transport plan unchanged. Because the hierarchical and skill distance matrices are measured on different numerical scales, calibration must ensure that regularization reflects comparable substitution margins rather than arbitrary units.

Accordingly, we anchor \(\varepsilon\) to economically informative regions of each cost distribution.

Anchoring to the Informative Cost Region

The hierarchical distance contains a large mass of maximally distant pairs (distance = 9), which represent transitions the taxonomy treats as categorically distant. Because these pairs provide little information about substitution intensity among plausible transitions, we define the informative region as the set of non-maximal, non-zero hierarchical distances.

Within this region, we compute the 25th, 50th, and 75th percentiles of hierarchical cost. These conditional quantiles represent increasingly broad but still economically meaningful transition margins.

Each anchor value is then mapped to its unconditional percentile in the full hierarchical distribution. The skill distance matrix is calibrated by selecting cost values at these same unconditional percentiles. This procedure ensures that calibration aligns comparable substitution margins across metrics, rather than matching arbitrary numerical magnitudes.

Percentile-based anchoring is invariant to monotonic transformations of the cost scale and therefore preserves the rank ordering of transition costs in each metric.

Calibration Anchors Across Distance Metrics
Conditional Quantile Hierarchical Anchor Unconditional Percentile Skill Anchor
0.25 3.000 0.040 6.659
0.50 4.000 0.080 8.104
0.75 5.000 0.104 8.773

The median non-maximal hierarchical transition lies at approximately the 8th percentile of the full hierarchical distribution; the skill anchor is defined at this same percentile to ensure comparable substitution intensity.

Model Evaluation

This section evaluates the empirical performance of the competing cost structures.

We assess whether the relative fit of the hierarchical and skill distance matrices remains stable across a range of entropy regularization values (\(\varepsilon\)). Robustness to regularization ensures that conclusions reflect differences in underlying cost geometry rather than sensitivity to the scale of stochastic dispersion.

Model performance is evaluated using cross-entropy (Kullback–Leibler divergence) between predicted and observed bilateral flow matrices. Because origin and destination totals are imposed by construction, fit is assessed exclusively on the composition of flows.

A distance metric is considered empirically supported if it consistently yields lower cross-entropy across calibration anchors and regularization values. Instability across \(\varepsilon\) would indicate that results depend on parameter tuning rather than structural differences in cost geometry.