This companion documents the construction, calibration, and diagnostic evaluation underlying the occupational mobility analysis. It provides technical detail on distance metrics, educational concentration measures, and transport calibration procedures that support the empirical tests presented in the main paper.
Skill distance: O*NET-based occupational skill vectors, dimensionality reduction via PCA, and the resulting continuous distance matrix.
Hierarchical distance: Taxonomy-based discrete distances derived from the NOC structure, including TEER-based vertical differentiation and comparison to skill geometry.
Destination education gating intensity: A size-adjusted measure of concentration in educational inflows into occupations, capturing heterogeneity in credential-linked entry barriers.
Origin education specificity: A size-adjusted measure of concentration in occupational outcomes across fields of study, capturing heterogeneity in supply-side specialization.
Transport calibration: Identification and normalization of the entropy regularization parameter (\(\varepsilon\)), including percentile-based anchoring across distance metrics.
Model evaluation: Diagnostic assessment of exponential structure, sensitivity to calibration choices, and robustness of substitution patterns.
Destination gating and origin specificity characterize complementary dimensions of the same allocation problem:
Destination gating reflects how narrowly occupations draw from educational pipelines.
Origin specificity reflects how narrowly fields of study channel graduates into occupations.
Within the equilibrium transport framework, these indices capture heterogeneity in frictions shaping mobility across the occupational network.
This companion is designed to ensure transparency and reproducibility of all distance construction and calibration choices.
Hierarchical distances encode institutional and career-ladder barriers implied by occupational taxonomies:
In entropic optimal transport, predicted flows take the form
\[ P_{ij} \propto \exp\left(-\frac{C_{ij}}{\varepsilon}\right). \]
Only the ratio \(C/\varepsilon\) is identified: multiplying both the cost matrix and \(\varepsilon\) by a common constant leaves the transport plan unchanged. Because the hierarchical and skill distance matrices are measured on different numerical scales, calibration must ensure that regularization reflects comparable substitution margins rather than arbitrary units.
Accordingly, we anchor \(\varepsilon\) to economically informative regions of each cost distribution.
The hierarchical distance contains a large mass of maximally distant pairs (distance = 9), which represent transitions the taxonomy treats as categorically distant. Because these pairs provide little information about substitution intensity among plausible transitions, we define the informative region as the set of non-maximal, non-zero hierarchical distances.
Within this region, we compute the 25th, 50th, and 75th percentiles of hierarchical cost. These conditional quantiles represent increasingly broad but still economically meaningful transition margins.
Each anchor value is then mapped to its unconditional percentile in the full hierarchical distribution. The skill distance matrix is calibrated by selecting cost values at these same unconditional percentiles. This procedure ensures that calibration aligns comparable substitution margins across metrics, rather than matching arbitrary numerical magnitudes.
Percentile-based anchoring is invariant to monotonic transformations of the cost scale and therefore preserves the rank ordering of transition costs in each metric.
| Calibration Anchors Across Distance Metrics | |||
| Conditional Quantile | Hierarchical Anchor | Unconditional Percentile | Skill Anchor |
|---|---|---|---|
| 0.25 | 3.000 | 0.040 | 6.659 |
| 0.50 | 4.000 | 0.080 | 8.104 |
| 0.75 | 5.000 | 0.104 | 8.773 |
The median non-maximal hierarchical transition lies at approximately the 8th percentile of the full hierarchical distribution; the skill anchor is defined at this same percentile to ensure comparable substitution intensity.
This section evaluates the empirical performance of the competing cost structures.
We assess whether the relative fit of the hierarchical and skill distance matrices remains stable across a range of entropy regularization values (\(\varepsilon\)). Robustness to regularization ensures that conclusions reflect differences in underlying cost geometry rather than sensitivity to the scale of stochastic dispersion.
Model performance is evaluated using cross-entropy (Kullback–Leibler divergence) between predicted and observed bilateral flow matrices. Because origin and destination totals are imposed by construction, fit is assessed exclusively on the composition of flows.
A distance metric is considered empirically supported if it consistently yields lower cross-entropy across calibration anchors and regularization values. Instability across \(\varepsilon\) would indicate that results depend on parameter tuning rather than structural differences in cost geometry.