Goal:

Quantify the causal effect employee/employer match quality has on wages.

An ideal experiment:

It is helpful to outline the ideal experiment that we would undertake in a world without constraints, which serves as a baseline for comparison to what we can actually achieve.

In the absolute best case we would have access to parallel universes, which we will name factual and counter-factual. Let factual be the actual universe we live in, whereas the counter-factual universes are identical to the factual with only one difference: the observable match quality between a single employee and employer. If we observe a difference in wages between factual and counter factual universes, it must be due to the difference in match quality: everything else is identical. Obviously we do not have access to parallel universes, but if we did, this would be how to do science: we would be able to identify individual causal effects.

Randomized controlled experiment:

Without access to parallel universes the best we can do is a randomized controlled experiment. If match quality is 1) observable/quantifiable and 2) randomly allocated then we can identify average causal effects by comparing the resulting wages associated with varying match quality. Individual causal effects are not observable: we can not say what would have happened if match quality was different for any given match. Nevertheless, a randomized controlled experiment does identify, on average, the causal effect of match quality on wages. Randomization ensures that match quality is independent of any confounding variables that could effect wages, and ensuring that the effect is not affected by selection bias.

Using observational data:

Selection bias occurs when the variable of interest is endogenous i.e. not random. In our case, the quality of a match depends on attributes of both the employee and the employer. Suppose that high skill/ability workers are more likely to find a high quality match than a low skill/ability worker. Thus skill/ability will be correlated with match quality, and if we were to naively compare wages on the basis match quality, the results would suffer from selection bias. We would overstate the effect of match quality because it is confounded with skill/ability differences.

If the confounding variables are quantifiable/observable we can include them in our regression and lean on the conditional independence assumption: conditional on ability/skill, match quality is “as good as” randomly assigned. If this assumption is true then the relationship between match quality and wages can be interpreted as being causal.

Challenges we face:

  1. Confounding variables are not directly observable: we will need proxies.
  2. Match quality is not directly observable, we will need a proxy.

Potential approach:

Creating proxies for the confounding variables (skills/abilities)

We lack information regarding the skills and abilities of individual workers. I propose that we create proxies for each individual’s skill/ability based on two observable variables: Their field of study, and their age.

From an individual’s field of study we can create a skill profile based on ONET skills data and the CIP/NOC table. For example, suppose that for a given field of study there are only two NOCs that have positive employment counts. Lets say NOC1 has 10000 people employed, and NOC2 has 5000 people employed. To keep things simple suppose there is only one skill: Suppose that for NOC1 the skill score is 6, and for NOC2 the skill score is 3. Given the above example numbers we can create a skill score for this field of study:

\[Skill_{fos}=\frac{10000}{10000+5000}\times6+\frac{5000}{10000+5000}\times3=5\] Of course skills are not determined solely by your field of study: it is likely that skills are picked up over time spent in the labour market. Again this is not observable, but could be proxied by age-25.

A proxy for match quality

Match quality is also not observable, so it will also need a proxy. Here I have two ideas, and I think we should try both.

  1. The difference in skills between the field of study skill profile (described above) and the skill profile associated with the NOC. Using the example above, suppose that the individual was one of the 5000 working in NOC2. The proxy for their skill (based on their field of study) is 5, and the skill required for NOC2 is 3, so the proxy for their skill (mis)match is \(2=5-3\): they appear to be over-skilled for this occupation, based on their field of study. The advantage of this approach is that we will be able to see the breakdown of which skills matter the most/least in determining the effect of match quality on wages.

  2. A single measure of match quality based on the CIP/NOC table of employment counts: For a given NOC, the match quality is the proportion of those employed with a given educational background. e.g. Suppose that for the occupation nursing, there are only two educational backgrounds where there are positive counts in the CIP/NOC table. For sake of argument, suppose that 90,000 nurses have field of study \(A\), and 10,000 nurses have field of study \(B\). The proposed match metric in this case would be .9 for those with a field of study \(A\), and \(.1\) for those with a field of study \(B\). i.e. we would say those with field of study \(A\) are well matched to their occupation, whereas those nurses with field of study \(B\) are not well matched. As an aside, note that according to this metric no university professors are well matched as their fields of study are very diverse.

Bias due to noisy measurement:

Even if the conditional independence assumption is met, the fact that we are using proxies rather than actual values implies that our results will be biased… but there is not much we can do about this.

Econometric theory is like an exquisitely balanced French recipe, spelling out precisely with how many turns to mix the sauce, how many carats of spice to add, and for how many milliseconds to bake the mixture at exactly 474 degrees of temperature. But when the statistical cook turns to raw materials, he finds that hearts of cactus fruit are unavailable, so he substitutes chunks of cantaloupe; where the recipe calls for vermicelli he uses shredded wheat; and he substitutes green garment dye for curry, ping-pong balls for turtle’s eggs and, for Chalifougnac vintage 1883, a can of turpentine. (Stefan Valavanis)