The impact of Skill mis-match on labour market outcomes

TL;DR
Introduction
Assisting occupations:
Filtering the census data:
Where do fields of study lead?
The model:
Plot of regression results:
Out of sample fit:
What if a social planner could reallocate workers between occupations?
The takeaway:
Does skill mis-match influence whether someone works full time full year?
The model:
Plot of regression results:
Assessing the Logit Model:
Appendix:

source("shared_code.R")

TL;DR

People vary in how well matched their skills are to the skill required by their occupation. There are various reasons why people might be mis-matched. For instance, if there is an element of “winner takes all” in the labour market, one would expect that the highest quality workers and highest quality employers would end up with the highest quality matches: both the buyer and the seller have the luxury of being selective in who they match with. But being poorly matched does not necessarily indicate a worker is of low ability.

People might choose to be poorly matched if they discover they do not enjoy the work in the field they trained for.
Poor match quality might be due to discrimination based on age, gender or ethnicity.

Regardless of the cause of poor match quality, we will want to control for these confounding variables. If we successfully control for all the factors that influence match quality, we can assume that match quality is “as good as” randomly assigned. The goal of this exercise is to investigate the impact of skill mis-match on

employment income of those who work full time full year: Going from the 10th percentile of skill mis-match (relatively well matched) to the 90th percentile of skill mis-match (relatively poorly matched) causes a -7.9% reduction in employment income for those working full time, full year.
the probability of working full time full year: Going from the 10th percentile of skill mis-match (relatively well matched) to the 90th percentile of skill mis-match (relatively poorly matched) causes a -5.4% reduction in the probability of being employed full time full year.

Introduction

We define skill mis-match to be the euclidean distance between the skill profile of an occupation and the skills possessed by a worker. The skill profile of an occupation is relatively straight forward to derive from the ONET skills and work activities. In contrast there is no direct source of information regarding the skills of a worker. Here we make the presumption that a worker’s skills are acquired during education, and we can infer the skills acquired during education by looking at the relationship between occupations and field of study.

Statistics Canada Table: 98-10-0403-01 contains counts of workers by occupation and field of study. Suppose that we wanted to infer the average skill profile of someone whose field of study was Health and related fields. In the table below we show the top occupations (by count of workers) for the field of study Health and related fields. From this we derive a proportion, a truncated proportion (only proportions greater than .01), and an adjusted proportion to ensure the proportions we utilize sum to one. We then multiply the skill profile of each of the occupations by the adjusted proportion and then sum to create a weighted average skill profile for each field of study. At the extreme, if every person from a given field of study ended up in the same occupation, both skill profiles would be identical.

my_dt(cip2_noc5)

We undertake a similar exercise to create a weighted average skill profile of the 25 aggregate occupations based on the 5 digit NOC skill profiles and weights based on the counts of workers.

Once we have the skill profiles associated with each of the fields of study and occupation we can measure the distance between. To begin with we scale the skills and work activities to have a mean of zero and a standard deviation of one. Next we perform principal component analysis, and retain only the first five principal components. Finally we compute the euclidean distance between each of the occupations and fields of study based on the first five principal components. The distances are ploted below, where colour indicates distance. The most striking pattern is that for “Assisting occupations” the distance is large across all fields of study i.e. the yellow stripe in the bottom row.

 heatmaply::heatmaply(dist_mat)

Assisting occupations:

Why is the NOC group “Assisting Occupations…” poorly matched? Based on the NOC-CIP differences, they appear to be generically over-skilled. The skills they apparently have (based on their field of study) exceed the skills required in this broad set of occupations.

plt <- cip_noc_diff|>
  filter(census_noc_code==14)|>
  select(contains("noc-cip"))|>
  pivot_longer(cols=everything())|>
  mutate(name=str_sub(name, start=8),
         value=-value)|>
  group_by(name)|>
  summarize(value=mean(value))|>
  ggplot(aes(value, 
             fct_reorder(name,value),
             text=paste0("Skill: ",
                         name,
                         "\n Skill surplus: ",
                         round(value,2)
                         )
             ))+
  geom_col()+
  labs(x=NULL, 
       y=NULL, 
       title='Skill surplus for Assisting occupations, care providers, student monitors, crossing guards...')+
  theme_minimal(base_size = 8)

plotly::ggplotly(plt, tooltip = "text")

Filtering the census data:

The census public use micro data file contains a sample of roughly 1 million Canadians. We perform the following filtering of the data prior to analysis of the impact of skill mis-match of the employment income of those who work full year full time.

colnames(filtering_info) <- c("Filter applied","Observations left")
my_dt(filtering_info)

Note that the proportion of Canadians working full year full time is likely lower than normal, given COVID. This filtering leaves us with 20122 observations, 90% of which are randomly allocated to a training set, with the remaining 10% going to a testing set.

Where do fields of study lead?

Using our filtered dataset, we can look at what fields of study are associated with what occupations. Note that it is the dispersion of occupations associated with a given field of study that leads to our measure of skill mis-match. e.g. if there was a field of study where every single person ends up in the same occupation, their skill mis-match (distance) would be zero. In contrast, the larger the set of destination occupations, the more likely it is that the skills developed during education do not match exactly the skills required in one of the resulting occupations.

filtered|>
  group_by(`NOC vs. Admin and financ...:`, `CIP vs. Agriculture...:`)|>
  count()|>
  ggplot(aes(axis1= `CIP vs. Agriculture...:`, axis2 = `NOC vs. Admin and financ...:`, y = n)) +
  geom_alluvium(aes(fill = `CIP vs. Agriculture...:`)) +
  geom_stratum() +
  ggfittext::geom_fit_text(stat = "stratum", aes(label = after_stat(stratum)), width = 1/3, min.size = 1) +
  labs(fill="Fields of Study")+
  theme_void()+
  theme(legend.position = "bottom")

The model:

So what we are after is the causal impact of a mis-match in skills (proxied by distance) on employment income. Given that we are using observational data, we must lean on the Conditional Independence Assumption: i.e. we must assume that distance is “as good as randomly assigned” when we condition on the all the variables that influence both distance and employment income. We fit the model

\[\log(Employment~Income)=Age+Highest~Degree+ Language+ Gender+ Ethnicity+Occupation+Field~of~Study+distance+\mu\] The plot below shows (most of the) regression results: Language and Gender estimates can be found in the regression table in the appendix. From the results you can see that:

Employment income increases rapidly with age and then levels off.
Techy fields of study have higher levels of employment income, whereas humanities and education are low.
Employment income increases significantly with highest degree attained.
Employment income tends to be lower for non-whites.
Employment income highest for health occupations, lowest in sales.

But the main result we are after is the monetary penalty associated with a skill mis-match: $\beta_{distance}~=~$-0.008, which can be interpreted as “A one unit increase in distance causes a $100*(\exp(\beta_{distance})-1)~=~$-0.8% change in employment income.” In terms of its economic relevance, going from the 10th percentile of distance 3.71 (relatively well matched) to the 90th percentile of distance 13.33 (relatively poorly matched) causes a -7.9% change in employment income, ceteris paribus.

Plot of regression results:

ggplot(mod1_coef, aes(estimate,
                     reorder_within(level, within=variable, by=estimate),
                     xmin=conf.low,
                     xmax=conf.high))+
  geom_vline(xintercept = 0, col="grey", lty=2)+
  geom_point(size=.5)+
  geom_errorbarh(height=0)+
  facet_wrap(~variable, scales = "free")+
  scale_y_reordered()+
  scale_x_continuous(labels=scales::percent)+
  labs(x=NULL,
       y=NULL)+
  theme_minimal()

Out of sample fit:

Note that the model only explains about 30% of the variation in the in sample employment income, but even this might be overly optimistic. It is always a good idea to investigate the model’s performance out of sample, to make sure that we have not over-fit the model. Over fitting occurs when the model fits the in sample data well, but does not perform well out of sample. Below we look at how well the model performed out of sample (using the 10% of the sample that we held back). The model does a fairly decent job of predicting employment income when it is low, but does not do a good job of explaining employment income in excess of $300,000. Note that the root mean squared error in sample is 0.444 whereas it is 0.437 out of sample: i.e. there is no evidence of over-fitting.

ggplot(test_w_pred, aes(exp(prediction), exp(log_income)))+
  geom_abline(slope = 1, intercept = 0, col="white", lwd=2)+
  geom_point(alpha=.1)+
  scale_x_continuous(trans="log10", labels=scales::dollar)+
  scale_y_continuous(trans="log10", labels = scales::dollar)+
  labs(x="Prediction",
       y="Actual",
       title="Test set prediction errors")

What if a social planner could reallocate workers between occupations?

Above we identified how much employment income would be expected to increase for an individual who went from being relatively well matched (10th percentile of distance) to relatively poorly matched (90th percentile of distance): a -7.9% change in employment income, ceteris paribus. However this does not give an indication of the social welfare benefits of reducing mis-match, as it is possible that increasing the match quality of a given individual may worsen the match quality of whoever they displaced when they shifted occupations. Next we perform a hypothetical exercise, where a social planner attempts to improve the average match quality by iteratively selecting two people at random, swapping their occupations, and comparing the original mis-match (distance) with the swapped mis-match (distance). If the swapped distance is significantly lower than the original distance the workers swap occupations, otherwise they remain unchanged… and then we repeat 100000 times. We then compare the predicted employment income based on observed occupations (factual) with the predicted employment income based on the swapped occupations (counter factual).

The simulation code ↓

set.seed(123)
num_sim <- 100000 #social planner gives up after this many tries
cutoff <- .5 #only swap occupations if proportional improvement is greater than this
results <- tibble(sim = 1:num_sim, #initialize a dataframe to store simulation results
                  nocs = vector(length = num_sim, mode = "list"),
                  mean_distance = NA_real_)

for(i in 1:nrow(results)){
  if(i==1){#in the first iteration
    original <- test_stripped|> # test stripped contains only id, NOC and CIP
      left_join(cip_noc_diff,
                by = c("NOC21"= "census_noc_code", "CIP2021"="census_cip_code"))|> #adds in the distances
      mutate(prob=distance/sum(distance))#used below for drawing two observations to swap occupations
  }else{
    original <- test_stripped|>
      select(-NOC21)|> #get rid of the original allocation of NOCs
      bind_cols(NOC21=results$nocs[[i-1]])|> #add in the NOCs from the previous iteration
      left_join(cip_noc_diff,
                by = c("NOC21"= "census_noc_code",
                       "CIP2021"="census_cip_code"))|>#add distances (using last iterations NOCs)
      mutate(prob=distance/sum(distance)) #used below for drawing two observations to swap occupations
  }
    changepoints <- sample(original$id,
                           size=2,
                           replace = FALSE,
                           prob = original$prob) #two observations where we will swap NOCs
    new <- original|> #take  the original (for this iteration) data, then
      mutate(NOC21= replace(NOC21,
                            changepoints,
                            NOC21[rev(changepoints)]))|> #swap the NOCs for two of the observations
      select(-distance)|> #distance is now wrong given the swap
      left_join(cip_noc_diff,
                by = c("NOC21"= "census_noc_code",
                       "CIP2021"="census_cip_code"))#add in the correct distances
    proportion_improvement <- (sum(original$distance)-sum(new$distance))/mean(original$distance)
    if(proportion_improvement>cutoff){#if significant reduction in distance
      results$nocs[[i]] <- new$NOC21 #save the swapped NOCs for use in next iteration
      results$mean_distance[[i]] <- mean(new$distance) #save the new average distance
    }else{#if the swap did not reduce the mean distance
      results$nocs[[i]] <- original$NOC21 #save the unchanged NOCs for use in next iteration
      results$mean_distance[[i]] <- mean(original$distance) #save the original distance
    }
  print(paste(scales::percent(i/num_sim, accuracy = 1), "complete")) #how much longer do I have to wait?
}

ggplot(simulation_results, aes(sim, mean_distance))+
  geom_line()+
  scale_x_continuous(labels=scales::comma)+
  labs(x="Simulation Number",
       y="Average skill gap distance",
       title="Shuffling occupations can reduce the average skill gap."
       )

What proportion of people had their occupations swapped?

table(noc_change$unchanged)/nrow(noc_change)


  changed      same 
0.3807267 0.6192733

So obviously this is a ridiculously unrealistic intervention: if 38% of the labour market changed occupations it would be hugely disruptive. The point of the simulation is to figure out what the upper bound is on an intervention attempting to improve skill matching in the labour market: in the best case scenerio where there are no adjustment costs and it is cool to displace more than a third of the labour market, what improvement in the distribution of employment income can be attained? To answer this question we take the model that was fit on the training data, and make two predictions: one based actual occupations (and distances), and one based on the intervention (swapped occupations and distances). First off, summary statistics:

compare_swap|>
  group_by(data)|>
  summarise(mean_income=scales::dollar(mean(exp(predictions))),
            sd_income=scales::dollar(sd(exp(predictions))))|>
  DT::datatable(rownames = FALSE)

From the summary statistics we can infer that any risk neutral individual behind the veil of ignorance would prefer the distribution of employment income based on the swapped occupations: the mean is (slightly) higher. How about for risk averse individuals? Next, lets take a look at density plots:

ggplot(compare_swap, aes(exp(predictions), fill=data))+
  geom_density(alpha=.25)+
  scale_x_continuous(labels = scales::dollar)+
  labs(x="Predicted Employment Income")

The reduction in dispersion is apparent, with the increase in level being more subtle.

Looks like it is possible that the swapped distribution of predicted employment income second order stochastically dominates (SOSD) the original distribution of predicted employment income. Why do we care about SOSD? What it implies is that any risk averse or risk neutral person behind the veil of ignorance would prefer the distribution of employment income based on the swapped occupations.

ggplot(compare_swap, aes(exp(predictions), colour = data)) +
  stat_ecdf()+
  scale_x_continuous(labels=scales::dollar)+
  annotate(geom = "text", x=60000, y=.22, label="a")+
  annotate(geom = "text", x=100000, y=.85, label="b")+
  labs(x="Predicted Employment Income",
       title="Second order stochastic dominance",
       subtitle="Given the area between the curves a is larger than area b, swapped SOSD original")

The takeaway:

By costlessly displacing over a third of the labour market the social planner was able to marginally improve social outcomes, in the sense that any risk neutral or risk averse individual behind the veil of ignorance would prefer the government to intervene. In reality, switching occupations is not without cost, so a more realistic intervention would likely focus on those entering the labour market. Thus the above exercise puts an upper bound on what we can hope to achieve by improving labour market skill matching.

Does skill mis-match influence whether someone works full time full year?

In the analysis above we only considered those who were working full year full time in 2020: here we look at whether skills mis-match causes a difference in the odds of working full year full time. We perform the following filtering of the data prior to analyzing the impact of skills mis-match on the odds of working full year full time.

colnames(filtering_info2) <- c("Filter applied","Observations left")
my_dt(filtering_info2)

The model:

So what we are after is the causal impact of a mis-match in skills (proxied by distance) on the probability of being employed full year full time. Given that we are using observational data, we must lean on the Conditional Independence Assumption: i.e. we must assume that distance is “as good as randomly assigned” when we condition on the all the variables that influence both distance and whether employed full time full year. We fit the model

\[logit(Employed~FT~FY)=Age+Highest~Degree+ Language+ Gender+ Ethnicity+Occupation+Field~of~Study+distance+\mu\] The plot below shows (most of the) regression results: Language and Gender estimates can be found in regression table which follows. From the results you can see that the probability of working full year full time is:

significantly higher for every age group when compared to 20-24 year olds.
significantly higher for techy fields of study, and significantly lower for arts and humanities.
significantly higher for the highest degree attained, with the exception of Medicine, dentistry..
significantly higher for Filipino, and significantly lower for West Asian, Japanese and Korean.
significantly higher for professional occupations, and significantly lower for a broad range of occupations.

But the main result we are after is how distance affects the probability of being employed full time full year: The probability of working full year full time decrease by -0.5% for every one unit increase in distance. In terms of its economic relevance, going from the 10th percentile of distance 3.71 (relatively well matched) to the 90th percentile of distance 13.67 (relatively poorly matched) causes a -5.4% reduction in the probability of being fully employed, ceteris paribus.

Plot of regression results:

margins_mod2|>
  mutate(variable=str_remove_all(variable, "`"),
         level= if_else(is.na(level), variable, level),
         level=str_trunc(level, 50)
         )|>
  filter(!variable %in% c("(Intercept)",
                          "Language vs. not english",
                          "Gender vs. Woman+"
                          ))|>
  arrange(variable, level)|>
  ggplot(aes(AME,
            reorder_within(level, within=variable, by=AME),
                     xmin=lower,
                     xmax=upper))+
  geom_vline(xintercept = 0, col="grey", lty=2)+
  geom_point(size=.5)+
  geom_errorbarh(height=0)+
  facet_wrap(~variable, scales = "free")+
  scale_y_reordered()+
  scale_x_continuous(labels=scales::percent)+
  labs(x=NULL,
       y=NULL)+
  theme_minimal()

Assessing the Logit Model:

We use the logit model to form predictions based on the test data. These predictions are converted to probabilities of being employed full time full year, and then rounded to either zero or one. We then can look at the out of sample prediction accuracy of the model via a confusion matrix:

confusion

                   prediction
full_time_full_year    0    1
                  0  610  812
                  1  340 1736

From the confusion matrix we can see the model predicts full time full year employment with 68% accuracy. We can test the correspondence between the observed and predicted using the Pearson’s Chi-squared test with Yates’ continuity correction. i.e. What is the probability that we would get a correlation this strong between prediction and actual if the null hypothesis (no relationship) is true \[p = 0.0000000000000000000000000000000000000000000000000000000000000000006362731\] Lets compare this result to the null model, where we randomly assign either a zero or a one to each observation in the test set using the probabilities from the training set.

null_confusion

                   null_prediction
full_time_full_year    0    1
                  0  589  833
                  1  827 1249

The NULL model predicts full time full year employment with 52% accuracy. Again, we can use the Pearson’s Chi-squared test with Yates’ continuity correction to assess the probability that we would get a result this extreme if the null hypothesis is true p = 0.367.

Appendix:

For those of you who like regression tables… Note that logit results give odds ratio effects (not probabilities)

stargazer::stargazer(mod1, mod2, type = "html",
          se = list(robust_se1, robust_se2))


	Dependent variable:

	log_income	full_time_full_year
	OLS	logistic
	(1)	(2)

`Age vs. 20-24:`25 to 29 years	0.182^***	0.956^***
	(0.021)	(0.066)

`Age vs. 20-24:`30 to 34 years	0.345^***	1.201^***
	(0.021)	(0.065)

`Age vs. 20-24:`35 to 39 years	0.442^***	1.286^***
	(0.021)	(0.066)

`Age vs. 20-24:`40 to 44 years	0.507^***	1.526^***
	(0.022)	(0.069)

`Age vs. 20-24:`45 to 49 years	0.535^***	1.638^***
	(0.022)	(0.069)

`Age vs. 20-24:`50 to 54 years	0.561^***	1.677^***
	(0.022)	(0.070)

`Age vs. 20-24:`55 to 59 years	0.523^***	1.463^***
	(0.022)	(0.070)

`Age vs. 20-24:`60 to 64 years	0.511^***	1.125^***
	(0.024)	(0.075)

`Age vs. 20-24:`65 to 69 years	0.489^***	0.419^***
	(0.033)	(0.089)

`Age vs. 20-24:`Unknown	0.424^***	0.751^***
	(0.106)	(0.270)

`Degree vs. non-apprentice:`Apprenticeship certificate	0.075^***	0.117
	(0.022)	(0.076)

`Degree vs. non-apprentice:`Less than 1 year College	0.029	0.112
	(0.021)	(0.075)

`Degree vs. non-apprentice:`1-2 years of College	0.096^***	0.254^***
	(0.020)	(0.070)

`Degree vs. non-apprentice:`More than 2 years of College	0.108^***	0.201^***
	(0.022)	(0.077)

`Degree vs. non-apprentice:`University certificate or diploma	0.091^***	0.222^***
	(0.023)	(0.080)

`Degree vs. non-apprentice:`Bachelor’s degree	0.220^***	0.332^***
	(0.020)	(0.069)

`Degree vs. non-apprentice:`University diploma above bachelor level	0.272^***	0.237^**
	(0.027)	(0.095)

`Degree vs. non-apprentice:`Medicine, dentistry, veterinary, optometry	0.319^***	0.033
	(0.070)	(0.162)

`Degree vs. non-apprentice:`Master’s degree	0.296^***	0.203^***
	(0.022)	(0.077)

`Degree vs. non-apprentice:`PhD	0.473^***	0.450^***
	(0.036)	(0.131)

`Degree vs. non-apprentice:`Unknown	0.182^**	-0.265
	(0.079)	(0.217)

`Language vs. not english:`English	0.148^***	0.224^***
	(0.011)	(0.039)

`Language vs. not english:`Unknown	-0.404	-0.710
	(0.256)	(0.526)

`Gender vs. Woman+:`Man+	0.187^***	0.514^***
	(0.008)	(0.030)

`Ethnicity vs. White:`South Asian	-0.067^***	0.062
	(0.014)	(0.050)

`Ethnicity vs. White:`Chinese	-0.037^***	0.006
	(0.012)	(0.046)

`Ethnicity vs. White:`Black	-0.138^***	-0.167
	(0.039)	(0.139)

`Ethnicity vs. White:`Filipino	-0.167^***	0.276^***
	(0.015)	(0.063)

`Ethnicity vs. White:`Arab	-0.001	-0.270
	(0.064)	(0.199)

`Ethnicity vs. White:`Latin American	-0.104^***	-0.192^*
	(0.028)	(0.100)

`Ethnicity vs. White:`Southeast Asian	-0.083^***	-0.172
	(0.031)	(0.120)

`Ethnicity vs. White:`West Asian	-0.195^***	-0.387^***
	(0.031)	(0.103)

`Ethnicity vs. White:`Korean	-0.163^***	-0.421^***
	(0.030)	(0.102)

`Ethnicity vs. White:`Japanese	-0.034	-0.389^***
	(0.043)	(0.135)

`Ethnicity vs. White:`Other population groups, n.i.e.	-0.139^***	-0.575^***
	(0.053)	(0.206)

`Ethnicity vs. White:`Other multiple population groups	-0.054^***	0.042
	(0.019)	(0.071)

`Ethnicity vs. White:`Indigenous peoples	-0.060^***	-0.166^**
	(0.018)	(0.065)

`Ethnicity vs. White:`Unknown	-0.132^***	-0.148^**
	(0.020)	(0.069)

`NOC vs. Admin and financ...:`Administrative and financial support and supply chain logistics occupations	-0.101^***	-0.424^***
	(0.022)	(0.097)

`NOC vs. Admin and financ...:`Administrative occupations and transportation logistics occupations	-0.083^***	-0.269^***
	(0.020)	(0.092)

`NOC vs. Admin and financ...:`Assisting occupations in support of health services	-0.055^**	-0.509^***
	(0.026)	(0.109)

`NOC vs. Admin and financ...:`Assisting occupations, care providers, student monitors, crossing guards and related occupations in education and in legal and public protection	0.060	-1.202^***
	(0.042)	(0.147)

`NOC vs. Admin and financ...:`Frontline public protection services and paraprofessional occupations in legal, social, community, education services	0.030	-0.430^***
	(0.022)	(0.094)

`NOC vs. Admin and financ...:`General trades	-0.052^*	-0.549^***
	(0.028)	(0.118)

`NOC vs. Admin and financ...:`Helpers and labourers and other transport drivers, operators and labourers	0.006	-0.727^***
	(0.038)	(0.126)

`NOC vs. Admin and financ...:`Mail and message distribution, other transport equipment operators and related maintenance workers	-0.095^**	-0.450^**
	(0.045)	(0.193)

`NOC vs. Admin and financ...:`Middle management occupations	0.304^***	0.413^***
	(0.022)	(0.095)

`NOC vs. Admin and financ...:`Occupations in natural resources, agriculture and related production	0.060	-1.379^***
	(0.054)	(0.154)

`NOC vs. Admin and financ...:`Occupations in processing, manufacturing and utilities	0.059^*	-0.415^***
	(0.032)	(0.119)

`NOC vs. Admin and financ...:`Occupations in sales and services	-0.063^*	-0.776^***
	(0.036)	(0.107)

`NOC vs. Admin and financ...:`Other occupations in art, culture and sport	-0.039	-1.633^***
	(0.049)	(0.166)

`NOC vs. Admin and financ...:`Professional and technical occupations in art, culture and sport	0.164^***	-0.518^***
	(0.028)	(0.112)

`NOC vs. Admin and financ...:`Professional occupations in business and finance	0.223^***	0.191^**
	(0.022)	(0.094)

`NOC vs. Admin and financ...:`Professional occupations in health	0.375^***	-0.295^***
	(0.024)	(0.101)

`NOC vs. Admin and financ...:`Professional occupations in law, education, social, community and government services	0.130^***	0.153^*
	(0.021)	(0.090)

`NOC vs. Admin and financ...:`Professional occupations in natural and applied sciences	0.278^***	0.432^***
	(0.023)	(0.101)

`NOC vs. Admin and financ...:`Retail sales and service supervisors and specialized occupations in sales and services	-0.059^*	-0.404^***
	(0.031)	(0.115)

`NOC vs. Admin and financ...:`Sales and service representatives and other customer and personal services occupations	-0.157^***	-1.168^***
	(0.025)	(0.090)

`NOC vs. Admin and financ...:`Sales and service support occupations	-0.260^***	-1.320^***
	(0.030)	(0.106)

`NOC vs. Admin and financ...:`Technical occupations in health	0.076^***	-0.541^***
	(0.027)	(0.111)

`NOC vs. Admin and financ...:`Technical occupations related to natural and applied sciences	0.030	-0.104
	(0.022)	(0.104)

`NOC vs. Admin and financ...:`Technical trades and transportation officers and controllers	0.106^***	-0.458^***
	(0.022)	(0.095)

`CIP vs. Agriculture...:`Architecture, engineering, and related trades	0.147^***	0.035
	(0.021)	(0.094)

`CIP vs. Agriculture...:`Business, management and public administration	0.083^***	-0.048
	(0.021)	(0.092)

`CIP vs. Agriculture...:`Education	-0.079^***	-0.150
	(0.024)	(0.104)

`CIP vs. Agriculture...:`Health and related fields	0.048^**	-0.212^**
	(0.023)	(0.097)

`CIP vs. Agriculture...:`Humanities	-0.078^***	-0.364^***
	(0.025)	(0.102)

`CIP vs. Agriculture...:`Mathematics, computer and information sciences	0.155^***	0.068
	(0.025)	(0.106)

`CIP vs. Agriculture...:`Personal, protective and transportation services	0.081^***	-0.239^**
	(0.027)	(0.106)

`CIP vs. Agriculture...:`Physical and life sciences and technologies	0.040	-0.028
	(0.025)	(0.106)

`CIP vs. Agriculture...:`Social and behavioural sciences and law	0.047^**	-0.197^**
	(0.022)	(0.094)

`CIP vs. Agriculture...:`Visual and performing arts, and communications technologies	-0.055^**	-0.410^***
	(0.027)	(0.108)

distance	-0.008^***	-0.026^***
	(0.001)	(0.005)

Constant	10.309^***	-0.806^***
	(0.039)	(0.149)


Observations	18,228	31,433
R²	0.301
Adjusted R²	0.298
Log Likelihood		-18,848.930
Akaike Inf. Crit.		37,845.860
Residual Std. Error	0.444 (df = 18154)
F Statistic	107.239^*** (df = 73; 18154)

Note:	p<0.1; p<0.05; p<0.01

The impact of Skill mis-match on labour market outcomes

Source code:

Richard Martin

2024-02-21

TL;DR

Introduction

Assisting occupations:

Filtering the census data:

Where do fields of study lead?

The model:

Plot of regression results:

Out of sample fit:

The takeaway:

Does skill mis-match influence whether someone works full time full year?

The model:

Plot of regression results:

Assessing the Logit Model:

Appendix: