Introduction

New registrations are a leading indicator for level one trades seats. The goal of this exercise is to produce a two year forecast for level one trades seats based on:

  1. an exponential smoothing forecast of the “at risk” population: new registrations where, at the time of registration, the individual had not yet completed level one.

  2. the historic probability distribution of delay between registration and completion of level one. This probability distribution is based on the Kaplan-Meier estimator of the survival function.

The data utilized is the population of individuals who have registered for the trades since 2016. For each individual we make use of the date of registration, the date that level one was completed (possibly missing) and the functional trades group associated with the registration.

Forecasting those at risk

To begin we filter out all the new registrants who, at the time of registration, had already completed level one. We then aggregate the remaining “at risk” registrations by month of new registration, creating a time series for the arrival of “at risk” registrants. An automated exponential smoothing algorithm chooses the best (in terms of AIC) model for each of the functional trades groups. These models are used to create a 2 year forecast for the arrival of new “at risk” registrants.

A number of the series appear to be white noise, resulting in a flat forecast at (roughly) the most recent observed values. The shaded blue areas indicate 80% and 95% forecast confidence intervals. If errors are additive then the confidence intervals are constant, whereas multiplicative errors result in a cone shaped confidence interval. Three of the trades groups feature seasonality in the arrival of “at risk” registrants.

Survival Curves

Once we have an idea of the arrival rate of “at risk” individuals, the next thing we need to figure out is how long of a delay is typical between the date of registration and completion of level one. For this task we make use of survival analysis, where the terminology reflects its typical use: “survival” indicates the individual has yet to complete level one, and “death” indicates completion of level one. Using the historical data we derive survival curves using the Kaplan-Meier estimator. With the probability of survival (plotted below) and the hazard rate (the instantaneous probability of death conditional on survival thus far) we can derive the joint probability distribution of death, which can then be applied to the “at risk” population.

First thing to note is that it appears that only about 50 percent of those at risk of completing level one end up doing so, and if people are going to complete level one, it is typically within 2 years of registration. The survival curves for most of the functional trades groups look similar, with the exception of “Aircraft and Rail trades” and “Industrial and Heavy equipment”. For “Aircraft and Rail Trades”, only 3 out of 547 new registrants have completed a level one seat. For “Industrial and Heavy equipment” completion of level one is pretty much instantaneous following registration.

Toy Example:

Here we look at a simple example to see how to derive the joint probability distribution for completion dates, and how to apply these probabilities to the number of “at risk” individuals to derive level one completion dates.

Suppose that

  1. time is measured in years (rather than months) and
  2. in 2020 there were 100 new registrants at risk of completing level 1, whereas in 2021 there were 200.
  3. the hazard rate is .3 for the first two years post registration, and 0 thereafter.
Year of Registration: 2020
2021 2022 2023
at risk cohort 100 100 100
hazard rate 0.3 0.3 0.0
probability of survival 1.00 0.70 0.49
joint probability 0.30 0.21 0.00
level one completions 30 21 0
Year of Registration: 2021
2022 2023 2024
at risk cohort 200 200 200
hazard rate 0.3 0.3 0.0
probability of survival 1.00 0.70 0.49
joint probability 0.30 0.21 0.00
level one completions 60 42 0

So in this example \(51\%\) of new registrants complete level one: \(30\%\) complete in the first year post registration, with the remaining \(21\%\) completing in the second year. In this simple two-cohort example, the level one seats completions would be \(30\) in 2021 (only the 2020 cohort), \(21+60=81\) in 2022 (both cohorts), and \(42\) in 2023 (only the 2021 cohort).

The seats forecasts:

Applying the above methodology we derive the following level one seats forecasts. Notable series include:

  1. “Structural…”, “Metal”, “Electrical” and “Automotive”, which all have strong growth in the next two years, despite the fact that their new registration forecasts are flat: what is driving this growth in level one seats is the recent spike in new registrations for these functional trades groups.
  2. “Aircraft and Rail trades” where the seats forecast is less than one twentieth of a person per month.
  3. “Industrial…” where the flat forecast for new registrations combines with the steep survival curve to yield a flat forecast for level one seats.

Back testing:

We can compare the model based expected seats with the observed seat completions to get an idea of how well the model predicts historic seat completions. We begin by aggregating the data by year. Not surprisingly, prediction is more accurate for the more popular trades: i.e. the proportion of heads from a million coin flips can be more accurately predicted than from a single coin flip. Note that forecast performance will likely be worse than the backcast performance: the backcast of level one seats is based on observed new registrations, whereas the forecast is based (partially) on a forecast of new registrations; a further source of error.

The results: