New registrations are a leading indicator for level one trades seats. The goal of this exercise is to produce a two year forecast for level one trades seats based on:
an exponential smoothing forecast of the “at risk” population: new registrations where, at the time of registration, the individual had not yet completed level one.
the historic probability distribution of delay between registration and completion of level one. This probability distribution is based on the Kaplan-Meier estimator of the survival function.
The data utilized is the population of individuals who have registered for the trades since 2016. For each individual we make use of the date of registration, the date that level one was completed (possibly missing) and the functional trades group associated with the registration.
To begin we filter out all the new registrants who, at the time of registration, had already completed level one. We then aggregate the remaining “at risk” registrations by month of new registration, creating a time series for the arrival of “at risk” registrants. An automated exponential smoothing algorithm chooses the best (in terms of AIC) model for each of the functional trades groups. These models are used to create a 2 year forecast for the arrival of new “at risk” registrants.
A number of the series appear to be white noise, resulting in a flat forecast at (roughly) the most recent observed values. The shaded blue areas indicate 80% and 95% forecast confidence intervals. If errors are additive then the confidence intervals are constant, whereas multiplicative errors result in a cone shaped confidence interval. Three of the trades groups feature seasonality in the arrival of “at risk” registrants.
Once we have an idea of the arrival rate of “at risk” individuals, the next thing we need to figure out is how long of a delay is typical between the date of registration and completion of level one. For this task we make use of survival analysis, where the terminology reflects its typical use: “survival” indicates the individual has yet to complete level one, and “death” indicates completion of level one. Using the historical data we derive survival curves using the Kaplan-Meier estimator. With the probability of survival (plotted below) and the hazard rate (the instantaneous probability of death conditional on survival thus far) we can derive the joint probability distribution of death, which can then be applied to the “at risk” population.
First thing to note is that it appears that only about 50 percent of those at risk of completing level one end up doing so, and if people are going to complete level one, it is typically within 2 years of registration. The survival curves for most of the functional trades groups look similar, with the exception of “Aircraft and Rail trades” and “Industrial and Heavy equipment”. For “Aircraft and Rail Trades”, only 3 out of 547 new registrants have completed a level one seat. For “Industrial and Heavy equipment” completion of level one is pretty much instantaneous following registration.
Here we look at a simple example to see how to derive the joint probability distribution for completion dates, and how to apply these probabilities to the number of “at risk” individuals to derive level one completion dates.
Suppose that
2021 | 2022 | 2023 | |
---|---|---|---|
at risk cohort | 100 | 100 | 100 |
hazard rate | 0.3 | 0.3 | 0.0 |
probability of survival | 1.00 | 0.70 | 0.49 |
joint probability | 0.30 | 0.21 | 0.00 |
level one completions | 30 | 21 | 0 |
2022 | 2023 | 2024 | |
---|---|---|---|
at risk cohort | 200 | 200 | 200 |
hazard rate | 0.3 | 0.3 | 0.0 |
probability of survival | 1.00 | 0.70 | 0.49 |
joint probability | 0.30 | 0.21 | 0.00 |
level one completions | 60 | 42 | 0 |
So in this example \(51\%\) of new registrants complete level one: \(30\%\) complete in the first year post registration, with the remaining \(21\%\) completing in the second year. In this simple two-cohort example, the level one seats completions would be \(30\) in 2021 (only the 2020 cohort), \(21+60=81\) in 2022 (both cohorts), and \(42\) in 2023 (only the 2021 cohort).
Applying the above methodology we derive the following level one seats forecasts. Notable series include:
We can compare the model based expected seats with the observed seat completions to get an idea of how well the model predicts historic seat completions. We begin by aggregating the data by year. Not surprisingly, prediction is more accurate for the more popular trades: i.e. the proportion of heads from a million coin flips can be more accurately predicted than from a single coin flip. Note that forecast performance will likely be worse than the backcast performance: the backcast of level one seats is based on observed new registrations, whereas the forecast is based (partially) on a forecast of new registrations; a further source of error.