Forecasting Midterm (Final)

1. Introduction

In recent years, the U.S. labor market has largely flourished according to broad indicators like overall unemployment rate and GDP growth. However, these aggregate measures can suppress the issues faced by particular demographic groups, such as young college graduates entering the workforce. Despite earning a higher education, young adults can experience elevated unemployment rates compared to the general population if their skills fail to meet the demands of employers. Exacerbating matters is that employment for college graduates is especially prone to dips during economic downturns or periods of labor market transition since they have less experience. To better understand this challenge, this report focuses on the unemployment rate for college graduates aged 25 to 34. The primary objectives are to examine recent trends in unemployment for this group, apply time series forecasting methods, to project future unemployment levels, and interpret what these forecasts suggest about the future of young professionals. Forecasting this labor market outcome is important because it informs policymakers, educators, and job seekers about the likely trajectory of employment opportunities and could help support successful transitions from college to career ventures. Also, studies suggest that college educated adults disproportionately influence economic growth and consumption, which could be diminished depending on the unemployment rate of this group. The unemployment rate for college graduates aged 25-34 is an essential economic indicator that reflects the health of the labor market for entrants with higher education. This rate measures the percentage of young individuals who have capped their education at a bachelor’s degree and are actively seeking, but unable to gain employment. Analyzing this measure is important because it reveals the extent that the economy is capable of utilizing skilled labor, potentially highlighting differences between educational commitment and labor market demand. The dataset used in this project spans from 2000 to 2025, but the report will focus on the most recent 10 years to ensure the forecasts and analysis are as relevant as possible. While unemployment is influenced by macroeconomic trends like business cycles, structural shifts, and policy changes, forecasting this indicator can provide valuable insights for policymakers, educational institutions, and job seekers. A rise in unemployment among young graduates can signal economic distress or inefficiencies in the labor market, while a decline often reflects strong job creation and economic stability. Regardless, the employment rate of these skilled individuals has dramatic implications for future economic activity, making it worthy of being forecasted.

2. Literature Review

The most relevant research pertaining to unemployment for young college graduates comes from Oxford Economics, which offers an analysis of a structural shift in white collar job hiring due to the influence, primarily due to the influence of artificial intelligence [1]. At the same time, labor participation rates have not declined for this demographic which potentially signals a misalignment of skills. The NACE Job Outlook 2025 Spring Update uses a survey-based forecasting method, relying on employer-reported hiring intentions to project future labor market conditions for recent graduates. While this method offers important insights into employer expectations, it is still prone to biases in respondent outlooks and does not utilize time series modeling or historical employment trends, which may limit its predictive accuracy when compared to data-driven forecasting techniques [2]. Additionally, Brookings has recently published research pertaining to “A New Approach to Forecasting Unemployment”, in which the actual rate of unemployment intersects with “worker flows”. Worker flows are the rate at which “people are moving into and out of unemployment”, and this convergence between it and the unemployment rate may yield more accurate forecasts than traditional models [3]. While the majority of existing research on forecasting unemployment is concerned with the aggregate unemployment rate, only a select few focus on the demographic of young college graduates. Even the existing research on the topic is largely subject to survey-based forecasting which does not fully utilize historical data that is less susceptible to bias and personal judgment. This project attempts to fill these gaps in existing research by solely focusing on the unemployment rate of college graduates aged 25–34, and applying a variety of time series models to produce statistically supported forecasts.

3. Methodology

library(fredr)
library(tidyverse)
library(fpp3)

Data Preparation

Select monthly time series data and clean the dataset.

#Loading FRED key and unemployment rate for young college
#graduates 
fredr_set_key("e94769feaf925275ff1738ada4ca88b5")

ur <- fredr(series_id = "CGBD2534")

glimpse(ur)

## Rows: 306
## Columns: 5
## $ date           <date> 2000-01-01, 2000-02-01, 2000-03-01, 2000-04-01, 2000-0…
## $ series_id      <chr> "CGBD2534", "CGBD2534", "CGBD2534", "CGBD2534", "CGBD25…
## $ value          <dbl> 2.1, 1.4, 1.6, 1.5, 1.9, 2.0, 2.1, 2.2, 2.0, 1.5, 1.5, …
## $ realtime_start <date> 2025-07-24, 2025-07-24, 2025-07-24, 2025-07-24, 2025-0…
## $ realtime_end   <date> 2025-07-24, 2025-07-24, 2025-07-24, 2025-07-24, 2025-0…

#Cleaning data by filtering for last 10 years, converting to
# time series object, and checking for missing values. 

ur_clean <- ur %>%
  filter(date >= as.Date("2015-06-01"))

ur_tsib <- ur_clean %>%
  mutate(date = yearmonth(date)) %>%
  as_tsibble(index = date)

colSums(is.na(ur_tsib))

##           date      series_id          value realtime_start   realtime_end 
##              0              0              0              0              0

Model Development

-Split data into training (80%) and test sets (20%).

#Dividing the data into training and testing, with the split
#being determined by multiplying number of rows by .8. 

n <- nrow(ur_tsib)           
n_train <- round(0.8 * n) 

train_data <- ur_tsib %>%
  slice(1:n_train)

test_data <- ur_tsib %>%
  slice((n_train + 1): n)

Build the following models:
- ETS Model: Exponential Smoothing State space model with automated parameter selection.

#Using the ETS() function to fit the data before applying a 
#two year forecast to be plotted.

ets_fit <- train_data %>%
  model(ETS(value))

ets_fc <- ets_fit %>%
  forecast(h = 24)

ets_fc %>%
  autoplot(train_data) +
  autolayer(test_data, value, col = "red") +
  labs(y= "Percentage", title="Unemployment Rate of Recent College Graduates (ETS)") +
  guides(colour = "none")

NAIVE Model

#Using the NAIVE() function to fit the data before applying #a two year forecast to be plotted.

naive_fit <- train_data %>%
  model(Naive = NAIVE(value))

naive_fc <- naive_fit %>%
  forecast(h = 24)

naive_fc %>%
  autoplot(train_data) +
  autolayer(test_data, value, col = "orange") +
  labs(y= "Percentage", title="Unemployment Rate of Recent College Graduates (Naive)") +
  guides(colour = "none")

SNAIVE Model: Seasonal NAIVE model.

#Using the SNAIVE() function to fit the data before applying #a two year forecast to be plotted.

snaive_fit <- train_data %>%
  model(SNaive = SNAIVE(value))

snaive_fc <- snaive_fit %>%
  forecast(h = 24)

snaive_fc %>%
  autoplot(train_data) +
  autolayer(test_data, value, col = "green") +
  labs(y= "MCSI Value", title="Unemployment Rate of Recent College Graduates (Snaive)") +
  guides(colour = "none")

Ensemble Model: Combine forecasts of the three models using averaging.

#Storing previous fitted models.
models_fit <- train_data %>%
  model(
    ETS = ETS(value),
    Naive = NAIVE(value),
    SNaive = SNAIVE(value))

#Forecasting previous fitted models for 2 years (24 months).
all_fc <- models_fit %>%
  forecast(h = 24)

#Creating an average of each model forecast by date and 
#collapsing the values into one summary via the summarise
#function.
ensemble_fc <- all_fc %>%
  as_tibble() %>%
  group_by(date) %>%
  summarise(.mean = mean(.mean),.model = "Ensemble")

#Adding ensemble forecast and picking needed variables via
#the select function.
all_fc_combined <- bind_rows(
  all_fc %>% 
  as_tibble() %>% 
  select(date, .mean, .model), ensemble_fc)

# Plotting separate lines that represent the training data
# and test data which are distinguished by their "linetype". # Ensemble model visualized by utilizing the other model's  # mean. Scale_color_manual function used to keep the colors # of the previous forecasts constant.   
ggplot() +
  geom_line(data = train_data, aes(x = date, y = value), color = "black") +
  geom_line(data = test_data, aes(x = date, y = value), color = "black", linetype = "longdash") +
  geom_line(data = all_fc_combined, aes(x = date, y = .mean, color = .model)) +
  scale_color_manual(values = c(
    ETS = "red",
    Naive = "orange",
    SNaive = "green",
    Ensemble = "blue")) +
  labs(title = "Forecast Comparison: ETS, Naive, SNaive and Ensemble Models",
    x = "Date",
    y = "Percentage",
    color = "Model")

Model Evaluation

# Define evaluation function
evaluate <- function(model_name, predicted, actual) {
  tibble(
    Model = model_name,
    RMSE = sqrt(mean((predicted - actual)^2)),
    MAE = mean(abs(predicted - actual)),
    MAPE = mean(abs((predicted - actual) / actual)) * 100)
  }

# Extract actuals
actuals <- test_data$value

# Evaluate models
evaluation_table <- bind_rows(
  evaluate("ETS",  filter(all_fc, .model == "ETS")$.mean, actuals),
  evaluate("Naive", filter(all_fc, .model == "Naive")$.mean, actuals),
  evaluate("SNaive", filter(all_fc, .model == "SNaive")$.mean, actuals),
  evaluate("Ensemble", ensemble_fc$.mean, actuals)
)

print(evaluation_table)

## # A tibble: 4 × 4
##   Model     RMSE   MAE  MAPE
##   <chr>    <dbl> <dbl> <dbl>
## 1 ETS      1.06  0.960  31.7
## 2 Naive    0.516 0.408  13.1
## 3 SNaive   0.679 0.567  17.6
## 4 Ensemble 0.674 0.574  18.2

# Create residuals for each model
residuals_tbl <- all_fc_combined %>%
  left_join(test_data, by = "date") %>%
  mutate(residual = value - .mean)

# Residual plot
ggplot(residuals_tbl, aes(x = date, y = residual, color = .model)) +
  geom_line() +
  labs(title = "Residuals Plot for each Model", x = "Date", y = "Residual", color = "Model") +
  scale_color_manual(values = c("ETS" = "red", "Naive" = "orange", "SNaive" = "green", "Ensemble" = "blue")) +
  theme_minimal()

Results and Analysis

Based on the evaluation metrics, the naive model outperformed all other models with the lowest RMSE (0.52), MAE (0.41), and MAPE (13.10). The seasonal naive and ensemble models showed similarly moderate performances, with nearly identical RMSE and MAE values (0.67 and 0.56–0.57, respectively). Their MAPE values (17.61 and 18.17) indicate slightly higher error when compared to the actual values compared to the naive model. The ETS model performed the worst according to these metrics, with significantly higher RMSE (1.06), MAE (0.96), and MAPE (31.68). This ranking of models is supported by the residual plot for each model. The residual for the ETS model fluctuates to extremes that the other models don’t reach. The residuals for the snaive and and ensemble models remain close to one another throughout the plot and has a shorter range relative to the ETS model. The naive model maintains a constant level of low residual that reinforces itself as the model with the highest performance in this scenario.

The naive method was the best-performing model due to the structure of the dataset. The naive method sets all forecasts to the value of the most recent observation. Since the unemployment rate of recent graduates has been remarkably consistent, it make sense that the simplicity of the naive model would be to its benefit in this context. The seasonal naive method operates like the naive method, but is set to equal the most recent observed values from the same season. In this scenario, the data does not exhibit dramatic of consistent seasonality, which resulted in it being less accurate than its counterpart. The ETS model’s poor performance relative to the naive model is surprising, given that the ETS model is typically considered more flexible. The ETS model likely overfit the data in a way that the naive model did not, meaning that it captured minor fluctuations in the training data that did not lend itself to future observations. The ETS model seemingly attempted to integrate trend or seasonality that was not prevalent in the observed data. The ensemble method is a combination of these forecasts, so it was less accurate due to the influence of the snaive model, and to a greater extent, the ETS model.

5. Discussion

This report has helped garner certain valuable insights about the forecasting process. The most surprising insight was the degree in which basic models, such as the naive model, can outperform models with enhanced complexity. The naive model outperformed all models across the board, and is indicative of the role that a data’s characteristics plays in the discussion of model accuracy. In this case, the stability of the data made it favorable for the naive model. In this way, this process helped uncover characteristics of the data such as low volatility, weak seasonality, and short-term persistence. These traits are useful for selecting models with the highest accuracy, but also for understanding the dynamics of the underlying data being forecasted. Another interesting insight is that the ensemble model’s accuracy can be derailed by a single underperforming model, at least assuming a small number of models are implemented. This should not be surprising given that the ensemble method is simply a sum of model averages, but the extent that the ensemble method can be swayed by a underperforming model is significant.
While a simple model like the naive model performed well in this scenario, it is not capable of adjusting to structural changes in the data. If a structural change were to dramatically alter the composition of the data, the snaive and naive models would become obsolete in many respects. Including external regressors in a multivariate model could improve accuracy in the event of a structural shift [4]. Another limitation is that the forecast horizon is relatively short at just two years. Extending the horizon and comparing performance across time could provide a more thorough evaluation that is more representative of the underlying features of the data. Another potential improvement lies within the construction of the ensemble forecast. The ensemble forecast was built using a simple average of models, while weighing models based on historical accuracy might lead to better performance. Lastly, the unemployment rate can be deceiving even in this capacity since it does not consider discouraged workers, the duration of unemployment, or the difference between full-time and part-time workers. 
Due to the current stability, relatively low level of unemployment, and the accuracy of the naive model, it is recommended that businesses hiring recent graduates maintain their current levels of hiring and overall staffing strategies. Given this landscape of low unemployment, businesses would be prudent to invest more in employee retention and investment as opposed to outreach for new employees. Furthermore, college graduates disproportionately contribute to the country’s overall consumption, so this suggested steady level of unemployment implies that consumption won’t shift dramatically on this account. Also, although unemployment is currently stable, it’s worth considering the shock present in the model as a result of the Covid-19 Pandemic. During this time, the unemployment rate for recent graduates increased significantly and abruptly. For this reason, businesses should have an established contingency plan in regards to their production and employment in case a future structural break were to occur.

6. Conclusion

This report was created to gain a better understanding of unemployment for recent graduates in a way that is impossible when considering just the overall unemployment rate. To understand this variable and its future implications, four models were fitted, forecasted, and visualized. The naive model performed the best out of all the models when considering its low RMSE, MAE, and MAPE values, as well as its low level of residuals. This fact suggests that recent unemployment values are predictive of future values in the short-run for this dataset. The other models performed significantly worse which indicates a lack of noteworthy trend or seasonality in this dataset. Although these insights are significant, they are limited in their effectiveness due to a lack of external regressors, a limited forecasting period of two years, and a preferred model type that would likely falter in the face of structural change. Future research could benefit from the inclusion of exogenous variables, further scaling down the subpopulation to include other factors, and applying the models to other periods with structural changes to the labor market.

References [1] Oxford Economics. (2024, March 6). Educated but unemployed: A rising reality for US college grads. https://www.oxfordeconomics.com/resource/educated-but-unemployed-a-rising-reality-for-us-college-grads/ [2] National Association of Colleges and Employers. (2024, April 1). Hiring projections level off for the college class of 2025. https://www.naceweb.org/job-market/trends-and-predictions/hiring-projections-level-off-for-the-college-class-of-2025 [3] Sahm, C. (2021, August 31). An introduction to a new approach to forecasting unemployment. Brookings Institution. https://www.brookings.edu/articles/an-introduction-to-a-new-approach-to-forecasting-unemployment/ [4] Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.).

Forecasting Midterm (Final)

Mick Pomer

2025-07-27