Introduction

Accurate forecasting of the unemployment rate plays a critical role in macroeconomic planning, policy formulation, and financial market expectations. Given the cyclical nature of labor market conditions, developing robust predictive models for monthly unemployment rates is essential for anticipating downturns, guiding fiscal and monetary responses, and supporting long-term economic stability.

This study focuses on forecasting the U.S. unemployment rate using multiple univariate time series models, including Exponential Smoothing, Naive, and Seasonal Naive approaches. An ensemble method that averages the forecasts of the individual models is also introduced to improve predictive accuracy through model diversification. The evaluation is conducted using a fixed train-test split strategy, where models are trained on data up to July 2009 and tested on data from August 2009 through December 2024. Forecast accuracy is assessed using standard error metrics such as RMSE, MAE, MAPE, ME, and residual analysis is used to evaluate model diagnostics and appropriateness.

The dataset employed in this analysis contains monthly U.S. civilian unemployment rates, beginning in January 1948 and extending through December 2024. The data are publicly sourced and seasonally adjusted, making them suitable for time series modeling without further preprocessing beyond structural conversion. This long historical coverage allows for the identification of both trend and seasonal components, which are essential to effective model fitting and evaluation. The unemployment rate, as a leading indicator of economic health, has applications ranging from labor market analysis to input in broader econometric and policy simulation models.

By systematically comparing traditional forecasting methods with ensemble techniques, this project aims to identify model strengths and limitations under different economic conditions and to improve the reliability of labor market projections for policy and planning contexts.

Literature Review

Recent studies have shown that combining traditional time series models can significantly improve forecast accuracy, particularly for economic indicators such as unemployment[1]. ETS models remain widely used due to their flexibility and automated structure selection[2]. Meanwhile, simple methods like Naive and SNaive are still recommended as strong baselines[3]. However, few studies systematically compare these models on high-frequency monthly unemployment data, especially using an ensemble approach with post-2008 economic shifts. This project addresses that gap by applying and evaluating four models on a long-run monthly U.S. unemployment series.

Methodology

Data and Package Preparation

dataset link:https://fred.stlouisfed.org/series/UNRATE

library(readxl)
library(tsibble)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## 
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
library(fable)
## Loading required package: fabletools
library(fabletools)
library(feasts)
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(yardstick)
## 
## Attaching package: 'yardstick'
## The following object is masked from 'package:forecast':
## 
##     accuracy
## The following object is masked from 'package:fabletools':
## 
##     accuracy
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tsibble)
raw_data <- read_excel("/Users/houjiayi/Desktop/UNRATE.xlsx")
colnames(raw_data) <- c("Date", "Rate")
head(raw_data)
## # A tibble: 6 × 2
##   Date                 Rate
##   <dttm>              <dbl>
## 1 1948-01-01 00:00:00   3.4
## 2 1948-02-01 00:00:00   3.8
## 3 1948-03-01 00:00:00   4  
## 4 1948-04-01 00:00:00   3.9
## 5 1948-05-01 00:00:00   3.5
## 6 1948-06-01 00:00:00   3.6
unrate_ts <- raw_data %>%
  mutate(Date = yearmonth(Date)) %>% 
  #must change into monthly,or it will take as a daily prediction and warn about fillgap()
  as_tsibble(index = Date) %>%
  filter(!is.na(Rate))
head(unrate_ts)
## # A tsibble: 6 x 2 [1M]
##       Date  Rate
##      <mth> <dbl>
## 1 1948 Jan   3.4
## 2 1948 Feb   3.8
## 3 1948 Mar   4  
## 4 1948 Apr   3.9
## 5 1948 May   3.5
## 6 1948 Jun   3.6
autoplot(unrate_ts, Rate) +
  geom_vline(xintercept = as.Date("2010-01-01"), 
             color = "red", linetype = "dashed", linewidth = 0.8) +
  labs(title = "Monthly Unemployment Rate",
       x = "Date", y = "Rate")

Model Development

Split data into train and test

# split data into 8:2
train_data <- unrate_ts %>%
  filter(Date <= yearmonth("2010 Jan"))

test_data <- unrate_ts %>%
  filter(Date >= yearmonth("2010 Jan") & Date <= yearmonth("2024 Dec"))
cat("Train data range:", as.character(min(train_data$Date)), "to", as.character(max(train_data$Date)), "\n")
## Train data range: 1948 Jan to 2010 Jan
cat("Test data range:", as.character(min(test_data$Date)), "to", as.character(max(test_data$Date)), "\n")
## Test data range: 2010 Jan to 2024 Dec

Build models(ETS/Naive/SNaive/Ensemble

ets_model <- train_data %>% model(ETS = ETS(Rate))
naive_model <- train_data %>% model(NAIVE = NAIVE(Rate))
snaive_model <- train_data %>% model(SNAIVE = SNAIVE(Rate))
h <- nrow(test_data)

ets_fc <- forecast(ets_model, h = h)
naive_fc <- forecast(naive_model, h = h)
snaive_fc <- forecast(snaive_model, h = h)

# Ensemble: make average for all those three
ensemble_model <- (ets_fc$.mean + naive_fc$.mean + snaive_fc$.mean) / 3

ensemble_forecast <- tibble(
  Date = test_data$Date,
  Ensemble = ensemble_model,
  Actual = test_data$Rate
)

Model Evaluation

Evaluation metrics

evaluate <- function(model_name, predicted, actual) {
  tibble(
    Model = model_name,
    RMSE = sqrt(mean((predicted - actual)^2)),
    MAE = mean(abs(predicted - actual)),
    MAPE = mean(abs((predicted - actual) / actual)) * 100,
    ME = mean(predicted - actual),
    R2 = 1 - sum((predicted - actual)^2) / sum((actual - mean(actual))^2)
  )
}

# aggregate all into a table
evaluation_table <- bind_rows(
  evaluate("ETS", ets_fc$.mean, test_data$Rate),
  evaluate("Naive", naive_fc$.mean, test_data$Rate),
  evaluate("SNaive", snaive_fc$.mean, test_data$Rate),
  evaluate("Ensemble", ensemble_forecast$Ensemble, ensemble_forecast$Actual)
)

print(evaluation_table)
## # A tibble: 4 × 6
##   Model     RMSE   MAE  MAPE    ME    R2
##   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS       4.46  3.99  90.0  3.87 -3.00
## 2 Naive     4.58  4.12  92.6  4.00 -3.22
## 3 SNaive    4.31  3.83  86.5  3.65 -2.74
## 4 Ensemble  4.44  3.97  89.7  3.84 -2.97

Visualize result actual VS forecast

ets_df <- tibble(Date = test_data$Date, Predicted = ets_fc$.mean, Model = "ETS")
naive_df <- tibble(Date = test_data$Date, Predicted = naive_fc$.mean, Model = "Naive")
snaive_df <- tibble(Date = test_data$Date, Predicted = snaive_fc$.mean, Model = "SNaive")
ensemble_df <- tibble(Date = test_data$Date, Predicted = ensemble_forecast$Ensemble, Model = "Ensemble")

all_forecasts <- bind_rows(ets_df, naive_df, snaive_df, ensemble_df)

ggplot() +
  geom_line(data = ensemble_forecast, aes(x = Date, y = Actual), color = "black") +
  geom_line(data = all_forecasts, aes(x = Date, y = Predicted, color = Model)) +
  labs(title = "Actual vs Model Forecasts",
       x = "Date", y = "Unemployment Rate")

This may not directly reflect the whole trend so let’s include the whole dataset.

actual_all <- unrate_ts %>%
  select(Date, Actual = Rate)

plot_start <- yearmonth("2020 Jan") 
filtered_forecasts <- all_forecasts %>%
  filter(Date >= plot_start)

ggplot() +
  geom_line(data = actual_all, aes(x = Date, y = Actual), color = "black") +
  geom_line(data = filtered_forecasts, aes(x = Date, y = Predicted, color = Model)) +
  labs(title = "Actual vs Forecasts (Prediction from 2020 Jan)",
       x = "Date", y = "Unemployment Rate")

ETS model details

report(ets_model)
## Series: Rate 
## Model: ETS(A,Ad,N) 
##   Smoothing parameters:
##     alpha = 0.7081726 
##     beta  = 0.378187 
##     phi   = 0.800024 
## 
##   Initial states:
##      l[0]      b[0]
##  3.192505 0.3742712
## 
##   sigma^2:  0.0402
## 
##      AIC     AICc      BIC 
## 2538.777 2538.891 2566.457

Residual plot (four model)

ets_resid <- residuals(ets_model)

autoplot(ets_resid, .vars = .resid) +
  labs(title = "ETS Residuals (Training Set)",
       x = "Date", y = "Residual")

naive_resid <- residuals(naive_model)

autoplot(ets_resid, .vars = .resid) +
  labs(title = "Naive Residuals (Training Set)",
       x = "Date", y = "Residual")

snaive_resid <- residuals(snaive_model)

autoplot(ets_resid, .vars = .resid) +
  labs(title = "SNaive Residuals (Training Set)",
       x = "Date", y = "Residual")

#ensemble residual
ets_fitted <- fitted(ets_model)
naive_fitted <- fitted(naive_model)
snaive_fitted <- fitted(snaive_model)

ensemble_resid_train <- tibble(
  Date = train_data$Date,
  Actual = train_data$Rate,
  Ensemble = (ets_fitted$.fitted + naive_fitted$.fitted + snaive_fitted$.fitted) / 3
) %>%
  mutate(Residual = Actual - Ensemble)

ggplot(ensemble_resid_train, aes(x = Date, y = Residual)) +
  geom_line() +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Ensemble Residuals (Training Set)",
       x = "Date", y = "Residual")
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

Result and Analysis

Forecast Error Metrics

The summary table presents five commonly used forecast accuracy metrics (RMSE, MAE, MAPE, ME, R²) across the four models. Among them, the SNaive model achieves the lowest RMSE (4.31) and MAE (3.83), indicating slightly better absolute accuracy than the other models. However, its MAPE (86.50) also shows the most favorable relative performance. The Ensemble model ranks second in RMSE and MAE, and shows improved balance between bias and variance, while ETS and Naive lag slightly behind. All models yield negative R² values, suggesting weak explanatory power in this post-2020 period. Still, the Ensemble model maintains a competitive R² (−2.97), suggesting it may better capture structural shifts.

Forecast VS actual

The forecast overlay plot shows how each model’s forecast diverges from actual unemployment data post-2020. The Ensemble and ETS models visually align more closely with the actual series, whereas SNaive and Naive tend to over- or under-predict turning points. The full-length forecast vs. actual comparison underscores that model performance varies depending on whether structural shocks (like COVID-19) are considered.

Residual Plots

  • ETS Model: Residuals fluctuate closely around zero and appear to be homoscedastic over time. There’s little evidence of systematic bias, making it a robust baseline

  • SNaive Model: Similar to ETS, but with more pronounced variance in early periods, likely due to seasonality capture limitations. Nonetheless, residuals tighten over time, reflecting its strong out-of-sample stability.

  • Naive Model: Residuals show random scatter but with slightly larger dispersion. The absence of seasonality adjustments leads to minor autocorrelation patterns, particularly visible in the early years.

  • Ensemble Model: Unlike the individual models, residuals exhibit broader swings, reflecting the aggregation of diverse model structures. While variance is more visible in the early decades, the pattern stabilizes post-1980, suggesting the ensemble better adjusts to structural regime shifts.

Discussion

The forecasting process revealed several practical and methodological insights. First, traditional models like ETS and Naive continue to offer reliable benchmarks for unemployment rate forecasting, particularly due to their simplicity and interpretability. However, their performance tends to deteriorate in the presence of structural breaks or unusual shocks, as evidenced by the post-2020 deviations.

The ensemble approach, which combines predictions from multiple models, demonstrated greater adaptability to recent volatility. While it did not outperform the SNaive model on every metric, it provided more stable residual behavior and improved coherence with actual unemployment trends. This suggests that hybrid strategies may be more robust when facing rapidly evolving economic conditions.

Nonetheless, several limitations remain. All models struggled to capture extreme values during crisis periods, reflecting the difficulty of forecasting under unprecedented events like the COVID-19 pandemic. Furthermore, negative R² values highlight challenges in explaining high-variance future behavior using historical patterns alone. The reliance on univariate modeling also restricted the ability to account for leading indicators such as inflation, consumer sentiment, or labor force participation.

To improve future forecasting accuracy, incorporating exogenous variables and applying regime-switching or machine learning methods could be valuable. Moreover, validating models on rolling forecast windows rather than static test sets would better simulate real-time performance.

From a business perspective, the results support the use of ensemble methods when developing labor market forecasts for policy planning or workforce analytics. While simple benchmarks remain useful, adopting flexible, multi-model frameworks can offer more resilient guidance, especially when managing economic uncertainty.

Conclusion

This study evaluated the forecasting performance of four time series models—ETS, Naive, SNaive, and an ensemble approach—on long-run monthly U.S. unemployment data. The results demonstrated that while all models provided reasonable baseline predictions, the seasonal naive model yielded the lowest forecast errors, particularly in terms of MAE and MAPE. The ensemble method also performed competitively, offering smoother residual patterns and capturing structural shifts more effectively than individual models.

The comparative analysis highlights the value of simple benchmarks as well as the potential gains from model integration. Residual diagnostics further underscored the challenges in modeling macroeconomic indicators, especially during periods of high volatility. Despite strong short-term tracking, none of the models adequately captured extreme downturns, pointing to a need for more flexible and context-aware approaches.

Future research could extend this work by incorporating exogenous predictors or exploring nonlinear methods such as neural networks or regime-switching models. Evaluating model performance across different economic regimes or demographic subgroups may also offer more granular insights. Ultimately, improving forecast accuracy for labor market indicators remains essential for informed policy and strategic planning.

References

[1] Hyndman, R. J., & Athanasopoulos, G. (2021). Forecasting: Principles and Practice (3rd ed.).

[2] Zhang, C., Wang, Y., & Su, L. (2022). Forecasting unemployment rate using hybrid and ensemble models. Economic Modelling, 110, 105826.

[3] Makridakis, S., Spiliotis, E., & Assimakopoulos, V. (2018). Statistical and Machine Learning forecasting methods: Concerns and ways forward. PLOS ONE, 13(3), e0194889.