This data contains the monthly unemployment rate from 1948-2021. None of the data is seasonally adjusted. This file also contains information on subsets of the population, including based on age ranges from 16-55 and over, and unemployment rates for men and women. This data is collected by the US Bureau of Labor Statistics.
Project Objective:
# Select target columns
young_unemp <- df[,c(1,8)]
# Rename columns
new_names <- c("Date", "Rate")
colnames(young_unemp) <- new_names
# Choose time period (01/01/2000 - 11/01/2021)
unemp <- young_unemp[ c(625:887) , ]
# Turn "Date" to date format
unemp$Date <- as.Date(unemp$Date, format = "%m/%d/%Y")
# Split training & test sets
unemp_train <- young_unemp[c(625:881),]
# Turn "Date" to date format
unemp_train$Date <- as.Date(unemp_train $Date, format = "%m/%d/%Y")
# Create a time series object
unemp_TS <- ts(unemp_train$Rate, start = c(2000, 1), frequency = 12)
# Build ETS model
ets_model <- ets(unemp_TS)
# Compute seasonally differenced time series
seasonal_diff <- diff(unemp$Rate, lag = 12)
# Create a data frame with time index and seasonally differenced data
seasonal_diff_df <- data.frame(Date = time(seasonal_diff), Rate = seasonal_diff)
# Plot the seasonally differenced time series using ggplot2
ggplot(seasonal_diff_df, aes(x = Date, y = Rate)) +
geom_line() +
labs(title = "Seasonally Differenced", y = "") +
theme_minimal()
Clearly, the time series is still not stationary. Thus, further differencing is needed. The graph below shows what the time series looks like after second order difference was performed.
Based on ACF plot, we should choose ARIMA(0,1,1)(0,1,1) which indicates a first difference, a seasonal difference, and non-seasonal MA(1) and seasonal MA(1) component.
Alternatively, the PACF graph suggests ARIMA(0,1,0)(0,1,0), indicating a first difference, a seasonal difference, and non-seasonal MA(0) and seasonal MA(0) component.
# Seasonal ARIMA model
# 1st ARIMA (R auto-selection)
auto_model <- auto.arima(unemployment_TS, stepwise = FALSE, approx = FALSE)
# 2nd ARIMA (based on ACF)
acf_model <- Arima(unemployment_TS, order = c(0,1,1), seasonal = c(0,1,1))
# 3rd ARIMA (based on PACF)
pacf_model <- Arima(unemployment_TS, order = c(0,1,0), seasonal = c(0,1,0))
ETS Model
## ETS(A,N,A)
##
## Call:
## ets(y = unemp_TS)
##
## Smoothing parameters:
## alpha = 0.9591
## gamma = 1e-04
##
## Initial states:
## l = 4.0069
## s = -0.165 -0.2986 -0.33 -0.3279 -0.0118 0.0927
## -0.0096 -0.1421 0.0759 0.2077 0.3628 0.546
##
## sigma: 0.7406
##
## AIC AICc BIC
## 1287.392 1289.383 1340.628
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.006717619 0.7201824 0.3121383 -0.2442364 4.933794 0.2741982
## ACF1
## Training set 0.01970026
R Selected ARIMA Model
## Series: unemployment_TS
## ARIMA(0,0,0)(1,0,1)[12] with zero mean
##
## Coefficients:
## sar1 sma1
## -0.3297 -0.5974
## s.e. 0.1445 0.1383
##
## sigma^2 = 0.5765: log likelihood = -290.38
## AIC=586.77 AICc=586.86 BIC=597.33
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.01323582 0.7562457 0.3301621 NaN Inf 0.4611419 -0.01654595
ACF ARIMA Model
## Series: unemployment_TS
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -1.0000 -1.0000
## s.e. 0.0182 0.0382
##
## sigma^2 = 1.159: log likelihood = -375.5
## AIC=756.99 AICc=757.1 BIC=767.4
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.003463716 1.043575 0.4298645 NaN Inf 0.6003975 -0.0131188
PACF ARIMA Model
## Series: unemployment_TS
## ARIMA(0,1,0)(0,1,0)[12]
##
## sigma^2 = 5.928: log likelihood = -547.18
## AIC=1096.37 AICc=1096.38 BIC=1099.83
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.001592079 2.37062 1.071222 NaN Inf 1.49619 -0.4568723
In total, four models were created to forecast June 2021 to November 2021, including one ETS model and three ARIMA models. After comparing the “MAE” and “AICc” values, I concluded that the ETS model is the best among the four models, as it has the lowest values in both metrics. However, as we will see later in the forecast section, the overall performance of the ETS and the selected ARIMA models are extremely similar.
As mentioned earlier, both models’ performances are extremely similar according to comparison of the key metrics . Additionally, both of these two models did a good job in capturing the seasonality in the data and showing it in the forecasts.
It’s crucial to note that the forecast lines from each model did not completely capture the actual data from June to November 2021. However, the actual data of the target period is well within the 95% confidence interval. Therefore, both models can be considered good models. Due to having better evaluation metrics, in my opinion, the ETS model should be chosen as the best model as it will provide more accuracy than the other models.
.