Data Information

Data Manipulation

# Select target columns 
young_unemp <- df[,c(1,8)]

# Rename columns 
new_names <- c("Date", "Rate")

colnames(young_unemp) <- new_names
# Choose time period (01/01/2000 - 11/01/2021)

unemp <- young_unemp[ c(625:887) , ]

# Turn "Date" to date format

unemp$Date <- as.Date(unemp$Date, format = "%m/%d/%Y")
# Split training & test sets

unemp_train <- young_unemp[c(625:881),]

# Turn "Date" to date format

unemp_train$Date <- as.Date(unemp_train $Date, format = "%m/%d/%Y")

# Create a time series object
unemp_TS <- ts(unemp_train$Rate, start = c(2000, 1), frequency = 12)

Data Visualisation

Models

ETS Models

# Build ETS model 

ets_model <- ets(unemp_TS)

ARIMA

# Compute seasonally differenced time series

seasonal_diff <- diff(unemp$Rate, lag = 12) 

# Create a data frame with time index and seasonally differenced data

seasonal_diff_df <- data.frame(Date = time(seasonal_diff), Rate = seasonal_diff)

# Plot the seasonally differenced time series using ggplot2

ggplot(seasonal_diff_df, aes(x = Date, y = Rate)) +
  geom_line() +
  labs(title = "Seasonally Differenced", y = "") +
  theme_minimal()

Clearly, the time series is still not stationary. Thus, further differencing is needed. The graph below shows what the time series looks like after second order difference was performed.

Based on ACF plot, we should choose ARIMA(0,1,1)(0,1,1) which indicates a first difference, a seasonal difference, and non-seasonal MA(1) and seasonal MA(1) component.

Alternatively, the PACF graph suggests ARIMA(0,1,0)(0,1,0), indicating a first difference, a seasonal difference, and non-seasonal MA(0) and seasonal MA(0) component.

# Seasonal ARIMA model

# 1st ARIMA (R auto-selection)

auto_model <- auto.arima(unemployment_TS, stepwise = FALSE, approx = FALSE)

# 2nd ARIMA (based on ACF)

acf_model  <- Arima(unemployment_TS, order = c(0,1,1), seasonal = c(0,1,1))

# 3rd ARIMA (based on PACF)

pacf_model <- Arima(unemployment_TS, order = c(0,1,0), seasonal = c(0,1,0))

Model Selection

ETS Model

## ETS(A,N,A) 
## 
## Call:
##  ets(y = unemp_TS) 
## 
##   Smoothing parameters:
##     alpha = 0.9591 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 4.0069 
##     s = -0.165 -0.2986 -0.33 -0.3279 -0.0118 0.0927
##            -0.0096 -0.1421 0.0759 0.2077 0.3628 0.546
## 
##   sigma:  0.7406
## 
##      AIC     AICc      BIC 
## 1287.392 1289.383 1340.628 
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.006717619 0.7201824 0.3121383 -0.2442364 4.933794 0.2741982
##                    ACF1
## Training set 0.01970026

R Selected ARIMA Model

## Series: unemployment_TS 
## ARIMA(0,0,0)(1,0,1)[12] with zero mean 
## 
## Coefficients:
##          sar1     sma1
##       -0.3297  -0.5974
## s.e.   0.1445   0.1383
## 
## sigma^2 = 0.5765:  log likelihood = -290.38
## AIC=586.77   AICc=586.86   BIC=597.33
## 
## Training set error measures:
##                       ME      RMSE       MAE MPE MAPE      MASE        ACF1
## Training set -0.01323582 0.7562457 0.3301621 NaN  Inf 0.4611419 -0.01654595

ACF ARIMA Model

## Series: unemployment_TS 
## ARIMA(0,1,1)(0,1,1)[12] 
## 
## Coefficients:
##           ma1     sma1
##       -1.0000  -1.0000
## s.e.   0.0182   0.0382
## 
## sigma^2 = 1.159:  log likelihood = -375.5
## AIC=756.99   AICc=757.1   BIC=767.4
## 
## Training set error measures:
##                        ME     RMSE       MAE MPE MAPE      MASE       ACF1
## Training set -0.003463716 1.043575 0.4298645 NaN  Inf 0.6003975 -0.0131188

PACF ARIMA Model

## Series: unemployment_TS 
## ARIMA(0,1,0)(0,1,0)[12] 
## 
## sigma^2 = 5.928:  log likelihood = -547.18
## AIC=1096.37   AICc=1096.38   BIC=1099.83
## 
## Training set error measures:
##                        ME    RMSE      MAE MPE MAPE    MASE       ACF1
## Training set -0.001592079 2.37062 1.071222 NaN  Inf 1.49619 -0.4568723

In total, four models were created to forecast June 2021 to November 2021, including one ETS model and three ARIMA models. After comparing the “MAE” and “AICc” values, I concluded that the ETS model is the best among the four models, as it has the lowest values in both metrics. However, as we will see later in the forecast section, the overall performance of the ETS and the selected ARIMA models are extremely similar.

Forecasts

Summary

As mentioned earlier, both models’ performances are extremely similar according to comparison of the key metrics . Additionally, both of these two models did a good job in capturing the seasonality in the data and showing it in the forecasts.

It’s crucial to note that the forecast lines from each model did not completely capture the actual data from June to November 2021. However, the actual data of the target period is well within the 95% confidence interval. Therefore, both models can be considered good models. Due to having better evaluation metrics, in my opinion, the ETS model should be chosen as the best model as it will provide more accuracy than the other models.

.