rm(list = ls())
library(fpp3)
library(fredr)
fredr_set_key(Sys.getenv("fred_api_key"))Week 2 Discussion
Part I
I have first loaded the necessary fpp3 and fredr packages into R, as well as my Fred API key.
In the last discussion, I used the Global Economic Policy Uncertainty Index to explore the fpp3 package and specifically the kinds of visuals we can create for time series data. This week, I am building off of that data and am specifically focusing on the Economic Policy Uncertainty Index for the United States. It’s a monthly dataset and it has some seasonality and an upward trend which makes it perfect for discussions like these.
Economic Policy Uncertainty Index (epui)
epui <- fredr(series_id = "USEPUINDXM",
observation_start = as.Date("2010-01-01"),
observation_end = as.Date("2026-01-01"))
epui_ts <- epui |> mutate(month = yearmonth(date)) |>
select(-date) |>
as_tsibble(index = month,
key = series_id)
# Filter the data to be 80% for training and 20% for testingGraphing the Time Series
epui_ts |> autoplot(value) +
labs(title = "Economic Policy Uncertainty Index for United States",
subtitle = "Monthly Data",
x = "Time (Months)",
y = "Index")Splitting the Time Series (80% Train and 20% Test)
train <- epui_ts |> filter_index(~"2022 Oct")
test <- epui_ts |> filter_index("2022 Nov"~.)Model
I specified each of the 4 simple methods that we looked at in class on the training dataset: the mean method, the naive method, the seasonal naive method, and the drift method.
train_fit <- train |>
fabletools::model(
mean = MEAN(value),
naive = NAIVE(value),
s_naive = SNAIVE(value),
drift = RW(value ~ drift())
)Forecasts on the Test Series
I have used the fitted methods to generate forecasts for the number of months in the future that are the same size as the observations in the test series. In this case, there are 39 observations in the test series, so I have generated forecasts for 39 months into the future.
We will then compare these forecasts to the actual observed values for the testing data.
test_fc <- train_fit |> forecast(h = nrow(test))
# Graph
test_fc |> autoplot(train) +
facet_wrap(~.model,
nrow = 2)Point Estimates and Confidence Intervals
The point estimates and confidence intervals for the drift and naive methods are very similar. The point estimates for the drift method are approximately linear and very slowly increasing as forecasts move into the future. Both of the confidence intervals for the drift and naive methods are very wide.
The Mean method has the narrowest confidence interval and we can clearly see that its point estimates are all just an average of the training series values. Also, it’s the only method where the confidence intervals stay constant over time, implying that this is the best of these 4 methods for making long-term forecasts. This is likely the case because the values did not fluctuate much in the training series, implying that the mean method is actually a good barometer of where the values are at any given time.
The seasonal naive method produces very realistic forecasts given that it captures the most recent seasonality, and it has the 2nd most narrow confidence interval.
Accuracy Metrics
metrics <- accuracy(test_fc, test) |> select(.model, ME, RMSE, MAE, MPE, MAPE)
# Renaming the columns and rows for a cleaner table
colnames(metrics)[1] <- "Models"
metrics[1, 1] <- "Drift Method"
metrics[2, 1] <- "Mean Method"
metrics[3, 1] <- "Naive Method"
metrics[4, 1] <- "Seasonal Naive Method"
kableExtra::kable(metrics)| Models | ME | RMSE | MAE | MPE | MAPE |
|---|---|---|---|---|---|
| Drift Method | -12.435616 | 80.30500 | 65.56040 | -26.6418671 | 44.11105 |
| Mean Method | 26.673039 | 84.86822 | 57.62065 | 0.4694083 | 30.41519 |
| Naive Method | -8.767138 | 81.04336 | 65.67328 | -24.3742916 | 43.33441 |
| Seasonal Naive Method | 6.540972 | 82.98794 | 62.24585 | -13.9516795 | 38.17094 |
Mean Error (ME)
As the name suggests, this metric calculates the average error from the forecasts on the testing data.
Positive mean errors indicate under-prediction whereas negative mean errors indicate over-prediction from the models.
The seasonal naive method has the lowest mean error in absolute value at around 6.5 indicating that the seasonal naive method has the smallest errors on average for predicting the test series.
Root Mean Squared Error (RMSE)
The RMSE squares the residuals which heavily penalizes outliers and then takes the square root at the end
The RMSE values are very similar across the 4 models, but the drift model has the lowest RMSE with just above 80.
Mean Average Error (MAE)
The MAE ignores the direction of the bias by taking the absolute value of each of the error terms.
The MEAN method has the lowest mean absolute error with a value of about 57.6.
Mean Percentage Error (MPE)
This metric computes the average of the percentage errors in which the forecasts differ from the actual value.
The MEAN method has the lowest mean percentage error in absolute value of about 0.47 which is very close to zero.
Mean Absolute Percentage Error (MAPE)
Calculates the absolute values of percentage errors and then takes the average. It differs from MPE because it does not take into account the sign of these percentage errors.
The MEAN model has the lowest mean absolute percentage error.
Conclusion
According to the accuracy metrics analyzed above, the Mean Method appears to have the lowest errors on average across all of the accuracy metrics.
In fact, it has the lowest mean absolute error, mean percentage error, and mean absolute percentage error.
Now, we will compare these metric values found here to what we find in Excel.
Part II
X-11 Decomposition
I used the X-11 method for decomposition since it allows you to perform both additive and multiplicative decomposition and because it utilizes more advanced methods to get more accurate decomposition of time series data than the classical method.
First, I have performed additive decomposition. The X-11 method comes from the family of the
X_13ARIMA_SEATSdecomposition methods from thefeastspackage.
Multiplicative Decomposition
x11_mult <- epui_ts |>
model(x11 = feasts::X_13ARIMA_SEATS(value ~ x11())) |>
components()
x11_mult |> autoplot()Additive Decomposition
For additive decomposition to work using the x11 method, you must specify that you do not want the data to be transformed like it would for multiplicative decomposition using the
transform(`function` = "none")command.You also must specify
mode="add"within thex11function to indicate you want R to perform additive decomposition.
# additive
x11_add <- epui_ts |>
model(x11 = feasts::X_13ARIMA_SEATS(value ~ transform(`function` = "none")
+ x11(mode = "add"))) |>
components()
# Graph
x11_add |> autoplot()In this case, it is clear that multiplicative decomposition is more appropriate than additive decomposition.
The trend-cycle component is much smoother and the remainder component has a more random fluctuation in the multiplicative decomposition compared to the additive decomposition. Also, the seasonality component appears to vary based on the level of the time series, but the multiplicative method is able to keep the variance in the seasonality much more constant.
Using these results, I would maybe transform the data before specifying any models to use for my forecasts to control for the variance in the seasonality component.