8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman (https://otexts.com/fpp3/expsmooth-exercises.html)

library(caret)
library(corrplot)
library(dplyr)
library(e1071)
library(fable)
library(fpp2)
library(fpp3)
library(ggplot2)
library(GGally)
library(mlbench)
library(MASS)
library(purrr)
library(patchwork)
library(stats)
library(tsibble)
library(tidyr)

——————————————————————————–

(8.1) Consider the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
pigs <- aus_livestock %>%
  filter(Animal == 'Pigs') %>%
  filter(State == 'Victoria')

(a) Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(α\) and \(ℓ_0\), and generate forecasts for the next four months.

#Simple exponential smoothing
fit <- pigs %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))

# Optimal alpha = 0.3221247, l = 100,646.6
report(fit)
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07
#Forecast 
fc <- fit %>%
  forecast(h = "4 months")

fc %>%
  autoplot(pigs) +
  geom_line(aes(y = .fitted), col="red", data = augment(fit)) +
  labs(title="Pigs Slaughtered in Victoria, Australia", subtitle = "4-Month Exponential Smoothing Forecast using ETS(A,N,N)")

Since there is no trend or seasonality within the data, simple exponential smoothing suffices here. The optimal value of \(α\) is 0.3221247 and \(ℓ_0\) is 100,646.6. Since \(α\) is closer to 0 than it is to 1, it means the forecast still considers prior data with heavy weight. The model chosen was ETS(A,N,N), so an additive error but no trend or seasonal components, thus the forecast will not change.

(b) Compute a 95% prediction interval for the first forecast using \(\hat{y}\) ± 1.96\(s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

# first forecast
first_fc <- (fit |> forecast(h = 1)) |> pull(.mean)

# standard deviation
s <- sd(augment(fit)$.resid)

# 95% prediction interval 
upper <- first_fc + 1.96 * s
lower <- first_fc - 1.96 * s
cat("95% Prediction Interval (Manual): [", lower, "," , upper, "] \n")
## 95% Prediction Interval (Manual): [ 76871.01 , 113502.1 ]
# R interval
interval <- fc %>% 
  hilo(95) %>% 
  pull('95%') %>% 
  head(1)

lower_bound <- as.numeric(gsub("\\[|\\]", "", strsplit(as.character(interval), ",")[[1]][1]))
upper_bound <- as.numeric(gsub("\\[|\\]", "", strsplit(as.character(interval), ",")[[1]][2]))

# Print the 95% prediction interval
cat("95% Prediction Interval (R): [", lower_bound, ", ", upper_bound, "]")
## 95% Prediction Interval (R): [ 76854.79 ,  113518.3 ]

These two intervals are very close in terms of lower and upper bounds.

——————————————————————————–

(8.5) Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
#Filter data
Mexico_exports <- global_economy %>% 
  filter(Country == "Mexico") 

head(Mexico_exports)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country Code   Year         GDP Growth    CPI Imports Exports Population
##   <fct>   <fct> <dbl>       <dbl>  <dbl>  <dbl>   <dbl>   <dbl>      <dbl>
## 1 Mexico  MEX    1960 13040000000  NA    0.0129   11.7     8.51   38174112
## 2 Mexico  MEX    1961 14160000000   5.00 0.0131   10.6     8.41   39394126
## 3 Mexico  MEX    1962 15200000000   4.66 0.0133   10.1     8.57   40649588
## 4 Mexico  MEX    1963 16960000000   8.11 0.0133    9.95    8.32   41939880
## 5 Mexico  MEX    1964 20080000000  11.9  0.0137    9.85    7.63   43264272
## 6 Mexico  MEX    1965 21840000000   7.10 0.0141    9.52    7.63   44623043

(a) Plot the Exports series and discuss the main features of the data.

Mexico_exports %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Exports from Mexico")

I chose to analyze exports from Mexico, and I see that the overall trend has increased over time, with some random noise. As the data is collected at an annual level, we will not see sesonality but we can see cyclicity. Mexico’s exports hovered around 10% until 1980, after which it began rising until 1990, then dropped for 3 years before beginning to rise to 30% at 2010 as Mexico’s economy began to grow.

(b) Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

#Simple exponential smoothing
fit_1 <- Mexico_exports %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

#Optimal values
report(fit_1)
## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9998997 
## 
##   Initial states:
##      l[0]
##  8.503378
## 
##   sigma^2:  4.8073
## 
##      AIC     AICc      BIC 
## 330.5385 330.9829 336.7198
#Forecast 
fc_1 <- fit_1 %>%
  forecast(h = "10 years")

#Plot data + forecast and fitted values
fc_1 %>%
  autoplot(Mexico_exports) +
  geom_line(aes(y = .fitted), col="red", data = augment(fit_1)) +
  labs(y="% of GDP", title="Exports from Mexico", subtitle = "10-Year Exponential Smoothing Forecast using ETS(A,N,N)")

(c) Compute the RMSE values for the training data.

accuracy(fit_1)
## # A tibble: 1 × 11
##   Country .model           .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <fct>   <chr>            <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mexico  "ETS(Exports ~ … Trai… 0.506  2.15  1.38  1.83  7.78 0.983 0.991 0.203

The Root Mean Square Error is 2.154425, which is the average deviation between fitted values and the actual values.

(d) Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit_2 <- Mexico_exports %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

#Forecast 
fc_2 <- fit_2 %>%
  forecast(h = "10 years")

#Plot data + forecast and fitted values
fc_2 %>%
  autoplot(Mexico_exports) +
  geom_line(aes(y = .fitted), col="red", data = augment(fit_2)) +
  labs(y="% of GDP", title="Exports from Mexico", subtitle = "10-Year Exponential Smoothing Forecast using ETS(A,A,N)")

accuracy(fit_2)
## # A tibble: 1 × 11
##   Country .model        .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <fct>   <chr>         <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mexico  "ETS(Exports… Trai… -0.00816  2.09  1.41 -1.97  8.68  1.01 0.964 0.203

Now that I have an ETS(A,N,N) and ETS(A,A,N) forecast- they look completely indistinguishable side by side. The only difference being that the second forecast has an additive trend component, so the forecast is increasing over the time period I’ve chosen, and that the confidence intervals shift with it. The RMSE is 2.154425 for ETS(A,N,N), and it is 2.093999 for the ETS(A,A,N) model, which tells me this model performs slightly better when we include an additive trend component, especially since there is an overall rising trend in the data. The Mean Absolute Error (MAE) has gone up slightly (1.375397 vs 1.407705), as well as the Mean Absolute Percentage Error (7.782879 vs 8.682648), which implies that the trend is not strong or consistent, which is visible at a glance. The ETS(A,A,N) model improves forecasting accuracy slightly, although only marginally.

(e) Compare the forecasts from both methods. Which do you think is best?

# Plot forecasts 
Mexico_exports %>%
  model(ANN = ETS(Exports ~ error("A") + trend("N") + season("N")),
        AAN = ETS(Exports ~ error("A") + trend("A") + season("N"))) %>%
  forecast(h = "10 years") %>%
  autoplot(Mexico_exports, level = NULL)

The ETS(A,A,N) method seems to be better since it shows the increasing trend in data, while the ETS(A,N,N) method suggests the GDP will stay the same. Given that the models have similar RMSE, MAE, MAPE, etc., I think it better to choose the model that more accurately demonstrates what the immediate trend is.

(f) Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

################################################################################
##### Manual ##### 
# Forecast 1 ETS(A,N,N)

# first forecast 
first_fc_1 <- fc_1$.mean[1]

# RMSE 
residuals_1 <- fit_1 %>% 
  augment() %>%
  mutate(residuals = .resid)

RMSE_1 <- sqrt(mean(residuals_1$residuals^2))

# Standard error
se_1 <- RMSE_1

# 95% prediction interval for first forecast
lower_bound_1m <- first_fc_1 - 1.96 * se_1
upper_bound_1m <- first_fc_1 + 1.96 * se_1

################################################################################
##### Manual ##### 
# Forecast 2 ETS(A,A,N)

# first forecast 
first_fc_2 <- fc_2$.mean[1]

# RMSE 
residuals_2 <- fit_2 %>% 
  augment() %>%
  mutate(residuals = .resid)

RMSE_2 <- sqrt(mean(residuals_2$residuals^2))

# Standard error
se_2 <- RMSE_2

# 95% prediction interval for first forecast
lower_bound_2m <- first_fc_2 - 1.96 * se_2
upper_bound_2m <- first_fc_2 + 1.96 * se_2

################################################################################
##### R ##### 
# Forecast 1 ETS(A,N,N) ##### 
interval_1 <- fc_1 %>% 
  hilo(95) %>% 
  pull('95%') %>% 
  head(1)

# 95% prediction interval for first forecast
lower_bound_1r <- as.numeric(gsub("\\[|\\]", "", strsplit(as.character(interval_1), ",")[[1]][1]))
upper_bound_1r <- as.numeric(gsub("\\[|\\]", "", strsplit(as.character(interval_1), ",")[[1]][2]))

################################################################################
##### R ##### 
# Forecast 2 ETS(A,A,N)
interval_2 <- fc_2 %>% 
  hilo(95) %>% 
  pull('95%') %>% 
  head(1)

# 95% prediction interval for first forecast
lower_bound_2r <- as.numeric(gsub("\\[|\\]", "", strsplit(as.character(interval_2), ",")[[1]][1]))
upper_bound_2r <- as.numeric(gsub("\\[|\\]", "", strsplit(as.character(interval_2), ",")[[1]][2]))

################################################################################
# Results
cat("95% Prediction Interval for ETS(A,N,N) (Manual): [", lower_bound_1m, ",", upper_bound_1m, "]\n")
## 95% Prediction Interval for ETS(A,N,N) (Manual): [ 33.64367 , 42.08901 ]
cat("95% Prediction Interval for ETS(A,N,N) (R): [", lower_bound_1r, ", ", upper_bound_1r, "]\n")
## 95% Prediction Interval for ETS(A,N,N) (R): [ 33.569 ,  42.16368 ]
cat("95% Prediction Interval for ETS(A,A,N) (Manual): [", lower_bound_2m, ",", upper_bound_2m, "]\n")
## 95% Prediction Interval for ETS(A,A,N) (Manual): [ 34.278 , 42.48647 ]
cat("95% Prediction Interval for ETS(A,A,N) (R): [", lower_bound_2r, ", ", upper_bound_2r, "]")
## 95% Prediction Interval for ETS(A,A,N) (R): [ 34.12878 ,  42.63569 ]

———————————————————————–

(8.6) Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

chinese_gdp <- global_economy %>% 
  filter(Country == 'China')

# Forecast
chinese_gdp %>% 
  model(ANN = ETS(GDP ~ error("A") + trend("N") + season("N")),  
        AAN = ETS(GDP ~ error("A") + trend("A") + season("N"))) %>%   
  forecast(h = 25) %>% 
  autoplot(chinese_gdp, level = NULL) +
  labs(title="Chinese GDP", subtitle = "25-Year Exponential Smoothing Forecast")

# Damped forecast, from 0.8 to 0.98
chinese_gdp %>% 
  model('ϕ = 0.8'  = ETS(GDP ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
        'ϕ = 0.85' = ETS(GDP ~ error("A") + trend("Ad", phi = 0.85) + season("N")),
        'ϕ = 0.9'  = ETS(GDP ~ error("A") + trend("Ad", phi = 0.9) + season("N")),
        'ϕ = 0.98' = ETS(GDP ~ error("A") + trend("Ad", phi = 0.98) + season("N"))) %>% 
  forecast(h = 25) %>% 
  autoplot(chinese_gdp, level = NULL) +
  labs(title="Chinese GDP", subtitle = "25-Year Exponential Smoothing Damped Forecast")

# Box-Cox
lambda <- global_economy %>%
  filter(Country == "China") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)


# Holts and Box-Cox
fit_chinese_gdp <- chinese_gdp  %>%
  model('Holt Winters' = ETS(GDP ~ error("A") + trend("A") + season("N")),
        'Damped Holt Winters' = ETS(GDP ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
        'Box-Cox' = ETS(box_cox(GDP,lambda) ~ error("A") + trend("A") + season("N")),
        'Damped Box-Cox' = ETS(box_cox(GDP,lambda) ~ error("A") + trend("Ad", phi = 0.8) + season("N")))

fc_chinese_gdp <- fit_chinese_gdp %>%
  forecast(h = 15)

fc_chinese_gdp %>%
  autoplot(global_economy, level = NULL) +
  guides(colour = guide_legend(title = "Forecast")) +
  labs(title="Chinese GDP", subtitle = "15-Year Forecast: Box-Cox & Holt-Winters Methods")

Since this dataset shows an increasing trend but no seasonality, we will use additive errors and non-seasonal methods. The ETS(A,A,N) method follows the growth trend that began after 2000, while the ETS(A,N,N) assumes no growth, so it remains the same value as it was in 2017. Dampening the trend allows the data to eventually level off, with varying intensity depending on the \(ϕ\) coefficient. I also see that the Holt Winters methods forecast quite well, although the Box-Cox methods predicts over zealously and is likely overfitting to the increasing trend of the more recent observations.

——————————————————————————–

(8.7) Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
aus_production %>%
    autoplot(Gas) +
    labs(title="Australian Gas Production")

# Forecast 
aus_production %>%
  model(ETS(Gas ~ error("M") + trend("A") + season("M"))) %>%
  forecast(h = 25) %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australian Gas Production", subtitle = "25-Year Forecast: Multiplicative Holt-Winters Method")

# Damped
fit_aus_production <- aus_production  %>%
  model('Holt Winters' = ETS(Gas ~ error("M") + trend("A") + season("M")),
        'Damped Holt Winters: ϕ = 0.8' = ETS(Gas ~ error("M") + trend("Ad", phi = 0.8) + season("M")),
        'Damped Holt Winters: ϕ = 0.9' = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M")))

fc_aus_production <- fit_aus_production %>%
  forecast(h = 25)

fc_aus_production %>%
  autoplot(aus_production, level = NULL) +
  guides(colour = guide_legend(title = "Forecast")) +
  labs(title="Australian Gas Production", subtitle = "25-Year Forecast: Multiplicative Holt-Winters Methods")

This data shows an overall increasing trend, along with seasonality that appears to be multiplicative, so we can use Holt-Winter methods here. The data also seems apt for a Box-Cox transformation to help normalize the fluctuations that begin after 1970. Multiplicative seasonality is needed here as the size of the fluctuations grows with the level, especially after 1980. Dampening the Holt-Winters model is a subjective matter- if we assume that the seasonality is going to continue rising at a multiplicative rate, then dampening the forecast is un-needed. However, it isn’t always feasible for the seasonality to continue increasing, so dampening allows us to be conservative and let the seasonality grow at a slower, yet still-increasing, rate.

——————————————————————————–

(8.8) Recall your retail time series data (from Exercise 7 in Section 2.10).
set.seed(22397)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

autoplot(myseries, Turnover) +
  labs(title = "Retail Turnover")

(a) Why is multiplicative seasonality necessary for this series?

This series requires multiplicative seasonality as the fluctuations grow with the level. In the start of the time-series, they’re small, but towards the end, they are wider.

(b) Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

# Forecast 
myseries %>%
  model(ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  forecast(h = 40) %>%  #10 years
  autoplot(myseries, level = NULL) +
  labs(title="Retail Turnover", subtitle = "10-Year Forecast: Multiplicative Holt-Winters Method")

# Damped
fit_aus_retail <- myseries  %>%
  model('Holt Winters' = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        'Damped Holt Winters: ϕ = 0.8' = ETS(Turnover ~ error("M") + trend("Ad", phi = 0.8) + season("M")),
        'Damped Holt Winters: ϕ = 0.9' = ETS(Turnover ~ error("M") + trend("Ad", phi = 0.9) + season("M")))

fc_aus_retail <- fit_aus_retail %>%
  forecast(h = 40)

fc_aus_retail %>%
  autoplot(myseries, level = NULL) +
  guides(colour = guide_legend(title = "Forecast")) +
  labs(title="Retail Turnover", subtitle = "10-Year Forecast: Multiplicative Holt-Winters Methods")

(c) Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(fit_aus_retail)
## # A tibble: 3 × 12
##   State      Industry .model .type      ME  RMSE   MAE     MPE  MAPE  MASE RMSSE
##   <chr>      <chr>    <chr>  <chr>   <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 Western A… Newspap… Holt … Trai… -0.0918  2.54  1.75 -0.926   6.83 0.421 0.449
## 2 Western A… Newspap… Dampe… Trai…  0.0979  2.55  1.75 -0.0749  6.75 0.419 0.450
## 3 Western A… Newspap… Dampe… Trai…  0.0804  2.55  1.75 -0.163   6.75 0.419 0.450
## # ℹ 1 more variable: ACF1 <dbl>

The RMSE is the lowest for the Multiplicative Holt Winter Method, and slightly larger for the two damped methods I used, but the RMSE values for these three models are so close, that I’d actually choose the damped method where ϕ = 0.8, for its low Mean Absolute Percentage Error, along with its low Mean Absolute Square Error.

(d) Check that the residuals from the best method look like white noise.

myseries %>%
  model(ETS(Turnover ~ error("M") + trend("Ad", phi = 0.8) + season("M"))) %>%
  gg_tsresiduals() 

(e) Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 7 in Section 5.11?

#### Find test set RMSE #### 

# Trained set
myseries_train <- myseries %>%
  filter(year(Month) < 2011)

# Forecast remaining data
myseries_test_forecast <- myseries_train |>
  model(ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) |>
  forecast(new_data = anti_join(myseries, myseries_train))

# RMSE = 8.116245
myseries_test_forecast %>% 
  accuracy(myseries)
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "ETS(Tur… West… Newspap… Test   4.91  8.12  6.53  10.6  17.3  1.59  1.43 0.829
#### Snaive #### 

# Trained set
myseries_train <- myseries %>%
  filter(year(Month) < 2011)

# Fit a seasonal naive model 
fit_snaive <- myseries_train %>%
  model(SNAIVE(Turnover))

# Produce forecasts for the test data
fc_snaive <- fit_snaive %>%
  forecast(new_data = anti_join(myseries, myseries_train))

# RMSE = 9.108427
fc_snaive %>%
  accuracy(myseries)
## # A tibble: 1 × 12
##   .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE(T… West… Newspap… Test   3.04  9.11  7.34  6.06  19.9  1.79  1.61 0.513

The RMSE for the test-set for my series is 8.116245 when training up to 2010. When using a seasonal naive method, I have a forecast accuracy of 9.108427, which is worse than the multiplicative method. Overall the Holt Winters Model performs better for multiplicative seasonal time series than a seasonal naive series.

——————————————————————————–

(8.9) For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
# Get optimal lambda
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# Box Cox transform
myseries_boxcox <- myseries %>%
  mutate(box_cox = box_cox(Turnover, lambda))

# STL decomposition
stl_decomp <- myseries_boxcox %>%
  model(STL(box_cox ~ season(window = "periodic"), robust = TRUE)) %>%
  components()

# Extract seasonally adjusted component from STL decomposition
myseries$Turnover_sa <- stl_decomp$season_adjust

# Filter training and test datasets
myseries_train <- myseries %>%
  filter(year(Month) < 2011)

myseries_test <- myseries %>%
  filter(year(Month) >= 2011)

# Fit ETS model to seasonally adjusted training set
myseries$Turnover_sa <- stl_decomp$season_adjust

fit <- myseries_train %>%
  model(ETS(Turnover_sa ~ error("M") + trend("A") + season("M")))

# Residuals, ACF plot
fit %>% gg_tsresiduals()  +
  labs(title="Retail Turnover", subtitle = "Seasonal STL Decomposition after a Box-Cox Transform")

# Forecast on test data
fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))

# RMSE = 0.1693932
fc %>%
  accuracy(myseries_test)
## # A tibble: 1 × 12
##   .model   State Industry .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>    <chr> <chr>    <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "ETS(Tu… West… Newspap… Test  -0.125 0.169 0.145 -3.95  4.54   NaN   NaN 0.672