FPP Chapter 8 HW: Exponential Smoothing

Author

Alex Ptacek

library(tidyverse)
library(fpp3)

Question 8.1: Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

Answer: Optimal alpha = 0.36. Optimal level = 95,487.5

vict_pigs <- aus_livestock |> 
  filter(State == "Victoria" & Animal == "Pigs")

#Find our optimal parameters
vict_pigs_fit <- vict_pigs |> 
  model(ETS(Count)) |> 
  report()

Series: Count 
Model: ETS(A,N,A) 
  Smoothing parameters:
    alpha = 0.3579401 
    gamma = 0.0001000139 

  Initial states:
    l[0]     s[0]    s[-1]     s[-2]     s[-3]     s[-4]     s[-5]   s[-6]
 95487.5 2066.235 6717.311 -2809.562 -1341.299 -7750.173 -8483.418 5610.89
     s[-7]    s[-8]     s[-9]   s[-10]   s[-11]
 -579.8107 1215.464 -2827.091 1739.465 6441.989

  sigma^2:  60742898

     AIC     AICc      BIC 
13545.38 13546.26 13610.24

vict_pigs_fit

# A mable: 1 x 3
# Key:     Animal, State [1]
  Animal State    `ETS(Count)`
  <fct>  <fct>         <model>
1 Pigs   Victoria <ETS(A,N,A)>

#Generate 4 months of forecasts. Filter year to see forecasts better.
vict_pigs_fit |> 
  forecast(h = 4) |> 
  autoplot(vict_pigs |> filter(year(Month) > 2010))

Compute a 95% prediction interval for the first forecast using ^y ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

#Lower interval
84425 - 1.96*sqrt(60742898)

[1] 69149.2

#Upper interval
84425 + 1.96*sqrt(60742898)

[1] 99700.8

#Manual calculations match first row of fable calculation
vict_pigs_fit |> 
  forecast(h = 4) |>
  hilo(level = 95) |> 
  mutate(lower = `95%`$lower, upper = `95%`$upper) |> 
  select(lower, .mean, upper)

# A tsibble: 4 x 4 [1M]
   lower  .mean   upper    Month
   <dbl>  <dbl>   <dbl>    <mth>
1 69149. 84425.  99700. 2019 Jan
2 68933. 85158. 101383. 2019 Feb
3 74446. 91567. 108688. 2019 Mar
4 72125. 90098. 108071. 2019 Apr

Question 8.5: Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data.

Answer: The Japanese Exports series starts off with an upward trend until 1987, where the Exports troughed for about 15 years before rising rapidly in 2001 and continuing a cyclic pattern. The cyclic pattern seems to be 1-3 years of rapid growth, followed by 2-5 years of plateau, and then 1-3 years of rapid decline.

#Removing 2017 because it has NA Exports for Japan
jap_exports <- global_economy |> 
  filter(Country == "Japan") |> 
  filter(Year != 2017)

jap_exports |> 
  autoplot(Exports) +
  labs(title = "Japanese Exports (1960-2016)")

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

jap_ANN <- jap_exports |> 
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

jap_ANN |> 
  forecast(h = 5) |> 
  autoplot(jap_exports) +
  ggtitle("Forecast 2017-2021")

Compute the RMSE values for the training data.

ANN_RMSE <- jap_ANN |> 
  accuracy() |> 
  pull(RMSE)
ANN_RMSE

[1] 1.263342

& e. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set. Compare the forecasts from both methods. Which do you think is best?

Answer: Plotting the AAN forecast, we can notice an upward trends vs. the ANN model, which had a nearly horizontal line. The RMSE is slightly lower in the ETS(A,A,N) model, but extremely close. The alpha value for AAN is lower than ANN, so the AAN model is giving higher weight to earlier observations. Beta, only given for AAN, is near 0, so it’s not as responsive to changes in y, and therefore captures the general upward trend of the time series. The AAN method seems more useful for this data, because we do see an overall upward trend in the series, while the Simple ANN method only accounts for the errors in the data.

jap_AAN <- jap_exports |> 
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

#Plot AAN forecast
jap_AAN |> 
  forecast(h = 5) |> 
  autoplot(jap_exports)

#Get RMSE fro AAN model
jap_AAN |>
  accuracy() |> 
  pull(RMSE)

[1] 1.259002

#Compare coefficients of ANN vs AAN
tidy(jap_ANN)

# A tibble: 2 × 4
  Country .model                                                  term  estimate
  <fct>   <chr>                                                   <chr>    <dbl>
1 Japan   "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"… alpha    0.941
2 Japan   "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"… l[0]    10.6

#Compare coefficients of ANN vs AAN
tidy(jap_AAN)

# A tibble: 4 × 4
  Country .model                                                  term  estimate
  <fct>   <chr>                                                   <chr>    <dbl>
1 Japan   "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"… alpha  9.09e-1
2 Japan   "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"… beta   1.00e-4
3 Japan   "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"… l[0]   1.05e+1
4 Japan   "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"… b[0]   1.05e-1

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

Answer: Overall, the intervals are nearly the same using RMSE vs. standard deviation with fable.

#Observe lower, mean, and upper levels of AAN prediction interval
jap_AAN |> 
  forecast(h = 5) |> 
  hilo(95) |> 
  mutate(lower = `95%`$lower, upper = `95%`$upper) |> 
  select(lower, .mean, upper)

# A tsibble: 5 x 4 [1Y]
  lower .mean upper  Year
  <dbl> <dbl> <dbl> <dbl>
1  13.8  16.4  18.9  2017
2  13.0  16.5  19.9  2018
3  12.4  16.6  20.7  2019
4  11.9  16.7  21.5  2020
5  11.5  16.8  22.1  2021

AAN_RMSE <- accuracy(jap_AAN) |> 
  pull(RMSE)

#AAN lower
15.6 - 1.96*AAN_RMSE

[1] 13.13236

#AAN upper
15.6 + 1.96*AAN_RMSE

[1] 18.06764

#Observe lower, mean, and upper levels of ANN prediction interval
jap_ANN |> 
  forecast(h = 5) |> 
  hilo(95) |> 
  mutate(lower = `95%`$lower, upper = `95%`$upper) |> 
  select(lower, .mean, upper)

# A tsibble: 5 x 4 [1Y]
  lower .mean upper  Year
  <dbl> <dbl> <dbl> <dbl>
1  13.7  16.2  18.7  2017
2  12.7  16.2  19.7  2018
3  12.0  16.2  20.4  2019
4  11.4  16.2  21.0  2020
5  10.8  16.2  21.6  2021

#ANN lower
16.2 - 1.96*ANN_RMSE

[1] 13.72385

#ANN upper
16.2 + 1.96*ANN_RMSE

[1] 18.67615

Question 8.6: Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

china_gdp <- global_economy |> 
  filter(Country == "China")

#MAN model is automatically picked
china_gdp |> 
  model(ETS(GDP)) |>  
  forecast(h = 15) |> 
  autoplot(china_gdp)

china_gdp |> 
  model(ETS(log(GDP))) |> 
  forecast(h = 15) |> 
  autoplot(china_gdp)

china_gdp |> 
  model(ETS(GDP ~ error("A") + trend("A") + season("N"))) |> 
  forecast(h = 15) |> 
  autoplot(china_gdp)

china_gdp |> 
  model(ETS(GDP ~ error("A") + trend("Ad") + season("N"))) |> 
  forecast(h = 15) |> 
  autoplot(china_gdp)

Question 8.7: Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Answer: The ETS function automatical picks an ETS(M,A,M) model. Multiplicative seasonality is necessary here because the variation is increasing relative to the level of Gas production. Additionally, we need to make sure the residuals of our fitted model are normally distributed and homoschodastic. This is achieved (mostly) with the MAM model, but not an AAA model. Based on the RMSE and AICc, the auto-fable-computed model (MAM) has better acuracy than a damped model (M,Ad,M).

#Original Gas Time Series
aus_production |> 
  autoplot(Gas)

#ETS automatically picks MAM model
auto_fit <- aus_production |> 
  model(ETS(Gas))
auto_fit |> 
  forecast(h = 4) |> 
  autoplot(aus_production)

#Observe homoschodasticity of MAM model
aus_production |> 
  model(ETS(Gas)) |> 
  gg_tsresiduals()

#Observe heteroschodasticity of AAA model
aus_production |> 
  model(ETS(Gas ~ error("A") + trend("A") + season("A"))) |> 
  gg_tsresiduals()

#Plot damped ETS model
damp_fit <- aus_production |> 
  model(ETS(Gas ~ error("M") + trend("Ad") + season("M")))
damp_fit |> 
  forecast(h = 4) |> 
  autoplot(aus_production)

#Get RMSE for models
accuracy(auto_fit) |> 
  pull(RMSE)

[1] 4.595113

#Get RMSE for models
accuracy(damp_fit) |> 
  pull(RMSE)

[1] 4.59184

#Get AICc for models
glance(auto_fit) |> 
  pull(AICc)

[1] 1681.794

#Get AICc for models
glance(damp_fit) |> 
  pull(AICc)

[1] 1685.091

Question 8.8: Recall your retail time series data (from Exercise 7 in Section 2.10).

set.seed(624)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

Why is multiplicative seasonality necessary for this series?

Answer: Multiplicative seasonality is necessary for this series because the variance is increasing relative to the level of Turnover.

myseries |> 
  autoplot(Turnover)

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

#Holt-Winters' Multiplicative method
mam_myseries <- myseries |> 
  model(ETS(Turnover ~ error("M") + trend("A") + season("M")))
mam_myseries |> 
  forecast() |> 
  autoplot(myseries)

#Damped version
madm_myseries <- myseries |> 
  model(ETS(Turnover ~ error("M") + trend("Ad") + season("M")))
madm_myseries |> 
  forecast() |> 
  autoplot(myseries)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

Answer: The RMSE of one-step cross-validation is equivalent to the AIC, so we can simply check the AIC (or the AICc for small samples). I prefer the Damped version [ETS(M,Ad,M)] because is has lower AICc.

glance(mam_myseries) |> 
  pull(AICc)

[1] 4635.968

glance(madm_myseries) |> 
  pull(AICc)

[1] 4626.859

Check that the residuals from the best method look like white noise.

Answer: There are a couple spikes in the ACF plot of residuals, but they are not at seasonal spikes, so this looks like white noise. Nonetheless, I opted to run the ljung-box text and I confirmed that the residuals are white noise.

gg_tsresiduals(madm_myseries)

#Set lag = m*2 = 24
augment(madm_myseries) |> 
  features(.innov, ljung_box, lag = 24)

# A tibble: 1 × 5
  State           Industry               .model                lb_stat lb_pvalue
  <chr>           <chr>                  <chr>                   <dbl>     <dbl>
1 New South Wales Takeaway food services "ETS(Turnover ~ erro…    25.3     0.392

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

Answer: Yes! By fitting an auto-fable-computed ETS model, we decreased RMSE from 96.8 (SNAIVE) to 70.4 (Exponential Smoothing).

Seasonal Naive approach from 5.7

train_myseries <- myseries |> filter(year(Month) <= 2010)

fit_sn <- train_myseries |> model(SNAIVE(Turnover)) 

fit_sn |>
  forecast(h = "8 years") |> 
  accuracy(myseries)

# A tibble: 1 × 12
  .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 SNAIVE(T… New … Takeawa… Test   48.6  96.8  79.5  7.67  16.3  4.14  3.71 0.964

ETS Model

fit_ets <- train_myseries |> model(ETS(Turnover)) 

fit_ets |>
  forecast(h = "8 years") |> 
  accuracy(myseries)

# A tibble: 1 × 12
  .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ETS(Turn… New … Takeawa… Test  -59.3  70.4  60.9 -15.0  15.3  3.17  2.69 0.905

Question 8.9: For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

Answer: The resulting RMSE of 88.9 is inbetween our SNAIVE and ETS models’ RMSE, but still significantly worse than the ETS. This is likely due to the decomposition model using a seasonal naive model on the seasonal component (effectively, gamma = 1), while using ETS on the seasonally adjusted component. The ETS model is accounting for more of the variation in seasonality and using weighted parameters accordingly. Looking at our gamma coefficient for our ETS model, we can see it’s near-zero, so the model is effectively using a mean seasonality model for the seasonal component of `myseries`.

#Observe near-zero lambda, which means can take log of Turnover
myseries |> 
  features(Turnover, features = guerrero)

# A tibble: 1 × 3
  State           Industry               lambda_guerrero
  <chr>           <chr>                            <dbl>
1 New South Wales Takeaway food services         0.00214

fit_decomp <- train_myseries |> 
  model(decomposition_model(STL(log(Turnover)), ETS(season_adjust)))
fit_decomp |> 
  forecast(h = "8 years") |> 
  accuracy(myseries)

# A tibble: 1 × 12
  .model    State Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
  <chr>     <chr> <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 decompos… New … Takeawa… Test  -85.1  88.9  85.1 -19.9  19.9  4.43  3.40 0.734

tidy(fit_ets) |> 
  filter(term == "gamma")

# A tibble: 1 × 5
  State           Industry               .model        term  estimate
  <chr>           <chr>                  <chr>         <chr>    <dbl>
1 New South Wales Takeaway food services ETS(Turnover) gamma 0.000119