library(fpp3)

Introduction

In this document, we will be going through exercises 8.1, 8.5, 8.6, 8.7, 8.8, and 8.9 from Forecasting: Principles and Practice (3rd ed).

Exercises

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

  • Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of alpha and l0, and generate forecasts for the next four months.
fit <- aus_livestock |>
  filter(Animal == "Pigs", State == "Victoria") |>
  model(
    SES = ETS(Count ~ error("A") + trend("N") + season("N"))
    )
tidy(fit) |>
  select(term,estimate) |>
  mutate(estimate = round(estimate,4))
## # A tibble: 2 × 2
##   term    estimate
##   <chr>      <dbl>
## 1 alpha      0.322
## 2 l[0]  100647.

The optimal value of alpha is 0.3221 and the optimal value of l0 is 100646.6.

fit |>
  forecast(h = 4) -> fc 
fc |>
  select(Month, Count, .mean)
## # A fable: 4 x 3 [1M]
##      Month             Count  .mean
##      <mth>            <dist>  <dbl>
## 1 2019 Jan N(95187, 8.7e+07) 95187.
## 2 2019 Feb N(95187, 9.7e+07) 95187.
## 3 2019 Mar N(95187, 1.1e+08) 95187.
## 4 2019 Apr N(95187, 1.1e+08) 95187.

We get the same forecast of 95186.56 pigs slaughtered for each future month due to SES not taking into account increases or decreases through trend or seasonality.

  • Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
fc |>
  slice(1) |>
  pull(.mean) -> y

augment(fit) |>
  pull(.resid) |>
  sd() -> s

cat("The 95% prediction interval for the first prediction goes between", y-1.96*s,"and", y+1.96*s)
## The 95% prediction interval for the first prediction goes between 76871.01 and 113502.1
fc |>
  slice(1) |>
  hilo(95) |>
  pull()
## <hilo[1]>
## [1] [76854.79, 113518.3]95

Our “manually” calculated prediction interval is about 30 counts narrower than the automatically calculated one. Such a small difference compared to the magnitude of the confidence interval values could be attributed to differences in rounding.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Here we analyze China

global_economy |>
  filter(Code == "CHN") |>
  select(Exports) -> tsb
  • Plot the Exports series and discuss the main features of the data.

There does not seem to be seasonality contained within this data, which makes sense considering the data is yearly. There is an upward trend that seems to increase exponentially up until 2008 where the trend reverses.

autoplot(tsb,.vars = Exports)

  • Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
fit <- tsb|>
  model(
    ANN = ETS(Exports ~ error("A") + trend("N") + season("N"))
    )
fc <- forecast(fit, h=5)
autoplot(fc,tsb)

  • Compute the RMSE values for the training data.
accuracy(fit) |>
  pull(RMSE) |>
  cat("is the RMSE of the ETS(ANN)")
## 1.900172 is the RMSE of the ETS(ANN)
  • Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
fit <- tsb|>
  model(
    ANN = ETS(Exports ~ error("A") + trend("N") + season("N")),
    AAN = ETS(Exports ~ error("A") + trend("A") + season("N"))
    )
fc <- forecast(fit, h=5)
autoplot(fc,tsb, level = NULL)

accuracy(fit) |>
  pull(RMSE) |>
  nth(2) |>
  cat("is the RMSE of the ETS(AAN)")
## 1.906436 is the RMSE of the ETS(AAN)
  • Compare the forecasts from both methods. Which do you think is best?

Although, ETS AAN has the lower RMSE, ETS ANN is likely the better method for forecasting here because the trend seems to be about to start recovering upward because of the economy starting to recover from the harsh drop in 2008 exports (until covid comes in). While AAN which takes account the recent downward trend forecasts a repeated lowering in exports.

  • Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

y^ +/- 2*RMSE can give us a 95% prediction interval.

RMSE <- accuracy(fit) |>
  pull(RMSE)
fc <- fc |>
  filter(Year == 2018)
fc$RMSE <- RMSE
fc |>
  hilo(95) |>
  mutate(upper_ci = .mean + 2*RMSE,
         lower_ci = .mean - 2*RMSE) |>
  select(.model, Exports, lower_ci, upper_ci, `95%`)
## # A tsibble: 2 x 6 [1Y]
## # Key:       .model [2]
##   .model    Exports lower_ci upper_ci                  `95%`  Year
##   <chr>      <dist>    <dbl>    <dbl>                 <hilo> <dbl>
## 1 ANN    N(20, 3.7)     16.0     23.6 [15.96718, 23.54756]95  2018
## 2 AAN    N(19, 3.9)     15.6     23.2 [15.49310, 23.23803]95  2018

Once more we have very similar 95% confidence intervals “manually” calculated compared to those that have been automatically calculated through R.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

To begin with we plot the GDP over time, we see an exponential increase in trend but no seasonality. Thus, an AAN model is likely the best fit here. Damping is likely not useful for forecasting because we don’t see the trend decreasing.

global_economy |>
  filter(Code == "CHN") |>
  select(GDP) -> tsb

autoplot(tsb,.vars = GDP)

fit <- tsb|>
  model(
    AAN = ETS(GDP ~ error("A") + trend("A") + season("N")),
    AADN = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
    MAN = ETS(GDP ~ error("M") + trend("A") + season("N")),
    MADN = ETS(GDP ~ error("M") + trend("Ad") + season("N")),
    BOXLOG = ETS(box_cox(GDP,0)),
    BOXSQRT = ETS(box_cox(GDP,0.5))
  )
fc <- forecast(fit, h=15)
autoplot(fc,tsb, level = NULL)

Here we have forecasts displayed with many different models which are either transformed or simply different forms of ETS. Box-Cox transforming the series with 0.5 seems to give us a forecast that best extends the overall exponential increasing trend of the GDP. While taking a lambda of 0 transformation makes the forecast tend towards a higher growth than what we would expect. The dampening trend options don’t quite fit the visual trend in our data, but if we think about how GDP is related to population growth and population growth is not sustainable forever these options might actually be the best ones to choose since they start leveling off. The non dampened AAN and MAN models seem to not fit how the data will grow in the future being more conservative with growth than the transformed data, but not conservative enough to truly level off. Yet, below we see the RMSE is the lowest for AAN which shows the dangers of not having separate data to test off of.

fit |>
  accuracy()
## # A tibble: 6 × 10
##   .model  .type              ME    RMSE     MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <chr>   <chr>           <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 AAN     Training      2.36e10 1.90e11 9.59e10 1.41   7.62 0.442 0.453  0.00905
## 2 AADN    Training      2.95e10 1.90e11 9.49e10 1.62   7.62 0.438 0.454 -0.00187
## 3 MAN     Training      4.13e10 2.00e11 9.66e10 2.11   7.72 0.446 0.477  0.268  
## 4 MADN    Training      4.65e10 2.00e11 9.70e10 2.32   7.79 0.447 0.476  0.236  
## 5 BOXLOG  Training     -3.48e10 2.86e11 1.25e11 0.744  7.21 0.577 0.683  0.654  
## 6 BOXSQRT Training      1.65e10 2.09e11 9.88e10 1.45   7.55 0.456 0.498  0.377

8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production |>
  select(Gas) -> tsb
autoplot(tsb, .vars=Gas)

Multiplicative seasonality is required here because seasonality exponentially increases as the trend does.

fit <- tsb |>
  model(
    MAM = ETS(Gas ~ error("M") + trend("A") + season("M")),
    MAdM = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
  )
fc <- forecast(fit,h=24)
autoplot(fc,tsb,level=NULL)

The forecasts seem to be only slightly effected by the dampening of trend here. Which makes it difficult to tell if one model is better than the other. Yet, the overall increasing trend does seem to be slowing down overtime which the dampened model would fit.

fit |>
  accuracy()
## # A tibble: 2 × 10
##   .model .type          ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 MAM    Training -0.115    4.60  3.02 0.199  4.08 0.542 0.606 -0.0131
## 2 MAdM   Training -0.00439  4.59  3.03 0.326  4.10 0.544 0.606 -0.0217

Taking a look at the RMSE we can see that dampening the trend has very slightly decreased it which could be a sign that it is a better model with this data.

8.8

Recall your retail time series data (from Exercise 7 in Section 2.10).

set.seed(1234567)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries_train <- myseries |>
  filter(year(Month) < 2011)
  • Why is multiplicative seasonality necessary for this series?

Multiplicative seasonality is required here because seasonality exponentially increases as the trend does.

autoplot(myseries_train, .var = Turnover)

  • Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
fit <- myseries_train |>
  model(
    MAM = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    MAdM = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )
fc <- forecast(fit,h=24)
autoplot(fc,myseries_train,level=NULL)

  • Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
fit |>
  accuracy()
## # A tibble: 2 × 12
##   State  Industry .model .type    ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <chr>  <chr>    <chr>  <chr> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Victo… Cafes, … MAM    Trai…  1.34  12.7  8.84 0.0736  3.08 0.320 0.339 0.222 
## 2 Victo… Cafes, … MAdM   Trai…  1.31  12.4  8.57 0.384   3.01 0.310 0.329 0.0640

The RMSE of the dampened model looks slightly better.

  • Check that the residuals from the best method look like white noise.
fit |>
  components() |>
  filter(.model == "MAdM") |>
  select(remainder) |>
  autoplot(.vars = remainder, na.rm = TRUE)

These residuals do end up looking like white noise with a mean around 0.

  • Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11? (The RMSE that we obtained with SNAIVE from 5.7 was 116)
fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))
fc |> autoplot(myseries, level=NULL)

fc |> accuracy(myseries)
## # A tibble: 2 × 12
##   .model State    Industry .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr>    <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MAM    Victoria Cafes, … Test   15.4  67.1  55.9 1.03   7.05  2.02  1.78 0.920
## 2 MAdM   Victoria Cafes, … Test   10.2  74.8  62.4 0.207  7.92  2.26  1.99 0.931

Here our MAM model actually performs better than our MAdM model with an RMSE of 67.05 and beats out the seasonal naive model’s RMSE of 116.

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- myseries_train |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

stl <- myseries_train |>
  model(
    STL(box_cox(Turnover, lambda))
  )

stl2 <- myseries |>
  model(
    STL(box_cox(Turnover, lambda))
  )

fit <- stl |>
  components() |>
  model(
    MAM = ETS(season_adjust ~ error("M") + trend("A") + season("M")),
    MAdM = ETS(season_adjust  ~ error("M") + trend("Ad") + season("M"))
  )

fit |>
  select(-.model) |>
  forecast(new_data = stl2 |> components() |> select(-.model) |> filter(year(Month) >= 2011)) -> fc

fc |>
  accuracy(stl2 |> components() |> select(-.model))
## # A tibble: 2 × 12
##   .model State Industry .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <chr>    <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 MAM    Vict… Cafes, … Test  -0.542  0.558 0.542 -3.98   3.98  1.96  1.63 0.811
## 2 MAdM   Vict… Cafes, … Test  -0.0928 0.342 0.300 -0.756  2.21  1.08  1.00 0.948

After transforming we notice that the MAPE we get from our predictions on the MAdM model is the lowest we’ve seen so far at 2.21% compared to our previous low of 7.05%. This shows that even despite the difference in scaled values we have, the model that serves best for forecasting is the one where we transform and decompose our initial series.