Chapter 8, Hyndman and Athanasopoulos

Q 8.1

3.1. Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

  1. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

First taking a look at the raw data for number of pigs slaughtered in Victoria, it looks like there is no trend and no clearly identifiable seasonality. This varying seasonality suggests that the additive naive method could be best to forecast this data.

Vic_pigs <- aus_livestock %>%
  filter(State == "Victoria", Animal == "Pigs")

ggplotly(Vic_pigs %>%
  autoplot(Count) +
  labs(y = "Count", title = "Victoria counts of pigs slaughtered"))

Fitting a model using the ‘ETS()’ function, set the error to “A” (Additive), the trend to “N” (None) and the season to “N” (None). Using the report() function we can see the \(\alpha\) and \(\ell_0\) values. We can see below that R chooses 0.3221 and 100646.6 respectively.

fit <- Vic_pigs %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))
report(fit)
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

Forecasting for the next four months, the forecast stays constant at 95,187 since this is a naive method.

fc_pigs <- fit %>%
  forecast(h = 4)

fc_pigs$Count
## <distribution[4]>
## [1] N(95187, 8.7e+07) N(95187, 9.7e+07) N(95187, 1.1e+08) N(95187, 1.1e+08)
  1. Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

The residuals of the ETS model are obtained from the augment() function.

(res_pigs <- augment(fit))
## # A tsibble: 558 x 8 [1M]
## # Key:       Animal, State, .model [1]
##    Animal State    .model                   Month  Count .fitted  .resid  .innov
##    <fct>  <fct>    <chr>                    <mth>  <dbl>   <dbl>   <dbl>   <dbl>
##  1 Pigs   Victoria "ETS(Count ~ error(\~ 1972 Jul  94200 100647.  -6447.  -6447.
##  2 Pigs   Victoria "ETS(Count ~ error(\~ 1972 Aug 105700  98570.   7130.   7130.
##  3 Pigs   Victoria "ETS(Count ~ error(\~ 1972 Sep  96500 100867.  -4367.  -4367.
##  4 Pigs   Victoria "ETS(Count ~ error(\~ 1972 Oct 117100  99460.  17640.  17640.
##  5 Pigs   Victoria "ETS(Count ~ error(\~ 1972 Nov 104600 105142.   -542.   -542.
##  6 Pigs   Victoria "ETS(Count ~ error(\~ 1972 Dec 100500 104968.  -4468.  -4468.
##  7 Pigs   Victoria "ETS(Count ~ error(\~ 1973 Jan  94700 103529.  -8829.  -8829.
##  8 Pigs   Victoria "ETS(Count ~ error(\~ 1973 Feb  93900 100685.  -6785.  -6785.
##  9 Pigs   Victoria "ETS(Count ~ error(\~ 1973 Mar  93200  98499.  -5299.  -5299.
## 10 Pigs   Victoria "ETS(Count ~ error(\~ 1973 Apr  78000  96792. -18792. -18792.
## # ... with 548 more rows

And the standard deviation of 9344.67 is derived from the .resid variable.

(sd_pigs <- sd(res_pigs$.resid))
## [1] 9344.666

The first estimated value in the forecast model produced by R is 95187.

(L_1 <- mean(fc_pigs$.mean[1]))
## [1] 95186.56

Having all of this allows for the calculation of the 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\).

(interval_pigs_high <- L_1 + (1.96 * sd_pigs))
## [1] 113502.1
(interval_pigs_low <- L_1 - (1.96 * sd_pigs))
## [1] 76871.01

The calculated 95% prediction interval is bounded from 76871.01 to 113502.1 Comparing that to the interval computed by the model, which is 76854.79 to 113518.3, the bounds of the model are just slightly larger than the mathematically calculated bounds.

interval_pigs_model <- hilo(fc_pigs)
interval_pigs_model$`95%`[1]
## <hilo[1]>
## [1] [76854.79, 113518.3]95

Q 8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

  1. Plot the Exports series and discuss the main features of the data.

Looking at exports from France, there is an upward trend but no seasonality from 1960 to 2017. There were periods of rapid growth and rapid decline, the overall trend was positive growth.

Fr_Exports <- global_economy %>%
  filter(Country == "France")

Fr_Exports %>%
  autoplot(Exports) +
  labs(y = "Count", title = "France exports")

  1. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

The forecast looks out 10 years as shown on the below plot.

fit_Fr <-  Fr_Exports %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))
 
fc_Fr <- fit_Fr %>%
  forecast(h = 10) 
  
(fc_Fr %>%
  autoplot(Fr_Exports) +
  labs(y = "Count", title = "France exports forecast using ETS(A,N,N)"))

  1. Compute the RMSE values for the training data.

Checking the fit training data for Root mean squared error (RMSE) returns 1.152003.

accuracy(fit_Fr)[5]
## # A tibble: 1 x 1
##    RMSE
##   <dbl>
## 1  1.15
  1. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

The RMSE of the trended forecast calculates to 1.11996.

fit_Fr2 <-  Fr_Exports %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))
 
accuracy(fit_Fr2)[5]
## # A tibble: 1 x 1
##    RMSE
##   <dbl>
## 1  1.12
fc_Fr2 <- fit_Fr2 %>%
  forecast(h = 10) 
 
(fc_Fr2 %>%
  autoplot(Fr_Exports) +
  labs(y = "Count", title = "France exports forecast using ETS(A,A,N)"))

  1. Compare the forecasts from both methods. Which do you think is best?

Using an Arithmetic trend in the second forecast reflects the overall upward trend in the forecast results as seen on the second plot, and the lower value for RMSE in the second model indicates that it would be preferred over the original naive forecast, because this second forecast accounts for the trend properly while the original forecast did not account for any trend.

  1. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
# model 1
res_Fr <- augment(fit_Fr)
sd_Fr <- sd(res_Fr$.resid)
L_1_Fr <- mean(fc_Fr$.mean[1])
interval_Fr_high <- L_1_Fr + (1.96 * sd_Fr)
interval_Fr_low <- L_1_Fr - (1.96 * sd_Fr)

# model 2
res_Fr2 <- augment(fit_Fr2)
sd_Fr2 <- sd(res_Fr2$.resid)
L_1_Fr2 <- mean(fc_Fr2$.mean[1])
interval_Fr2_high <- L_1_Fr2 + (1.96 * sd_Fr2)
interval_Fr2_low <- L_1_Fr2 - (1.96 * sd_Fr2)

paste("The 95% prediction interval for model 1 is", interval_Fr_low, "to", interval_Fr_high, "while for model 2 it is",interval_Fr2_low, "to", interval_Fr2_high,".")
## [1] "The 95% prediction interval for model 1 is 28.6745409558786 to 33.0890216655963 while for model 2 it is 28.9598961226832 to 33.3882664753598 ."
paste("The span of the model 1 is",interval_Fr_high-interval_Fr_low,"and for model 2 it is",interval_Fr2_high-interval_Fr2_low)
## [1] "The span of the model 1 is 4.41448070971769 and for model 2 it is 4.42837035267668"

The 95% predication intervals produced using R for the first model is 28.5849 to 33.1796, and for the second model it is 28.8992 to 33.4490. The span for model 1 is 4.5957 and for model 2 the span is 4.54986. Both of these spans are slightly larger than the spans calculated by hand.

interval_Fr1_model <- hilo(fc_Fr)
interval_Fr1_model$`95%`[1]
## <hilo[1]>
## [1] [28.58393, 33.17963]95
interval_Fr2_model <- hilo(fc_Fr2)
interval_Fr2_model$`95%`[1]
## <hilo[1]>
## [1] [28.89915, 33.44901]95

Q 8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

GDP in China show trend but no seasonality. Therefore, in the models the season parameter should be set to “N” while the other parameters may be adjusted for comparison purposes.

Ch_GDP <- global_economy %>%
  filter(Country == "China")

Ch_GDP %>%
  autoplot(GDP) +
  labs(y = "Count", title = "China GDP")

Utilizing the ETS model with several variations shows the potential impacts of different models on forecasts of the same underlying data. The model generated six different model types: additive, multiplicative, additive damped with \(\phi = 0.8\), additive damped with \(\phi = 0.95\), multiplicative damped with \(\phi = 0.8\) and multiplicative damped with \(\phi = 0.95\). The results are interesting, in that additive and multiplicative appear to have almost the same effect of continuing the projection rapidly towards infinity, which is not possible economically even in a State-controlled economy such as China. The additive and multiplicative models damped at \(\phi = 0.95\) begin to bend the curve downwards (away from infinity), but it is the models damped at \(\phi = 0.8\) that begin to produce what economists would be more likely to agree upon as a reasonable model, in that the curvature created appears to head more towards a leveling out of the economy at some point in the future. The multiplicative damped displaying the most curvature. Thus, in a free market economy one might expect the multiplicative damped at \(\phi = 0.8\) to be the most acceptable model, but in the State-controlled Chinese economy that may not actually be the preferred forecast depending on party level political agendas.

fit_Ch <-  Ch_GDP %>%
  model(
    additive = ETS(GDP ~ error("A") + trend("A") + season("N")),
    multiplicative = ETS(GDP ~ error("M") + trend("A") + season("N")),
    add_damped_0.8 = ETS(GDP ~ error("A") + trend("Ad",phi=0.8) + season("N")),
    add_damped_0.95 = ETS(GDP ~ error("A") + trend("Ad",phi=0.95) + season("N")),
    mult_damped_0.8 = ETS(GDP ~ error("M") + trend("Ad",phi=0.8) + season("N")),
    mult_damped_0.95 = ETS(GDP ~ error("M") + trend("Ad",phi=0.95) + season("N"))
    )
 
fc_Ch <- fit_Ch %>%
  forecast(h = 10) 
  
(fc_Ch %>%
  autoplot(Ch_GDP, level = NULL) +
  labs(y = "Count", title = "China GDP forecasts using various models"))

Q 8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Australian gas production is seasonal with increasing variance in the seasonality as time progresses. It also shows a clearly increasing trend overall.

Gas <- aus_production

Gas %>%
  autoplot(Gas) +
  labs(y = "Petajoules", title = "Australian Gas production")

The multiplicative seasonality continues the increase in the variance that we saw in the original data, and the Additive trend continues the positive slope from the historical production levels. The damped slopes flatten out the positive trend and appear to diminish the growth in variance in the seasonality in the plots below.

fit_Gas <-  Gas %>%
  model(
    multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
    add_damped_0.8 = ETS(Gas ~ error("A") + trend("Ad",phi=0.8) + season("M")),
    add_damped_0.95 = ETS(Gas ~ error("A") + trend("Ad",phi=0.95) + season("M"))
    )
 
fc_Gas <- fit_Gas %>%
  forecast(h = 30) 
  
(fc_Gas %>%
  autoplot(Gas, level = NULL) +
  labs(y = "Petajoules", title = "Australian Gas production") +
  facet_grid(.model~.))

Q 8.8

Recall your retail time series data (from Exercise 8 in Section 2.10).

This again refers to the aus_retail dataset showing retail turnover from 1982 to 2019. As a refresher, the original series is shown in the below plot.

set.seed(54321)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(myseries,Turnover) +
  labs(title = "Turnover by Month")

  1. Why is multiplicative seasonality necessary for this series?

Multiplicative seasonality is necessary for this series because the variance in the seasonality increases over time, and multiplicative seasonality retains and respects the changes in the variance over time in its forecasting methodology.

  1. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

Again the strict multiplicative method incorporates more of the additive upward trend in the overall forecast, while the dampened methods at the programmatic upper and lower limits of \(\phi\) (respectively at 0.8 and 0.98) show a dampened trend much less than the additive trend in the strict model.

fit_myseries <- myseries %>%
  model(
    multiplicative = ETS(Turnover ~ error("A") + trend("A") + season("M")),
    add_damped_0.8 = ETS(Turnover ~ error("A") + trend("Ad",phi=0.8) + season("M")),
    add_damped_0.98 = ETS(Turnover ~ error("A") + trend("Ad",phi=0.98) + season("M"))
    )
 
fc_myseries <- fit_myseries %>%
  forecast(h = 48) 
  
(fc_myseries %>%
  autoplot(myseries, level = NULL) +
  labs(y = "Turnover", title = "Turnover by Month") +
  facet_grid(.model~.))

  1. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

The RMSE for the damped model at 0.80 shows that it is empirically the best choice option with the lowest score. This is intuitive for a subject matter such as Turnover, where variance may be increasing, but even the most recent actual trend shows both decline and increase since 2010. Therefore, eliminating constant upward trend from the forecast makes more sense.

fit_myseries %>%
  accuracy() %>%
  select(.model,RMSE)
## # A tibble: 3 x 2
##   .model           RMSE
##   <chr>           <dbl>
## 1 multiplicative   3.02
## 2 add_damped_0.8   2.99
## 3 add_damped_0.98  3.11
  1. Check that the residuals from the best method look like white noise.

There is one large residual in 2010 that is visible on all three plots as a spike or an extreme outlier, and on the lag plot there are three other points that fall just outside the bounds of insignificance (i.e. they hold at least minor significance) to the model. The residual histogram looks normal with a small number of outliers.

(bestfit_myseries_res <- myseries %>%
  model(add_damped_0.8 = ETS(Turnover ~ error("A") + trend("Ad",phi=0.8) + season("M"))) %>% 
    gg_tsresiduals())

  1. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

The training model to the end of 2010 returns an RMSE of 2.529164, while the test set from 2011 forward returns an RMSE of 3.733941. The test set RMSE beats the RMSE of the original seasonal naive model which was 5.637434.

myseries_train <- myseries %>%
  filter(year(Month) < 2011)    # split the train set up to 2010

fit_train <- myseries_train %>%  # fit the new train set
  model(add_damped_0.8 = ETS(Turnover ~ error("A") + trend("Ad",phi=0.8) + season("M")))

myseries_test <- myseries %>%   # create a test set from 2011 forward
  filter(year(Month) > 2010)

fit_test <- myseries_test %>%  # fit the test set
  model(add_damped_0.8 = ETS(Turnover ~ error("A") + trend("Ad",phi=0.8) + season("M")))

fit_5.11 <- myseries_train %>%
  model(SNAIVE(Turnover))       # compare to Ex 7 from Section 5.11

fit_train %>%
  accuracy() %>%
  select(.model,RMSE)
## # A tibble: 1 x 2
##   .model          RMSE
##   <chr>          <dbl>
## 1 add_damped_0.8  2.53
fit_test %>%
  accuracy() %>%
  select(.model,RMSE)
## # A tibble: 1 x 2
##   .model          RMSE
##   <chr>          <dbl>
## 1 add_damped_0.8  3.73
fit_5.11 %>%
  accuracy() %>%
  select(.model,RMSE)
## # A tibble: 1 x 2
##   .model            RMSE
##   <chr>            <dbl>
## 1 SNAIVE(Turnover)  5.64

Q 8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

Taking the same training data and applying a Box-Cox transformation to the series and then taking an STL decomposition, we can compare it to the original STL decomposition of the full series.

Here is the original decomposition of the series:

retail_dcmp <- myseries %>%
  model(STL(Turnover ~ trend(window = 10) +
    season(window = "periodic"),
    robust = TRUE)) %>%
  components()

autoplot(retail_dcmp) +
  labs(title = "Decomposition of Australian retail turnover - original")

Now performing the Box-Cox transform on the training set through Dec 2010, in which we see a considerable decrease in scale for all components of the series and the series itself compared to the original STL decomposition. There is considerable smoothing in the trend component, particularly from 2009 to 2015, and the remainder is much more consistent.

lambda_train <- myseries_train %>%
  features(Turnover, features = guerrero) %>% # get lambda for the train set
  pull(lambda_guerrero)

myseries_train_adjusted <-
    myseries_train %>% 
    mutate(turnover_transform = box_cox(Turnover, lambda_train)) # apply the transformation with lambda_train

myseries_train_adjusted %>%   # perform the STL decomp
  model(STL(turnover_transform ~ trend(window = 10) +
    season(window = "periodic"),
    robust = TRUE)) %>%
  components() %>%
  autoplot() + 
  labs(title = "Decomposition of Australian retail turnover - Train set")

Likewise seasonally adjusting the test set for comparision purposes. There is certainly still something odd about the transformed trend, bnut the seasonality looks more consistent.

lambda_test <- myseries_test %>%
  features(Turnover, features = guerrero) %>% # get lambda for the test set
  pull(lambda_guerrero)

myseries_test_adjusted <-
    myseries_test %>% 
    mutate(turnover_transform = box_cox(Turnover, lambda_test)) # apply the transformation with lambda_test

myseries_test_decomp <- myseries_test_adjusted %>%   # perform the STL decomp
  model(STL(turnover_transform ~ trend(window = 10) +
    season(window = "periodic"),
    robust = TRUE))

myseries_test_decomp %>%
  components() %>%
  autoplot() + 
  labs(title = "Decomposition of Australian retail turnover - Test set")

Now taking the transformed sets and forecasting on the seasonally adjusted data, the training set returns an RMSE of 0.1174647 and the test set returns an RMSE of only 0.08381113, which is a big improvement over the original test set RMSE of 3.733941.

fit_train_adjusted <- myseries_train_adjusted %>%  # fit the new adjusted train set
  model(add_damped_0.8 = ETS(turnover_transform ~ error("A") + trend("Ad",phi=0.8) + season("M")))

fit_train_adjusted %>%
  accuracy() %>%
  select(.model,RMSE)
## # A tibble: 1 x 2
##   .model          RMSE
##   <chr>          <dbl>
## 1 add_damped_0.8 0.117
fit_test_adjusted <- myseries_test_adjusted %>%  # fit the new adjusted test set
  model(add_damped_0.8 = ETS(turnover_transform ~ error("A") + trend("Ad",phi=0.8) + season("M")))

fit_test_adjusted %>%
  accuracy() %>%
  select(.model,RMSE)
## # A tibble: 1 x 2
##   .model           RMSE
##   <chr>           <dbl>
## 1 add_damped_0.8 0.0838

Looking at this visually, here is the original data:

myseries %>% autoplot(Turnover)

And the seasonally adjusted test set v seasonally adjusted training forecast:

myseries_train_adjusted <- myseries_train %>% 
  mutate(turnover_transform = box_cox(Turnover, lambda_test)) # apply the transformation with lambda_test
  
fit_myseries_train_adjusted <- myseries_train_adjusted %>%
  model(add_damped_0.8 = ETS(turnover_transform ~ error("A") + trend("Ad",phi=0.8) + season("M")))

fc_train_adjusted <- fit_myseries_train_adjusted %>%
  forecast(h = 96) 

fc_compare_train <- fit_myseries_train_adjusted %>%
  forecast(new_data = anti_join(myseries_train_adjusted, fc_train_adjusted))
## Joining, by = c("State", "Industry", "Month", "turnover_transform")
fc_train_adjusted %>% autoplot(myseries_train_adjusted) 

The training forecast for 2011 through 2019 follows essentially the same shape within the 95% confidence interval as the actuals in the original dataset for those years, and it also resembles the shape of the best fit forecast from the above test set in question 8.8.e.