DATA 624: Assignment 05

Author

Curtis Elsasser

Published

March 9, 2025

Setup

library(fpp3)
set.seed(314159)

Assignment 05

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman. Please submit both the link to your Rpubs and the .pdf file with your run code.

Exercise 8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

data <- aus_livestock |> 
  filter(State == "Victoria" & Animal == "Pigs")

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of $\alpha$ and $\ell_0$, and generate forecasts for the next four months.

8.1.a Solution

The ETS() function is a formidable force. It not only chooses an optimal model, but optimizes the paramters as well. All we need to do is spy on him.

fit <- data |> 
  model(ets = ETS(Count))
report(fit)

Series: Count 
Model: ETS(A,N,A) 
  Smoothing parameters:
    alpha = 0.3579401 
    gamma = 0.0001000139 

  Initial states:
    l[0]     s[0]    s[-1]     s[-2]     s[-3]     s[-4]     s[-5]   s[-6]
 95487.5 2066.235 6717.311 -2809.562 -1341.299 -7750.173 -8483.418 5610.89
     s[-7]    s[-8]     s[-9]   s[-10]   s[-11]
 -579.8107 1215.464 -2827.091 1739.465 6441.989

  sigma^2:  60742898

     AIC     AICc      BIC 
13545.38 13546.26 13610.24

fit |> 
  forecast(h = 4) |> 
  autoplot(data)

# It's tiny. Let's zoom in a bit
fit |>
  forecast(h = 4) |> 
  autoplot(data) +
  coord_cartesian(xlim = c(yearmonth("2015-01"), yearmonth("2020-01")))

The model chosen as the best fit is ETS(A,N,A): an additive error component and an additive seasonal component. The optimal value of $=.3579 and $\hat{\gamma}=0.0001$

Compute a 95% prediction interval for the first forecast using $\hat{y}_1 \pm 1.96s$, where $s$ is the standard deviation of the residuals. Compare your interval with the interval produced by R.

8.1.b Solution

f <- forecast(fit, h = 4)
r <- residuals(fit)
s <- sd(r$.resid)
pi95 <- f$.mean[1] + c(-1.96, 1.96) * s
f |> 
  autoplot() +
  geom_hline(yintercept = pi95, color = "red") +
  labs(title = "95% Prediction Interval")

It matches the first (January, 2019) 95% forecast interval from forecast perfectly.

Exercise 8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse. Let’s look at Ethiopia’s exports (hoping it’s food). There isn’t much data on Ethiopia. Let’s try Japan.

data <- global_economy |> 
  filter(Code == "JPN")

Plot the Exports series and discuss the main features of the data.

8.5.a Solution

data |> 
  autoplot(Exports) +
  labs(title = "Japan's Exports")

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

It looks like there is a strong, positive trend line with a humungous valley spanning approximately 1990-2007. This period is known as the Lost Decades. Just as Japan was rising out of the Lost Decades, the 2008 financial crisis happened and Japan faced an approximately 10 year long downward facing spike. Nonetheless, Japan’s exports are on the rise.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

8.5.b Solution

fit_ann <- data |> 
  filter(!is.na(Exports)) |>
  model(ets = ETS(Exports ~ error("A") + trend("N") + season("N")))

fit_ann |>
  # let's increase the forecast window so that it is more visible
  forecast(h = 10) |> 
  autoplot(data)

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

Wow, that is an instrument with a big bell. Using the error alone did not instill much confidence in the overall prediction.

Compute the RMSE values for the training data.

8.5.c Solution

report(fit_ann)

Series: Exports 
Model: ETS(A,N,N) 
  Smoothing parameters:
    alpha = 0.9414732 

  Initial states:
   l[0]
 10.638

  sigma^2:  1.6541

     AIC     AICc      BIC 
263.1026 263.5555 269.2318

accuracy(fit_ann)

# A tibble: 1 × 11
  Country .model .type       ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
  <fct>   <chr>  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Japan   ets    Training 0.104  1.26 0.883 0.258  7.26 0.981 0.990 0.00173

The RMSE is ~1.263.

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

8.5.d Solution`

fit_aan <- data |>
  filter(!is.na(Exports)) |>
  model(ets = ETS(Exports ~ error("A") + trend("A") + season("N")))
fit_aan |>
  report() |>
  accuracy()

Series: Exports 
Model: ETS(A,A,N) 
  Smoothing parameters:
    alpha = 0.9094125 
    beta  = 0.0001000003 

  Initial states:
     l[0]      b[0]
 10.50228 0.1047452

  sigma^2:  1.7047

     AIC     AICc      BIC 
266.7104 267.8868 276.9256

# A tibble: 1 × 11
  Country .model .type          ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
  <fct>   <chr>  <chr>       <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
1 Japan   ets    Training -0.00389  1.26 0.880 -0.702  7.30 0.978 0.987 0.0246

Let’s put together a comparison table. |Model|AIC|AICc|BIC|RMSE| |—-|–|–|–|–| |ETS(A,N,N)|263|263.6|269.2|1.263| |ETS(A,A,N)|266.7|267.9|277|1.259|

The ETS(A,N,N) model has a lower AIC, AICc, and BIC. But its RMSE is slightly higher than the ETS(A,A,N) model’s RMSE. To be honest, I’m not sure what to do in this case. My inclination is to go with the lower RMSE, but that’s probably because I understand RMSE better than I do AIC, AICc and BIC.

Compare the forecasts from both methods. Which do you think is best?

8.5.e Solution

fit_ann |>
  forecast(h = 10) |> 
  autoplot(data) +
  labs(title = "ETS(A,N,N)") +
  coord_cartesian(ylim = c(5, 25))

fit_aan |>
  forecast(h = 10) |> 
  autoplot(data) +
  labs(title = "ETS(A,A,N)") +
  coord_cartesian(ylim = c(5, 25))

Isn’t it interesting that ETS(A,A,N) captures the apparent trend. I guess this really no surprise since ETS(A,N,N) does not have a trend component. I’m starting to lean in favor of the ETS(A,A,N) model. I like the fact that it captures the trend. It seems like Japan is on the rise again. But, I am not an economist! Please don’t invest in the Japanese Yen just because I like ETS(A,A,N).

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

8.5.f Solution

# ETS(A,N,N)
f_ann <- forecast(fit_ann, h = 10)
r_ann <- residuals(fit_ann)
s_ann <- sd(r_ann$.resid)
pi_ann <- f_ann$.mean[1] + c(-1.96, 1.96) * s_ann
f_ann |> 
  autoplot() +
  geom_hline(yintercept = pi_ann, color = "red") +
  labs(title = "ETS(A,N,N): 95% Prediction Interval")

# ETS(A,A,N)
f_aan <- forecast(fit_aan, h = 10)
r_aan <- residuals(fit_aan)
s_aan <- sd(r_aan$.resid)
pi_aan <- f_aan$.mean[1] + c(-1.96, 1.96) * s_aan
f_aan |> 
  autoplot() +
  geom_hline(yintercept = pi_aan, color = "red") +
  labs(title = "ETS(A,A,N): 95% Prediction Interval")

Both intersect with the 95% initial prediction interval from forecast. The best fit line for ETS(A,N,N) is parallel to our horizontal 95% intercept. The best fit line for ETS(A,A,N) is not parallel to our horizontal 95% intercept, because the best fit line is not parallel to the x-axis, but rather reflects the apparent trend in the data.

Exercise 8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of $h$ when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

Let’s isolate the Chinese GDP data.

data <- global_economy |> 
  filter(Code == "CHN")

8.6 Solution

Let’s plot the data and see what we’ve got.

data |>
  autoplot(GDP) +
  labs(title = "Chinese GDP")

Oh, gosh! That is a curve begging to be box-coxed with a logarithmic transformation. Let’s try to straighten it out before we do anything else.

data_t <- data |> 
  mutate(GDP = log(GDP))
data_t |>
  autoplot(GDP) +
  labs(title = "Chinese GDP: Log Transformed")

Wow, box-cox won’t do a better job than a straight up log transformation. Let’s try the ETS model with the log transformed data. It’s very clear that the transformed trend is linear and arithmetically increasing. So, we will use an additive trend component. The seasonality, if there is any, is not clear. So, we will use an additive trend component and leave the rest up to ETS().

fit_t <- data_t |> 
  model(ets = ETS(GDP ~ trend("A"))) |>
  report()

Series: GDP 
Model: ETS(A,A,N) 
  Smoothing parameters:
    alpha = 0.9999 
    beta  = 0.1079782 

  Initial states:
     l[0]       b[0]
 24.77007 0.04320226

  sigma^2:  0.0088

      AIC      AICc       BIC 
-33.07985 -31.92600 -22.77763

Okay, ETS() has chosen an ETS(A,A,N) model. Let’s see what happens when we use a damped trend.

fit_td <- data_t |> 
  model(ets = ETS(GDP ~ trend("Ad"))) |>
  report()

Series: GDP 
Model: ETS(A,Ad,N) 
  Smoothing parameters:
    alpha = 0.9998999 
    beta  = 0.2480956 
    phi   = 0.979999 

  Initial states:
     l[0]        b[0]
 24.82881 0.003249391

  sigma^2:  0.0091

      AIC      AICc       BIC 
-30.46070 -28.81364 -18.09804

Damping has had a negative effect on the fit. The AIC, AICc, and BIC have all increased by 3-4 points.

Let’s try operating on the original data. We’ll let ETS() find the best fit.

fit <- data |> 
  model(ets = ETS(GDP)) |>
  report()

Series: GDP 
Model: ETS(M,A,N) 
  Smoothing parameters:
    alpha = 0.9998998 
    beta  = 0.3119984 

  Initial states:
        l[0]       b[0]
 45713434615 3288256682

  sigma^2:  0.0108

     AIC     AICc      BIC 
3102.064 3103.218 3112.366

Oh my goodness, ETS() has chosen an ETS(M,A,N) model. It’s the information criteria scores that have exploded. The AIC, AICc, and BIC have all increased by ~3000 points. We are going to stick with the log transformed data and the ETS(A,A,N) model.

First we will plot the forecasts for the log transformed, non-dampened data.

fit_t |>
  forecast(h = 10) |>
  autoplot(data_t) +
  labs(title = "Chinese GDP: Log Transformed, Non-Damped")

Now we will plot the forecasts for the log transformed, dampened data.

fit_td |>
  forecast(h = 10) |>
  autoplot(data_t) +
  labs(title = "Chinese GDP: Log Transformed and Damped")

It looks as if the damped trend has a less confident forecast; it has a wider bell. The non-damped trend has a more confident and narrow forecast. We shall stick with the non-damped trend with the log transformed data.

Exercise 8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production |> 
  autoplot(Gas)

Ah, I remember this guy. I haven’s searched for a model yet, nonethless can see by the plot that there is a growth factor in the seasonality. For that reason, the seasonal component will be multiplicative. And it very clearly has a steep, positive linear trend from 1970 - 2010. The only real variable is damping and being that we’ve been asked to experiment with it, we shall.

8.7 Solution

fit <- aus_production |> 
  model(ets = ETS(Gas)) |>
  report()

Series: Gas 
Model: ETS(M,A,M) 
  Smoothing parameters:
    alpha = 0.6528545 
    beta  = 0.1441675 
    gamma = 0.09784922 

  Initial states:
     l[0]       b[0]      s[0]    s[-1]    s[-2]     s[-3]
 5.945592 0.07062881 0.9309236 1.177883 1.074851 0.8163427

  sigma^2:  0.0032

     AIC     AICc      BIC 
1680.929 1681.794 1711.389

A champion has been found. Sure enough it has a multiplicative seasonal component as well as a multiplicative error component. It’s trend is additive. The optimal values of $\hat{\alpha}=0.6529$, $\hat{\beta}=0.1552$, and $\hat{\gamma}=0.0978$.

Let’s try it with a damped trend component. Let’s whether it still

fit <- aus_production |> 
  model(ets = ETS(Gas ~ error("M") + trend("Ad") + season("M"))) |>
  report()

Series: Gas 
Model: ETS(M,Ad,M) 
  Smoothing parameters:
    alpha = 0.6489044 
    beta  = 0.1551275 
    gamma = 0.09369372 
    phi   = 0.98 

  Initial states:
     l[0]       b[0]      s[0]    s[-1]   s[-2]     s[-3]
 5.858941 0.09944006 0.9281912 1.177903 1.07678 0.8171255

  sigma^2:  0.0033

     AIC     AICc      BIC 
1684.028 1685.091 1717.873

What do we see? The smoothing values, not surprisingly, have subtly changed. And all of the information criteria scores have slightly increased. But only by a small fraction ($<0.04%$).

Exercise 8.8

Recall your retail time series data (from Exercise 7 in Section 2.10).

data <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

Let’s plot the data and see what we have.

data |>
  autoplot(Turnover) +
  labs(title = "Retail Turnover")

Why is multiplicative seasonality necessary for this series?

8.8.a Solution

The amplitude of the seasonal component is increasing as the series grows. This is a clear indicator that the seasonal component is multiplicative. But, I do wonder whether multiplicative is idea for the entire series. It does appear to grow from the beginning of the series to ~2010. From ~2010-2019 the amplitude is pretty constant.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

8.8.b Solution

First, we shall try the multiplicative method without damping.

data |> 
  model(ets = ETS(Turnover ~ error("M") + trend("A") + season("M"))) |>
  report()

Series: Turnover 
Model: ETS(M,A,M) 
  Smoothing parameters:
    alpha = 0.5276247 
    beta  = 0.0001000352 
    gamma = 0.08266459 

  Initial states:
     l[0]       b[0]      s[0]     s[-1]     s[-2]    s[-3]   s[-4]     s[-5]
 4.760311 0.06749661 0.9355303 0.8488176 0.9066948 1.470268 1.01972 0.9194645
     s[-6]    s[-7]    s[-8]    s[-9]   s[-10]    s[-11]
 0.9372326 1.006957 1.000908 1.004499 1.028769 0.9211401

  sigma^2:  0.0056

     AIC     AICc      BIC 
2838.026 2839.473 2907.540

And now we shall try Holt-Winters with a damped trend.

data |> 
  model(ets = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) |>
  report()

Series: Turnover 
Model: ETS(M,Ad,M) 
  Smoothing parameters:
    alpha = 0.5510153 
    beta  = 0.000102758 
    gamma = 0.07987571 
    phi   = 0.9786211 

  Initial states:
     l[0]       b[0]      s[0]     s[-1]     s[-2]    s[-3]     s[-4]     s[-5]
 5.065674 0.07156032 0.9354592 0.8503412 0.8922841 1.474074 0.9996284 0.9053009
     s[-6]    s[-7]    s[-8]    s[-9]   s[-10]   s[-11]
 0.9452715 1.005326 1.011572 1.010507 1.042879 0.927357

  sigma^2:  0.0057

     AIC     AICc      BIC 
2844.284 2845.905 2917.887

The model with the damped trend has a higher AIC, AICc, and BIC. So, our champion is ETS(M,A,M).

Let’s see what ETS() finds if we don’t specify the component types. And then we will compare.

data |> 
  model(ets = ETS(Turnover)) |>
  report()

Series: Turnover 
Model: ETS(M,A,M) 
  Smoothing parameters:
    alpha = 0.5276247 
    beta  = 0.0001000352 
    gamma = 0.08266459 

  Initial states:
     l[0]       b[0]      s[0]     s[-1]     s[-2]    s[-3]   s[-4]     s[-5]
 4.760311 0.06749661 0.9355303 0.8488176 0.9066948 1.470268 1.01972 0.9194645
     s[-6]    s[-7]    s[-8]    s[-9]   s[-10]    s[-11]
 0.9372326 1.006957 1.000908 1.004499 1.028769 0.9211401

  sigma^2:  0.0056

     AIC     AICc      BIC 
2838.026 2839.473 2907.540

Uh oh, it looks like ETS() has chosen ETS(M,A,M). This doesn’t look good for the model with damping.

Let’s compare the information criteria scores:

Model	AIC	AICc	BIC
ETS(M,A,M)	2838	2839	2908
ETS(M,Ad,M)	2844	2846	2918

The model with the damped trend has a higher AIC, AICc, and BIC. So, our champion indeed is ETS(M,A,M).

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

8.8.c Solution

Okay, let’s compare the RMSE of the one-step forecasts from the two methods. One-step? I know accuracy() calculates the RMSE, but for a single step? Let’s see. Ah, accuracy(): Accuracy measures can be computed directly from models as the one-step-ahead fitted residuals are available.

# Non-damped
data |> 
  model(ets = ETS(Turnover ~ error("M") + trend("A") + season("M"))) |>
  accuracy()

# A tibble: 1 × 12
  State    Industry    .model .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE
  <chr>    <chr>       <chr>  <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Tasmania Electrical… ets    Trai… -0.0114  1.33 0.944 -0.495  5.62 0.458 0.480
# ℹ 1 more variable: ACF1 <dbl>

# Damped
data |>
  model(ets = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) |>
  accuracy()

# A tibble: 1 × 12
  State    Industry      .model .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE
  <chr>    <chr>         <chr>  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Tasmania Electrical a… ets    Trai… 0.0925  1.34 0.947 0.194  5.61 0.459 0.480
# ℹ 1 more variable: ACF1 <dbl>

The RMSE of the non-damped model is 1.3333, while the RMSE of the damped model is 1.3351. So close! But the non-damped model has a slightly lower RMSE, so, if that is all I am basing my decision on, then I will opt for the non-damped model.

Check that the residuals from the best method look like white noise.

8.8.d Solution

data |>
  model(ets = ETS(Turnover ~ error("M") + trend("A") + season("M"))) |>
  gg_tsresiduals()

Wow, the residuals are very normally distributed. And the ACF show a few lags at which the correlation coefficient is above or below the dotted line, but very minimally. And the residual line plot looks like white noise. Let’s confirm by checking the mean.

c <- data |>
  model(ets = ETS(Turnover ~ error("M") + trend("A") + season("M"))) |>
  components()
mean(c$remainder, na.rm = TRUE)

[1] 0.0004808833

Alright! The mean is tiny. I’m comfortable calling it white noise.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

8.9.e Solution

data |>
  filter(year(Month) < 2011) |>
  model(ets = ETS(Turnover ~ error("M") + trend("A") + season("M"))) |>
  accuracy()

# A tibble: 1 × 12
  State Industry .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
  <chr> <chr>    <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
1 Tasm… Electri… ets    Trai… 0.0293  1.14 0.823 -0.522  5.96 0.482 0.511 0.0471

The RMSE of the test set above is ~1.144. The seasonal naïve approach from Exercise 7 in Section 5.11 had an RMSE of 2.24. So, yes indeed! We can beat the seasonal naïve approach.

Exercise 8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

8.9 Solution

Find an optimized lambda for the Box-Cox transformation.

lambda <- data |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)
data_t <- data |>
  mutate(Turnover = box_cox(Turnover, lambda))
data_t |>
  autoplot(Turnover)

Well, welly, welly, that’s done a pretty good job. Let’s try the STL decomposition. Actually, I’m not understanding what is being asked? What is the STL decomposition for? We have the transformed data. Let’s walk through it. Perhaps it will make sense?

decomped <- data_t |> 
  model(stl = STL(Turnover)) |>
  components()

Okay, I see the season_adjust data. But why aren’t we trying to use the S in ETS? All will be revealed. We shall use the season_adjust data with ETS.

decomped |> 
  select(season_adjust) |>
  model(ets = ETS(season_adjust ~ error("M") + trend("A"))) |>
  accuracy()

# A tibble: 1 × 10
  .model .type          ME   RMSE    MAE    MPE  MAPE  MASE RMSSE   ACF1
  <chr>  <chr>       <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
1 ets    Training -0.00307 0.0930 0.0710 -0.156  2.31 0.396 0.415 0.0566

The RMSE of 8.9.e is ~1.144. Our RMSE, calculated from our seasonally adjusted, box-cox transformed data is 0.093. Ladies and gentleman, we have a new champion. Well, all was revealed. There is wisdom behind the ways of our wiley professors.