library(fpp3)
library(latex2exp)
library(seasonal)

Exercise 8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Part a

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(ℓ_{0}\), and generate forecasts for the next four months.

The optimal values of \(\alpha\) is 0.3221247 and \(ℓ_{0}\) is 100646.6

fit <- aus_livestock %>%
  filter(State == "Victoria", Animal == "Pigs") %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))

tidy(fit)

## # A tibble: 2 × 5
##   Animal State    .model                                          term  estimate
##   <fct>  <fct>    <chr>                                           <chr>    <dbl>
## 1 Pigs   Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + sea… alpha  3.22e-1
## 2 Pigs   Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + sea… l[0]   1.01e+5

fc <- fit %>%
  forecast(h = 4)


fit %>%
  forecast(h = 4) |>
  autoplot(aus_livestock) +
  labs(title = "Number of Pigs Slaughtered Monthly")

Part b

Compute a 95% prediction interval for the first forecast using \(\widehat{y} \pm 1.96s\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

They calculated interval and interval produced by r are very close.

residuals <- fit %>%
  augment() %>%
  pull(.resid)

s <- sd(residuals)

y_hat <- fc$.mean[1]

lower <- y_hat - 1.96*s
upper <- y_hat + 1.96*s

paste(lower, upper)

## [1] "76871.0124775157 113502.102384467"

fc %>% 
  slice(1)  %>% 
  hilo(95)

## # A tsibble: 1 x 7 [1M]
## # Key:       Animal, State, .model [1]
##   Animal State    .model                                                   Month
##   <fct>  <fct>    <chr>                                                    <mth>
## 1 Pigs   Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Jan
## # ℹ 3 more variables: Count <dist>, .mean <dbl>, `95%` <hilo>

Exercise 8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Part a

Plot the Exports series and discuss the main features of the data.

Exports in United Kingdom increases from 1960 to 1980, then decreases to mid 1990s and continue to increase. There appears to have no seasonality.

uk_economy <- global_economy %>%
  filter(Country == "United Kingdom")

uk_economy %>%
  autoplot(Exports) + 
  labs(y = "% of GDP", title = "United Kingdom Exports") +
  geom_smooth(method = 'loess', se = FALSE, color = "blue")

## `geom_smooth()` using formula = 'y ~ x'

decomp <- uk_economy %>%
  model(stl <- STL(Exports))

autoplot(components(decomp))

## Part b

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

uk_fit1 <- uk_economy %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

uk_fc1 <- uk_fit1 %>%
  forecast(h = 5)


uk_fit1 %>%
  forecast(h = 5) |>
  autoplot(uk_economy) +
  labs(title = "Annual United Kingdom Exports")

Part c

Compute the RMSE values for the training data.

accuracy(uk_fit1)

## # A tibble: 1 × 11
##   Country        .model   .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <fct>          <chr>    <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 United Kingdom "ETS(Ex… Trai… 0.178  1.35  1.02 0.564  4.09 0.983 0.991 0.0135

Part d

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

For the ETS(A,N,N) model, trend was not activated so forecast stayed flat. For the ETS(A,A,N) model, trend was added so forecast shows increasing.

uk_fit2 <- uk_economy %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

uk_fc2 <- uk_fit2 %>%
  forecast(h = 5)

uk_fit2 %>%
  forecast(h = 5) |>
  autoplot(uk_economy) +
  labs(title = "Annual United Kingdom Exports")

accuracy(uk_fit2)$RMSE

## [1] 1.350896

Part e

Compare the forecasts from both methods. Which do you think is best?

For the ETS(A,N,N) model, trend was not activated so forecast stayed flat. For the ETS(A,A,N) model, trend was added so forecast shows increasing. This model also has a lower RMSE so it is a better model.

Part f

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

uk_fit1 %>%
  forecast(h = 5) %>%
  hilo(level = 95) %>%
  mutate(conf_lo = .mean - 2 *accuracy(uk_fit1)$RMSE,
         conf_hi = .mean + 2 *accuracy(uk_fit1)$RMSE)

## # A tsibble: 5 x 8 [1Y]
## # Key:       Country, .model [1]
##   Country        .model                                                     Year
##   <fct>          <chr>                                                     <dbl>
## 1 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"N\…  2018
## 2 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"N\…  2019
## 3 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"N\…  2020
## 4 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"N\…  2021
## 5 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"N\…  2022
## # ℹ 5 more variables: Exports <dist>, .mean <dbl>, `95%` <hilo>, conf_lo <dbl>,
## #   conf_hi <dbl>

uk_fit2 %>%
  forecast(h = 5) %>%
  hilo(level = 95) %>%
  mutate(conf_lo = .mean - 2 *accuracy(uk_fit2)$RMSE,
         conf_hi = .mean + 2 *accuracy(uk_fit2)$RMSE)

## # A tsibble: 5 x 8 [1Y]
## # Key:       Country, .model [1]
##   Country        .model                                                     Year
##   <fct>          <chr>                                                     <dbl>
## 1 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"N\…  2018
## 2 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"N\…  2019
## 3 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"N\…  2020
## 4 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"N\…  2021
## 5 United Kingdom "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"N\…  2022
## # ℹ 5 more variables: Exports <dist>, .mean <dbl>, `95%` <hilo>, conf_lo <dbl>,
## #   conf_hi <dbl>

Exercise 8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

china_gdp <- global_economy %>%
  filter(Country == "China") %>%
  select(Year, GDP)

lambda <- china_gdp %>%
  features(GDP, features =  guerrero)

china_fit <- china_gdp%>%
  model(
    SES = ETS(GDP ~ error("A") + trend("N") + season("N")),
    Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    'Damped Holt'  = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
    'Box Cox'  = ETS(box_cox(GDP,lambda)~ error("A") + trend("A") + season("N"))
  ) 

china_fc <- china_fit %>%
  forecast(h = 15)

china_fit %>%
  forecast(h = 15)%>%
  autoplot(china_gdp, level = NULL) +
  labs(title = "China GDP")

accuracy(china_fit)

## # A tibble: 4 × 10
##   .model      .type          ME    RMSE     MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <chr>       <chr>       <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 SES         Training  2.10e11 4.16e11 2.13e11 8.14  11.0  0.983 0.991  0.789  
## 2 Holt        Training  2.36e10 1.90e11 9.59e10 1.41   7.62 0.442 0.453  0.00905
## 3 Damped Holt Training  2.95e10 1.90e11 9.49e10 1.62   7.62 0.438 0.454 -0.00187
## 4 Box Cox     Training -2.75e10 2.88e11 1.25e11 0.607  7.17 0.578 0.688  0.665

Exercise 8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Gas production varies seasonally as we saw in the previous homework demand was high in colder months. The seasonal fluctuations vary proportionally with the overall production level so multiplicative seasonality was necessary here. Addidtive seasonality would be necessary if seasonal variations are roughly constant through the series, which was not the case here. The multiplicative model and damped multiplicative model performed similarly. Both models did better than the additive model as expected. Damping the multipicative model did improve the forecast slightly as it had a lower RMSE.

gas_data <- aus_production %>%
  select(Quarter, Gas) %>%
  filter(!is.na(Gas))

gas_fit <- gas_data %>%
  model(
    additive = ETS(Gas ~ error("A") + trend("A") + season("A")),
    multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
    'Damped multiplicative'  = ETS(Gas ~ error("M") + trend("Ad") + season("M"))
  )
gas_fc <- gas_fit %>%
  forecast(h = 20)

gas_fc %>%
  autoplot(gas_data, level = NULL) +
  labs(title="Australian Gas Production") +
  guides(colour = guide_legend(title = "Forecast"))

report(gas_fit)

## Warning in report.mdl_df(gas_fit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 3 × 9
##   .model                  sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>                    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 additive              23.6       -927. 1872. 1873. 1903.  22.7  29.7 3.35  
## 2 multiplicative         0.00324   -831. 1681. 1682. 1711.  21.1  32.2 0.0413
## 3 Damped multiplicative  0.00329   -832. 1684. 1685. 1718.  21.1  32.0 0.0417

accuracy(gas_fit)

## # A tibble: 3 × 10
##   .model             .type       ME  RMSE   MAE    MPE  MAPE  MASE RMSSE    ACF1
##   <chr>              <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 additive           Trai…  0.00525  4.76  3.35 -4.69  10.9  0.600 0.628  0.0772
## 2 multiplicative     Trai… -0.115    4.60  3.02  0.199  4.08 0.542 0.606 -0.0131
## 3 Damped multiplica… Trai… -0.00439  4.59  3.03  0.326  4.10 0.544 0.606 -0.0217

Exercise 8.8

Recall your retail time series data (from Exercise 7 in Section 2.10).

Part a

Why is multiplicative seasonality necessary for this series?

The seasonal fluctuations vary proportionally with the overall production level so multiplicative seasonality was necessary here. The overall trend is increasing over time. We also see the amplitude of the waves (seasonality) increasing over time.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot()+
  labs(title = "Monthly Australian Retail Trade Turnover",
       x = "Monthly",
       y = "Turnover ($Million AUD)")

## Plot variable not specified, automatically selected `.vars = Turnover`

Part b

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

myseries_fit <- myseries %>%
  model(
    multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    'Damped multiplicative'  = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
  )

myseries_fit %>%
  forecast(h = 48) %>%
  autoplot(myseries, level = NULL) +
  labs(title = "Monthly Retail Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

Part c

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

The multiplicative model has a lower RMSE than the other model. Thus, its the better model.

accuracy(myseries_fit) %>%
  select(.model, RMSE)

## # A tibble: 2 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 multiplicative        0.613
## 2 Damped multiplicative 0.616

Part d

Check that the residuals from the best method look like white noise.

myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Null hypothesis: Residuals are white noise (uncorrelated)

If the p-value > 0.05, we fail to reject the null hypothesis. The resdidual are white noise. If the p-value < 0.05, we reject the null hypothesis. The model is still inadequate.

For models, the p-value was greater than 0.05 so the residuals behave like white noise. But the damped multiplicative model has higher p-values than the multiplicative model which shows it has better behaved residuals

myseries_fit %>%
  augment %>%
  features(.innov, box_pierce, lag = 24)

## # A tibble: 2 × 5
##   State              Industry                           .model bp_stat bp_pvalue
##   <chr>              <chr>                              <chr>    <dbl>     <dbl>
## 1 Northern Territory Clothing, footwear and personal a… Dampe…    22.9     0.526
## 2 Northern Territory Clothing, footwear and personal a… multi…    33.1     0.102

myseries_fit %>%
  augment %>%
  features(.innov, ljung_box, lag = 24)

## # A tibble: 2 × 5
##   State              Industry                           .model lb_stat lb_pvalue
##   <chr>              <chr>                              <chr>    <dbl>     <dbl>
## 1 Northern Territory Clothing, footwear and personal a… Dampe…    23.8    0.472 
## 2 Northern Territory Clothing, footwear and personal a… multi…    34.5    0.0765

Part e

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

Both multiplciative model and damped multiplciative model has a lower RMSE than the seasonal naïve approach, which makes them better model than the seasonal naïve approach.

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

myseries_fit2 <- myseries_train %>%
  model(
    multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
    'Damped multiplicative'  = ETS(Turnover ~ error("M") + trend("Ad") + season("M")),
    snaive = SNAIVE(Turnover)
  )

fc <- myseries_fit2 %>%
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc %>%
  autoplot(myseries, level = NULL)

myseries_fit2 %>% 
  accuracy()

## # A tibble: 3 × 12
##   State       Industry .model .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE
##   <chr>       <chr>    <chr>  <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 Northern T… Clothin… multi… Trai… -0.0119 0.518 0.384 -0.446  5.21 0.420 0.427
## 2 Northern T… Clothin… Dampe… Trai…  0.0357 0.519 0.392  0.158  5.34 0.428 0.427
## 3 Northern T… Clothin… snaive Trai…  0.439  1.21  0.915  5.23  12.4  1     1    
## # ℹ 1 more variable: ACF1 <dbl>

fc %>% 
  accuracy(myseries)

## # A tibble: 3 × 12
##   .model State Industry .type     ME  RMSE   MAE     MPE  MAPE  MASE RMSSE  ACF1
##   <chr>  <chr> <chr>    <chr>  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Dampe… Nort… Clothin… Test   0.163  1.15 0.878   0.194  6.45 0.960 0.949 0.501
## 2 multi… Nort… Clothin… Test  -1.54   1.78 1.60  -12.3   12.6  1.75  1.47  0.495
## 3 snaive Nort… Clothin… Test   0.836  1.55 1.24    5.94   9.06 1.36  1.28  0.601

Exercise 8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- myseries %>%
  features(Turnover, features =  guerrero) %>%
  pull(lambda_guerrero)

# refer code from https://otexts.com/fpp3/stl.html
# I went back to chap 5 for this part but not too sure if I am doing this problem correct

myseries %>%
  model(STL(box_cox(Turnover, lambda)~season(window = "periodic"),
    robust = TRUE)) %>%
  components() %>%
  autoplot()

my_series_new <- myseries_train %>%
  mutate(Turnover_box_cox = box_cox(Turnover, lambda) )

my_series_test <- myseries %>%
  filter(year(Month) >= 2011)

fit_dcmp <- my_series_new %>%
  model(
    stlf = decomposition_model(
    STL(Turnover_box_cox ~season(window = "periodic"),robust = TRUE),
    ETS(season_adjust)
  ))
  
# refer to code from ch 5 https://otexts.com/fpp3/forecasting-decomposition.html

fc <- fit_dcmp %>%
  forecast(h=10) 

autoplot(fc, my_series_new)

The RMSE in this model is lower than the other previous models.

accuracy(fit_dcmp)

## # A tibble: 1 × 12
##   State    Industry .model .type      ME   RMSE    MAE     MPE  MAPE  MASE RMSSE
##   <chr>    <chr>    <chr>  <chr>   <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… stlf   Trai… 2.92e-4 0.0781 0.0604 -0.0365  2.93 0.396 0.384
## # ℹ 1 more variable: ACF1 <dbl>

fit_dcmp %>%
  gg_tsresiduals()

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_rug()`).

fit_dcmp %>%
  augment %>%
  features(.innov, box_pierce, lag = 24)

## # A tibble: 1 × 5
##   State              Industry                           .model bp_stat bp_pvalue
##   <chr>              <chr>                              <chr>    <dbl>     <dbl>
## 1 Northern Territory Clothing, footwear and personal a… stlf      23.6     0.483

fit_dcmp %>%
  augment %>%
  features(.innov, ljung_box, lag = 24)

## # A tibble: 1 × 5
##   State              Industry                           .model lb_stat lb_pvalue
##   <chr>              <chr>                              <chr>    <dbl>     <dbl>
## 1 Northern Territory Clothing, footwear and personal a… stlf      24.9     0.410

For the box - pierce test, the p-value was greater than 0.05. We failed to reject the null hypothesis. The residuals looks okay here.

But for the Ljung- Box test, the p-value was less than 0.05. We reject the null hypothesis. This means there are sill autocorrection still present in the residual.

DATA 624 Homework 5

Susanna Wong

2025-09-11

Exercise 8.1

Part a

Part b

Exercise 8.5

Part a

Part c

Part d

Part e

Part f

Exercise 8.6

Exercise 8.7

Exercise 8.8

Part a

Part b

Part c

Part d

Part e

Exercise 8.9