DATA 624 Data Acquisition & Management

Homework 5

Silma Khan SPRING 2025

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman

Exercise 8.1:

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

  1. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of alpha (a) and l0, and generate forecasts for the next four months.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.4     ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(forecast)
## Warning: package 'forecast' was built under R version 4.4.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
pig_fit <- aus_livestock %>%
  filter(State == "Victoria", Animal == "Pigs") %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))

pig_fc <- pig_fit %>%
  forecast(h = 4)

pig_fc %>%
  autoplot(aus_livestock) +
  labs(title = "Distribution of Pigs Slaughters in Victoria")

report(pig_fit)
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

The optimal values of: - alpha = 0.322 - l0 = 100646.6

  1. Compute a 95% prediction interval for the first forecast using y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
pig_fc %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [76854.79, 113518.3]95

Exercise 8.5:

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

  1. Plot the Exports series and discuss the main features of the data.
global_economy
## # A tsibble: 15,150 x 9 [1Y]
## # Key:       Country [263]
##    Country     Code   Year         GDP Growth   CPI Imports Exports Population
##    <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
##  1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
##  2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
##  3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
##  4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
##  5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
##  6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
##  7 Afghanistan AFG    1966 1399999967.     NA    NA   18.6     8.57   10152331
##  8 Afghanistan AFG    1967 1673333418.     NA    NA   14.2     6.77   10372630
##  9 Afghanistan AFG    1968 1373333367.     NA    NA   15.2     8.90   10604346
## 10 Afghanistan AFG    1969 1408888922.     NA    NA   15.0    10.1    10854428
## # ℹ 15,140 more rows
global_economy %>%
  filter(Country == "Bangladesh") %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Exports in Bangladesh")

  1. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
bangladesh_fit <- global_economy %>%
  filter(Country == "Bangladesh") %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N"))) 

bangladesh_fc <- bangladesh_fit %>%
  forecast(h = 4)

bangladesh_fc %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Exports in Bangladesh", subtitle = "ETS(A,N,N)")

  1. Compute the RMSE values for the training data.
rmse_bd <- accuracy(bangladesh_fit)$RMSE
rmse_bd
## [1] 1.253158
  1. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
bangladesh_fit_2 <- global_economy %>%
  filter(Country == "Bangladesh") %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N"))) 

bangladesh_fc_2 <- bangladesh_fit_2 %>%
  forecast(h = 4)

bangladesh_fc_2 %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Exports in Bangladesh", subtitle = "ETS(A,A,N)")

rmse_bd_2 <- accuracy(bangladesh_fit_2)$RMSE
rmse_bd_2
## [1] 1.250591
  1. Compare the forecasts from both methods. Which do you think is best?

I think the ETS(A,A,N) method is the better one since it is able to showcase either an increasing or decreasing trend, rather than just being a straight constant line. This shows variation in future predicitons

  1. Calculate a 95% prediction interval for the first forecast for each model, using the RMSE
bangladesh_fc %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [12.53665, 17.53589]95

Exercise 8.6:

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

china <- global_economy %>%
  filter(Country == "China") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

china_fit <- global_economy %>%
  filter(Country == "China") %>%
  model(`Standard` = ETS(GDP ~ error("A") + trend("N") + season("N")),
        `Holt's method` = ETS(GDP ~ error("A") + trend("A") + season("N")),
        `Damped Holt's method` = ETS(GDP ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
        `Box-Cox` = ETS(box_cox(GDP,china) ~ error("A") + trend("Ad") + season("N")),
        `Damped Box-Cox` = ETS(box_cox(GDP,china) ~ error("A") + trend("Ad", phi = 0.8) + season("N"))) 

china_fc <- china_fit %>%
  forecast(h = 25)

china_fc %>%
  autoplot(global_economy, level = NULL) +
  labs(title="China's GDP") +
  guides(colour = guide_legend(title = "Forecast"))

Exercise 8.7:

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Multiplicative Seaosnality is necessary here because the difference in the seasonality and the patterns seem to be direcrly related to the time series as well

gas_fit <- aus_production %>%
  model(multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `damped multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M")))

aus_production %>%
    model(multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `damped multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M"))
        ) %>%
   forecast(h=20) %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australian Gas Production") +
  guides(colour = guide_legend(title = "Forecast"),
         subtitle="Multiplicative v.s Damped Multiplicative Seasonality")

report(gas_fit)
## Warning in report.mdl_df(gas_fit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.
## # A tibble: 2 × 9
##   .model                 sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE    MAE
##   <chr>                   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 multiplicative        0.00324   -831. 1681. 1682. 1711.  21.1  32.2 0.0413
## 2 damped multiplicative 0.00340   -835. 1688. 1689. 1719.  21.0  32.4 0.0424
accuracy(gas_fit)
## # A tibble: 2 × 10
##   .model               .type     ME  RMSE   MAE   MPE  MAPE  MASE RMSSE     ACF1
##   <chr>                <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
## 1 multiplicative       Trai… -0.115  4.60  3.02 0.199  4.08 0.542 0.606 -0.0131 
## 2 damped multiplicati… Trai…  0.255  4.58  3.04 0.655  4.15 0.545 0.604 -0.00451

Exercise 8.8:

Recall your retail time series data (from Exercise 7 in Section 2.10).

  1. Why is multiplicative seasonality necessary for this series?

Multiplicative seasonality is necessary for this series because retail sales show seasonal patterns and by using a multiplicative seasonal model, it allows the seasonal effect to be proportional to the series level

  1. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
set.seed(2468)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 
myseries_fit <- myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        `damped multiplicative` = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) 

myseries_fit %>%
  forecast(h=36) %>%
  autoplot(myseries, level = NULL) +
  labs(title="Australia Retail Turnover") +
  guides(colour = guide_legend(title = "Forecast"))

The multiplicative Method shows more of an increase in the trend while still maintaining the seasonality of the data

  1. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
accuracy(myseries_fit) %>%
  select(.model, RMSE)
## # A tibble: 2 × 2
##   .model                 RMSE
##   <chr>                 <dbl>
## 1 multiplicative         5.45
## 2 damped multiplicative  5.48

The RMSE for the damped multiplicative method is approximately 0.0268 more than the RMSE for the multiplicative method.So the multiplicative one would be more preferred

  1. Check that the residuals from the best method look like white noise.
myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  gg_tsresiduals() +
  ggtitle("Multiplicative Method Residuals")

  1. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
myseries_training <- myseries %>%
  filter(year(Month) < 2011)
training_fit <- myseries_training %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        snaive = SNAIVE(Turnover))
training_fc <- training_fit %>%
  forecast(new_data = anti_join(myseries, myseries_training))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
training_fc %>%
  autoplot(myseries, level=NULL)

accuracy(training_fit) %>%
  select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type    .model          RMSE
##   <chr>    <chr>          <dbl>
## 1 Training multiplicative  4.09
## 2 Training snaive          9.50
training_fc %>% accuracy(myseries)  %>%
  select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type .model          RMSE
##   <chr> <chr>          <dbl>
## 1 Test  multiplicative  16.6
## 2 Test  snaive          33.6

The multiplicative method has a much lower RMSE compared to Seasonal Naive and seems to be able to forecast the data more efficiently, showing that the Multiplicative method is the more appropriate method to use in this case

Exercise 8.9:

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
myseries %>%
  model(STL(box_cox(Turnover,lambda) ~ season(window = "periodic"), robust = TRUE)) %>%
  components() %>%
  autoplot() +
  ggtitle("STL with Box-Cox for MySeries Retail Data")