DATA624_HW5_CameronGray

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.4.2

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## Warning: package 'tidyr' was built under R version 4.4.2

## Warning: package 'tsibble' was built under R version 4.4.2

## Warning: package 'tsibbledata' was built under R version 4.4.2

## Warning: package 'feasts' was built under R version 4.4.2

## Warning: package 'fabletools' was built under R version 4.4.2

## Warning: package 'fable' was built under R version 4.4.2

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

pigs <- aus_livestock |>
  filter(Animal == 'Pigs')

pigs |>
  model(
    STL(Count ~ trend(window = 12) + season(window = "periodic"),
    robust = TRUE)) |>
  components() |>
  autoplot() +
  theme(legend.position = "none")

fit <- pigs |> model(ETS(Count ~ error("A") + trend("N") + season("N")))
fc <- fit |> forecast(h = 4)

fc |>
  autoplot(pigs) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  theme(legend.position = "none")

pigs |>
  autoplot() +
  facet_wrap('State', ncol = 1, scale = "free_y")+
  theme(legend.position = "none")

## Plot variable not specified, automatically selected `.vars = Count`

fc |>
  autoplot() +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  theme(legend.position = "none")

Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where \(s\) is the standard deviation of the residuals. Compare your interval with the interval produced by R.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data.

There doesn’t appear to be any seasonality to the data which makes sense since it is yearly data. That leaves the trend to be what we are mostly looking for. We have an upward trend with a but with some variation from year to year.

korean_exports <- global_economy |>
  select(Exports) |>
  filter(Country == "Korea, Rep.")

korean_exports |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Exports`

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit <- korean_exports |>
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc <- forecast(fit, h = 4)

fc |>
  autoplot(korean_exports) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  theme(legend.position = "none")

c) Compute the RMSE values for the training data.

accuracy(fit)$RMSE

## [1] 3.408323

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

The ETS(A,A,N) is much better than ETS(A,N,N) for this data set because we have an upward trend so using an additive trend causes the model to perform better, as shown by it’s RMSE of 3.339374 vs the none trended model having an RMSE of 3.408323.

fit <- korean_exports |>
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))
fc <- forecast(fit, h = 4)

fc |>
  autoplot(korean_exports) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  theme(legend.position = "none")

accuracy(fit)$RMSE

## [1] 3.339374

Compare the forecasts from both methods. Which do you think is best?

The ETS(A,A,N) did much better because it followed the upward trend expected in the data whereas the ETS(A,N,N) did not. This looks much more accurate to the expected data than the other.

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

china_gdp <- global_economy |>
  select(GDP) |>
  filter(Country == "China")

china_gdp |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = GDP`

fit <- china_gdp |>
  model(ETS(GDP ~ error("A") + trend("A") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(china_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Additive") +
  theme(legend.position = "none")

fit <- china_gdp |>
  model(ETS(GDP ~ error("A") + trend("M") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(china_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Multiplicative") +
  theme(legend.position = "none")

fit <- china_gdp |>
  model(ETS(GDP ~ error("A") + trend("Ad") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(china_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Additive Damped") +
  theme(legend.position = "none")

fit <- china_gdp |>
  model(ETS(GDP ~ error("A") + trend("Md") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(china_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Multiplicative Damped") +
  theme(legend.position = "none")

lambda <- china_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

bcchina_gdp <- china_gdp |>
  mutate(GDP = box_cox(china_gdp$GDP, lambda))

bcchina_gdp |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = GDP`

fit <- bcchina_gdp |>
  model(ETS(GDP ~ error("A") + trend("A") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(bcchina_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Box-Cox Additive") +
  theme(legend.position = "none")

fit <- bcchina_gdp |>
  model(ETS(GDP ~ error("A") + trend("M") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(bcchina_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Box-Cox Multiplicative") +
  theme(legend.position = "none")

fit <- bcchina_gdp |>
  model(ETS(GDP ~ error("A") + trend("Ad") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(bcchina_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Box-Cox Additive Damped") +
  theme(legend.position = "none")

fit <- bcchina_gdp |>
  model(ETS(GDP ~ error("A") + trend("Md") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(bcchina_gdp) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Box-Cox Multiplicative Damped") +
  theme(legend.position = "none")

8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

The multiplicative seasonality is necessary since the trend of the seasonality is increasing over time with an exponential growth. Interestingly enough with the damping the multiplicative does better than the additive does before the damping.

aus_gas <- aus_production |>
  select(Gas)

fit <- aus_gas |>
  model(ETS(Gas ~ error("A") + trend("A") + season("M")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(aus_gas) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Additive; Season: Multiplicative") +
  theme(legend.position = "none")

print(accuracy(fit)$RMSE)

## [1] 4.189856

fit <- aus_gas |>
  model(ETS(Gas ~ error("A") + trend("M") + season("M")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(aus_gas) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Multiplicative; Season: Multiplicative") +
  theme(legend.position = "none")

print(accuracy(fit)$RMSE)

## [1] 4.231464

fit <- aus_gas |>
  model(ETS(Gas ~ error("A") + trend("Ad") + season("M")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(aus_gas) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Additive Damped; Season: Multiplicative") +
  theme(legend.position = "none")

print(accuracy(fit)$RMSE)

## [1] 4.21658

fit <- aus_gas |>
  model(ETS(Gas ~ error("A") + trend("Md") + season("M")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(aus_gas) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Multiplicative Damped; Season: Multiplicative") +
  theme(legend.position = "none")

print(accuracy(fit)$RMSE)

## [1] 4.159836

8.8

Recall your retail time series data (from Exercise 7 in Section 2.10).

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Why is multiplicative seasonality necessary for this series?

Since this data is highly seasonal there needs to be a seasonality aspect for the ETS to be accurate. Based on how the trends in the seasonality grow it is probably best to be multiplicative instead of additive.

myseries |>
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit <- myseries |>
  model(ETS(Turnover ~ error("M") + trend("M") + season("M")))
fc <- forecast(fit, h = 60)
fc |>
  autoplot(myseries) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Multiplicative; Season: Multiplicative") +
  theme(legend.position = "none")

fit <- myseries |>
  model(ETS(Turnover ~ error("M") + trend("Md") + season("M")))
fc <- forecast(fit, h = 60)
fc |>
  autoplot(myseries) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Multiplicative Damped; Season: Multiplicative") +
  theme(legend.position = "none")

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

The not damped works better for this data set since it has a lower error.

fit <- myseries |> model(ETS(Turnover ~ error("M") + trend("M") + season("M")))
fc <- forecast(fit, h = 1)
mmmmyseries <- accuracy(fit)$RMSE

fit <- myseries |> model(ETS(Turnover ~ error("M") + trend("Md") + season("M")))
fc <- forecast(fit, h = 1)
mmdmmyseries <- accuracy(fit)$RMSE

print("Not damped:")

## [1] "Not damped:"

print(mmmmyseries)

## [1] 0.6151178

print("Damped:")

## [1] "Damped:"

print(mmdmmyseries)

## [1] 0.6277715

Check that the residuals from the best method look like white noise.

Yes they are the mostly random with very little pattern.

myseries |>
  model(ETS(Turnover ~ error("M") + trend("M") + season("M"))) |>
  gg_tsresiduals()

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

Yes the ETS method is much better than the naive just looking at the RMSE.

myseries_train <- myseries |>
  filter(year(Month) < 2010)

fit <- myseries_train |>
  model(ETS(Turnover ~ error("M") + trend("Md") + season("M")))
fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

ets <- accuracy(fit)$RMSE

fit <- myseries_train |>
  model(SNAIVE(Turnover))
fc <- fit |>
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc |> autoplot(myseries)

naive <- accuracy(fit)$RMSE

print("ETS RMSE:")

## [1] "ETS RMSE:"

print(ets)

## [1] 0.4899889

print("Naive RMSE:")

## [1] "Naive RMSE:"

print(naive)

## [1] 1.218218

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

After all the transformation the RMSE is the closest to zero it has been so it is the very best forecast I’ve seen so far.

lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

bc_myseries <- myseries |>
  mutate(Turnover = box_cox(Turnover, lambda))

stl_myseries <- bc_myseries |>
  model(
    STL(Turnover ~ trend(window = 12) + season(window = "periodic"),
    robust = TRUE)) |>
  components() 

stl_myseries |>
  autoplot()

newseries <- myseries |>
  mutate(Turnover = stl_myseries$season_adjust)

fit <- newseries |>
  model(ETS(Turnover ~ error("A") + trend("Md") + season("N")))
fc <- forecast(fit, h = 20)
fc |>
  autoplot(newseries) +
  geom_line(aes(y = .fitted), col="#D55E00", data = augment(fit)) +
  ggtitle("Trend: Multiplicative Damped") +
  theme(legend.position = "none")

print(accuracy(fit)$RMSE)

## [1] 0.0790849

DATA624_HW5_CameronGray

Cameron Gray

2025-02-24

8.1

8.5

8.6

8.7

8.8

8.9