DATA624HW5

Import Libraries

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of
α and
ℓ 0 , and generate forecasts for the next four months. Compute a 95% prediction interval for the first forecast using
^ y ± 1.96 s where
s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

pig_fit <- aus_livestock %>%
  filter(State == "Victoria") %>%
  filter(Animal == "Pigs") %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N"))) 

pig_fc <- pig_fit %>%
  forecast(h = 4)

pig_fc %>%
  autoplot(aus_livestock) +
  ggtitle("Number of Pigs Slaughtered in Victoria")

report(pig_fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

Optimal values for α is 0.322124 and for ℓ0 is 100646.6

pig_fc %>% hilo(95) %>% pull('95%') %>% head(1)

## <hilo[1]>
## [1] [76854.79, 113518.3]95

mean <- 95186.56
s <- sqrt(87480760)
z <- 1.96
margin <- z * s
lower <- mean - margin
upper <- mean + margin

paste(lower, upper)

## [1] "76854.4546212935 113518.665378707"

The 95% prediction intervals calculated manually and using R differ by only 0.34 on the lower bound and 0.33 on the upper bound. This small discrepancy indicates that the results are essentially the same, with an error rate of less than 0.01%.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts. Compute the RMSE values for the training data. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set. Compare the forecasts from both methods. Which do you think is best? Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

global_economy %>%
  filter(Country == "Norway") %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Norway's Yearly Exports")

Examining Norway’s exports, it is evident that exports consistently accounted for about 35-40% of the country’s GDP. There was a brief surge in the early 1980s, followed by a sharp decline in the late 1980s. After that, the share of exports in GDP steadily increased, reaching a peak around 2008. However, a significant drop occurred afterward, lasting until the mid-2010s. These sharp declines seem to coincide with global economic downturns, such as the late 1980s recession, the dot-com bubble burst in 2000, and the 2008 housing crisis and recession.

Nw_ft <- global_economy %>%
  filter(Country == "Norway") %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N"))) 

Nw_fc <- Nw_ft %>%
  forecast(h = 6)

Nw_fc %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Norway Yearly Exports", subtitle = "ETS(A,N,N) Model")

accuracy(Nw_ft)$RMSE

## [1] 2.485073

Nw_ft2 <- global_economy %>%
  filter(Country == "Norway") %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

Nw_fc2 <- Nw_ft2 %>%
  forecast(h = 6)

Nw_fc2 %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Norway Yearly Exports", subtitle = "ETS(A,A,N) Model")

accuracy(Nw_ft2)$RMSE

## [1] 2.489792

The key distinction between ETS(A,A,N) and ETS(A,N,N) lies in the A/N part, which represents the model’s trend component. “A” stands for additive, indicating a linear trend in the data, while “N” means none, signifying that no clear trend is present in the data.

For this dataset, I believe ETS(A,N,N) is a better fit. There doesn’t appear to be a noticeable trend in the data over this timeframe. While the Root Mean Square Error (RMSE) is similar for both models, ETS(A,N,N) performs slightly better with a lower RMSE of 2.485073.

Nw_fc %>% hilo(95) %>% pull('95%') %>% head(1)

## <hilo[1]>
## [1] [30.51542, 40.42916]95

mean_fc1 <- 35.5
s_fc1 <- sqrt(7.5)
z_fc1 <- 1.96
margin_fc1 <- z_fc1 * s_fc1
lower_fc1 <- mean_fc1 - margin_fc1
upper_fc1 <- mean_fc1 + margin_fc1

paste(lower_fc1, upper_fc1)

## [1] "30.1323189364494 40.8676810635506"

Nw_fc2 %>% hilo(95) %>% pull('95%') %>% head(1)

## <hilo[1]>
## [1] [30.34156, 40.45638]95

mean_fc2 <- 35.4
s_fc2 <- sqrt(7.5)
z_fc2 <- 1.96
margin_fc2 <- z_fc2 * s_fc2
lower_fc2 <- mean_fc2 - margin_fc2
upper_fc2 <- mean_fc2 + margin_fc2
paste(lower_fc2, upper_fc2)

## [1] "30.0323189364494 40.7676810635506"

As we can observe, the range of the hand-calculated interval is slightly wider than the interval calculated using R.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

Lambda_china <- global_economy %>%
  filter(Country == "China") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

Ch_fit <- global_economy %>%
  filter(Country == "China") %>%
  model(`Standard` = ETS(GDP ~ error("A") + trend("N") + season("N")),
        `Holt's method` = ETS(GDP ~ error("A") + trend("A") + season("N")),
        `Damped Holt's method` = ETS(GDP ~ error("A") + trend("Ad", phi = 0.8) + season("N")),
        `Box-Cox` = ETS(box_cox(GDP,Lambda_china) ~ error("A") + trend("Ad") + season("N")),
        `Damped Box-Cox` = ETS(box_cox(GDP,Lambda_china) ~ error("A") + trend("Ad", phi = 0.8) + season("N"))) 

Ch_fc <- Ch_fit %>%
  forecast(h = 20)

Ch_fc %>%
  autoplot(global_economy, level = NULL) +
  labs(title="China's Gross Domestic Product (GDP)") +
  guides(colour = guide_legend(title = "Forecast"))

8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

fit_gas <- aus_production %>%
  model(`Additive` = ETS(Gas ~ error("A") + trend("A") + season("A")),
        `Multiplicative` = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `Damped Multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M"))) 

aus_production %>%
  model(`Additive` = ETS(Gas ~ error("A") + trend("A") + season("A")),
        `Multiplicative` = ETS(Gas ~ error("M") + trend("A") + season("M")),
        `Damped Multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9))) %>%
  forecast(h=20) %>%
  autoplot(aus_production, level = NULL) +
  labs(title="Australia's Gas Production") +
  guides(colour = guide_legend(title = "Forecast"),
         subtitle="Additive vs. Multiplicative Seasonal Patterns")

report(fit_gas)

## Warning in report.mdl_df(fit_gas): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

accuracy(fit_gas)

The multiplicative method is more suitable when seasonal variations change in proportion to the data’s level. In contrast, the additive method is preferred when seasonal variation remains constant. In this case, we observe a proportional increase in the seasonal pattern over time, making the multiplicative method the best choice. Additionally, applying a damped trend improves the forecast by reducing the RMSE from 4.60 to 4.58.

8.8

Recall your retail time series data (from Exercise 7 in Section 2.10).

Why is multiplicative seasonality necessary for this series? Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer? Check that the residuals from the best method look like white noise. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

set.seed(334)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>% autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

Analyzing the retail data, it is clear that multiplicative seasonality is required because both the trend and the amplitude of the seasonality are increasing over time. This indicates that the variations in the seasonal pattern are proportional to the level of the time series.

fit_my <- myseries %>%
  model(`Multiplicative` = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        `Damped Multiplicative` = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))) 

fit_my %>%
  forecast(h=36) %>%
  autoplot(myseries, level = NULL) +
  labs(title="Retail Sales in Australia") +
  guides(colour = guide_legend(title = "Forecast"))

accuracy(fit_my) %>%
  select(.model, RMSE)

The preferred approach appears to be the Damped Multiplicative forecast, as it has an RMSE of 4.01, whereas the standard multiplicative method has an RMSE of 4.05.

myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  gg_tsresiduals() +
  ggtitle("Multiplicative Method")

# Box-Pierce test
myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 24, dof = 0)

# Ljung-Box test
myseries %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, ljung_box, lag = 24, dof = 0)

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

fit_train <- myseries_train %>%
  model(multi = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        snaive = SNAIVE(Turnover))

# Forecasts
fc <- fit_train %>%
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`

fc %>% autoplot(myseries, level = NULL)

accuracy(fit_train) %>%
  select(.type, .model, RMSE)

fc %>% accuracy(myseries)  %>%
  select(.type, .model, RMSE)

I was unable to surpass the Seasonal Naive method, as it demonstrates a lower RMSE.

8.9

lambda_retail <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# STL Box Cox transformed data
myseries %>%
  model(STL(box_cox(Turnover,lambda_retail) ~ season(window = "periodic"), robust = TRUE)) %>%
  components() %>%
  autoplot() +
  ggtitle("STL with Box-Cox Transformation")

# Box-Cox STL
Decomp <- myseries %>%
  model(STL_box = STL(box_cox(Turnover,lambda_retail) ~ season(window = "periodic"), robust = TRUE)) %>%
  components()

# Seasonally Updated
myseries$Turnover_sa <- Decomp$season_adjust

myseries_train <- myseries %>%
  filter(year(Month) < 2011)

fit <- myseries_train %>%
  model(ETS(Turnover_sa ~ error("M") + trend("A") + season("M")))
fit %>% gg_tsresiduals()  +
  ggtitle("Residuals of Australian Retail Turnover")

# Forecast Test Data
fc <- fit %>%
  forecast(new_data = anti_join(myseries, myseries_train))

## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover,
## Turnover_sa)`

fit %>% accuracy() %>%
  select(.model, .type, RMSE)

fc %>% accuracy(myseries) %>%
  select(.model, .type, RMSE)

DATA624HW5

2024-10-5

Import Libraries

8.1

8.5

8.6

8.7

8.8

8.9