Homework 5

library(fpp3)

Registered S3 method overwritten by 'tsibble':
  method               from 
  as_tibble.grouped_df dplyr

── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──

✔ tibble      3.3.1     ✔ tsibble     1.1.6
✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
✔ tidyr       1.3.2     ✔ feasts      0.4.2
✔ lubridate   1.9.4     ✔ fable       0.5.0
✔ ggplot2     4.0.1

── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()    masks base::date()
✖ dplyr::filter()      masks stats::filter()
✖ tsibble::intersect() masks base::intersect()
✖ tsibble::interval()  masks lubridate::interval()
✖ dplyr::lag()         masks stats::lag()
✖ tsibble::setdiff()   masks base::setdiff()
✖ tsibble::union()     masks base::union()

1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

vic_pigs <- aus_livestock %>%
  filter(Animal == "Pigs", State == "Victoria")

fit <- vic_pigs %>%
  model(SES = ETS(Count ~ error("A") + trend("N") + season("N")))

tidy(fit)

# A tibble: 2 × 5
  Animal State    .model term    estimate
  <fct>  <fct>    <chr>  <chr>      <dbl>
1 Pigs   Victoria SES    alpha      0.322
2 Pigs   Victoria SES    l[0]  100647.

fc <- fit %>%
  forecast(h = "4 months")

# View the forecast table
fc

# A fable: 4 x 6 [1M]
# Key:     Animal, State, .model [1]
  Animal State    .model    Month
  <fct>  <fct>    <chr>     <mth>
1 Pigs   Victoria SES    2019 Jan
2 Pigs   Victoria SES    2019 Feb
3 Pigs   Victoria SES    2019 Mar
4 Pigs   Victoria SES    2019 Apr
# ℹ 2 more variables: Count <dist>, .mean <dbl>

# Visualize the result
fc %>%
  autoplot(vic_pigs %>% filter_index("2014" ~ .)) +
  labs(title = "Pigs Slaughtered in Victoria", y = "Count")

Warning: There was 1 warning in `filter()`.
ℹ In argument: `time_in(Month, ...)`.
Caused by warning:
! `yearmonth()` may yield unexpected results.
ℹ Please use arg `format` to supply formats.

b

Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

#Standard deviation
res_std <- augment(fit) %>%
  pull(.resid) %>%
  sd(na.rm = TRUE)

# Gets the y hat
first_fc <- fc %>%
  filter(row_number() == 1) %>%
  pull(.mean)

lower <- first_fc - 1.96 * res_std
upper <- first_fc + 1.96 * res_std

cat("Manual Interval: [", lower, ", ", upper, "]\n")

Manual Interval: [ 76871.01 ,  113502.1 ]

Comparison

fc %>%
  hilo(level = 95) %>%
  unpack_hilo("95%") %>%
  select(.model, Month, .mean, `95%_lower`, `95%_upper`) %>%
  slice(1)

# A tsibble: 1 x 5 [1M]
# Key:       .model [1]
  .model    Month  .mean `95%_lower` `95%_upper`
  <chr>     <mth>  <dbl>       <dbl>       <dbl>
1 SES    2019 Jan 95187.      76855.     113518.

#5 Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

a

a Plot the Exports series and discuss the main features of the data.

us_exports <- global_economy %>%
  filter(Country == "United States")

# Plot the series
us_exports %>%
  autoplot(Exports) +
  labs(title = "Annual Exports: United States",
       y = "Exports (% of GDP)",
       x = "Year")

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

There is a clear upward trend from the 1960s through 2011. There are cycles when the exports drop mostly likely related around the time the US went through a recession ##b Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

us_exports <- global_economy %>%
  filter(Country == "United States")

fit <- us_exports %>%
  model(SES = ETS(Exports ~ error("A") + trend("N") + season("N")))

fc <- fit %>%
  forecast(h = 5)

fc %>%
  autoplot(us_exports) +
  labs(title = "US Exports: ETS(A,N,N) Forecast",
       y = "Exports (% of GDP)",
       x = "Year")

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

c

Compute the RMSE values for the training data.

# Using Fit from previous problems
training_accuracy <- fit %>%
  accuracy()

training_accuracy %>%
  select(.model, .type, RMSE)

# A tibble: 1 × 3
  .model .type     RMSE
  <chr>  <chr>    <dbl>
1 SES    Training 0.627

d

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

us_exports <- global_economy %>%
  filter(Country == "United States")

us_fit_aan <- us_exports %>%
  model(AAN = ETS(Exports ~ error("A") + trend("A") + season("N")))

# 5-year forecast
us_fc_aan <- forecast(us_fit_aan, h = "5 years")

us_fc_aan %>%
  autoplot(us_exports, level = NULL) +
  labs(title = "ETS AAN Forecast for United States Exports",
       subtitle = "Holt's Linear Trend Model",
       x = "Year",
       y = "Exports (% of GDP)") +
  scale_y_continuous(labels = scales::comma_format()) +
  theme_minimal()

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_line()`).

e

Compare the forecasts from both methods. Which do you think is best?

I think Holt is better for this dataset. SES is better for flat data that stay at a constant value, struggles to measure historical growth, and assume the current level is the best guess for the future. The danger of using the holt model is it’s blind to momentum. The US ecomny is slowing done and holt model

f

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R

# us_exports defined in previous problem
fit <- us_exports %>%
  model(
    SES = ETS(Exports ~ error("A") + trend("N") + season("N")),
    Holt = ETS(Exports ~ error("A") + trend("A") + season("N"))
  )

fc_1 <- fit %>% forecast(h = 1)
rmse_values <- accuracy(fit) %>% select(.model, RMSE)

# Calculation of 95% Prediction Interval
manual_intervals <- fc_1 %>%
  as_tibble() %>%
  left_join(rmse_values, by = ".model") %>%
  mutate(
    Lower_Manual = .mean - 1.96 * RMSE,
    Upper_Manual = .mean + 1.96 * RMSE
  ) %>%
  select(.model, .mean, RMSE, Lower_Manual, Upper_Manual)

r_intervals <- fc_1 %>%
  hilo(level = 95) %>%
  unpack_hilo("95%") %>%
  select(.model, .mean, `95%_lower`, `95%_upper`)

print(manual_intervals)

# A tibble: 2 × 5
  .model .mean  RMSE Lower_Manual Upper_Manual
  <chr>  <dbl> <dbl>        <dbl>        <dbl>
1 SES     11.9 0.627         10.7         13.1
2 Holt    12.1 0.615         10.9         13.3

print(r_intervals)

# A tsibble: 2 x 5 [?]
# Key:       .model [2]
  .model .mean `95%_lower` `95%_upper`  Year
  <chr>  <dbl>       <dbl>       <dbl> <dbl>
1 SES     11.9        10.7        13.1  2018
2 Holt    12.1        10.9        13.4  2018

6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

china_gdp <- global_economy %>%
  filter(Country == "China") 

#Calculate lambda for box cox
lambda <- china_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit <- china_gdp %>%
  model(
    `SES (Flat)` = ETS(GDP ~ error("A") + trend("N") + season("N")),
    `Holt (Linear)` = ETS(GDP ~ error("A") + trend("A") + season("N")),
    `Damped` = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
    `Box-Cox Holt` = ETS(box_cox(GDP, lambda) ~ error("A") + trend("A") + season("N"))
  )

fit %>%
  forecast(h = 20) %>%
  autoplot(china_gdp, level = NULL) +
  labs(title = "GDP of China: Model Comparison",
       y = "GDP (USD)", x = "Year") +
  theme_minimal()

7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

gas_data <- aus_production %>%
  filter(!is.na(Gas))

fit <- gas_data %>%
  model(
    Multiplicative = ETS(Gas ~ error("M") + trend("A") + season("M")),
    Damped = ETS(Gas ~ error("M") + trend("Ad") + season("M")),
    Additive = ETS(Gas ~ error("A") + trend("A") + season("A"))
  )

fc <- fit %>% forecast(h = "5 years")

fc %>%
  autoplot(gas_data, level = NULL) +
  labs(title = "Australian Gas Production Forecast",
       y = "Petajoules") +
  theme_minimal()

8

Recall your retail time series data (from Exercise 7 in Section 2.10). ## a Why is multiplicative seasonality necessary for this series?

As a business grows, its busy seasons, like the fall to winter holiday rush, don’t just stay the same size they scale up in proportion to the overall volume of the business.

b

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

myseries <- aus_retail %>%
  index_by(Month) %>%
  summarise(turnover = sum(Turnover, na.rm = TRUE))

fit <- myseries %>%
  model(
    `Holt-Winters Multiplicative` = ETS(turnover ~ error("M") + trend("A") + season("M")),
    `Damped Multiplicative` = ETS(turnover ~ error("M") + trend("Ad") + season("M"))
  )

fc <- fit %>% forecast(h = "3 years")

fc %>%
  autoplot(myseries, level = NULL) +
  labs(title = "Retail Turnover: Holt-Winters vs. Damped Trend",
       y = "Turnover") +
  theme_minimal()

c

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy_comparison <- fit %>%
  accuracy() %>%
  select(.model, RMSE, MAE, MAPE)

# Print the results
accuracy_comparison

# A tibble: 2 × 4
  .model                       RMSE   MAE  MAPE
  <chr>                       <dbl> <dbl> <dbl>
1 Holt-Winters Multiplicative  421.  327.  1.55
2 Damped Multiplicative        428.  327.  1.55

Looking at the forecasts I think the Damped Multiplicative would be the best but the difference between both model looks negligible.

d

Check that the residuals from the best method look like white noise.

fit %>%
  select(`Holt-Winters Multiplicative`) %>%
  gg_tsresiduals()

Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
ℹ Please use `ggtime::gg_tsresiduals()` instead.

fit %>%
  select(`Damped Multiplicative`) %>%
  gg_tsresiduals()

I decided to check both models residual just to see the comparison between them. The residuals for both of them seems to average around zero. It also seems to be getting more accurate as time goes on suggesting that both model are preforming well.

e

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

train <- myseries %>%
  filter_index(. ~ "2010 Dec")

fit_test <- train %>%
  model(
    `Holt-Winters` = ETS(turnover ~ error("M") + trend("A") + season("M")),
    `Damped` = ETS(turnover ~ error("M") + trend("Ad") + season("M")),
    `SNAIVE` = SNAIVE(turnover)
  )

fc_test <- fit_test %>%
  forecast(new_data = anti_join(myseries, train, by = "Month"))

fc_test %>%
  accuracy(myseries) %>%
  select(.model, RMSE, MAE, MAPE)

# A tibble: 3 × 4
  .model        RMSE   MAE  MAPE
  <chr>        <dbl> <dbl> <dbl>
1 Damped       5928. 4850. 10.1 
2 Holt-Winters 4304. 3578.  7.47
3 SNAIVE       8089. 6962. 14.6

Snaive beat the two other models which is surprising. I effected the more advance model to out perform it. My guess is that holt and damped model learned a trend before 2009 and when the 2009 recession slowed down the economy those models didn’t adapt while the SNaive just repeat data from the previous year. This is why it’s import to have a data sciencist manually adjust the dampening value to prevent the data from over estimating.

9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

# Setting Lambda for box cot
lambda_train <- train %>%
  features(turnover, features = guerrero) %>%
  pull(lambda_guerrero)

fit_hybrid <- train %>%
  model(
    STL_ETS = decomposition_model(
      STL(box_cox(turnover, lambda_train) ~ season(window = "periodic")),
      ETS(season_adjust ~ error("A") + trend("Ad") + season("N"))
    )
  )

fc_hybrid <- fit_hybrid %>% forecast(new_data = anti_join(myseries, train, by = "Month"))

# Comparing with SNAIVE and Damped
fc_hybrid %>% 
  accuracy(myseries) %>%
  select(.model, RMSE, MAE, MAPE)

# A tibble: 1 × 4
  .model   RMSE   MAE  MAPE
  <chr>   <dbl> <dbl> <dbl>
1 STL_ETS 4537. 3576.  7.34

The STL + ETS did out preform Holt Winters but couldn’t out perform the original seasonal Naive model.