Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman. Please submit both the link to your Rpubs and the .pdf file with your run code
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.2 ──
## ✔ tibble 3.3.0 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.2
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 4.0.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ purrr 1.1.0 ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
data("aus_livestock")
# Filter for pigs in victoria
pigs_vic <- aus_livestock %>%
filter(State == "Victoria", Animal == "Pigs")
# Expontenital smoothing
pigs_exp <- pigs_vic %>%
model(ETS(Count ~ error("A") + trend("N") + season("N")))
# Finding the optimal values for alpha
report(pigs_exp)
## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.3221247
##
## Initial states:
## l[0]
## 100646.6
##
## sigma^2: 87480760
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
The optimal alpha is 0.322 indicating moderate smoothing. The optimal ell-zero is 100,646.6.
# Forecast next 4 months
fc <- forecast(pigs_exp, h = 4)
# Plot
autoplot(fc, pigs_vic)
It looks like the next 4 months is around the current level (100,000) of
pigs slaughtered.
# interval produced by R
fc %>%
hilo(95) %>% # computes 95% prediction interval
pull('95%') %>% # extracts the interval column
head(1) # shows the first forecast interval
## <hilo[1]>
## [1] [76854.79, 113518.3]95
# My interval
pig_fc <- forecast(pigs_exp, h = 1) # finding the mean
mean <- pig_fc$.mean
s <- sqrt(87480760) # sigma
z <- 1.96 # 95% confidence interval
margin <- 1.96 * s
lower <- mean - margin
upper <- mean + margin
paste(lower,upper)
## [1] "76854.452052285 113518.662809698"
The manually computed 95% prediction interval closely matches the R’s interval. The small difference can be due to rounding but this confirms that the manual formula y +- 1.96s produces the same results as R’s built in forecast calculation.
Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
aus_exports <- global_economy %>% filter(Country == "Australia")
autoplot(aus_exports, Exports) +
labs(title = "Australian Exports (% of GDP)",
y = "Exports (% of GDP)", x = "Year")
Clear uptrend with no strong seasonality.
fit <- aus_exports %>% model(ETS(Exports ~ error("A") + trend("N") + season("N")))
# Forecast 5 years ahead (you can change h)
fc <- forecast(fit, h = 5)
# Plot forecasts
autoplot(fc, aus_exports) +
labs(title = "ETS(A,N,N) Forecasts of Australian Exports",
y = "Exports (% of GDP)", x = "Year")
The forecasted line appears flat since ETS(A,N,N) assumes no trend or seasonality.
accuracy(fit)$RMSE
## [1] 1.146794
# Fit both models
fit_models <- aus_exports %>%
model(
`ETS(A,N,N)` = ETS(Exports ~ error("A") + trend("N") + season("N")),
`ETS(A,A,N)` = ETS(Exports ~ error("A") + trend("A") + season("N"))
)
# Model fit statistics (AICc, BIC)
model_stats <- glance(fit_models) %>%
select(.model, AICc, BIC)
# Accuracy metrics (RMSE, MAE, MAPE)
accuracy_stats <- accuracy(fit_models) %>%
select(.model, RMSE, MAE, MAPE)
# Combine both tables
comparison <- left_join(model_stats, accuracy_stats, by = ".model")
comparison
## # A tibble: 2 × 6
## .model AICc BIC RMSE MAE MAPE
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ETS(A,N,N) 258. 264. 1.15 0.914 5.41
## 2 ETS(A,A,N) 259. 269. 1.12 0.893 5.39
The ETS(A,A,N) model has a lower RMSE, MAE, and MAPE, indicating it fits the data better. However, the difference is minimal and the added trend parameter in the ETS(A,A,N) may not justified.
Both models had similar short term forecasts but the original plot had a upward trend so ETS(A,A,N) would be a better forecasting method. ETS(A,N,N) produces sloped forecasts that will continue the upward trend observed in the historical data whereas ETS(A,N,N) will just give a flat forecast because it assumes no trend.
# Mean forecasts from each model
fc <- forecast(fit_models, h = 1)
mean_ANN <- fc %>% filter(.model == "ETS(A,N,N)") %>% pull(.mean)
mean_AAN <- fc %>% filter(.model == "ETS(A,A,N)") %>% pull(.mean)
# RMSE values
rmse_ANN <- 1.1468
rmse_AAN <- 1.1167
# 95% intervals
lower_ANN <- mean_ANN - 1.96 * rmse_ANN
upper_ANN <- mean_ANN + 1.96 * rmse_ANN
lower_AAN <- mean_AAN - 1.96 * rmse_AAN
upper_AAN <- mean_AAN + 1.96 * rmse_AAN
# Table of interval
cbind(Model = c("ETS(A,N,N)", "ETS(A,A,N)"),
Lower = c(lower_ANN, lower_AAN),
Upper = c(upper_ANN, upper_AAN))
## Model Lower Upper
## [1,] "ETS(A,N,N)" "18.3594310526505" "22.8548870526505"
## [2,] "ETS(A,A,N)" "18.6499091205802" "23.0273731205802"
# Forecast both models
fc <- forecast(fit_models, h = 1)
# Extract 95% prediction intervals for both models
fc_intervals <- fc %>%
hilo(95) %>% # computes 95% interval
select(.model, `95%`) # keep model name and interval
fc_intervals # shows the first forecast interval
## # A tsibble: 2 x 3 [1Y]
## # Key: .model [2]
## .model `95%` Year
## <chr> <hilo> <dbl>
## 1 ETS(A,N,N) [18.31970, 22.89462]95 2018
## 2 ETS(A,A,N) [18.57028, 23.10700]95 2018
The manually calculated intervals are nearly identical to the R’s output.
Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
china_gdp <- global_economy %>%
filter(Country == "China") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
china <- global_economy %>% filter(Country == "China")
# Models
fit <- china %>%
model(
`ETS default` = ETS(GDP),
`Holt` = ETS(GDP ~ error("A") + trend("A") + season("N")),
`Damped Holt` = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
`Box-Cox` = ETS(box_cox(GDP, china_gdp)),
`Damped Box-Cox`= ETS(box_cox(GDP, china_gdp) ~ trend("Ad"))
)
# Forecast 20 years
fc <- forecast(fit, h = 20)
autoplot(fc, china, level = NULL) +
labs(title = "China GDP: ETS variants", y = "US$ billions")
# Compare fit and accuracy
glance(fit) %>% select(.model, AICc, BIC)
## # A tibble: 5 × 3
## .model AICc BIC
## <chr> <dbl> <dbl>
## 1 ETS default 3103. 3112.
## 2 Holt 3259. 3268.
## 3 Damped Holt 3262. 3273.
## 4 Box-Cox -137. -128.
## 5 Damped Box-Cox -133. -123.
accuracy(fit) %>% select(.model, RMSE, MAE, MAPE)
## # A tibble: 5 × 4
## .model RMSE MAE MAPE
## <chr> <dbl> <dbl> <dbl>
## 1 ETS default 200000912947. 96595526193. 7.72
## 2 Holt 189990266650. 95916434827. 7.62
## 3 Damped Holt 190208894133. 94855392933. 7.62
## 4 Box-Cox 288333700735. 125239687471. 7.17
## 5 Damped Box-Cox 196282258243. 102421287515. 7.26
This plot shows that all the ETS models forecase continued growth in China’s gdp. The Box-Cox and Damped Box-box models show a higher rate of growth compared to the undamped models. The Box-Cox model has the lowest AICc and MAPE, indicating the best fit but the RMSE is very high compared to the other models. The damped models produced a flatter, more realistic long term forecast in line with the current trend. Thus, the damped Box-Cox or Damped Holt is the most optimal model for forecasting China’s gdp.
Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
gas <- aus_production %>%
select(Quarter, Gas)
# Fit ETS model with multiplicative seasonality
fit <- gas %>%
model(ETS(Gas ~ error("A") + trend("A") + season("M")))
# Forecast next 3 years
fc <- forecast(fit, h = "3 years")
autoplot(fc, gas) +
labs(title = "Australian Gas Production Forecasts",
y = "Terajoules", x = "Year")
Multiplicative seasonality is needed as the highs and lows rise as overall gas production grows. The seasonal swings grow at a proportional rate, so the seasonal effect scales with the level of the series rather than stay constant, like in additive seasonality.
# Comparing the damped model
fits <- gas %>%
model(
`ETS(A,A,M)` = ETS(Gas ~ error("A") + trend("A") + season("M")),
`ETS(A,Ad,M)` = ETS(Gas ~ error("A") + trend("Ad") + season("M"))
)
# Forecast next 8 years
fc <- forecast(fits, h = "10 years")
autoplot(fc, gas, level = NULL) +
labs(title = "Australian Gas Production: ETS vs Damped ETS",
x = "Year", y = "Terajoules") +
guides(colour = guide_legend(title = "Model"))
The damped and undamped lines overlapped in the 5-year forecast, so I extended the time period to 10 years. Over the longer period, the damped model begins to flatten while the undamped model continues its upward trend. This shows how damping reduces the long-term growth rate, producing more realistic forecasts when sustained exponential growth is unlikely.
Recall your retail time series data (from Exercise 7 in Section 2.10).
set.seed(12345678)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(myseries)
## Plot variable not specified, automatically selected `.vars = Turnover`
Why is multiplicative seasonality necessary for this series? Multiplicative seasonality is necessary for this series as the plot shows seasonal variation grows as sales increases. The peaks and troughs get larger over time, meaning this seasonal effect is proportional. Multiplicative seasonality captures how the seasonal swings expand along with the upward retail trend better than additive seasonality.
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
# Fit two models: standard Holt-Winters and damped Holt-Winters
fit <- myseries %>%
model(
`Holt-Winters (Multiplicative)` = ETS(Turnover ~ error("M") + trend("A") + season("M")),
`Damped Holt-Winters` = ETS(Turnover ~ error("M") + trend("Ad") + season("M"))
)
# Forecast next 5 years
fc <- forecast(fit, h = "5 years")
autoplot(fc, myseries) +
labs(title = "Holt-Winters Multiplicative Forecasts for Retail Turnover",
y = "Turnover", x = "Year") +
guides(colour = guide_legend(title = "Model"))
# Compare performance
glance(fit) %>% select(.model, AICc, BIC)
## # A tibble: 2 × 3
## .model AICc BIC
## <chr> <dbl> <dbl>
## 1 Holt-Winters (Multiplicative) 1776. 1841.
## 2 Damped Holt-Winters 1783. 1851.
accuracy(fit) %>% select(.model, RMSE, MAE, MAPE)
## # A tibble: 2 × 4
## .model RMSE MAE MAPE
## <chr> <dbl> <dbl> <dbl>
## 1 Holt-Winters (Multiplicative) 0.613 0.450 5.15
## 2 Damped Holt-Winters 0.616 0.444 5.06
Both the models capture the general uptrend in retail turnover but the damped model flattens more over time.
d.Check that the residuals from the best method look like white noise.
myseries %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
gg_tsresiduals() +
ggtitle("Multiplicative Method")
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The residuals fluctuate randomly around zero with no visible pattern suggesting that the model captures the main structure of the data. In the ACF plot, with the exception of the two spikes on lag 7 and 12, there spikes fall within the blue lines, indicating there is little autocorrelation in the model.
# Split the data
train <- myseries %>% filter(year(Month) <= 2010)
test <- myseries %>% filter(year(Month) > 2010)
# Fit Holt-Winters multiplicative model to training data
fit_hw <- train %>%
model(ETS(Turnover ~ error("M") + trend("A") + season("M")))
# Forecast over test period
fc_hw <- forecast(fit_hw, new_data = test)
# Compute RMSE for Holt-Winters
acc_hw <- accuracy(fc_hw, test)
acc_hw %>% select(.model, RMSE)
## # A tibble: 1 × 2
## .model RMSE
## <chr> <dbl>
## 1 "ETS(Turnover ~ error(\"M\") + trend(\"A\") + season(\"M\"))" 1.78
# Seasonal naïve model for comparison
fit_snaive <- train %>% model(SNAIVE(Turnover ~ lag("year")))
fc_snaive <- forecast(fit_snaive, new_data = test)
acc_snaive <- accuracy(fc_snaive, test)
acc_snaive %>% select(.model, RMSE)
## # A tibble: 1 × 2
## .model RMSE
## <chr> <dbl>
## 1 "SNAIVE(Turnover ~ lag(\"year\"))" 1.55
# Plot
fc_all <- bind_rows(
mutate(fc_hw, Model = "Holt-Winters (ETS)"),
mutate(fc_snaive, Model = "Seasonal Naïve")
)
autoplot(myseries, Turnover) +
autolayer(fc_all, level = NULL, aes(colour = Model)) +
labs(
title = "Retail Turnover Forecasts: Holt-Winters vs Seasonal Naïve",
x = "Year", y = "Turnover"
) +
guides(colour = guide_legend(title = "Model"))
The seasonal naive model performed slightly better than the Holt-Winters model with a lower RMSE on the test set. The series is highly seasonal and the naive model captures the repeating pattern better than the Holt-Winters model.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
# Split the data
train <- myseries %>% filter(year(Month) <= 2010)
test <- myseries %>% filter(year(Month) > 2010)
# Estimate lambda for Box–Cox
lambda <- train %>% features(Turnover, guerrero) %>% pull(lambda_guerrero)
# STL decomposition with Box–Cox transform
stl_fit <- train %>%
model(
STL = decomposition_model(
STL(box_cox(Turnover, lambda)),
ETS(season_adjust ~ error("A") + trend("A") + season("N"))
)
)
# Forecast
fc_stl <- forecast(stl_fit, new_data = test)
# Evaluate test RMSE
acc_stl <- accuracy(fc_stl, test)
bind_rows(
mutate(acc_hw, Method = "Holt-Winters (ETS)"),
mutate(acc_snaive, Method = "Seasonal Naïve"),
mutate(acc_stl, Method = "STL + BoxCox + ETS")
) %>% select(Method, RMSE)
## # A tibble: 3 × 2
## Method RMSE
## <chr> <dbl>
## 1 Holt-Winters (ETS) 1.78
## 2 Seasonal Naïve 1.55
## 3 STL + BoxCox + ETS 4.01
The STL + Box–Cox + ETS model had a RMSE of 4.0099, which is much worse than both the Holt-Winters and seasonal naïve models. This likely occurred because the STL decomposition or Box–Cox transformation over-smoothed the data, causing the model to miss short-term seasonal fluctuations. The seasonal naive model was the most accurate of the models with a RMSE of 1.551981. This suggests that the retail series is highly seasonal. More complex models like the STL, Box-Cos, and ETS model did not provide additional benefit and may have introduced unnecessary noise to the model.