Data 624 HW 5

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman.

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

suppressWarnings(library(tsibble))
suppressWarnings(library(fpp3))
suppressWarnings(library(fable))
suppressWarnings(library(ggplot2))
suppressWarnings(library(dplyr))

aus_livestock_vic_pigs <- aus_livestock %>%
  filter(State == "Victoria",
         Animal == "Pigs")
# Plot the time series.
plot<- aus_livestock_vic_pigs %>%
  autoplot(Count) +
  labs(title = 'Pigs Slaughtered in Victoria Timeseries')
plot

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

fit <- aus_livestock_vic_pigs %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))

fit_report <- report(fit)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

print(fit_report)

## # A mable: 1 x 3
## # Key:     Animal, State [1]
##   Animal State    `ETS(Count ~ error("A") + trend("N") + season("N"))`
##   <fct>  <fct>                                                 <model>
## 1 Pigs   Victoria                                         <ETS(A,N,N)>

The optimal values: 𝛼= 0.3221247. ℓ0= 100646.6

Generate forecasts for the next four months.

fc <- fit %>%
  forecast(h = 4)

print(fc)

## # A fable: 4 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State    .model                                                   Month
##   <fct>  <fct>    <chr>                                                    <mth>
## 1 Pigs   Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Jan
## 2 Pigs   Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Feb
## 3 Pigs   Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Mar
## 4 Pigs   Victoria "ETS(Count ~ error(\"A\") + trend(\"N\") + season(\"… 2019 Apr
## # ℹ 2 more variables: Count <dist>, .mean <dbl>

fc%>% autoplot(aus_livestock_vic_pigs) +
  ggtitle("Number of Pigs Slaughtered in Victoria") +
  xlab("Month") +
  ylab("Number of Pigs Slaughtered")

b.Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals.

Mean <- 95186.56
SD <- sqrt(87480760)

lower_level <- Mean - 1.96 * SD
upper_level <- Mean + 1.96 * SD
paste(lower_level,upper_level)%>% head(1)

## [1] "76854.4546212935 113518.665378707"

The 95% prediction interval for the first forecast is from 76854 to 113518.

Compare your interval with the interval produced by R:

fc %>% hilo(95) %>% 
  pull('95%') %>% head(1)

## <hilo[1]>
## [1] [76854.79, 113518.3]95

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

(a) Plot the Exports series and discuss the main features of the data.

# Load necessary libraries
library(tsibble)
library(fable)
library(ggplot2)
library(fabletools)

# Assume 'global_economy' data set is already loaded
# Filter data for Australia
Australia_exports <- global_economy %>%
  filter(Country == "Australia") %>%
  select(Country, Year, Exports)

# Convert to tsibble
Australia_exports_tsibble <- as_tsibble(Australia_exports, index = Year)

# Plot the Exports series
ggplot(Australia_exports_tsibble, aes(x = Year, y = Exports)) +
  geom_line(colour = "blue") +
  labs(title = "Annual Exports of Australia", x = "Year", y = "Exports (in USD)") +
  theme_minimal()

When plotting the Exports series, we might observe the following features:

The graph shows an increasing trend in the annual exports of Australia between 1960 and 2000. The overall direction of the graph is upwards, indicating that exports have generally increased over this period.

(b) Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

# Fit ETS(A,N,N) model
ets_ann <- Australia_exports_tsibble %>% 
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

# Generate forecasts for the next 5 years
ets_ann_forecast <- ets_ann %>% forecast(h = 5)

ets_ann_forecast %>% autoplot(Australia_exports) +
  labs(title = 'ANN MODEL: Australia Annual Exports Forecast')

report(ets_ann)

## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.5659948 
## 
##   Initial states:
##      l[0]
##  12.98943
## 
##   sigma^2:  1.3621
## 
##      AIC     AICc      BIC 
## 257.3943 257.8387 263.5756

(c) Compute the RMSE values for the training data.

accuracy(ets_ann)

## # A tibble: 1 × 11
##   Country   .model        .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <fct>     <chr>         <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Australia "ETS(Exports… Trai… 0.232  1.15 0.914  1.09  5.41 0.928 0.928 0.0125

The RMSE value for the training data is 1.146794

(d) Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

# Fit ETS(A,A,N) model
ets_aan <- Australia_exports_tsibble %>% 
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

# Generate forecasts for the next 5 years
ets_aan_forecast <- ets_aan %>% forecast(h = 5)

ets_aan_forecast %>% autoplot(Australia_exports) +
  labs(title = 'AAN MODEL: Australia Annual Exports Forecast')

modelComparison <- Australia_exports %>%
  model(
    ANN = ETS(Exports ~ error('A') + trend('N') + season('N')),
    AAN = ETS(Exports ~ error('A') + trend('A') + season('N'))
  )

accuracy(modelComparison)

## # A tibble: 2 × 11
##   Country   .model .type          ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <fct>     <chr>  <chr>       <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Australia ANN    Training  2.32e-1  1.15 0.914  1.09   5.41 0.928 0.928 0.0125
## 2 Australia AAN    Training -7.46e-7  1.12 0.893 -0.387  5.39 0.907 0.904 0.109

The table shows that the ANN model has a slightly lower RMSE (1.146794) and MAE (0.9135835) compared to the AAN model (1.116727 and 0.8926420 respectively).
Usually, if RMSE is lower, we can say that the model fits the best. In this case, both models provide a reasonably good fit to the data, since the difference between the metrics is relatively small.

(e) Compare the forecasts from both methods. Which do you think is best?

modelComparison %>%
  forecast(h = 4) %>%
  autoplot(Australia_exports, level = NULL) +
  labs(title = 'Australia Annual Exports ANN Vs AAN Forecast Model Comparison')

Although it’s hard to say which model is better, I assume the AAN model appears slightly better due to it’s greater stability.

(f) Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

ANN

standardDeviation <- modelComparison %>%
  select(Country, ANN) %>%
  accuracy() %>%
  transmute(Country, standardDeviation = RMSE)
standardDeviation

## # A tibble: 1 × 2
##   Country   standardDeviation
##   <fct>                 <dbl>
## 1 Australia              1.15

modelComparison %>%
  select(Country, ANN) %>%
  forecast(h = 1) %>%
  left_join(standardDeviation, by = 'Country') %>%
  mutate(lowerCi = Exports - 1.96 * standardDeviation,
         upperCi = Exports + 1.96 * standardDeviation) %>%
  select(Country, Exports, lowerCi, upperCi)

## # A fable: 1 x 5 [?]
## # Key:     Country [1]
##   Country      Exports
##   <fct>         <dist>
## 1 Australia N(21, 1.4)
## # ℹ 3 more variables: lowerCi <dist>, upperCi <dist>, Year <dbl>

ets_ann_forecast %>% hilo(95) %>% pull('95%') %>% head(1)

## <hilo[1]>
## [1] [18.3197, 22.89462]95

AAN

standardDeviation <- modelComparison %>%
  select(Country, AAN) %>%
  accuracy() %>%
  transmute(Country, standardDeviation = RMSE)
standardDeviation

## # A tibble: 1 × 2
##   Country   standardDeviation
##   <fct>                 <dbl>
## 1 Australia              1.12

modelComparison %>%
  select(Country, AAN) %>%
  forecast(h = 1) %>%
  left_join(standardDeviation, by = 'Country') %>%
  mutate(lowerCi = Exports - 1.96 * standardDeviation,
         upperCi = Exports + 1.96 * standardDeviation) %>%
  select(Country, Exports, lowerCi, upperCi)

## # A fable: 1 x 5 [?]
## # Key:     Country [1]
##   Country      Exports
##   <fct>         <dist>
## 1 Australia N(21, 1.3)
## # ℹ 3 more variables: lowerCi <dist>, upperCi <dist>, Year <dbl>

##interval produced by R:**
ets_aan_forecast %>% hilo(95) %>% pull('95%') %>% head(1)

## <hilo[1]>
## [1] [18.57028, 23.107]95

The interval computed by R using hilo() is a slightly larger interval compared to the others.

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

Chana_GDP <- global_economy %>% 
filter(Country == "China")

# Estimate the optimal Box-Cox transformation parameter using Guerrero's method.

box_cox_lambda <- Chana_GDP %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)

# Compare different ETS() models

ets_model_comparison <- Chana_GDP %>%
model(
ETS_Basic = ETS(GDP),
ETS_Transformed = ETS(box_cox(GDP, box_cox_lambda)),
ETS_DampedTrend = ETS(GDP ~ trend('A', phi = 0.7)),
 ETS_LogTransformed = ETS(log(GDP))

  )

# Generate forecasts and visualize the results using ggplot2.

ets_model_comparison %>%
forecast(h = 30) %>%
autoplot(Chana_GDP, level = NULL) +
labs(title = 'China GDP Forecasts: A Comparison of ETS Models') +
scale_color_manual(values = c("ETS_Basic" = "blue", "ETS_Transformed" = "red", "ETS_DampedTrend" = "green", "ETS_LogTransformed" = "purple")) + 
theme_minimal()

8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production %>%
autoplot(Gas) +
labs(
title = "Australian Gas Production",
ylab = "Production (Billions of Cubic Feet)",
xlab = "Quarter"
 ) +
theme(plot.title = element_text(hjust = 0.5))

fit_model <- aus_production %>%
model(
    `Additive` = ETS(Gas ~ error("A") + trend("A") + season("A")),
    `Multiplicative` = ETS(Gas ~ error("M") + trend("A") + season("M")) )
fc <- fit_model %>%
forecast(h=30)
autoplot(fc, aus_production, level = NULL) +
labs(title="Australian Gas Production",
subtitle="Additive vs. Multiplicative Seasonality") +
guides(colour = guide_legend(title = "Forecast"))

Multiplicative forecast shows slightly larger fluctuations than the additive forecast.

fit_model <- aus_production %>%
model(
    `Multiplicative` = ETS(Gas ~ error("M") + trend("A") + season("M")),
    `Damped Multiplicative` = ETS(Gas ~ error("M") + trend("Ad", phi = 0.9) + season("M"))
  )

fc <- fit_model %>%
forecast(h=30)
autoplot(fc, aus_production, level = NULL) +
labs(title="Australian Gas Production",
subtitle="Multiplicative vs. Damped Multiplicative Seasonality") +
 guides(colour = guide_legend(title = "Forecast"))

Multiplicative seasonality is essential because the seasonal fluctuations increase over time. It looks like dumped multiplicative model provides a better forecast than the multiplicative model for this dataset. Damped Multiplicative model shows smoother and less volatile prediction of future production.

8.8

Recall your retail time series data (from Exercise 7 in Section 2.10).

Why is multiplicative seasonality necessary for this series?

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>% autoplot(Turnover)+
labs(title="Monthly Australian retail data")

Multiplicative seasonality is necessary because the amplitude of seasonal fluctuations increases with the level of the time series.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fit_model <- myseries %>%
  model(
    'Holt Winters Multiplicative Method' = ETS(Turnover ~ error('M') + trend('A') + season('M')),
    'Holt Winters Damped Method' = ETS(Turnover ~ error('M') + trend('Ad') + season('M'))
  )

fc <- fit_model %>% forecast(h = 20)

fc %>% autoplot(myseries, level = NULL)+
labs(title="Monthly Australian retail data",
subtitle="Holt Winters Multiplicative Method vs. Holt Winters Damped Method") +
guides(colour = guide_legend(title = "Forecast"))

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

compare_accuracy<-accuracy(fit_model) %>% select('.model', 'RMSE')
compare_accuracy

## # A tibble: 2 × 2
##   .model                              RMSE
##   <chr>                              <dbl>
## 1 Holt Winters Multiplicative Method 0.613
## 2 Holt Winters Damped Method         0.616

The preferred method will be the one with the lower RMSE value, which is Holt Winters Multiplicative Method in our case

Check that the residuals from the best method look like white noise.

fit_model %>%
  select('Holt Winters Multiplicative Method') %>%
  gg_tsresiduals()

Since the autocorrelation values on the ACF plot remain inside the confidence intervals, it suggests minimal significant autocorrelation. The histogram displays a near-normal distribution which positions its central point at zero. The residual plot displays a random distribution of points around zero without any visible patterns.

Overall, based on the ACF plot, histogram, and residual plot, the residuals appear to be consistent with white noise. This suggests that the Holt-Winters Multiplicative Method is a good fit for the data.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

# Define a function to create the models and forecast
forecast_turnover <- function(data, train_end_year) {

  # Create training data
  train_data <- data %>%
    filter(year(Month) < train_end_year)

  # Fit the models
  fit_models <- train_data %>%
    model(
      'Holt Winters Multiplicative' = ETS(Turnover ~ error("M") + trend("A") + season("M")),
      'Holt Winters Damped' = ETS(Turnover ~ error("M") + trend("Ad") + season("M")),
      'Seasonal Naive' = SNAIVE(Turnover)
    )

  # Create comparison data  
  comparison_data <- anti_join(data, train_data, by = c("State", "Industry", "Series ID", "Month", "Turnover"))

  # Forecast using the fitted models
  forecasts <- fit_models %>% forecast(comparison_data)

  # Return the comparison data, forecasts, and fit_models
  return(list(comparison_data = comparison_data, forecasts = forecasts, fit_models = fit_models))
}

# Apply the function
forecast_output <- forecast_turnover(myseries, 2011)
comparison_data <- forecast_output$comparison_data
forecasts <- forecast_output$forecasts
fit_models <- forecast_output$fit_models

# Visualize the forecasts
autoplot(comparison_data, Turnover) +
  autolayer(forecasts, level = NULL) +
  labs(title = "Turnover Forecast Comparison")

# Calculate accuracy metrics using the fit_models object
accuracy(fit_models) %>%
  select(.type, .model, RMSE)

## # A tibble: 3 × 3
##   .type    .model                       RMSE
##   <chr>    <chr>                       <dbl>
## 1 Training Holt Winters Multiplicative 0.518
## 2 Training Holt Winters Damped         0.519
## 3 Training Seasonal Naive              1.21

I was able to beat Seasonal Naive approach, since it has the highest RMSE. The best method for forecasting turnover appears to be the Holt-Winters Multiplicative method.

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

myseries_retail<- myseries %>%
  filter(year(Month) < 2011)

lambda <- myseries_retail %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
ts_bc <- myseries_retail %>%
  mutate(
    bc = box_cox(Turnover, lambda)
  )

bc_fit <- ts_bc %>%
  model(
    'STL (BoxCox)' = STL(bc ~ season(window = "periodic"),
                         robust = T),
    'ETS (BoxCox)' = ETS(bc)
  )

bc_fit_opt <-ts_bc %>%
  model(
    "Holt-Winters' Multiplicative" = ETS(Turnover ~ error("M") + 
                                           trend("A") +
                                           season("M"))
  )

rbind(accuracy(bc_fit),accuracy(bc_fit_opt))

## # A tibble: 3 × 12
##   State    Industry .model .type       ME   RMSE    MAE    MPE  MAPE  MASE RMSSE
##   <chr>    <chr>    <chr>  <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 Norther… Clothin… STL (… Trai…  0.00745 0.0819 0.0623  0.193  2.85 0.329 0.326
## 2 Norther… Clothin… ETS (… Trai…  0.0139  0.0994 0.0784  0.516  3.56 0.414 0.396
## 3 Norther… Clothin… Holt-… Trai… -0.0119  0.518  0.384  -0.446  5.21 0.420 0.427
## # ℹ 1 more variable: ACF1 <dbl>

Now It looks like Holt-Winters’ Multiplicative shows the highest RMSE compare to other models.

Data 624 HW 5

Uliana Plotnikova

2025-03-02

8.1

8.5

8.6

8.7

8.8

8.9