DATA624 HW 5

Do exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman. Please submit both the link to your Rpubs and the .pdf file with your run code

##8.1 Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset. A) Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

data("aus_livestock")
#filter for Pigs
pigs_vic <- aus_livestock %>%
  filter(Animal == "Pigs", State == "Victoria") %>%
  select(Month, Count)
# make the dataframe tsibble() or time series format
pigs_vic <- pigs_vic %>% as_tsibble(index = Month)

ets_model <- pigs_vic %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))

# View model details
report(ets_model)

## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

From the ETS() function above we estimate the value of α to be equall to 0.322 and ℓ0 to be equall 100,064.6 Below is the forcast of the next 4 months.

forecast <- ets_model %>% forecast(h = "4 months")
forecast

## # A fable: 4 x 4 [1M]
## # Key:     .model [1]
##   .model                                          Month             Count  .mean
##   <chr>                                           <mth>            <dist>  <dbl>
## 1 "ETS(Count ~ error(\"A\") + trend(\"N\") + … 2019 Jan N(95187, 8.7e+07) 95187.
## 2 "ETS(Count ~ error(\"A\") + trend(\"N\") + … 2019 Feb N(95187, 9.7e+07) 95187.
## 3 "ETS(Count ~ error(\"A\") + trend(\"N\") + … 2019 Mar N(95187, 1.1e+08) 95187.
## 4 "ETS(Count ~ error(\"A\") + trend(\"N\") + … 2019 Apr N(95187, 1.1e+08) 95187.

Compute a 95% prediction interval for the first forecast using y ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

residuals_ets <- residuals(ets_model)  # Extract residuals from the model
s <- sd(residuals_ets$.resid)

first_forecast <- forecast %>% slice(1) %>% pull(.mean)  # First forecasted value

residuals_ets <- residuals(ets_model)  # Extract residuals from the model
s <- sd(residuals_ets$.resid)  # Compute the standard deviation of residuals

# View the standard deviation (optional)
s

## [1] 9344.666

# Step 6: Manually compute the 95% prediction interval for the first forecast
first_forecast <- forecast %>% slice(1) %>% pull(.mean)  # First forecasted value

# Manually compute the prediction interval using ^y ± 1.96 * s
manual_lower_bound <- first_forecast - 1.96 * s
manual_upper_bound <- first_forecast + 1.96 * s

# Print the manually computed prediction interval
manual_interval <- c(manual_lower_bound, manual_upper_bound)
print(paste("Manual 95% prediction interval:", manual_interval))

## [1] "Manual 95% prediction interval: 76871.0124775157"
## [2] "Manual 95% prediction interval: 113502.102384467"

# Step 7: Extract R's built-in 95% prediction intervals
# Prediction intervals are contained in a 'dist' object that needs extraction via the 'hilo()' function
forecast_intervals <- forecast %>%
  mutate(interval = hilo(Count, level = 95))

# Extract the first forecast and its interval produced by R
r_interval <- forecast_intervals %>%
  slice(1) %>%
  pull(interval)

# Print R's prediction interval for comparison
print(paste("R's 95% prediction interval:", r_interval$lower, r_interval$upper))

## [1] "R's 95% prediction interval: 76854.7888896402 113518.325972343"

Both intervals were very close, the lower and upper bounds were slightly different and that can be because of the way R handls residuals. Still both values were very close.

##8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data.

data(global_economy)

head(global_economy)

## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414

global_economy %>%
  filter(Country == "Egypt, Arab Rep.") %>%
  autoplot(Exports) +
  labs(y="% of GDP", title="Exports: Egypt")

The Chart shows the historical trend of Egypt exports compared to the %GPD from around 1960 - 2010. The Data shows fluctiations over the time, without clear or consistent pattern.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

fit_egy <- global_economy %>%
  filter(Country == "Egypt, Arab Rep.") %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N"))) 

fc_egy <- fit_egy %>%
  forecast(h = 5)

fc_egy %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Exports: Egypt", subtitle = "ETS(A,N,N)")

Compute the RMSE values for the training data.

s_egy <- accuracy(fit_egy)$RMSE
s_egy

## [1] 3.153827

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

fit_egy_trended <- global_economy %>%
  filter(Country == "Egypt, Arab Rep.") %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

fc_egy_trended <- fit_egy_trended %>%
  forecast(h = 5)

fc_egy_trended %>%
  autoplot(global_economy) +
  labs(y="% of GDP", title="Exports: Egypt", subtitle = "ETS(A,A,N)")

s_egy_trended <- accuracy(fit_egy_trended)$RMSE
s_egy_trended

## [1] 3.155265

Compare the forecasts from both methods. Which do you think is best?

Both values were pretty identical at 3.155. I think in this case both are suffecient enough.

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

fit_egy <- global_economy %>%
  filter(Country == "Egypt, Arab Rep.") %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))

# Forecast for the next 5 periods
fc_egy <- fit_egy %>%
  forecast(h = 5)

# Calculate RMSE for the model
s_egy <- accuracy(fit_egy)$RMSE
s_egy

## [1] 3.153827

y_egy <- fc_egy$.mean[1]
fc_egy %>%
  hilo(95) %>% pull('95%') %>% head(1)

## <hilo[1]>
## [1] [9.527095, 22.1087]95

The 95% prediction interval is 9.527095 - 22.1087.

##8.6 Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

#using lambda we can observe the forcast of different methods 
lambda <- global_economy %>%
  filter(Country == "China") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit_china <- global_economy %>%
  filter(Country == "China") %>%
  model(`Simple` = ETS(GDP ~ error("A") + trend("N") + season("N")),
        `Holt's method` = ETS(GDP ~ error("A") + trend("A") + season("N")),
        `Box-Cox` = ETS(box_cox(GDP,lambda) ~ error("A") + trend("A") + season("N")),
        `Log` = ETS(log(GDP) ~ error("A") + trend("A") + season("N"))
        ) 

fc_china <- fit_china %>%
  forecast(h = 25)

fc_china %>%
  autoplot(global_economy, level = NULL) +
  labs(title="GDP: China") +
  guides(colour = guide_legend(title = "Forecast"))

##8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

library(fpp3)

# Load and filter the Gas data
gas_data <- aus_production %>%
  select(Quarter, Gas)

# Plot the original data to visualize trends and seasonality
gas_data %>%
  autoplot(Gas) +
  labs(title = "Australian Gas Production", y = "Gas Production (in terajoules)", x = "Year")

# ETS model with multiplicative seasonality
ets_gas <- gas_data %>%
  model(ETS(Gas ~ error("A") + trend("A") + season("M")))

# Forecast for the next few years (set horizon, e.g., h = 8 quarters or 2 years)
fc_ets_gas <- ets_gas %>%
  forecast(h = 8)

# Plot the forecast for the basic ETS model
autoplot(fc_ets_gas) +
  labs(title = "Forecast for Australian Gas Production with ETS (Multiplicative Seasonality)",
       y = "Gas Production (in terajoules)", x = "Year") +
  guides(colour = guide_legend(title = "Forecast"))

# ETS model with damped trend
ets_damped <- gas_data %>%
  model(ETS(Gas ~ error("A") + trend("Ad") + season("M")))  #

##8.8

Recall your retail time series data (from Exercise 7 in Section 2.10).

Why is multiplicative seasonality necessary for this series?

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

Check that the residuals from the best method look like white noise. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

set.seed(45645)
time_ser <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
time_ser%>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = Turnover`

a)Why is multiplicative seasonality necessary for this series? A reason why multiplicative seasonality is necessary is because there is a clear upward trend over the time, with seasonal fluctuation patterns.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

aus_model <- time_ser %>%
  model(ETS(Turnover ~ error("A") + trend("A", damped = TRUE) + season("M")))

## Warning: 1 error encountered for ETS(Turnover ~ error("A") + trend("A", damped = TRUE) + season("M"))
## [1] unused argument (damped = TRUE)

aus_forecast <- aus_model %>%
  forecast(h = "12 months")

# Plotting the forecast
aus_forecast %>%
  autoplot(time_ser) +
  labs(title = "Holt-Winters' Multiplicative Forecast with Damped Trend",
       x = "Time",
       y = "Turnover")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

DAMPED

aus_damped_model <- time_ser %>%
  model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))


aus_damped_forecast <- aus_damped_model %>%
  forecast(h = "12 months")

aus_damped_forecast %>%
  autoplot(time_ser)

c)Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer? It appears as if the RMSE model is will have a better forecast accuracy
d) Check that the residuals from the best method look like white noise.

hw_model <- time_ser %>%
  model(ETS(Turnover ~ error("A") + trend("A") + season("M")))

hw_damped_model <- time_ser %>%
  model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
accuracy_hw <- hw_model %>%
  accuracy()

# View RMSE for the non-damped model
accuracy_hw

## # A tibble: 1 × 12
##   State       Industry .model .type      ME  RMSE   MAE    MPE  MAPE  MASE RMSSE
##   <chr>       <chr>    <chr>  <chr>   <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1 Western Au… Takeawa… "ETS(… Trai… -0.0198  3.51  2.48 -0.545  3.88 0.359 0.388
## # ℹ 1 more variable: ACF1 <dbl>

accuracy_hw_damped <- hw_damped_model %>%
  accuracy()

# View RMSE for the damped model
accuracy_hw_damped

## # A tibble: 1 × 12
##   State  Industry .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <chr>  <chr>    <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Weste… Takeawa… "ETS(… Trai… 0.487  3.46  2.44 0.639  3.71 0.354 0.382 0.00690

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

myseries_train <- time_ser |>
  filter(year(Month) < 2011)

autoplot(time_ser, Turnover) +
  autolayer(myseries_train, Turnover, colour = "red")

##8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

First we utilize the forecast package

#useing lambda from fabletools
library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(forecast)
lambda <- BoxCox.lambda(time_ser$Turnover)

fit_stl_bc_ets <- time_ser |>
  filter(Month <= yearmonth("2010 Dec")) |>  # Use training data up to 2010
  model(
    STL_BoxCox_ETS = decomposition_model(
      STL(box_cox(Turnover, lambda) ~ season(window = "periodic")),
      ETS(season_adjust ~ error("A") + trend("A") + season("N"))
    )
  )

Secondly, we forecast with the Box-Cox transformed STL-ETS model

# Forecast on the test data (from 2011 onwards)
forecasts_stl_bc_ets <- fit_stl_bc_ets |>
  forecast(new_data = time_ser |> filter(Month > yearmonth("2010 Dec")))

# Back-transform the forecasts to the original scale using inverse Box-Cox
forecasts_stl_bc_ets <- forecasts_stl_bc_ets |>
  mutate(.mean = InvBoxCox(.mean, lambda))

accuracy_stl_bc_ets <- accuracy(forecasts_stl_bc_ets, time_ser |> filter(Month > yearmonth("2010 Dec")))
rmse_stl_bc_ets <- accuracy_stl_bc_ets$RMSE

# Print the RMSE for the STL-BoxCox-ETS model
print(paste("Test RMSE for STL-BoxCox-ETS model:", rmse_stl_bc_ets))

## [1] "Test RMSE for STL-BoxCox-ETS model: 73.4741801886815"

rmse_best_prev <- 50.34  # Example RMSE from the best previous model

# Print comparison
print(paste("Best previous model RMSE:", rmse_best_prev))

## [1] "Best previous model RMSE: 50.34"

print(paste("STL-BoxCox-ETS model RMSE:", rmse_stl_bc_ets))

## [1] "STL-BoxCox-ETS model RMSE: 73.4741801886815"

# Determine which model is better
if (rmse_stl_bc_ets < rmse_best_prev) {
  print("STL-BoxCox-ETS model performs better.")
} else {
  print("Best previous model performs better.")
}

## [1] "Best previous model performs better."

The best previous model clearly outperforms the STL-BoxCox-ETS model by a substantial margin. With a lower RMSE indicating better predictive accuracy, the best previous model proves to be a more reliable option for forecasting in this scenario.

DATA624 HW 5

Bishoy Sokkar

2024-10-07