Assignment: Exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman. Seehttps://otexts.com/fpp3/expsmooth-exercises.html

Setting up the environment: Load required libraries

Exercise 1:

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

First we need to get the specific data on pigs from the dataset aus_livestock

# Get the data
data <- aus_livestock
head(data)
## # A tsibble: 6 x 4 [1M]
## # Key:       Animal, State [1]
##      Month Animal                     State                        Count
##      <mth> <fct>                      <fct>                        <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory  2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory  2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory  2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory  1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory  2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory  1800
# Filter for pigs slaughtered in Victoria using filter(Animal == "Pigs", State == "Victoria")
victoria_pigs <- data %>% 
  filter(Animal == "Pigs", State == "Victoria") %>% 
   mutate(Count_Thousands = Count / 1000)
# Visualize the new data
autoplot(victoria_pigs, Count) + labs(title = "Number of Pigs Slaughtered in Victoria", y = "Count in Thousands") + theme_minimal()

Now, we can fit the ETS model using SES. We will use the ETS() function with the model specified as “ANN” (Additive Error, No Trend, No Seasonality), which is equivalent to simple exponential smoothing.

# Fit the Simple Exponential Smoothing (ETS with model='ANN') 
fit <- victoria_pigs %>%
  model(ETS(Count ~ error("A") + trend("N") + season("N")))

# Extract the fitted values
report(fit)
## Series: Count 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.3221247 
## 
##   Initial states:
##      l[0]
##  100646.6
## 
##   sigma^2:  87480760
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

Generating the forecasts for the next 4 months

# Generate forecasts for the next 4 months
forecasts <- fit %>% forecast(h = "4 months")
print(forecasts)
## # A fable: 4 x 6 [1M]
## # Key:     Animal, State, .model [1]
##   Animal State    .model                          Month             Count  .mean
##   <fct>  <fct>    <chr>                           <mth>            <dist>  <dbl>
## 1 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Jan N(95187, 8.7e+07) 95187.
## 2 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Feb N(95187, 9.7e+07) 95187.
## 3 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Mar N(95187, 1.1e+08) 95187.
## 4 Pigs   Victoria "ETS(Count ~ error(\"A\") +… 2019 Apr N(95187, 1.1e+08) 95187.

b.Compute a 95% prediction interval for the first forecast using ^y+ 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

We first calculate the residuals from the fitted model and find the standard deviation of these residuals.

residuals <- augment(fit) %>%
  filter(.model == "ETS(ANN)") %>%
  pull(.resid)
 # Calculate the standard deviation of residuals
s <- sd(residuals, na.rm = TRUE)
initial_forecast <- forecasts %>%
  slice(1) %>%
  pull(.mean)
initial_forecast 
## [1] 95186.56

Conducting a manual calculation of the 95 percent intervals

accuracy <- accuracy(fit)
RMSE <- round(as.numeric(accuracy$RMSE), 2)
kable(accuracy, format = "simple")
Animal State .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
Pigs Victoria ETS(Count ~ error(“A”) + trend(“N”) + season(“N”)) Training -30.37652 9336.338 7190.361 -1.056875 8.346607 0.7762144 0.750849 0.0103228
# Manual calculation
forecasts <- fit |>
    forecast(h = 1)

y_hat <- round(as.numeric(forecasts$.mean), 2)
lower <- y_hat - (1.96 * RMSE)
upper <- y_hat + (1.96 * RMSE)
pi <- as.data.frame(t(c(lower, upper)))
colnames(pi) <- c("Lower", "Upper")
rownames(pi) <- "**Calculated Prediction Interval**"
kable(pi, format = "simple")
Lower Upper
Calculated Prediction Interval 76887.33 113485.8
# Alternate Manual Calculation 
mean <- 95186.56
s <- sqrt(87480760)
z <- 1.96
margin <- z * s
lower <- mean - margin
upper <- mean + margin

paste(lower, upper)
## [1] "76854.4546212935 113518.665378707"

Compare manually calculated intervals with the intervals generated by R to ensure consistency.

# built-in prediction interval in R
forecasts %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [76854.79, 113518.3]95

Conclusion: We can see that the 95% prediction interval calculated by hand and by R are only off by a small margin on both the lower and the upper. This shows that they are basically identical. So, there is consistency and we move ahead with the model.

Exercise 5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts. Compute the RMSE values for the training data. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set. Compare the forecasts from both methods. Which do you think is best? Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

data <- global_economy
glimpse(global_economy)
## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country    <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code       <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year       <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP        <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports    <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports    <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key:       Country [1]
##   Country     Code   Year         GDP Growth   CPI Imports Exports Population
##   <fct>       <fct> <dbl>       <dbl>  <dbl> <dbl>   <dbl>   <dbl>      <dbl>
## 1 Afghanistan AFG    1960  537777811.     NA    NA    7.02    4.13    8996351
## 2 Afghanistan AFG    1961  548888896.     NA    NA    8.10    4.45    9166764
## 3 Afghanistan AFG    1962  546666678.     NA    NA    9.35    4.88    9345868
## 4 Afghanistan AFG    1963  751111191.     NA    NA   16.9     9.17    9533954
## 5 Afghanistan AFG    1964  800000044.     NA    NA   18.1     8.89    9731361
## 6 Afghanistan AFG    1965 1006666638.     NA    NA   21.4    11.3     9938414
# Extract the data for France
france_exports <- global_economy %>%
  filter(Country == "France") %>%
  dplyr::select(Year, Exports)

Plot the Exports series

# Graph the Exports series for France
autoplot(france_exports, Exports) +
  ggtitle("France Exports (1970-2019)") +
  xlab("Year") +
  ylab("Exports (in billions,US$)") + 
  theme_minimal() 

Fit an ETS(A,N,N) model and make forecasts

# Fit the ETS(A,N,N) model, aka Simple Exponential Smoothing (no trend, no seasonality, additive errors)
ets_ANN <- france_exports %>%
  model(ETS(Exports ~ error("A") + trend("N") + season("N")))
# Report the results
report(ets_ANN)
## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.9998995 
## 
##   Initial states:
##     l[0]
##  14.3989
## 
##   sigma^2:  1.3745
## 
##      AIC     AICc      BIC 
## 257.9200 258.3644 264.1013
# Forecast the next few periods
fc_ets_ANN <- ets_ANN %>%
  forecast(h = 5)

# Plot the forecasts
fc_ets_ANN %>%
  autoplot(france_exports) +
  ggtitle("ETS(A,N,N) Forecast for France Exports") +
  theme_minimal()

Analysis of the main features of exports from France:

In terms of trend, the export series for France shows a clear upward trend over time. The trend is particularly noticeable after the 1960s, with some fluctuations, but the overall direction points towards growth. The trend appears to have accelerated, with a steeper increase in exports post the year 2000.

In terms of volatility, one notices some volatility in the export data. While the general trend is upward, the series exhibits significant fluctuations, especially between 1975 and 2000. Periods of rapid growth are followed by sharp declines, suggesting that external factors (like economic recessions, policy changes, or global trade dynamics) may have affected France’s export levels during these times. However, post-2000, the fluctuations appear less severe, though still present.

In terms of RMSE comparison,the simpler model (ETS(A,N,N)) is expected to capture the general level of the exports without accounting for trends explicitly. This might work well for periods with stable trends but would likely result in a higher RMSE due to the significant fluctuations and growing trend. The more complex model (ETS(A,A,N)) will capture both the level and trend in the data. Given the visible upward trend in the data, this model is likely to perform better, with a lower RMSE, as it accounts for the long-term growth in exports.

Overall, the final decision on which model is better would be based on comparing RMSE values. A significant improvement in RMSE for the ETS(A,A,N) model would justify the extra parameter used. In terms of prediction intervals,the prediction intervals for ETS(A,N,N) will likely be wider since it does not account for the growing trend. This could lead to more uncertainty in the forecasts, as the model is limited in its ability to project future growth.In contrast, the ETS(A,A,N) model, which incorporates a trend, would provide more reasonable forecasts and, consequently, narrower prediction intervals. These intervals would reflect the expected upward movement in exports, giving more confidence in the projections.

Compute RMSE for the ETS(A,N,N) model

# Calculate RMSE for ETS(A,N,N)
rmse_ANN <- accuracy(ets_ANN)["RMSE"]
rmse_ANN
## # A tibble: 1 × 1
##    RMSE
##   <dbl>
## 1  1.15

Fit an ETS(A,N,N) model to forecast the series, and plot the forecasts

# Forecast using the ETS(A,N,N) model
forecasts_ANN <- ets_ANN %>% forecast(h = "5 years")

# Plot the forecasts
autoplot(france_exports, Exports) +
  autolayer(forecasts_ANN, level = 95) +
  labs(title = "ETS(A,N,N) Forecasts for Exports of France", y = "Exports (Billions, US$)")+ theme_minimal()

#### The graph above shows the forecast for France’s exports using the ETS(A,N,N) model, which corresponds to simple exponential smoothing (with additive error, no trend, and no seasonality).The historical data (black line) is displayed up to around 2020, and the forecast (blue line) extends into the future with a 95% confidence interval represented by the shaded purple region. This interval provides a range within which the actual future export values are expected to lie with 95% confidence. In general,the interval widens as the forecast horizon increases, which is typical in time series forecasting due to the increasing uncertainty over time.The purple area is an important feature, providing insight into the level of uncertainty around the forecast. It helps decision-makers understand the possible range of future exports, not just the most likely forecast.

Fit the ETS(A,A,N) model (with additive trend)

# Fit the ETS(A,A,N) model (with additive trend)
ets_AAN <- france_exports %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))
# Forecast the next few periods
fc_ets_AAN <- ets_AAN %>%
  forecast(h = 5)
# Plot the forecasts
fc_ets_AAN %>%
  autoplot(france_exports) +
 labs(title = "ETS(A,N,N) Forecasts for Exports of France", y = "Exports (Billions, US$)")

# Calculate RMSE for ETS(A,A,N)
rmse_AAN <- accuracy(ets_AAN)["RMSE"]
rmse_AAN
## # A tibble: 1 × 1
##    RMSE
##   <dbl>
## 1  1.12

Compute the RMSE values for the training data.

# Compute RMSE for the ETS(A,N,N) model
rmse_ANN <- accuracy(ets_ANN)["RMSE"]

# Fit an ETS(A,A,N) model (Holt’s Linear Trend model)
fit_AAN <- france_exports %>%
  model(ETS(Exports ~ error("A") + trend("A") + season("N")))

# Report the results
report(fit_AAN)
## Series: Exports 
## Model: ETS(A,A,N) 
##   Smoothing parameters:
##     alpha = 0.9475167 
##     beta  = 0.0001000041 
## 
##   Initial states:
##      l[0]      b[0]
##  13.37335 0.3120665
## 
##   sigma^2:  1.3472
## 
##      AIC     AICc      BIC 
## 258.6477 259.8015 268.9499

Compare the results to those from an ETS(A,A,N) model.

# Forecast using the ETS(A,A,N) model
forecasts_AAN <- fit_AAN %>% forecast(h = "5 years")
# Plot the forecasts for ETS(A,A,N)
autoplot(france_exports, Exports) +
  autolayer(forecasts_AAN, level = 95) +
  labs(title = "ETS(A,A,N) Forecasts for Exports of France", y = "Exports (US$)") + 
  theme_minimal()

# Compute RMSE for the ETS(A,A,N) model
rmse_AAN <- accuracy(fit_AAN)["RMSE"]

# Compare RMSE values
print(paste("RMSE for ETS(A,N,N):", rmse_ANN))
## [1] "RMSE for ETS(A,N,N): 1.15200325634401"
print(paste("RMSE for ETS(A,A,N):", rmse_AAN))
## [1] "RMSE for ETS(A,A,N): 1.11995982644395"

Compare forecasts from both models

# Plot forecasts from both models on the same graph
autoplot(france_exports, Exports) +
  autolayer(fc_ets_ANN, series = "ETS(A,N,N)", PI = FALSE) +
  autolayer(fc_ets_AAN, series = "ETS(A,A,N)", PI = FALSE) +
  ggtitle("Comparison of Forecasts: ETS(A,N,N) vs ETS(A,A,N)") +
  xlab("Year") +
  ylab("Exports (in billions)") +
  guides(colour = guide_legend(title = "Forecast"))
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.

ETS(A,N,N) versus ETS(A,A,N)

A lower RMSE usually indicates a better model fit. In this case, the ETS(A,A,N) has a smaller RMSE than ETS (A,N,N) which points to it as being better than the ETS (A,N,N) , aka, the SES. Also, one notices that in the ETS(A,N,N) graph,the forecasts are flat with no trend because the model does not account for a trend in the data. This means that the future predictions remain constant based on the current level of exports, despite the clear upward trend in the historical data. In the ETS(A,A,N) forecasts graph, however, the forecasts show an upward trend, reflecting the trend component that is incorporated into the ETS(A,A,N) model (or Holt’s Linear Trend model). This makes the forecast more aligned with the actual historical trend of increasing exports.

Additionally, in ETS(A,N,N),the prediction interval is wider around a flat forecast, indicating greater uncertainty, but the flat nature of the forecast does not reflect the historical upward trend.The interval does not anticipate much change in future exports, thus showing a static range for predictions. In the ETS(A,A,N) plot, the prediction interval is narrower in comparison but moves upwards along with the trend in the forecast. This suggests that while the model accounts for uncertainty, it also projects an increase in exports, which is consistent with the historical trend in the historical data, while the ETS(A,N,N) mode does not. This makes the ETS(A,A,N) model more accurate and reliable for forecasting future exports. In closing, The ETS(A,A,N) model is a better fit for the France exports data because it reflects the upward trend.

Calculate 95% prediction intervals manually

# Ensure RMSE is extracted correctly as a numeric value
rmse_ANN <- accuracy(ets_ANN) %>% pull(RMSE)
rmse_AAN <- accuracy(ets_AAN) %>% pull(RMSE)

# Calculate the 95% prediction interval for the first forecast for each model
z_value <- 1.96  # For 95% confidence interval

# ETS(A,N,N) first forecast and prediction interval
first_forecast_ANN <- fc_ets_ANN %>% pull(.mean) %>% .[1]
lower_ANN <- first_forecast_ANN - z_value * rmse_ANN
upper_ANN <- first_forecast_ANN + z_value * rmse_ANN

# ETS(A,A,N) first forecast and prediction interval
first_forecast_AAN <- fc_ets_AAN %>% pull(.mean) %>% .[1]
lower_AAN <- first_forecast_AAN - z_value * rmse_AAN
upper_AAN <- first_forecast_AAN + z_value * rmse_AAN

# Print intervals
cat("ETS(A,N,N) 95% Prediction Interval:", lower_ANN, "-", upper_ANN, "\n")
## ETS(A,N,N) 95% Prediction Interval: 28.62385 - 33.13971
cat("ETS(A,A,N) 95% Prediction Interval:", lower_AAN, "-", upper_AAN, "\n")
## ETS(A,A,N) 95% Prediction Interval: 28.97896 - 33.3692
# For ETS(A,N,N)
first_forecast_ANN <- forecasts_ANN %>% slice(1) %>% pull(.mean)
manual_interval_ANN <- c(first_forecast_ANN - 1.96 * rmse_ANN, first_forecast_ANN + 1.96 * rmse_ANN)
# Print the manual intervals ETS(A,N,N)
print(paste("Manual 95% Prediction Interval for ETS(A,N,N):", manual_interval_ANN))
## [1] "Manual 95% Prediction Interval for ETS(A,N,N): 28.6238549283032"
## [2] "Manual 95% Prediction Interval for ETS(A,N,N): 33.1397076931717"
# For ETS(A,A,N)
first_forecast_AAN <- forecasts_AAN %>% slice(1) %>% pull(.mean)
manual_interval_AAN <- c(first_forecast_AAN - 1.96 * rmse_AAN, first_forecast_AAN + 1.96 * rmse_AAN)
# Print the manual intervals for ETS(A,A,N)
print(paste("Manual 95% Prediction Interval for ETS(A,A,N):", manual_interval_AAN))
## [1] "Manual 95% Prediction Interval for ETS(A,A,N): 28.9789600391914"
## [2] "Manual 95% Prediction Interval for ETS(A,A,N): 33.3692025588516"

Compare manual intervals with the intervals produced by R to ensure consistency.

# R generated intervals
# Extract the first forecast and prediction intervals for ETS(A,N,N)
first_fc_ANN <- fc_ets_ANN %>%
  as_tibble() %>%
  slice(1)

# Extract the first forecast and prediction intervals for ETS(A,A,N)
first_fc_AAN <- fc_ets_AAN %>%
  as_tibble() %>%
  slice(1)

# Display the results from R
cat("R-generated ETS(A,N,N) forecast and intervals:\n")
## R-generated ETS(A,N,N) forecast and intervals:
print(first_fc_ANN)
## # A tibble: 1 × 4
##   .model                                                   Year    Exports .mean
##   <chr>                                                   <dbl>     <dist> <dbl>
## 1 "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"…  2018 N(31, 1.4)  30.9
cat("R-generated ETS(A,A,N) forecast and intervals:\n")
## R-generated ETS(A,A,N) forecast and intervals:
print(first_fc_AAN)
## # A tibble: 1 × 4
##   .model                                                   Year    Exports .mean
##   <chr>                                                   <dbl>     <dist> <dbl>
## 1 "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"…  2018 N(31, 1.3)  31.2

Our conclusion is that the prediction intervals obtained manually are consistent with R’s built-in intervals. It’s clear that the ETS(A,A,N) model fits better for the analysis of France’s exports because there’s a clear upward trend in the data. The ETS(A,N,N) is not a good fit in this case, because the model assumes no trend. Additionally, a comparison in RMSE which helps evaluate the accuracy of both models,indicates that the ETS(A,A,N) model with its lower RMSE is a better fit on the data.

Exercise 6.

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]

How does different trend and transformation options impact the GDP forecast for China?

Getting the data for China from the global_economy dataset

library(fpp3)

# Filter the global_economy dataset for China
china_gdp <- global_economy %>% 
  filter(Country == "China") %>%
  dplyr::select(Year, GDP)
# Ensure GDP is numeric and that there is no NA
china_gdp$GDP <- as.numeric(china_gdp$GDP)


china_gdp <- na.omit(china_gdp)

Create an ETS model with different configurations and forecast. Selected horizon, h = 15

# Forecast using the default ETS model (Holt Linear trend or A,A,N)
fit_ets <- china_gdp %>% 
  model(ETS(GDP ~ error("A") + trend("A") + season("N")))

# Forecast over 15 years
fc_ets <- fit_ets %>% forecast(h = 15)

# Display the first few rows of the forecast
head(fc_ets)
## # A fable: 6 x 4 [1Y]
## # Key:     .model [1]
##   .model                                        Year                 GDP   .mean
##   <chr>                                        <dbl>              <dist>   <dbl>
## 1 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se…  2018 N(1.3e+13, 3.9e+22) 1.30e13
## 2 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se…  2019 N(1.4e+13, 1.3e+23) 1.38e13
## 3 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se…  2020   N(1.5e+13, 3e+23) 1.45e13
## 4 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se…  2021 N(1.5e+13, 5.8e+23) 1.53e13
## 5 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se…  2022 N(1.6e+13, 9.8e+23) 1.60e13
## 6 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se…  2023 N(1.7e+13, 1.5e+24) 1.68e13
# Forecast using the ETS model with damped trend
fit_ets_damped <- china_gdp%>%
  model(ETS(GDP ~ error("A") + trend("Ad") + season("N")))

# Forecast over 15 years
fc_ets_damped <- fit_ets_damped %>% forecast(h = 15)
head(fc_ets_damped)
## # A fable: 6 x 4 [1Y]
## # Key:     .model [1]
##   .model                                        Year                 GDP   .mean
##   <chr>                                        <dbl>              <dist>   <dbl>
## 1 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s…  2018   N(1.3e+13, 4e+22) 1.30e13
## 2 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s…  2019 N(1.4e+13, 1.3e+23) 1.37e13
## 3 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s…  2020 N(1.4e+13, 3.1e+23) 1.44e13
## 4 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s…  2021 N(1.5e+13, 5.8e+23) 1.51e13
## 5 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s…  2022 N(1.6e+13, 9.7e+23) 1.58e13
## 6 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s…  2023 N(1.6e+13, 1.5e+24) 1.65e13
# Fit the ETS model with log transformation
fit_ets_transformed <- china_gdp %>% 
  model(ETS(log(GDP) ~ error("A") + trend("A") + season("N")))

# Forecast over 15 periods using the fitted model
fc_ets_transformed <- fit_ets_transformed %>% forecast(h = 15)

# Display the first few rows of the forecast
head(fc_ets_transformed)
## # A fable: 6 x 4 [1Y]
## # Key:     .model [1]
##   .model                                           Year              GDP   .mean
##   <chr>                                           <dbl>           <dist>   <dbl>
## 1 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + …  2018 t(N(30, 0.0088)) 1.38e13
## 2 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + …  2019   t(N(30, 0.02)) 1.56e13
## 3 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + …  2020  t(N(30, 0.033)) 1.76e13
## 4 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + …  2021  t(N(31, 0.048)) 1.99e13
## 5 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + …  2022  t(N(31, 0.066)) 2.25e13
## 6 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + …  2023  t(N(31, 0.087)) 2.56e13
# Fit the models
fit <- china_gdp |> 
  model(
    Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
    DampedTrend = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
    Holt_Log_transformed = ETS(log(GDP) ~ error("A") + trend("A") + season("N"))
  )

# Report for each model individually
fit |> 
  dplyr::select(Holt) |> 
  report()
## Series: GDP 
## Model: ETS(A,A,N) 
##   Smoothing parameters:
##     alpha = 0.9998964 
##     beta  = 0.5518569 
## 
##   Initial states:
##         l[0]       b[0]
##  50284778074 3288256684
## 
##   sigma^2:  3.87701e+22
## 
##      AIC     AICc      BIC 
## 3258.053 3259.207 3268.356
fit |> 
 dplyr:: select(DampedTrend) |> 
  report()
## Series: GDP 
## Model: ETS(A,Ad,N) 
##   Smoothing parameters:
##     alpha = 0.9998747 
##     beta  = 0.5634078 
##     phi   = 0.9799999 
## 
##   Initial states:
##         l[0]       b[0]
##  50284778075 3288256683
## 
##   sigma^2:  3.959258e+22
## 
##      AIC     AICc      BIC 
## 3260.187 3261.834 3272.549
fit |> 
  dplyr::select(Holt_Log_transformed) |> 
  report()
## Series: GDP 
## Model: ETS(A,A,N) 
## Transformation: log(GDP) 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.1079782 
## 
##   Initial states:
##      l[0]       b[0]
##  24.77007 0.04320226
## 
##   sigma^2:  0.0088
## 
##       AIC      AICc       BIC 
## -33.07985 -31.92600 -22.77763
# Generate forecasts for each model
fc_holt <- fit |> 
 dplyr:: select(Holt) |> 
  forecast(h = 15)

fc_damped <- fit |> 
  dplyr::select(DampedTrend) |> 
  forecast(h = 15)

fc_holt_log <- fit |> 
  dplyr::select(Holt_Log_transformed) |> 
  forecast(h = 15)

Plot the forecasts to compare

# Combine the forecasts into a single tibble for plotting
combined_forecasts <- bind_rows(
  fc_holt %>% mutate(Model = "Holt"),
  fc_damped %>% mutate(Model = "DampedTrend"),
  fc_holt_log %>% mutate(Model = "Holt_Log_transformed")
)
(kable(combined_forecasts))
.model Year GDP .mean Model
DampedTrend 2018 N(1.3e+13, 4e+22) 1.297637e+13 DampedTrend
DampedTrend 2019 N(1.4e+13, 1.3e+23) 1.370035e+13 DampedTrend
DampedTrend 2020 N(1.4e+13, 3.1e+23) 1.440985e+13 DampedTrend
DampedTrend 2021 N(1.5e+13, 5.8e+23) 1.510516e+13 DampedTrend
DampedTrend 2022 N(1.6e+13, 9.7e+23) 1.578656e+13 DampedTrend
DampedTrend 2023 N(1.6e+13, 1.5e+24) 1.645433e+13 DampedTrend
DampedTrend 2024 N(1.7e+13, 2.2e+24) 1.710875e+13 DampedTrend
DampedTrend 2025 N(1.8e+13, 3e+24) 1.775008e+13 DampedTrend
DampedTrend 2026 N(1.8e+13, 4.1e+24) 1.837859e+13 DampedTrend
DampedTrend 2027 N(1.9e+13, 5.3e+24) 1.899452e+13 DampedTrend
DampedTrend 2028 N(2e+13, 6.8e+24) 1.959813e+13 DampedTrend
DampedTrend 2029 N(2e+13, 8.4e+24) 2.018968e+13 DampedTrend
DampedTrend 2030 N(2.1e+13, 1e+25) 2.076939e+13 DampedTrend
DampedTrend 2031 N(2.1e+13, 1.2e+25) 2.133750e+13 DampedTrend
DampedTrend 2032 N(2.2e+13, 1.5e+25) 2.189426e+13 DampedTrend
Holt 2018 N(1.3e+13, 3.9e+22) 1.299726e+13 Holt
Holt 2019 N(1.4e+13, 1.3e+23) 1.375689e+13 Holt
Holt 2020 N(1.5e+13, 3e+23) 1.451652e+13 Holt
Holt 2021 N(1.5e+13, 5.8e+23) 1.527615e+13 Holt
Holt 2022 N(1.6e+13, 9.8e+23) 1.603578e+13 Holt
Holt 2023 N(1.7e+13, 1.5e+24) 1.679540e+13 Holt
Holt 2024 N(1.8e+13, 2.2e+24) 1.755503e+13 Holt
Holt 2025 N(1.8e+13, 3.2e+24) 1.831466e+13 Holt
Holt 2026 N(1.9e+13, 4.3e+24) 1.907429e+13 Holt
Holt 2027 N(2e+13, 5.7e+24) 1.983392e+13 Holt
Holt 2028 N(2.1e+13, 7.3e+24) 2.059355e+13 Holt
Holt 2029 N(2.1e+13, 9.3e+24) 2.135317e+13 Holt
Holt 2030 N(2.2e+13, 1.2e+25) 2.211280e+13 Holt
Holt 2031 N(2.3e+13, 1.4e+25) 2.287243e+13 Holt
Holt 2032 N(2.4e+13, 1.7e+25) 2.363206e+13 Holt
Holt_Log_transformed 2018 t(N(30, 0.0088)) 1.379815e+13 Holt_Log_transformed
Holt_Log_transformed 2019 t(N(30, 0.02)) 1.557269e+13 Holt_Log_transformed
Holt_Log_transformed 2020 t(N(30, 0.033)) 1.759407e+13 Holt_Log_transformed
Holt_Log_transformed 2021 t(N(31, 0.048)) 1.990050e+13 Holt_Log_transformed
Holt_Log_transformed 2022 t(N(31, 0.066)) 2.253660e+13 Holt_Log_transformed
Holt_Log_transformed 2023 t(N(31, 0.087)) 2.555455e+13 Holt_Log_transformed
Holt_Log_transformed 2024 t(N(31, 0.11)) 2.901542e+13 Holt_Log_transformed
Holt_Log_transformed 2025 t(N(31, 0.14)) 3.299074e+13 Holt_Log_transformed
Holt_Log_transformed 2026 t(N(31, 0.17)) 3.756429e+13 Holt_Log_transformed
Holt_Log_transformed 2027 t(N(31, 0.2)) 4.283432e+13 Holt_Log_transformed
Holt_Log_transformed 2028 t(N(31, 0.24)) 4.891602e+13 Holt_Log_transformed
Holt_Log_transformed 2029 t(N(32, 0.28)) 5.594454e+13 Holt_Log_transformed
Holt_Log_transformed 2030 t(N(32, 0.33)) 6.407845e+13 Holt_Log_transformed
Holt_Log_transformed 2031 t(N(32, 0.38)) 7.350386e+13 Holt_Log_transformed
Holt_Log_transformed 2032 t(N(32, 0.44)) 8.443920e+13 Holt_Log_transformed
# Plot the historical data and the forecasts from all three models with enhanced distinction
china_gdp %>% 
  autoplot(GDP) + 
  autolayer(fc_holt, series = "Holt", PI = FALSE, linetype = "solid", color = "blue", size = 1.2) +
  autolayer(fc_damped, series = "DampedTrend", PI = FALSE, linetype = "dashed", color = "red", size = 1.2) +
  autolayer(fc_holt_log, series = "Holt_Log_transformed", PI = FALSE, linetype = "dotdash", color = "green", size = 1.2) +
  labs(
    title = "China GDP Forecast: Comparing Holt, DampedTrend, and Holt (Log Transformed)",
    y = "GDP", x = "Year"
  ) +
  scale_color_manual(
    values = c("blue", "red", "green"),
    labels = c("Holt", "DampedTrend", "Holt (Log Transformed)")
  ) +
  guides(colour = guide_legend(title = "Model")) +
  theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), : Ignoring unknown parameters: `series` and `PI`
## Ignoring unknown parameters: `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.

Discuss the model suggested by R.

Through fable, R chose the model with the smallest AIC, because it’s the one that gives us the best forecasting compared to those one may personally or manually select.

# The best model as suggested by R

china_gdp %>% 
  model(ETS(GDP)) %>% 
  report()
## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9998998 
##     beta  = 0.3119984 
## 
##   Initial states:
##         l[0]       b[0]
##  45713434615 3288256682
## 
##   sigma^2:  0.0108
## 
##      AIC     AICc      BIC 
## 3102.064 3103.218 3112.366

Results of my personal models’ selection (not encouraging):

# Personal selection 1
china_gdp %>% 
  model(ETS(GDP ~ error ("A") + trend ("A") + season("N"))) %>% 
  report()
## Series: GDP 
## Model: ETS(A,A,N) 
##   Smoothing parameters:
##     alpha = 0.9998964 
##     beta  = 0.5518569 
## 
##   Initial states:
##         l[0]       b[0]
##  50284778074 3288256684
## 
##   sigma^2:  3.87701e+22
## 
##      AIC     AICc      BIC 
## 3258.053 3259.207 3268.356

Exercise 7:

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

Extracting gas data from aus_production

# Filter the aus_production dataset for Gas data
gas_data <- aus_production %>%
  dplyr::select(Quarter, Gas)

Fit an ETS model with multiplicative seasonality

fit_ets <- gas_data %>%
  model(ETS(Gas ~ error("A") + trend("A") + season("M")))
fit_ets
## # A mable: 1 x 1
##   `ETS(Gas ~ error("A") + trend("A") + season("M"))`
##                                              <model>
## 1                                       <ETS(A,A,M)>
# Forecast the next few years (e.g., next 12 quarters or 3 years)
fc_ets <- fit_ets %>% forecast(h = 12)

head(fc_ets)
## # A fable: 6 x 4 [1Q]
## # Key:     .model [1]
##   .model                                              Quarter          Gas .mean
##   <chr>                                                 <qtr>       <dist> <dbl>
## 1 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2010 Q3 sample[5000]  256.
## 2 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2010 Q4 sample[5000]  212.
## 3 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q1 sample[5000]  203.
## 4 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q2 sample[5000]  242.
## 5 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q3 sample[5000]  261.
## 6 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q4 sample[5000]  217.
# Plot the historical data and the forecast
gas_data %>% 
  autoplot(Gas) +
  autolayer(fc_ets, series = "ETS", PI = FALSE) +
  labs(title = "Gas Production Forecast with ETS Model",
       y = "Gas Production", x = "Year") +
  theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`

Why is an ETS with multiplicative necessary here?

Multiplicative seasonality is necessary when the seasonal variations increase or decrease proportionally to the level of the series. In gas production data, as production increases, seasonal effects (like higher demand in certain quarters) likely become more pronounced, making a multiplicative seasonality model more appropriate.

Experimenting with a damped trend in the ETS model

fit_ets_damped <- gas_data %>%
  model(ETS(Gas ~ error("A") + trend("Ad") + season("M")))
fit_ets_damped 
## # A mable: 1 x 1
##   `ETS(Gas ~ error("A") + trend("Ad") + season("M"))`
##                                               <model>
## 1                                       <ETS(A,Ad,M)>
# Forecast the next few years with the damped trend
fc_ets_damped <- fit_ets_damped %>% forecast(h = 12)
head(fc_ets_damped)
## # A fable: 6 x 4 [1Q]
## # Key:     .model [1]
##   .model                                              Quarter          Gas .mean
##   <chr>                                                 <qtr>       <dist> <dbl>
## 1 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2010 Q3 sample[5000]  256.
## 2 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2010 Q4 sample[5000]  212.
## 3 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q1 sample[5000]  202.
## 4 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q2 sample[5000]  241.
## 5 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q3 sample[5000]  259.
## 6 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q4 sample[5000]  214.

Plot both the original and damped trend forecasts

# Plot the original and damped trend models with distinct colors
gas_data %>% 
  autoplot(Gas) +
  autolayer(fc_ets, series = "ETS", PI = FALSE, color = "blue") +       # Original ETS model in blue
  autolayer(fc_ets_damped, series = "ETS with Damped Trend", PI = FALSE, color = "red") +  # Damped trend ETS model in red
  labs(title = "Gas Production Forecast: Comparing ETS and Damped Trend",
       y = "Gas Production", x = "Year") +
  scale_color_manual(values = c("ETS" = "blue", "ETS with Damped Trend" = "red")) + # Color mapping for the legend
  theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.

Discussion/Analysis: Did damping improve the forecast?

I would argue yes. Damping the trend can provide a more moderate and potentially more realistic forecast, especially if external factors or domain knowledge(like resource limits or market saturation) are expected to affect future production growth.

In this particular case, the forecasted values in the original model(blue) show an unrealistic continuation of the trend, especially towards the end of the forecast horizon, and damping (red) likely improves the forecast by making it more plausible and stable.In general, damping tends to provide a more reliable forecast for long-term projections by preventing exaggerated trend growth.

Best practice: As a general rule, a careful comparison of the forecasted values and their alignment with domain knowledge (e.g., future production constraints or market behavior) can help determine if damping does indeed improves the forecast.

Exercise 8

Recall your retail time series data (from Exercise 7 in Section 2.10). Why is multiplicative seasonality necessary for this series? Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer? Check that the residuals from the best method look like white noise. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

## Recall Exercise 7 in Section 2.10
set.seed(13101917)
retail_series <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

Apply Holt-Winters’ Multiplicative Method.

Let’s fit both the standard Holt-Winters model and the damped trend model to the time series data.

Fit the Holt-Winters model with multiplicative seasonality

fit_hw <- retail_series %>%
  model(ETS(Turnover ~ error("A") + trend("A") + season("M")))
fit_hw
## # A mable: 1 x 3
## # Key:     State, Industry [1]
##   State           Industry          ETS(Turnover ~ error("A") + trend("A") + s…¹
##   <chr>           <chr>                                                  <model>
## 1 South Australia Department stores                                 <ETS(A,A,M)>
## # ℹ abbreviated name: ¹​`ETS(Turnover ~ error("A") + trend("A") + season("M"))`

Fit the Holt-Winters model with damped trend

fit_hw_damped <- retail_series %>%
  model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fit_hw_damped 
## # A mable: 1 x 3
## # Key:     State, Industry [1]
##   State           Industry          ETS(Turnover ~ error("A") + trend("Ad") + …¹
##   <chr>           <chr>                                                  <model>
## 1 South Australia Department stores                                <ETS(A,Ad,M)>
## # ℹ abbreviated name: ¹​`ETS(Turnover ~ error("A") + trend("Ad") + season("M"))`

Generate one-step forecasts

fc_hw_one_step <- fit_hw %>% forecast(h = "1 month")
fc_hw_damped_one_step <- fit_hw_damped %>% forecast(h = "1 month")
fc_hw_one_step 
## # A fable: 1 x 6 [1M]
## # Key:     State, Industry, .model [1]
##   State           Industry          .model              Month     Turnover .mean
##   <chr>           <chr>             <chr>               <mth>       <dist> <dbl>
## 1 South Australia Department stores "ETS(Turnover ~… 2019 Jan sample[5000]  103.

Calculate RMSE for one-step forecasts

rmse_hw <- sqrt(mean((retail_series$Turnover[1:(nrow(retail_series) - 1)] - 
                      fc_hw_one_step$.mean)^2, na.rm = TRUE))

rmse_hw_damped <- sqrt(mean((retail_series$Turnover[1:(nrow(retail_series) - 1)] - 
                              fc_hw_damped_one_step$.mean)^2, na.rm = TRUE))

Display RMSE values

# Display RMSE values
cat("RMSE for Holt-Winters model: ", rmse_hw, "\n")
## RMSE for Holt-Winters model:  32.25201
cat("RMSE for Damped Holt-Winters model: ", rmse_hw_damped, "\n")
## RMSE for Damped Holt-Winters model:  32.26309

Forecast for the next 12 months

fc_hw <- fit_hw %>% forecast(h = "1 year")
fc_hw_damped <- fit_hw_damped %>% forecast(h = "1 year")
head(fc_hw)
## # A fable: 6 x 6 [1M]
## # Key:     State, Industry, .model [1]
##   State           Industry          .model              Month     Turnover .mean
##   <chr>           <chr>             <chr>               <mth>       <dist> <dbl>
## 1 South Australia Department stores "ETS(Turnover ~… 2019 Jan sample[5000] 103. 
## 2 South Australia Department stores "ETS(Turnover ~… 2019 Feb sample[5000]  79.3
## 3 South Australia Department stores "ETS(Turnover ~… 2019 Mar sample[5000] 100. 
## 4 South Australia Department stores "ETS(Turnover ~… 2019 Apr sample[5000] 104. 
## 5 South Australia Department stores "ETS(Turnover ~… 2019 May sample[5000] 107. 
## 6 South Australia Department stores "ETS(Turnover ~… 2019 Jun sample[5000] 110.

Plot the original data, forecasts, and fitted models

retail_series %>%
  autoplot(Turnover) +
  autolayer(fc_hw, series = "Holt-Winters", PI = FALSE) +
  autolayer(fc_hw_damped, series = "Holt-Winters Damped", PI = FALSE) +
  labs(
    title = "Turnover Forecast with Holt-Winters Models",
    y = "Turnover",
    x = "Month"
  ) +
  theme_minimal() +
  guides(colour = guide_legend(title = "Model"))
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.

Why is Multiplicative Seasonality Necessary?

Multiplicative seasonality is necessary when the seasonal variations increase or decrease proportionally to the level of the series. For retail sales, as the sales figures grow (e.g., during holiday seasons), the seasonal effects (like increased sales during Christmas) also tend to grow proportionately. Therefore, a multiplicative model can better capture these dynamics than an additive model.

Apply Holt-Winters’ Multiplicative Method.

Let’s fit both the standard Holt-Winters model and the damped trend model to the time series data.

Calculate RMSE for both models

accuracy_hw <- fit_hw %>% accuracy()
accuracy_hw_damped <- fit_hw_damped %>% accuracy()
accuracy_hw 
## # A tibble: 1 × 12
##   State Industry .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE   ACF1
##   <chr> <chr>    <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Sout… Departm… "ETS(… Trai… -0.326  4.74  3.59 -0.496  3.76 0.740 0.779 -0.119

Display RMSE

rmse_hw <- accuracy_hw["RMSE"]

print(rmse_hw )
## # A tibble: 1 × 1
##    RMSE
##   <dbl>
## 1  4.74
rmse_hw_damped <- accuracy_hw_damped["RMSE"]
print(rmse_hw_damped )
## # A tibble: 1 × 1
##    RMSE
##   <dbl>
## 1  4.68

Check that the residuals from the best method look like white noise:

The best fitted model here is (ETS(Turnover ~ error(“A”) + trend(“Ad”) + season(“M)

# Set seed for reproducibility
set.seed(13101917)

# Select a random retail series
retail_series <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# Ensure there are no gaps in the time series data
retail_series <- retail_series %>%
  fill_gaps()  

# Fit the ETS model with a damped trend
fit_hw_damped <- retail_series %>%
  model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))

# Report on the model to ensure it was fitted correctly
report(fit_hw_damped)
## Series: Turnover 
## Model: ETS(A,Ad,M) 
##   Smoothing parameters:
##     alpha = 0.09088322 
##     beta  = 0.01836772 
##     gamma = 0.2398043 
##     phi   = 0.9501628 
## 
##   Initial states:
##      l[0]      b[0]      s[0]     s[-1]     s[-2]    s[-3]    s[-4]     s[-5]
##  50.36808 0.4017832 0.8753357 0.7348669 0.7993526 1.737597 1.128757 0.9765879
##      s[-6]     s[-7]     s[-8]     s[-9]   s[-10]    s[-11]
##  0.9557908 0.9153437 0.9624873 0.9213728 1.048251 0.9442572
## 
##   sigma^2:  22.8161
## 
##      AIC     AICc      BIC 
## 4083.146 4084.767 4156.749
accuracy(fit) %>%
dplyr::  select(.model, RMSE)
## # A tibble: 3 × 2
##   .model                        RMSE
##   <chr>                        <dbl>
## 1 Holt                 189990266650.
## 2 DampedTrend          190208894133.
## 3 Holt_Log_transformed 286414589325.
retail_series  %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  gg_tsresiduals() +
  ggtitle("Multiplicative Method")

#Box-Pierce test, ℓ=2m for seasonal data, m=12
retail_series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, box_pierce, lag = 24, dof = 0)
## # A tibble: 1 × 5
##   State           Industry          .model         bp_stat bp_pvalue
##   <chr>           <chr>             <chr>            <dbl>     <dbl>
## 1 South Australia Department stores multiplicative    106.  2.61e-12

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

series_train <- retail_series %>%
  filter(year(Month) < 2011)

fit_train <- series_train %>%
  model(multi = ETS(Turnover ~ error("M") + trend("A") + season("M")),
        snaive = SNAIVE(Turnover))

#producing forecasts
fc <- fit_train %>%
  forecast(new_data = anti_join(retail_series, series_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc %>% autoplot(retail_series, level = NULL)

Fit the ETS model with damped trend

# Fit the ETS model with damped trend
fit_hw_damped <- retail_series %>%
  model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fit_hw_damped
## # A mable: 1 x 3
## # Key:     State, Industry [1]
##   State           Industry          ETS(Turnover ~ error("A") + trend("Ad") + …¹
##   <chr>           <chr>                                                  <model>
## 1 South Australia Department stores                                <ETS(A,Ad,M)>
## # ℹ abbreviated name: ¹​`ETS(Turnover ~ error("A") + trend("Ad") + season("M"))`
# Forecast for the test set
fc_hw_damped <- fit_hw_damped %>% forecast(h = nrow(retail_series))
head(fc_hw_damped)
## # A fable: 6 x 6 [1M]
## # Key:     State, Industry, .model [1]
##   State           Industry          .model              Month     Turnover .mean
##   <chr>           <chr>             <chr>               <mth>       <dist> <dbl>
## 1 South Australia Department stores "ETS(Turnover ~… 2019 Jan sample[5000] 103. 
## 2 South Australia Department stores "ETS(Turnover ~… 2019 Feb sample[5000]  79.5
## 3 South Australia Department stores "ETS(Turnover ~… 2019 Mar sample[5000] 101. 
## 4 South Australia Department stores "ETS(Turnover ~… 2019 Apr sample[5000] 104. 
## 5 South Australia Department stores "ETS(Turnover ~… 2019 May sample[5000] 108. 
## 6 South Australia Department stores "ETS(Turnover ~… 2019 Jun sample[5000] 111.
# Calculate RMSE for the ETS model
rmse_hw_damped <- accuracy(fc_hw_damped, retail_series)$RMSE
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 441 observations are missing between 2019 Jan and 2055 Sep
rmse_hw_damped 
## [1] NaN
accuracy(fit_train) %>%
 dplyr:: select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type    .model  RMSE
##   <chr>    <chr>  <dbl>
## 1 Training multi   4.96
## 2 Training snaive  6.30

Seasonal Naïve Model

# Seasonal Naïve Model
fit_snaive <- retail_series %>%
  model(SNAIVE(Turnover))
fit_snaive
## # A mable: 1 x 3
## # Key:     State, Industry [1]
##   State           Industry          `SNAIVE(Turnover)`
##   <chr>           <chr>                        <model>
## 1 South Australia Department stores           <SNAIVE>
# Forecast for the test set using Seasonal Naïve
fc_snaive <- fit_snaive %>% forecast(h = nrow(retail_series))
head(fc_snaive)
## # A fable: 6 x 6 [1M]
## # Key:     State, Industry, .model [1]
##   State           Industry          .model              Month   Turnover .mean
##   <chr>           <chr>             <chr>               <mth>     <dist> <dbl>
## 1 South Australia Department stores SNAIVE(Turnover) 2019 Jan  N(99, 37)  99  
## 2 South Australia Department stores SNAIVE(Turnover) 2019 Feb  N(79, 37)  78.8
## 3 South Australia Department stores SNAIVE(Turnover) 2019 Mar N(101, 37) 101  
## 4 South Australia Department stores SNAIVE(Turnover) 2019 Apr  N(98, 37)  98.2
## 5 South Australia Department stores SNAIVE(Turnover) 2019 May N(112, 37) 112. 
## 6 South Australia Department stores SNAIVE(Turnover) 2019 Jun N(110, 37) 110.
# Calculate RMSE for the Seasonal Naïve model
rmse_snaive <- accuracy(fc_snaive, retail_series)$RMSE
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
## 441 observations are missing between 2019 Jan and 2055 Sep
rmse_snaive
## [1] NaN

Display RMSE results

# Display RMSE results
cat("RMSE for ETS model with Damped Trend:", rmse_hw_damped, "\n")
## RMSE for ETS model with Damped Trend: NaN
cat("RMSE for Seasonal Naïve model:", rmse_snaive, "\n")
## RMSE for Seasonal Naïve model: NaN
fc %>% accuracy(retail_series)  %>%
 dplyr:: select(.type, .model, RMSE)
## # A tibble: 2 × 3
##   .type .model  RMSE
##   <chr> <chr>  <dbl>
## 1 Test  multi   5.29
## 2 Test  snaive 12.2

Compare models

It’s obvioust that ETS model with Damped Trend outperforms the Seasonal Naïve model.

Exercise 9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

set.seed(13101917)
retail_series <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

retail_series %>% head()
## # A tsibble: 6 x 5 [1M]
## # Key:       State, Industry [1]
##   State           Industry          `Series ID`    Month Turnover
##   <chr>           <chr>             <chr>          <mth>    <dbl>
## 1 South Australia Department stores A3349658K   1982 Apr     48.9
## 2 South Australia Department stores A3349658K   1982 May     52.2
## 3 South Australia Department stores A3349658K   1982 Jun     48.9
## 4 South Australia Department stores A3349658K   1982 Jul     48.3
## 5 South Australia Department stores A3349658K   1982 Aug     49.4
## 6 South Australia Department stores A3349658K   1982 Sep     48.5

Box cox transformation:

# Calculate the optimal lambda using the guerrero method
lambda <- aus_retail %>%  
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1)) %>%

  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)
# Apply the Box-Cox transformation using the calculated lambda
retail_series_transformed <- aus_retail %>%
  mutate(Turnover_transformed = box_cox(Turnover, lambda))
head(retail_series_transformed) 
## # A tsibble: 6 x 6 [1M]
## # Key:       State, Industry [1]
##   State              Industry `Series ID`    Month Turnover Turnover_transformed
##   <chr>              <chr>    <chr>          <mth>    <dbl>                <dbl>
## 1 Australian Capita… Cafes, … A3349849A   1982 Apr      4.4                 1.45
## 2 Australian Capita… Cafes, … A3349849A   1982 May      3.4                 1.20
## 3 Australian Capita… Cafes, … A3349849A   1982 Jun      3.6                 1.26
## 4 Australian Capita… Cafes, … A3349849A   1982 Jul      4                   1.36
## 5 Australian Capita… Cafes, … A3349849A   1982 Aug      3.6                 1.26
## 6 Australian Capita… Cafes, … A3349849A   1982 Sep      4.2                 1.41
# Plot the transformed Turnover data
retail_series_transformed %>%
  autoplot(Turnover_transformed) +
  labs(y = "Turnover (Transformed)",
       title = latex2exp::TeX(paste0(
         "Transformed retail turnover with $\\lambda$ = ", 
         round(lambda, 2)))) +
  theme_minimal()

### Australia’ print_retail forecasting before transformation

print_retail <-aus_retail %>% 
  filter(Industry == "Newspaper and book retailing") %>% 
  summarize(Turnover = sum(Turnover))
head(print_retail)
## # A tsibble: 6 x 2 [1M]
##      Month Turnover
##      <mth>    <dbl>
## 1 1982 Apr     134 
## 2 1982 May     134.
## 3 1982 Jun     127.
## 4 1982 Jul     128.
## 5 1982 Aug     131.
## 6 1982 Sep     134.
autoplot(print_retail)
## Plot variable not specified, automatically selected `.vars = Turnover`

After a transformation has been applied to australia print retail forecasts

# Adjusting Australia retail print to economic inflation, CPI
  
print_retail <- aus_retail |>
  filter(Industry == "Newspaper and book retailing") |>
  group_by(Industry) |>
  index_by(Year = year(Month)) |>
  summarise(Turnover = sum(Turnover))
aus_economy <- global_economy |>
  filter(Code == "AUS")
print_retail |>
  left_join(aus_economy, by = "Year") |>
  mutate(Adjusted_turnover = Turnover / CPI * 100) |>
  pivot_longer(c(Turnover, Adjusted_turnover),
               values_to = "Turnover") |>
  mutate(name = factor(name,
         levels=c("Turnover","Adjusted_turnover"))) |>
  ggplot(aes(x = Year, y = Turnover)) +
  geom_line() +
  facet_grid(name ~ ., scales = "free_y") +
  labs(title = "Turnover: Australian print media industry",
       y = "$AU")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

The adjusted turnover forecasting (which takes into account inflation or CPI) reveals that Australia’s print retail has been on a downward trend since 2005. This reality is not visible from the initial Turnover forecasting (Turnover).

Transformation using the square root

retail_series_transformed <- print_retail %>% 
  autoplot(sqrt(Turnover)) + 
  labs(y = " Square root turnover")
retail_series_transformed

Applying the log transfomation

retail_series_transformed <- print_retail %>% 
  autoplot(log(Turnover)) + 
  labs(y = " Square root turnover")
retail_series_transformed

Applying a Box cox transformation on Australia retail print

lambda <- retail_series %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

lambda
## [1] 0.03543504
retail_series_transformed <- retail_series %>%tsibble:::as_tsibble.tbl_ts(., index = Month)
  # Ensure 'Year' is the correct index

# Apply Guerrero transformation to stabilize variance
lambda <- retail_series_transformed %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# Apply the Box-Cox transformation with the estimated lambda
retail_series_transformed <- retail_series_transformed %>%
  mutate(Turnover_transformed = box_cox(Turnover, lambda))

# Check the structure of the transformed data
print(head(retail_series_transformed))
## # A tsibble: 6 x 6 [1M]
## # Key:       State, Industry [1]
##   State           Industry    `Series ID`    Month Turnover Turnover_transformed
##   <chr>           <chr>       <chr>          <mth>    <dbl>                <dbl>
## 1 South Australia Department… A3349658K   1982 Apr     48.9                 4.17
## 2 South Australia Department… A3349658K   1982 May     52.2                 4.25
## 3 South Australia Department… A3349658K   1982 Jun     48.9                 4.17
## 4 South Australia Department… A3349658K   1982 Jul     48.3                 4.16
## 5 South Australia Department… A3349658K   1982 Aug     49.4                 4.18
## 6 South Australia Department… A3349658K   1982 Sep     48.5                 4.16
# Now plot the original and transformed turnover over the years
gg <- retail_series_transformed %>% 
  ggplot(aes(x = Year)) +
  geom_line(aes(y = Turnover, color = "Original Turnover")) +
  geom_line(aes(y = Turnover_transformed, color = "Transformed Turnover")) +
  labs(
    title = latex2exp::TeX(paste0(
         "Turnover: Original and Transformed (Guerrero) $\\lambda = ", 
         round(lambda, 2))),
    y = "Turnover ($AU)", 
    x = "Year"
  ) +
  scale_color_manual(values = c("Original Turnover" = "blue", 
                                "Transformed Turnover" = "red")) +
  theme_minimal()
library(dplyr)
library(ggplot2)
library(fable)
library(tsibble)
library(fabletools)  # For the features function
library(latex2exp)   # For LaTeX expression in titles

# Convert the data to a tsibble using 'Month' as the index
retail_series_transformed <- retail_series %>%
  as_tsibble(index = Month)  # Ensure 'Month' is the correct index

# Apply Guerrero transformation to stabilize variance
lambda <- retail_series_transformed %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# Apply the Box-Cox transformation with the estimated lambda
retail_series_transformed <- retail_series_transformed %>%
  mutate(Turnover_transformed = box_cox(Turnover, lambda))

# Check the structure of the transformed data
print(head(retail_series_transformed))
## # A tsibble: 6 x 6 [1M]
## # Key:       State, Industry [1]
##   State           Industry    `Series ID`    Month Turnover Turnover_transformed
##   <chr>           <chr>       <chr>          <mth>    <dbl>                <dbl>
## 1 South Australia Department… A3349658K   1982 Apr     48.9                 4.17
## 2 South Australia Department… A3349658K   1982 May     52.2                 4.25
## 3 South Australia Department… A3349658K   1982 Jun     48.9                 4.17
## 4 South Australia Department… A3349658K   1982 Jul     48.3                 4.16
## 5 South Australia Department… A3349658K   1982 Aug     49.4                 4.18
## 6 South Australia Department… A3349658K   1982 Sep     48.5                 4.16
# Now plot the original and transformed turnover over the months
gg <- retail_series_transformed %>% 
  ggplot(aes(x = Month)) +  # Change x-axis to Month
  geom_line(aes(y = Turnover, color = "Original Turnover")) +
  geom_line(aes(y = Turnover_transformed, color = "Transformed Turnover")) +
  labs(
    title = latex2exp::TeX(paste0(
         "Turnover: Original and Transformed (Guerrero) $\\lambda = ", 
         round(lambda, 2))),
    y = "Turnover ($AU)", 
    x = "Month"
  ) +
  scale_color_manual(values = c("Original Turnover" = "blue", 
                                "Transformed Turnover" = "red")) +
  theme_minimal()

# Print the ggplot object
print(gg)

Applying STL Decomposition

# Filter for Newspaper and book retailing, and aggregate by year
print_retail <- aus_retail %>%
  filter(Industry == "Newspaper and book retailing", year(Month) >= 1990) %>%
  group_by(Year = year(Month)) %>%
  summarize(Turnover = sum(Turnover, na.rm = TRUE)) %>%
  ungroup()

# Plot the turnover data
autoplot(print_retail, Turnover) +
  labs(y = "Turnover (in thousands)",
       title = "Total turnover in Newspaper and Book Retailing in Australia")

 print_retail %>% 
 model(stl = STL(Turnover)) 
## # A mable: 29 x 2
## # Key:     Year [29]
##     Year     stl
##    <dbl> <model>
##  1  1990   <STL>
##  2  1991   <STL>
##  3  1992   <STL>
##  4  1993   <STL>
##  5  1994   <STL>
##  6  1995   <STL>
##  7  1996   <STL>
##  8  1997   <STL>
##  9  1998   <STL>
## 10  1999   <STL>
## # ℹ 19 more rows
lambda <- retail_series %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

lambda
## [1] 0.03543504
#Ljung-Box test
retail_series %>%
  model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
  augment() %>% 
  features(.innov, ljung_box, lag = 24, dof = 0)
## # A tibble: 1 × 5
##   State           Industry          .model         lb_stat lb_pvalue
##   <chr>           <chr>             <chr>            <dbl>     <dbl>
## 1 South Australia Department stores multiplicative    109.  7.32e-13
c
## function (...)  .Primitive("c")