Assignment: Exercises 8.1, 8.5, 8.6, 8.7, 8.8, 8.9 in Hyndman. Seehttps://otexts.com/fpp3/expsmooth-exercises.html
Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
First we need to get the specific data on pigs from the dataset aus_livestock
# Get the data
data <- aus_livestock
head(data)
## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
# Filter for pigs slaughtered in Victoria using filter(Animal == "Pigs", State == "Victoria")
victoria_pigs <- data %>%
filter(Animal == "Pigs", State == "Victoria") %>%
mutate(Count_Thousands = Count / 1000)
# Visualize the new data
autoplot(victoria_pigs, Count) + labs(title = "Number of Pigs Slaughtered in Victoria", y = "Count in Thousands") + theme_minimal()
Now, we can fit the ETS model using SES. We will use the ETS() function with the model specified as “ANN” (Additive Error, No Trend, No Seasonality), which is equivalent to simple exponential smoothing.
# Fit the Simple Exponential Smoothing (ETS with model='ANN')
fit <- victoria_pigs %>%
model(ETS(Count ~ error("A") + trend("N") + season("N")))
# Extract the fitted values
report(fit)
## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.3221247
##
## Initial states:
## l[0]
## 100646.6
##
## sigma^2: 87480760
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
# Generate forecasts for the next 4 months
forecasts <- fit %>% forecast(h = "4 months")
print(forecasts)
## # A fable: 4 x 6 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month Count .mean
## <fct> <fct> <chr> <mth> <dist> <dbl>
## 1 Pigs Victoria "ETS(Count ~ error(\"A\") +… 2019 Jan N(95187, 8.7e+07) 95187.
## 2 Pigs Victoria "ETS(Count ~ error(\"A\") +… 2019 Feb N(95187, 9.7e+07) 95187.
## 3 Pigs Victoria "ETS(Count ~ error(\"A\") +… 2019 Mar N(95187, 1.1e+08) 95187.
## 4 Pigs Victoria "ETS(Count ~ error(\"A\") +… 2019 Apr N(95187, 1.1e+08) 95187.
We first calculate the residuals from the fitted model and find the standard deviation of these residuals.
residuals <- augment(fit) %>%
filter(.model == "ETS(ANN)") %>%
pull(.resid)
# Calculate the standard deviation of residuals
s <- sd(residuals, na.rm = TRUE)
initial_forecast <- forecasts %>%
slice(1) %>%
pull(.mean)
initial_forecast
## [1] 95186.56
accuracy <- accuracy(fit)
RMSE <- round(as.numeric(accuracy$RMSE), 2)
kable(accuracy, format = "simple")
| Animal | State | .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Pigs | Victoria | ETS(Count ~ error(“A”) + trend(“N”) + season(“N”)) | Training | -30.37652 | 9336.338 | 7190.361 | -1.056875 | 8.346607 | 0.7762144 | 0.750849 | 0.0103228 |
# Manual calculation
forecasts <- fit |>
forecast(h = 1)
y_hat <- round(as.numeric(forecasts$.mean), 2)
lower <- y_hat - (1.96 * RMSE)
upper <- y_hat + (1.96 * RMSE)
pi <- as.data.frame(t(c(lower, upper)))
colnames(pi) <- c("Lower", "Upper")
rownames(pi) <- "**Calculated Prediction Interval**"
kable(pi, format = "simple")
| Lower | Upper | |
|---|---|---|
| Calculated Prediction Interval | 76887.33 | 113485.8 |
# Alternate Manual Calculation
mean <- 95186.56
s <- sqrt(87480760)
z <- 1.96
margin <- z * s
lower <- mean - margin
upper <- mean + margin
paste(lower, upper)
## [1] "76854.4546212935 113518.665378707"
# built-in prediction interval in R
forecasts %>% hilo(95) %>% pull('95%') %>% head(1)
## <hilo[1]>
## [1] [76854.79, 113518.3]95
Conclusion: We can see that the 95% prediction interval calculated by hand and by R are only off by a small margin on both the lower and the upper. This shows that they are basically identical. So, there is consistency and we move ahead with the model.
Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
Plot the Exports series and discuss the main features of the data. Use an ETS(A,N,N) model to forecast the series, and plot the forecasts. Compute the RMSE values for the training data. Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set. Compare the forecasts from both methods. Which do you think is best? Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
data <- global_economy
glimpse(global_economy)
## Rows: 15,150
## Columns: 9
## Key: Country [263]
## $ Country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan",…
## $ Code <fct> AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG, AFG,…
## $ Year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969,…
## $ GDP <dbl> 537777811, 548888896, 546666678, 751111191, 800000044, 1006…
## $ Growth <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CPI <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Imports <dbl> 7.024793, 8.097166, 9.349593, 16.863910, 18.055555, 21.4128…
## $ Exports <dbl> 4.132233, 4.453443, 4.878051, 9.171601, 8.888893, 11.258279…
## $ Population <dbl> 8996351, 9166764, 9345868, 9533954, 9731361, 9938414, 10152…
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
# Extract the data for France
france_exports <- global_economy %>%
filter(Country == "France") %>%
dplyr::select(Year, Exports)
# Graph the Exports series for France
autoplot(france_exports, Exports) +
ggtitle("France Exports (1970-2019)") +
xlab("Year") +
ylab("Exports (in billions,US$)") +
theme_minimal()
# Fit the ETS(A,N,N) model, aka Simple Exponential Smoothing (no trend, no seasonality, additive errors)
ets_ANN <- france_exports %>%
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
# Report the results
report(ets_ANN)
## Series: Exports
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.9998995
##
## Initial states:
## l[0]
## 14.3989
##
## sigma^2: 1.3745
##
## AIC AICc BIC
## 257.9200 258.3644 264.1013
# Forecast the next few periods
fc_ets_ANN <- ets_ANN %>%
forecast(h = 5)
# Plot the forecasts
fc_ets_ANN %>%
autoplot(france_exports) +
ggtitle("ETS(A,N,N) Forecast for France Exports") +
theme_minimal()
In terms of trend, the export series for France shows a clear upward trend over time. The trend is particularly noticeable after the 1960s, with some fluctuations, but the overall direction points towards growth. The trend appears to have accelerated, with a steeper increase in exports post the year 2000.
In terms of volatility, one notices some volatility in the export data. While the general trend is upward, the series exhibits significant fluctuations, especially between 1975 and 2000. Periods of rapid growth are followed by sharp declines, suggesting that external factors (like economic recessions, policy changes, or global trade dynamics) may have affected France’s export levels during these times. However, post-2000, the fluctuations appear less severe, though still present.
In terms of RMSE comparison,the simpler model (ETS(A,N,N)) is expected to capture the general level of the exports without accounting for trends explicitly. This might work well for periods with stable trends but would likely result in a higher RMSE due to the significant fluctuations and growing trend. The more complex model (ETS(A,A,N)) will capture both the level and trend in the data. Given the visible upward trend in the data, this model is likely to perform better, with a lower RMSE, as it accounts for the long-term growth in exports.
Overall, the final decision on which model is better would be based on comparing RMSE values. A significant improvement in RMSE for the ETS(A,A,N) model would justify the extra parameter used. In terms of prediction intervals,the prediction intervals for ETS(A,N,N) will likely be wider since it does not account for the growing trend. This could lead to more uncertainty in the forecasts, as the model is limited in its ability to project future growth.In contrast, the ETS(A,A,N) model, which incorporates a trend, would provide more reasonable forecasts and, consequently, narrower prediction intervals. These intervals would reflect the expected upward movement in exports, giving more confidence in the projections.
# Calculate RMSE for ETS(A,N,N)
rmse_ANN <- accuracy(ets_ANN)["RMSE"]
rmse_ANN
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 1.15
# Forecast using the ETS(A,N,N) model
forecasts_ANN <- ets_ANN %>% forecast(h = "5 years")
# Plot the forecasts
autoplot(france_exports, Exports) +
autolayer(forecasts_ANN, level = 95) +
labs(title = "ETS(A,N,N) Forecasts for Exports of France", y = "Exports (Billions, US$)")+ theme_minimal()
#### The graph above shows the forecast for France’s exports using the
ETS(A,N,N) model, which corresponds to simple exponential smoothing
(with additive error, no trend, and no seasonality).The historical data
(black line) is displayed up to around 2020, and the forecast (blue
line) extends into the future with a 95% confidence interval represented
by the shaded purple region. This interval provides a range within which
the actual future export values are expected to lie with 95% confidence.
In general,the interval widens as the forecast horizon increases, which
is typical in time series forecasting due to the increasing uncertainty
over time.The purple area is an important feature, providing insight
into the level of uncertainty around the forecast. It helps
decision-makers understand the possible range of future exports, not
just the most likely forecast.
# Fit the ETS(A,A,N) model (with additive trend)
ets_AAN <- france_exports %>%
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
# Forecast the next few periods
fc_ets_AAN <- ets_AAN %>%
forecast(h = 5)
# Plot the forecasts
fc_ets_AAN %>%
autoplot(france_exports) +
labs(title = "ETS(A,N,N) Forecasts for Exports of France", y = "Exports (Billions, US$)")
# Calculate RMSE for ETS(A,A,N)
rmse_AAN <- accuracy(ets_AAN)["RMSE"]
rmse_AAN
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 1.12
# Compute RMSE for the ETS(A,N,N) model
rmse_ANN <- accuracy(ets_ANN)["RMSE"]
# Fit an ETS(A,A,N) model (Holt’s Linear Trend model)
fit_AAN <- france_exports %>%
model(ETS(Exports ~ error("A") + trend("A") + season("N")))
# Report the results
report(fit_AAN)
## Series: Exports
## Model: ETS(A,A,N)
## Smoothing parameters:
## alpha = 0.9475167
## beta = 0.0001000041
##
## Initial states:
## l[0] b[0]
## 13.37335 0.3120665
##
## sigma^2: 1.3472
##
## AIC AICc BIC
## 258.6477 259.8015 268.9499
# Forecast using the ETS(A,A,N) model
forecasts_AAN <- fit_AAN %>% forecast(h = "5 years")
# Plot the forecasts for ETS(A,A,N)
autoplot(france_exports, Exports) +
autolayer(forecasts_AAN, level = 95) +
labs(title = "ETS(A,A,N) Forecasts for Exports of France", y = "Exports (US$)") +
theme_minimal()
# Compute RMSE for the ETS(A,A,N) model
rmse_AAN <- accuracy(fit_AAN)["RMSE"]
# Compare RMSE values
print(paste("RMSE for ETS(A,N,N):", rmse_ANN))
## [1] "RMSE for ETS(A,N,N): 1.15200325634401"
print(paste("RMSE for ETS(A,A,N):", rmse_AAN))
## [1] "RMSE for ETS(A,A,N): 1.11995982644395"
# Plot forecasts from both models on the same graph
autoplot(france_exports, Exports) +
autolayer(fc_ets_ANN, series = "ETS(A,N,N)", PI = FALSE) +
autolayer(fc_ets_AAN, series = "ETS(A,A,N)", PI = FALSE) +
ggtitle("Comparison of Forecasts: ETS(A,N,N) vs ETS(A,A,N)") +
xlab("Year") +
ylab("Exports (in billions)") +
guides(colour = guide_legend(title = "Forecast"))
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
A lower RMSE usually indicates a better model fit. In this case, the ETS(A,A,N) has a smaller RMSE than ETS (A,N,N) which points to it as being better than the ETS (A,N,N) , aka, the SES. Also, one notices that in the ETS(A,N,N) graph,the forecasts are flat with no trend because the model does not account for a trend in the data. This means that the future predictions remain constant based on the current level of exports, despite the clear upward trend in the historical data. In the ETS(A,A,N) forecasts graph, however, the forecasts show an upward trend, reflecting the trend component that is incorporated into the ETS(A,A,N) model (or Holt’s Linear Trend model). This makes the forecast more aligned with the actual historical trend of increasing exports.
Additionally, in ETS(A,N,N),the prediction interval is wider around a flat forecast, indicating greater uncertainty, but the flat nature of the forecast does not reflect the historical upward trend.The interval does not anticipate much change in future exports, thus showing a static range for predictions. In the ETS(A,A,N) plot, the prediction interval is narrower in comparison but moves upwards along with the trend in the forecast. This suggests that while the model accounts for uncertainty, it also projects an increase in exports, which is consistent with the historical trend in the historical data, while the ETS(A,N,N) mode does not. This makes the ETS(A,A,N) model more accurate and reliable for forecasting future exports. In closing, The ETS(A,A,N) model is a better fit for the France exports data because it reflects the upward trend.
# Ensure RMSE is extracted correctly as a numeric value
rmse_ANN <- accuracy(ets_ANN) %>% pull(RMSE)
rmse_AAN <- accuracy(ets_AAN) %>% pull(RMSE)
# Calculate the 95% prediction interval for the first forecast for each model
z_value <- 1.96 # For 95% confidence interval
# ETS(A,N,N) first forecast and prediction interval
first_forecast_ANN <- fc_ets_ANN %>% pull(.mean) %>% .[1]
lower_ANN <- first_forecast_ANN - z_value * rmse_ANN
upper_ANN <- first_forecast_ANN + z_value * rmse_ANN
# ETS(A,A,N) first forecast and prediction interval
first_forecast_AAN <- fc_ets_AAN %>% pull(.mean) %>% .[1]
lower_AAN <- first_forecast_AAN - z_value * rmse_AAN
upper_AAN <- first_forecast_AAN + z_value * rmse_AAN
# Print intervals
cat("ETS(A,N,N) 95% Prediction Interval:", lower_ANN, "-", upper_ANN, "\n")
## ETS(A,N,N) 95% Prediction Interval: 28.62385 - 33.13971
cat("ETS(A,A,N) 95% Prediction Interval:", lower_AAN, "-", upper_AAN, "\n")
## ETS(A,A,N) 95% Prediction Interval: 28.97896 - 33.3692
# For ETS(A,N,N)
first_forecast_ANN <- forecasts_ANN %>% slice(1) %>% pull(.mean)
manual_interval_ANN <- c(first_forecast_ANN - 1.96 * rmse_ANN, first_forecast_ANN + 1.96 * rmse_ANN)
# Print the manual intervals ETS(A,N,N)
print(paste("Manual 95% Prediction Interval for ETS(A,N,N):", manual_interval_ANN))
## [1] "Manual 95% Prediction Interval for ETS(A,N,N): 28.6238549283032"
## [2] "Manual 95% Prediction Interval for ETS(A,N,N): 33.1397076931717"
# For ETS(A,A,N)
first_forecast_AAN <- forecasts_AAN %>% slice(1) %>% pull(.mean)
manual_interval_AAN <- c(first_forecast_AAN - 1.96 * rmse_AAN, first_forecast_AAN + 1.96 * rmse_AAN)
# Print the manual intervals for ETS(A,A,N)
print(paste("Manual 95% Prediction Interval for ETS(A,A,N):", manual_interval_AAN))
## [1] "Manual 95% Prediction Interval for ETS(A,A,N): 28.9789600391914"
## [2] "Manual 95% Prediction Interval for ETS(A,A,N): 33.3692025588516"
# R generated intervals
# Extract the first forecast and prediction intervals for ETS(A,N,N)
first_fc_ANN <- fc_ets_ANN %>%
as_tibble() %>%
slice(1)
# Extract the first forecast and prediction intervals for ETS(A,A,N)
first_fc_AAN <- fc_ets_AAN %>%
as_tibble() %>%
slice(1)
# Display the results from R
cat("R-generated ETS(A,N,N) forecast and intervals:\n")
## R-generated ETS(A,N,N) forecast and intervals:
print(first_fc_ANN)
## # A tibble: 1 × 4
## .model Year Exports .mean
## <chr> <dbl> <dist> <dbl>
## 1 "ETS(Exports ~ error(\"A\") + trend(\"N\") + season(\"… 2018 N(31, 1.4) 30.9
cat("R-generated ETS(A,A,N) forecast and intervals:\n")
## R-generated ETS(A,A,N) forecast and intervals:
print(first_fc_AAN)
## # A tibble: 1 × 4
## .model Year Exports .mean
## <chr> <dbl> <dist> <dbl>
## 1 "ETS(Exports ~ error(\"A\") + trend(\"A\") + season(\"… 2018 N(31, 1.3) 31.2
Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]
library(fpp3)
# Filter the global_economy dataset for China
china_gdp <- global_economy %>%
filter(Country == "China") %>%
dplyr::select(Year, GDP)
# Ensure GDP is numeric and that there is no NA
china_gdp$GDP <- as.numeric(china_gdp$GDP)
china_gdp <- na.omit(china_gdp)
# Forecast using the default ETS model (Holt Linear trend or A,A,N)
fit_ets <- china_gdp %>%
model(ETS(GDP ~ error("A") + trend("A") + season("N")))
# Forecast over 15 years
fc_ets <- fit_ets %>% forecast(h = 15)
# Display the first few rows of the forecast
head(fc_ets)
## # A fable: 6 x 4 [1Y]
## # Key: .model [1]
## .model Year GDP .mean
## <chr> <dbl> <dist> <dbl>
## 1 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se… 2018 N(1.3e+13, 3.9e+22) 1.30e13
## 2 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se… 2019 N(1.4e+13, 1.3e+23) 1.38e13
## 3 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se… 2020 N(1.5e+13, 3e+23) 1.45e13
## 4 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se… 2021 N(1.5e+13, 5.8e+23) 1.53e13
## 5 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se… 2022 N(1.6e+13, 9.8e+23) 1.60e13
## 6 "ETS(GDP ~ error(\"A\") + trend(\"A\") + se… 2023 N(1.7e+13, 1.5e+24) 1.68e13
# Forecast using the ETS model with damped trend
fit_ets_damped <- china_gdp%>%
model(ETS(GDP ~ error("A") + trend("Ad") + season("N")))
# Forecast over 15 years
fc_ets_damped <- fit_ets_damped %>% forecast(h = 15)
head(fc_ets_damped)
## # A fable: 6 x 4 [1Y]
## # Key: .model [1]
## .model Year GDP .mean
## <chr> <dbl> <dist> <dbl>
## 1 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s… 2018 N(1.3e+13, 4e+22) 1.30e13
## 2 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s… 2019 N(1.4e+13, 1.3e+23) 1.37e13
## 3 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s… 2020 N(1.4e+13, 3.1e+23) 1.44e13
## 4 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s… 2021 N(1.5e+13, 5.8e+23) 1.51e13
## 5 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s… 2022 N(1.6e+13, 9.7e+23) 1.58e13
## 6 "ETS(GDP ~ error(\"A\") + trend(\"Ad\") + s… 2023 N(1.6e+13, 1.5e+24) 1.65e13
# Fit the ETS model with log transformation
fit_ets_transformed <- china_gdp %>%
model(ETS(log(GDP) ~ error("A") + trend("A") + season("N")))
# Forecast over 15 periods using the fitted model
fc_ets_transformed <- fit_ets_transformed %>% forecast(h = 15)
# Display the first few rows of the forecast
head(fc_ets_transformed)
## # A fable: 6 x 4 [1Y]
## # Key: .model [1]
## .model Year GDP .mean
## <chr> <dbl> <dist> <dbl>
## 1 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + … 2018 t(N(30, 0.0088)) 1.38e13
## 2 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + … 2019 t(N(30, 0.02)) 1.56e13
## 3 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + … 2020 t(N(30, 0.033)) 1.76e13
## 4 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + … 2021 t(N(31, 0.048)) 1.99e13
## 5 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + … 2022 t(N(31, 0.066)) 2.25e13
## 6 "ETS(log(GDP) ~ error(\"A\") + trend(\"A\") + … 2023 t(N(31, 0.087)) 2.56e13
# Fit the models
fit <- china_gdp |>
model(
Holt = ETS(GDP ~ error("A") + trend("A") + season("N")),
DampedTrend = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
Holt_Log_transformed = ETS(log(GDP) ~ error("A") + trend("A") + season("N"))
)
# Report for each model individually
fit |>
dplyr::select(Holt) |>
report()
## Series: GDP
## Model: ETS(A,A,N)
## Smoothing parameters:
## alpha = 0.9998964
## beta = 0.5518569
##
## Initial states:
## l[0] b[0]
## 50284778074 3288256684
##
## sigma^2: 3.87701e+22
##
## AIC AICc BIC
## 3258.053 3259.207 3268.356
fit |>
dplyr:: select(DampedTrend) |>
report()
## Series: GDP
## Model: ETS(A,Ad,N)
## Smoothing parameters:
## alpha = 0.9998747
## beta = 0.5634078
## phi = 0.9799999
##
## Initial states:
## l[0] b[0]
## 50284778075 3288256683
##
## sigma^2: 3.959258e+22
##
## AIC AICc BIC
## 3260.187 3261.834 3272.549
fit |>
dplyr::select(Holt_Log_transformed) |>
report()
## Series: GDP
## Model: ETS(A,A,N)
## Transformation: log(GDP)
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.1079782
##
## Initial states:
## l[0] b[0]
## 24.77007 0.04320226
##
## sigma^2: 0.0088
##
## AIC AICc BIC
## -33.07985 -31.92600 -22.77763
# Generate forecasts for each model
fc_holt <- fit |>
dplyr:: select(Holt) |>
forecast(h = 15)
fc_damped <- fit |>
dplyr::select(DampedTrend) |>
forecast(h = 15)
fc_holt_log <- fit |>
dplyr::select(Holt_Log_transformed) |>
forecast(h = 15)
# Combine the forecasts into a single tibble for plotting
combined_forecasts <- bind_rows(
fc_holt %>% mutate(Model = "Holt"),
fc_damped %>% mutate(Model = "DampedTrend"),
fc_holt_log %>% mutate(Model = "Holt_Log_transformed")
)
(kable(combined_forecasts))
| .model | Year | GDP | .mean | Model |
|---|---|---|---|---|
| DampedTrend | 2018 | N(1.3e+13, 4e+22) | 1.297637e+13 | DampedTrend |
| DampedTrend | 2019 | N(1.4e+13, 1.3e+23) | 1.370035e+13 | DampedTrend |
| DampedTrend | 2020 | N(1.4e+13, 3.1e+23) | 1.440985e+13 | DampedTrend |
| DampedTrend | 2021 | N(1.5e+13, 5.8e+23) | 1.510516e+13 | DampedTrend |
| DampedTrend | 2022 | N(1.6e+13, 9.7e+23) | 1.578656e+13 | DampedTrend |
| DampedTrend | 2023 | N(1.6e+13, 1.5e+24) | 1.645433e+13 | DampedTrend |
| DampedTrend | 2024 | N(1.7e+13, 2.2e+24) | 1.710875e+13 | DampedTrend |
| DampedTrend | 2025 | N(1.8e+13, 3e+24) | 1.775008e+13 | DampedTrend |
| DampedTrend | 2026 | N(1.8e+13, 4.1e+24) | 1.837859e+13 | DampedTrend |
| DampedTrend | 2027 | N(1.9e+13, 5.3e+24) | 1.899452e+13 | DampedTrend |
| DampedTrend | 2028 | N(2e+13, 6.8e+24) | 1.959813e+13 | DampedTrend |
| DampedTrend | 2029 | N(2e+13, 8.4e+24) | 2.018968e+13 | DampedTrend |
| DampedTrend | 2030 | N(2.1e+13, 1e+25) | 2.076939e+13 | DampedTrend |
| DampedTrend | 2031 | N(2.1e+13, 1.2e+25) | 2.133750e+13 | DampedTrend |
| DampedTrend | 2032 | N(2.2e+13, 1.5e+25) | 2.189426e+13 | DampedTrend |
| Holt | 2018 | N(1.3e+13, 3.9e+22) | 1.299726e+13 | Holt |
| Holt | 2019 | N(1.4e+13, 1.3e+23) | 1.375689e+13 | Holt |
| Holt | 2020 | N(1.5e+13, 3e+23) | 1.451652e+13 | Holt |
| Holt | 2021 | N(1.5e+13, 5.8e+23) | 1.527615e+13 | Holt |
| Holt | 2022 | N(1.6e+13, 9.8e+23) | 1.603578e+13 | Holt |
| Holt | 2023 | N(1.7e+13, 1.5e+24) | 1.679540e+13 | Holt |
| Holt | 2024 | N(1.8e+13, 2.2e+24) | 1.755503e+13 | Holt |
| Holt | 2025 | N(1.8e+13, 3.2e+24) | 1.831466e+13 | Holt |
| Holt | 2026 | N(1.9e+13, 4.3e+24) | 1.907429e+13 | Holt |
| Holt | 2027 | N(2e+13, 5.7e+24) | 1.983392e+13 | Holt |
| Holt | 2028 | N(2.1e+13, 7.3e+24) | 2.059355e+13 | Holt |
| Holt | 2029 | N(2.1e+13, 9.3e+24) | 2.135317e+13 | Holt |
| Holt | 2030 | N(2.2e+13, 1.2e+25) | 2.211280e+13 | Holt |
| Holt | 2031 | N(2.3e+13, 1.4e+25) | 2.287243e+13 | Holt |
| Holt | 2032 | N(2.4e+13, 1.7e+25) | 2.363206e+13 | Holt |
| Holt_Log_transformed | 2018 | t(N(30, 0.0088)) | 1.379815e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2019 | t(N(30, 0.02)) | 1.557269e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2020 | t(N(30, 0.033)) | 1.759407e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2021 | t(N(31, 0.048)) | 1.990050e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2022 | t(N(31, 0.066)) | 2.253660e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2023 | t(N(31, 0.087)) | 2.555455e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2024 | t(N(31, 0.11)) | 2.901542e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2025 | t(N(31, 0.14)) | 3.299074e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2026 | t(N(31, 0.17)) | 3.756429e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2027 | t(N(31, 0.2)) | 4.283432e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2028 | t(N(31, 0.24)) | 4.891602e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2029 | t(N(32, 0.28)) | 5.594454e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2030 | t(N(32, 0.33)) | 6.407845e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2031 | t(N(32, 0.38)) | 7.350386e+13 | Holt_Log_transformed |
| Holt_Log_transformed | 2032 | t(N(32, 0.44)) | 8.443920e+13 | Holt_Log_transformed |
# Plot the historical data and the forecasts from all three models with enhanced distinction
china_gdp %>%
autoplot(GDP) +
autolayer(fc_holt, series = "Holt", PI = FALSE, linetype = "solid", color = "blue", size = 1.2) +
autolayer(fc_damped, series = "DampedTrend", PI = FALSE, linetype = "dashed", color = "red", size = 1.2) +
autolayer(fc_holt_log, series = "Holt_Log_transformed", PI = FALSE, linetype = "dotdash", color = "green", size = 1.2) +
labs(
title = "China GDP Forecast: Comparing Holt, DampedTrend, and Holt (Log Transformed)",
y = "GDP", x = "Year"
) +
scale_color_manual(
values = c("blue", "red", "green"),
labels = c("Holt", "DampedTrend", "Holt (Log Transformed)")
) +
guides(colour = guide_legend(title = "Model")) +
theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), : Ignoring unknown parameters: `series` and `PI`
## Ignoring unknown parameters: `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
Through fable, R chose the model with the smallest AIC, because it’s the one that gives us the best forecasting compared to those one may personally or manually select.
# The best model as suggested by R
china_gdp %>%
model(ETS(GDP)) %>%
report()
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9998998
## beta = 0.3119984
##
## Initial states:
## l[0] b[0]
## 45713434615 3288256682
##
## sigma^2: 0.0108
##
## AIC AICc BIC
## 3102.064 3103.218 3112.366
# Personal selection 1
china_gdp %>%
model(ETS(GDP ~ error ("A") + trend ("A") + season("N"))) %>%
report()
## Series: GDP
## Model: ETS(A,A,N)
## Smoothing parameters:
## alpha = 0.9998964
## beta = 0.5518569
##
## Initial states:
## l[0] b[0]
## 50284778074 3288256684
##
## sigma^2: 3.87701e+22
##
## AIC AICc BIC
## 3258.053 3259.207 3268.356
Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
# Filter the aus_production dataset for Gas data
gas_data <- aus_production %>%
dplyr::select(Quarter, Gas)
fit_ets <- gas_data %>%
model(ETS(Gas ~ error("A") + trend("A") + season("M")))
fit_ets
## # A mable: 1 x 1
## `ETS(Gas ~ error("A") + trend("A") + season("M"))`
## <model>
## 1 <ETS(A,A,M)>
# Forecast the next few years (e.g., next 12 quarters or 3 years)
fc_ets <- fit_ets %>% forecast(h = 12)
head(fc_ets)
## # A fable: 6 x 4 [1Q]
## # Key: .model [1]
## .model Quarter Gas .mean
## <chr> <qtr> <dist> <dbl>
## 1 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2010 Q3 sample[5000] 256.
## 2 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2010 Q4 sample[5000] 212.
## 3 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q1 sample[5000] 203.
## 4 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q2 sample[5000] 242.
## 5 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q3 sample[5000] 261.
## 6 "ETS(Gas ~ error(\"A\") + trend(\"A\") + season(\"… 2011 Q4 sample[5000] 217.
# Plot the historical data and the forecast
gas_data %>%
autoplot(Gas) +
autolayer(fc_ets, series = "ETS", PI = FALSE) +
labs(title = "Gas Production Forecast with ETS Model",
y = "Gas Production", x = "Year") +
theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
Multiplicative seasonality is necessary when the seasonal variations increase or decrease proportionally to the level of the series. In gas production data, as production increases, seasonal effects (like higher demand in certain quarters) likely become more pronounced, making a multiplicative seasonality model more appropriate.
fit_ets_damped <- gas_data %>%
model(ETS(Gas ~ error("A") + trend("Ad") + season("M")))
fit_ets_damped
## # A mable: 1 x 1
## `ETS(Gas ~ error("A") + trend("Ad") + season("M"))`
## <model>
## 1 <ETS(A,Ad,M)>
# Forecast the next few years with the damped trend
fc_ets_damped <- fit_ets_damped %>% forecast(h = 12)
head(fc_ets_damped)
## # A fable: 6 x 4 [1Q]
## # Key: .model [1]
## .model Quarter Gas .mean
## <chr> <qtr> <dist> <dbl>
## 1 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2010 Q3 sample[5000] 256.
## 2 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2010 Q4 sample[5000] 212.
## 3 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q1 sample[5000] 202.
## 4 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q2 sample[5000] 241.
## 5 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q3 sample[5000] 259.
## 6 "ETS(Gas ~ error(\"A\") + trend(\"Ad\") + season(\… 2011 Q4 sample[5000] 214.
# Plot the original and damped trend models with distinct colors
gas_data %>%
autoplot(Gas) +
autolayer(fc_ets, series = "ETS", PI = FALSE, color = "blue") + # Original ETS model in blue
autolayer(fc_ets_damped, series = "ETS with Damped Trend", PI = FALSE, color = "red") + # Damped trend ETS model in red
labs(title = "Gas Production Forecast: Comparing ETS and Damped Trend",
y = "Gas Production", x = "Year") +
scale_color_manual(values = c("ETS" = "blue", "ETS with Damped Trend" = "red")) + # Color mapping for the legend
theme_minimal()
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
## Warning: No shared levels found between `names(values)` of the manual scale and the
## data's colour values.
I would argue yes. Damping the trend can provide a more moderate and potentially more realistic forecast, especially if external factors or domain knowledge(like resource limits or market saturation) are expected to affect future production growth.
In this particular case, the forecasted values in the original model(blue) show an unrealistic continuation of the trend, especially towards the end of the forecast horizon, and damping (red) likely improves the forecast by making it more plausible and stable.In general, damping tends to provide a more reliable forecast for long-term projections by preventing exaggerated trend growth.
Best practice: As a general rule, a careful comparison of the forecasted values and their alignment with domain knowledge (e.g., future production constraints or market behavior) can help determine if damping does indeed improves the forecast.
Recall your retail time series data (from Exercise 7 in Section 2.10). Why is multiplicative seasonality necessary for this series? Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer? Check that the residuals from the best method look like white noise. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
## Recall Exercise 7 in Section 2.10
set.seed(13101917)
retail_series <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
Let’s fit both the standard Holt-Winters model and the damped trend model to the time series data.
fit_hw <- retail_series %>%
model(ETS(Turnover ~ error("A") + trend("A") + season("M")))
fit_hw
## # A mable: 1 x 3
## # Key: State, Industry [1]
## State Industry ETS(Turnover ~ error("A") + trend("A") + s…¹
## <chr> <chr> <model>
## 1 South Australia Department stores <ETS(A,A,M)>
## # ℹ abbreviated name: ¹`ETS(Turnover ~ error("A") + trend("A") + season("M"))`
fit_hw_damped <- retail_series %>%
model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fit_hw_damped
## # A mable: 1 x 3
## # Key: State, Industry [1]
## State Industry ETS(Turnover ~ error("A") + trend("Ad") + …¹
## <chr> <chr> <model>
## 1 South Australia Department stores <ETS(A,Ad,M)>
## # ℹ abbreviated name: ¹`ETS(Turnover ~ error("A") + trend("Ad") + season("M"))`
fc_hw_one_step <- fit_hw %>% forecast(h = "1 month")
fc_hw_damped_one_step <- fit_hw_damped %>% forecast(h = "1 month")
fc_hw_one_step
## # A fable: 1 x 6 [1M]
## # Key: State, Industry, .model [1]
## State Industry .model Month Turnover .mean
## <chr> <chr> <chr> <mth> <dist> <dbl>
## 1 South Australia Department stores "ETS(Turnover ~… 2019 Jan sample[5000] 103.
rmse_hw <- sqrt(mean((retail_series$Turnover[1:(nrow(retail_series) - 1)] -
fc_hw_one_step$.mean)^2, na.rm = TRUE))
rmse_hw_damped <- sqrt(mean((retail_series$Turnover[1:(nrow(retail_series) - 1)] -
fc_hw_damped_one_step$.mean)^2, na.rm = TRUE))
# Display RMSE values
cat("RMSE for Holt-Winters model: ", rmse_hw, "\n")
## RMSE for Holt-Winters model: 32.25201
cat("RMSE for Damped Holt-Winters model: ", rmse_hw_damped, "\n")
## RMSE for Damped Holt-Winters model: 32.26309
fc_hw <- fit_hw %>% forecast(h = "1 year")
fc_hw_damped <- fit_hw_damped %>% forecast(h = "1 year")
head(fc_hw)
## # A fable: 6 x 6 [1M]
## # Key: State, Industry, .model [1]
## State Industry .model Month Turnover .mean
## <chr> <chr> <chr> <mth> <dist> <dbl>
## 1 South Australia Department stores "ETS(Turnover ~… 2019 Jan sample[5000] 103.
## 2 South Australia Department stores "ETS(Turnover ~… 2019 Feb sample[5000] 79.3
## 3 South Australia Department stores "ETS(Turnover ~… 2019 Mar sample[5000] 100.
## 4 South Australia Department stores "ETS(Turnover ~… 2019 Apr sample[5000] 104.
## 5 South Australia Department stores "ETS(Turnover ~… 2019 May sample[5000] 107.
## 6 South Australia Department stores "ETS(Turnover ~… 2019 Jun sample[5000] 110.
retail_series %>%
autoplot(Turnover) +
autolayer(fc_hw, series = "Holt-Winters", PI = FALSE) +
autolayer(fc_hw_damped, series = "Holt-Winters Damped", PI = FALSE) +
labs(
title = "Turnover Forecast with Holt-Winters Models",
y = "Turnover",
x = "Month"
) +
theme_minimal() +
guides(colour = guide_legend(title = "Model"))
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series` and `PI`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series` and `PI`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
Multiplicative seasonality is necessary when the seasonal variations increase or decrease proportionally to the level of the series. For retail sales, as the sales figures grow (e.g., during holiday seasons), the seasonal effects (like increased sales during Christmas) also tend to grow proportionately. Therefore, a multiplicative model can better capture these dynamics than an additive model.
Let’s fit both the standard Holt-Winters model and the damped trend model to the time series data.
accuracy_hw <- fit_hw %>% accuracy()
accuracy_hw_damped <- fit_hw_damped %>% accuracy()
accuracy_hw
## # A tibble: 1 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sout… Departm… "ETS(… Trai… -0.326 4.74 3.59 -0.496 3.76 0.740 0.779 -0.119
rmse_hw <- accuracy_hw["RMSE"]
print(rmse_hw )
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 4.74
rmse_hw_damped <- accuracy_hw_damped["RMSE"]
print(rmse_hw_damped )
## # A tibble: 1 × 1
## RMSE
## <dbl>
## 1 4.68
The best fitted model here is (ETS(Turnover ~ error(“A”) + trend(“Ad”) + season(“M)
# Set seed for reproducibility
set.seed(13101917)
# Select a random retail series
retail_series <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
# Ensure there are no gaps in the time series data
retail_series <- retail_series %>%
fill_gaps()
# Fit the ETS model with a damped trend
fit_hw_damped <- retail_series %>%
model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
# Report on the model to ensure it was fitted correctly
report(fit_hw_damped)
## Series: Turnover
## Model: ETS(A,Ad,M)
## Smoothing parameters:
## alpha = 0.09088322
## beta = 0.01836772
## gamma = 0.2398043
## phi = 0.9501628
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5]
## 50.36808 0.4017832 0.8753357 0.7348669 0.7993526 1.737597 1.128757 0.9765879
## s[-6] s[-7] s[-8] s[-9] s[-10] s[-11]
## 0.9557908 0.9153437 0.9624873 0.9213728 1.048251 0.9442572
##
## sigma^2: 22.8161
##
## AIC AICc BIC
## 4083.146 4084.767 4156.749
accuracy(fit) %>%
dplyr:: select(.model, RMSE)
## # A tibble: 3 × 2
## .model RMSE
## <chr> <dbl>
## 1 Holt 189990266650.
## 2 DampedTrend 190208894133.
## 3 Holt_Log_transformed 286414589325.
retail_series %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
gg_tsresiduals() +
ggtitle("Multiplicative Method")
#Box-Pierce test, ℓ=2m for seasonal data, m=12
retail_series %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
augment() %>%
features(.innov, box_pierce, lag = 24, dof = 0)
## # A tibble: 1 × 5
## State Industry .model bp_stat bp_pvalue
## <chr> <chr> <chr> <dbl> <dbl>
## 1 South Australia Department stores multiplicative 106. 2.61e-12
series_train <- retail_series %>%
filter(year(Month) < 2011)
fit_train <- series_train %>%
model(multi = ETS(Turnover ~ error("M") + trend("A") + season("M")),
snaive = SNAIVE(Turnover))
#producing forecasts
fc <- fit_train %>%
forecast(new_data = anti_join(retail_series, series_train))
## Joining with `by = join_by(State, Industry, `Series ID`, Month, Turnover)`
fc %>% autoplot(retail_series, level = NULL)
# Fit the ETS model with damped trend
fit_hw_damped <- retail_series %>%
model(ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
fit_hw_damped
## # A mable: 1 x 3
## # Key: State, Industry [1]
## State Industry ETS(Turnover ~ error("A") + trend("Ad") + …¹
## <chr> <chr> <model>
## 1 South Australia Department stores <ETS(A,Ad,M)>
## # ℹ abbreviated name: ¹`ETS(Turnover ~ error("A") + trend("Ad") + season("M"))`
# Forecast for the test set
fc_hw_damped <- fit_hw_damped %>% forecast(h = nrow(retail_series))
head(fc_hw_damped)
## # A fable: 6 x 6 [1M]
## # Key: State, Industry, .model [1]
## State Industry .model Month Turnover .mean
## <chr> <chr> <chr> <mth> <dist> <dbl>
## 1 South Australia Department stores "ETS(Turnover ~… 2019 Jan sample[5000] 103.
## 2 South Australia Department stores "ETS(Turnover ~… 2019 Feb sample[5000] 79.5
## 3 South Australia Department stores "ETS(Turnover ~… 2019 Mar sample[5000] 101.
## 4 South Australia Department stores "ETS(Turnover ~… 2019 Apr sample[5000] 104.
## 5 South Australia Department stores "ETS(Turnover ~… 2019 May sample[5000] 108.
## 6 South Australia Department stores "ETS(Turnover ~… 2019 Jun sample[5000] 111.
# Calculate RMSE for the ETS model
rmse_hw_damped <- accuracy(fc_hw_damped, retail_series)$RMSE
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
## 441 observations are missing between 2019 Jan and 2055 Sep
rmse_hw_damped
## [1] NaN
accuracy(fit_train) %>%
dplyr:: select(.type, .model, RMSE)
## # A tibble: 2 × 3
## .type .model RMSE
## <chr> <chr> <dbl>
## 1 Training multi 4.96
## 2 Training snaive 6.30
# Seasonal Naïve Model
fit_snaive <- retail_series %>%
model(SNAIVE(Turnover))
fit_snaive
## # A mable: 1 x 3
## # Key: State, Industry [1]
## State Industry `SNAIVE(Turnover)`
## <chr> <chr> <model>
## 1 South Australia Department stores <SNAIVE>
# Forecast for the test set using Seasonal Naïve
fc_snaive <- fit_snaive %>% forecast(h = nrow(retail_series))
head(fc_snaive)
## # A fable: 6 x 6 [1M]
## # Key: State, Industry, .model [1]
## State Industry .model Month Turnover .mean
## <chr> <chr> <chr> <mth> <dist> <dbl>
## 1 South Australia Department stores SNAIVE(Turnover) 2019 Jan N(99, 37) 99
## 2 South Australia Department stores SNAIVE(Turnover) 2019 Feb N(79, 37) 78.8
## 3 South Australia Department stores SNAIVE(Turnover) 2019 Mar N(101, 37) 101
## 4 South Australia Department stores SNAIVE(Turnover) 2019 Apr N(98, 37) 98.2
## 5 South Australia Department stores SNAIVE(Turnover) 2019 May N(112, 37) 112.
## 6 South Australia Department stores SNAIVE(Turnover) 2019 Jun N(110, 37) 110.
# Calculate RMSE for the Seasonal Naïve model
rmse_snaive <- accuracy(fc_snaive, retail_series)$RMSE
## Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing.
## 441 observations are missing between 2019 Jan and 2055 Sep
rmse_snaive
## [1] NaN
# Display RMSE results
cat("RMSE for ETS model with Damped Trend:", rmse_hw_damped, "\n")
## RMSE for ETS model with Damped Trend: NaN
cat("RMSE for Seasonal Naïve model:", rmse_snaive, "\n")
## RMSE for Seasonal Naïve model: NaN
fc %>% accuracy(retail_series) %>%
dplyr:: select(.type, .model, RMSE)
## # A tibble: 2 × 3
## .type .model RMSE
## <chr> <chr> <dbl>
## 1 Test multi 5.29
## 2 Test snaive 12.2
It’s obvioust that ETS model with Damped Trend outperforms the Seasonal Naïve model.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
set.seed(13101917)
retail_series <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`, 1))
retail_series %>% head()
## # A tsibble: 6 x 5 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover
## <chr> <chr> <chr> <mth> <dbl>
## 1 South Australia Department stores A3349658K 1982 Apr 48.9
## 2 South Australia Department stores A3349658K 1982 May 52.2
## 3 South Australia Department stores A3349658K 1982 Jun 48.9
## 4 South Australia Department stores A3349658K 1982 Jul 48.3
## 5 South Australia Department stores A3349658K 1982 Aug 49.4
## 6 South Australia Department stores A3349658K 1982 Sep 48.5
# Calculate the optimal lambda using the guerrero method
lambda <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`, 1)) %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# Apply the Box-Cox transformation using the calculated lambda
retail_series_transformed <- aus_retail %>%
mutate(Turnover_transformed = box_cox(Turnover, lambda))
head(retail_series_transformed)
## # A tsibble: 6 x 6 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover Turnover_transformed
## <chr> <chr> <chr> <mth> <dbl> <dbl>
## 1 Australian Capita… Cafes, … A3349849A 1982 Apr 4.4 1.45
## 2 Australian Capita… Cafes, … A3349849A 1982 May 3.4 1.20
## 3 Australian Capita… Cafes, … A3349849A 1982 Jun 3.6 1.26
## 4 Australian Capita… Cafes, … A3349849A 1982 Jul 4 1.36
## 5 Australian Capita… Cafes, … A3349849A 1982 Aug 3.6 1.26
## 6 Australian Capita… Cafes, … A3349849A 1982 Sep 4.2 1.41
# Plot the transformed Turnover data
retail_series_transformed %>%
autoplot(Turnover_transformed) +
labs(y = "Turnover (Transformed)",
title = latex2exp::TeX(paste0(
"Transformed retail turnover with $\\lambda$ = ",
round(lambda, 2)))) +
theme_minimal()
### Australia’ print_retail forecasting before transformation
print_retail <-aus_retail %>%
filter(Industry == "Newspaper and book retailing") %>%
summarize(Turnover = sum(Turnover))
head(print_retail)
## # A tsibble: 6 x 2 [1M]
## Month Turnover
## <mth> <dbl>
## 1 1982 Apr 134
## 2 1982 May 134.
## 3 1982 Jun 127.
## 4 1982 Jul 128.
## 5 1982 Aug 131.
## 6 1982 Sep 134.
autoplot(print_retail)
## Plot variable not specified, automatically selected `.vars = Turnover`
# Adjusting Australia retail print to economic inflation, CPI
print_retail <- aus_retail |>
filter(Industry == "Newspaper and book retailing") |>
group_by(Industry) |>
index_by(Year = year(Month)) |>
summarise(Turnover = sum(Turnover))
aus_economy <- global_economy |>
filter(Code == "AUS")
print_retail |>
left_join(aus_economy, by = "Year") |>
mutate(Adjusted_turnover = Turnover / CPI * 100) |>
pivot_longer(c(Turnover, Adjusted_turnover),
values_to = "Turnover") |>
mutate(name = factor(name,
levels=c("Turnover","Adjusted_turnover"))) |>
ggplot(aes(x = Year, y = Turnover)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y") +
labs(title = "Turnover: Australian print media industry",
y = "$AU")
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
retail_series_transformed <- print_retail %>%
autoplot(sqrt(Turnover)) +
labs(y = " Square root turnover")
retail_series_transformed
retail_series_transformed <- print_retail %>%
autoplot(log(Turnover)) +
labs(y = " Square root turnover")
retail_series_transformed
lambda <- retail_series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.03543504
retail_series_transformed <- retail_series %>%tsibble:::as_tsibble.tbl_ts(., index = Month)
# Ensure 'Year' is the correct index
# Apply Guerrero transformation to stabilize variance
lambda <- retail_series_transformed %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# Apply the Box-Cox transformation with the estimated lambda
retail_series_transformed <- retail_series_transformed %>%
mutate(Turnover_transformed = box_cox(Turnover, lambda))
# Check the structure of the transformed data
print(head(retail_series_transformed))
## # A tsibble: 6 x 6 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover Turnover_transformed
## <chr> <chr> <chr> <mth> <dbl> <dbl>
## 1 South Australia Department… A3349658K 1982 Apr 48.9 4.17
## 2 South Australia Department… A3349658K 1982 May 52.2 4.25
## 3 South Australia Department… A3349658K 1982 Jun 48.9 4.17
## 4 South Australia Department… A3349658K 1982 Jul 48.3 4.16
## 5 South Australia Department… A3349658K 1982 Aug 49.4 4.18
## 6 South Australia Department… A3349658K 1982 Sep 48.5 4.16
# Now plot the original and transformed turnover over the years
gg <- retail_series_transformed %>%
ggplot(aes(x = Year)) +
geom_line(aes(y = Turnover, color = "Original Turnover")) +
geom_line(aes(y = Turnover_transformed, color = "Transformed Turnover")) +
labs(
title = latex2exp::TeX(paste0(
"Turnover: Original and Transformed (Guerrero) $\\lambda = ",
round(lambda, 2))),
y = "Turnover ($AU)",
x = "Year"
) +
scale_color_manual(values = c("Original Turnover" = "blue",
"Transformed Turnover" = "red")) +
theme_minimal()
library(dplyr)
library(ggplot2)
library(fable)
library(tsibble)
library(fabletools) # For the features function
library(latex2exp) # For LaTeX expression in titles
# Convert the data to a tsibble using 'Month' as the index
retail_series_transformed <- retail_series %>%
as_tsibble(index = Month) # Ensure 'Month' is the correct index
# Apply Guerrero transformation to stabilize variance
lambda <- retail_series_transformed %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# Apply the Box-Cox transformation with the estimated lambda
retail_series_transformed <- retail_series_transformed %>%
mutate(Turnover_transformed = box_cox(Turnover, lambda))
# Check the structure of the transformed data
print(head(retail_series_transformed))
## # A tsibble: 6 x 6 [1M]
## # Key: State, Industry [1]
## State Industry `Series ID` Month Turnover Turnover_transformed
## <chr> <chr> <chr> <mth> <dbl> <dbl>
## 1 South Australia Department… A3349658K 1982 Apr 48.9 4.17
## 2 South Australia Department… A3349658K 1982 May 52.2 4.25
## 3 South Australia Department… A3349658K 1982 Jun 48.9 4.17
## 4 South Australia Department… A3349658K 1982 Jul 48.3 4.16
## 5 South Australia Department… A3349658K 1982 Aug 49.4 4.18
## 6 South Australia Department… A3349658K 1982 Sep 48.5 4.16
# Now plot the original and transformed turnover over the months
gg <- retail_series_transformed %>%
ggplot(aes(x = Month)) + # Change x-axis to Month
geom_line(aes(y = Turnover, color = "Original Turnover")) +
geom_line(aes(y = Turnover_transformed, color = "Transformed Turnover")) +
labs(
title = latex2exp::TeX(paste0(
"Turnover: Original and Transformed (Guerrero) $\\lambda = ",
round(lambda, 2))),
y = "Turnover ($AU)",
x = "Month"
) +
scale_color_manual(values = c("Original Turnover" = "blue",
"Transformed Turnover" = "red")) +
theme_minimal()
# Print the ggplot object
print(gg)
# Filter for Newspaper and book retailing, and aggregate by year
print_retail <- aus_retail %>%
filter(Industry == "Newspaper and book retailing", year(Month) >= 1990) %>%
group_by(Year = year(Month)) %>%
summarize(Turnover = sum(Turnover, na.rm = TRUE)) %>%
ungroup()
# Plot the turnover data
autoplot(print_retail, Turnover) +
labs(y = "Turnover (in thousands)",
title = "Total turnover in Newspaper and Book Retailing in Australia")
print_retail %>%
model(stl = STL(Turnover))
## # A mable: 29 x 2
## # Key: Year [29]
## Year stl
## <dbl> <model>
## 1 1990 <STL>
## 2 1991 <STL>
## 3 1992 <STL>
## 4 1993 <STL>
## 5 1994 <STL>
## 6 1995 <STL>
## 7 1996 <STL>
## 8 1997 <STL>
## 9 1998 <STL>
## 10 1999 <STL>
## # ℹ 19 more rows
lambda <- retail_series %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
lambda
## [1] 0.03543504
#Ljung-Box test
retail_series %>%
model(multiplicative = ETS(Turnover ~ error("M") + trend("A") + season("M"))) %>%
augment() %>%
features(.innov, ljung_box, lag = 24, dof = 0)
## # A tibble: 1 × 5
## State Industry .model lb_stat lb_pvalue
## <chr> <chr> <chr> <dbl> <dbl>
## 1 South Australia Department stores multiplicative 109. 7.32e-13
c
## function (...) .Primitive("c")