library(tidyverse)
library(fpp3)
library(kableExtra)
Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
ETS()
function to estimate the equivalent model
for simple exponential smoothing. Find the optimal values of \(\alpha\) and \(l_0\), and generate forecasts for the next
four months.Optimal values of \(\alpha = 0.3221\) and \(l_0 = 100646.6\)
vic_pigs_mod <-
vic_pigs |>
model(ETS(Count ~ error("A") + trend("N") + season("N")))
report(vic_pigs_mod)
## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.3221247
##
## Initial states:
## l[0]
## 100646.6
##
## sigma^2: 87480760
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
vic_pigs_forecast <-
vic_pigs_mod |>
forecast(h = 4)
kbl(vic_pigs_forecast) |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Animal | State | .model | Month | Count | .mean |
---|---|---|---|---|---|
Pigs | Victoria | ETS(Count ~ error(“A”) + trend(“N”) + season(“N”)) | 2019 Jan | N(95187, 8.7e+07) | 95186.56 |
Pigs | Victoria | ETS(Count ~ error(“A”) + trend(“N”) + season(“N”)) | 2019 Feb | N(95187, 9.7e+07) | 95186.56 |
Pigs | Victoria | ETS(Count ~ error(“A”) + trend(“N”) + season(“N”)) | 2019 Mar | N(95187, 1.1e+08) | 95186.56 |
Pigs | Victoria | ETS(Count ~ error(“A”) + trend(“N”) + season(“N”)) | 2019 Apr | N(95187, 1.1e+08) | 95186.56 |
vic_pigs_forecast |>
autoplot(vic_pigs) +
labs(title = "Four Month Forecast Victorian Pigs Slaughter", x='')
We can see that R calculates a slightly wider 95% prediction interval than when we manually calculate it.
vic_pigs_mod_resid <-
residuals(vic_pigs_mod)
s <- sd(vic_pigs_mod_resid$.resid)
forecast_mean <-
vic_pigs_forecast$.mean[1]
lower <- forecast_mean - 1.96 * s
upper <- forecast_mean + 1.96 * s
kbl(tibble(lower = lower, upper = upper), caption = "95% Prediction Interval") |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
lower | upper |
---|---|
76871.01 | 113502.1 |
vic_pigs_forecast |>
hilo(95) |>
head(1) |>
select("95%")
## # A tsibble: 1 x 2 [1M]
## `95%` Month
## <hilo> <mth>
## 1 [76854.79, 113518.3]95 2019 Jan
Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
There is no seasonality, while the overall trend looks to be increasing over the years.
us_exports |>
autoplot(Exports) +
labs(title = 'USA Exports', x = '', y="% of GDP", caption = "Source: World Bank 1960-2017")
ETS(A,N,N)
model to forecast the series, and
plot the forecasts.us_exports_forecast_ann <-
us_exports_mod_ann |>
forecast(h = 10)
us_exports_forecast_ann |>
autoplot(us_exports) +
labs(title = 'USA Exports 10 Year Forecast (ANN)', x="")
accuracy(us_exports_mod_ann)$RMSE
## [1] 0.6270672
ETS(A,A,N)
model.
(Remember that the trended model is using one more parameter than the
simpler model.) Discuss the merits of the two forecasting methods for
this data set.We can see that the simpler model has a slightly higher RMSE. This may mean that our trended model may be a slightly better model to forecast with.
us_exports_forecast_aan <-
us_exports_mod_aan |>
forecast(h = 10)
us_exports_forecast_aan |>
autoplot(us_exports) +
labs(title = 'USA Exports 10 Year Forecast (AAN)', x="")
accuracy(us_exports_mod_aan)$RMSE
## [1] 0.6149566
The trended model has an increasing outlook into exports percentages compared to the simpler model being more stagnant and leveled to its current performance. We saw how the trended model has a slightly better RMSE, however, it is not significant enough to warrant that this model is good to begin with, We are unable to explain the troughs throughout the series, and because of this may not fully understand what the future holds within 10 years.
The manually calculated prediction intervals for both models has a slightly smaller interval compared to the R calculated one.
model | lower | upper |
---|---|---|
ANN | 10.67559 | 13.10577 |
AAN | 10.79077 | 13.22246 |
model | 95% | Year |
---|---|---|
ANN | [10.63951, 13.14186]95 | 2017 |
AAN | [10.75667, 13.25656]95 | 2017 |
Forecast the Chinese GDP from the global_economy data set
using an ETS model. Experiment with the various options in the
ETS()
function to see how much the forecasts change with
damped trend, or with a Box-Cox transformation. Try to develop an
intuition of what each is doing to the forecasts.
[Hint: use a relatively large value of h
when
forecasting, so you can clearly see the differences between the various
options when plotting the forecasts.]
We can see China’s GDP growth is quite exponential.
As we build our models, we see a few interesting outputs. The simple model (ANN) does not have the continued growth we have seen and flat lines to the latest years. As for the trended models (AAN), we see an increasing trend but the damped model is slightly less increasing. With the Box-Cox transformations, the damped version grows dramatically compared to the AAN and ANN models, and the regular Box-Cox explodes even more exponentially. The Box-Cox methods assumes the growth will continue at such a large exponential growth and looks somewhat unrelaistic.
china_gdp_models <-
china_gdp |>
model("ANN" = ETS(GDP ~ error("A") + trend("N") + season("N")),
"AAN" = ETS(GDP ~ error("A") + trend("A") + season("N")),
"AAN (Damped)" = ETS(GDP ~ error("A") + trend("Ad") + season("N")),
"Box-Cox" = ETS(box_cox(GDP,china_lambda) ~ error("A") + trend("A") + season("N")),
"Box-Cox (Damped)" = ETS(box_cox(GDP,china_lambda) ~ error("A") + trend("Ad") + season("N")))
china_gdp_forecasts <-
china_gdp_models |>
forecast(h = 23)
china_gdp_forecasts |>
autoplot(china_gdp, level=NULL) +
labs(title = "Forecasting China's GDP (23 Years)", x="")
Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
The multiplicative seasonality is required as we see the seasonal variance increasing as the years progress.
gas_models <-
aus_gas |>
model("additive" = ETS(Gas ~ error("A") + trend("A") + season("A")),
"multiplicative" = ETS(Gas ~ error("M") + trend("A") + season("M")),
"multiplicative (damped)" = ETS(Gas ~ error("M") + trend("Ad") + season("M")))
gas_forecasts <-
gas_models |>
forecast(h = "5 years")
gas_forecasts |>
autoplot(aus_gas, level=NULL) +
labs(title = "Forecasting AUS Gas Production (5 Years)", x="")
Looking at the RMSE the damped model has a slightly better number than with just a normal multiplicative model.
gas_rmse <-
accuracy(gas_models)
kbl(gas_rmse) |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
.model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
---|---|---|---|---|---|---|---|---|---|
additive | Training | 0.0052496 | 4.763979 | 3.345044 | -4.6854479 | 10.864395 | 0.6000330 | 0.6282550 | 0.0772293 |
multiplicative | Training | -0.1148865 | 4.595113 | 3.021727 | 0.1987614 | 4.082508 | 0.5420365 | 0.6059856 | -0.0130629 |
multiplicative (damped) | Training | -0.0043856 | 4.591840 | 3.031478 | 0.3258961 | 4.104484 | 0.5437856 | 0.6055540 | -0.0217035 |
Recall your retail time series data (from Exercise 7 in Section 2.10).
We see the variance of the seasons becoming larger as we progress through the time series.
print_media |>
autoplot(Turnover) +
labs(title = "Western Australia Newspaper and Book Retailing Turnover", x = "", y="Turnover ($ Million AUD)")
print_media_models <-
print_media |>
model("multiplicative" = ETS(Turnover ~ error("M") + trend("A") + season("M")),
"multiplicative (damped)" = ETS(Turnover ~ error("M") + trend("Ad") + season("M")))
print_media_forecasts <-
print_media_models |>
forecast(h = "5 years")
print_media_forecasts |>
autoplot(print_media, level=NULL) +
labs(title = "Forecasting Western AUS Newspaper and Book Turnover (5 Years)", x="", y="Turnover ($ Million AUD)")
The damped version of the model has a slightly better RMSE. It is also not as aggressive in the turnover forecast given the past results.
print_rmse<-
accuracy(print_media_models)
kbl(print_rmse) |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
State | Industry | .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
---|---|---|---|---|---|---|---|---|---|---|---|
Western Australia | Newspaper and book retailing | multiplicative | Training | -0.0917712 | 2.544763 | 1.754189 | -0.9258251 | 6.827673 | 0.4208640 | 0.4494494 | 0.0367818 |
Western Australia | Newspaper and book retailing | multiplicative (damped) | Training | 0.0632780 | 2.544093 | 1.752262 | -0.3224721 | 6.796918 | 0.4204019 | 0.4493312 | 0.0032401 |
The lag plot shows only two lags of concern outside of the threshold and the residuals are right-skewed distributed. Because most of the lags in the ACF are not significantly different from zero, we confirm that this is white noise.
print_media_models_best <-
print_media_models |>
select("multiplicative (damped)")
print_media_models_best |>
gg_tsresiduals()
The seasonal naive approach has a worse RMSE compared to the multiplicative (damped) model.
print_media_train <-
print_media |>
filter(Month < yearmonth("Jan 2011"))
print_media_models <-
print_media_train |>
model("multiplicative (damped)" = ETS(Turnover ~ error("M") + trend("Ad") + season("M")),
"seasonal naive" = SNAIVE(Turnover))
print_media_forecasts <-
print_media_models |>
forecast(new_data = anti_join(print_media, print_media_train))
print_media_forecasts |>
autoplot(print_media)
forecast_rmse <-
print_media_forecasts |>
accuracy(print_media)
kbl(forecast_rmse) |>
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
.model | State | Industry | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
---|---|---|---|---|---|---|---|---|---|---|---|
multiplicative (damped) | Western Australia | Newspaper and book retailing | Test | 4.912083 | 8.116245 | 6.528112 | 10.611067 | 17.28675 | 1.594441 | 1.431081 | 0.8292389 |
seasonal naive | Western Australia | Newspaper and book retailing | Test | 3.044792 | 9.108427 | 7.340625 | 6.055847 | 19.91692 | 1.792891 | 1.606026 | 0.5132505 |
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
The RMSE significantly improved on the test set.
lambda <-
print_media |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
print_media <-
print_media |>
mutate(turnover_box_cox = box_cox(Turnover, lambda))
print_media_stl <-
print_media |>
model(STL(turnover_box_cox)) |>
components()
print_media_stl |>
autoplot()
.model | State | Industry | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
---|---|---|---|---|---|---|---|---|---|---|---|
multiplicative (damped) | Western Australia | Newspaper and book retailing | Test | 0.1482623 | 0.2089708 | 0.1782171 | 4.394424 | 5.4142 | 1.270864 | 1.194786 | 0.9158683 |