Forecasting ETS and ARIMA
Introduction
These are exercises on chapter 8 and 9 from “Forecasting Principles and Practice”
8.1
Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.
Compute a 95% prediction interval for the first forecast using ^y ± 1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
Answer
α = 0.3221247 ℓ0 = 100646.6
Plotted the decomposition but it appears that the data is generally stationary. We could model on the error term.
Manual limits and then using HILO function, revealed very little discrepancy in the confidence interval.
#a.
vic_pigs <- aus_livestock %>%
filter(State == "Victoria", Animal == "Pigs")
vic_pigs %>%
autoplot(Count) +
theme_ipsum_es() +
labs(y = "Count"
, x = "Date"
, title = "Pigs Slaughtered, Victoria") +
geom_smooth(formula = y ~ x, method = "loess")model_pigs <- vic_pigs %>%
model(ETS(Count~error("A") + trend("N") + season("N")))
report(model_pigs)## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.3221247
##
## Initial states:
## l[0]
## 100646.6
##
## sigma^2: 87480760
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
#Actual Model
pig_decomp <- vic_pigs %>%
model(ETS(Count~error("A") + trend("N") + season("N"))) %>%
components()
#Plot ETS decomposition
autoplot(pig_decomp) # +
# labs(title =
# "Decomposition of total pigs slghtrd using ETS")
#Forecast 4 months using "Model Pigs", using ETS model
fcst_pigs <-
model_pigs %>%
forecast(h = 4)
#Plot Forecast
vic_pigs_plot <-
vic_pigs %>%
filter(yearmonth(Month) > yearmonth("2015 Dec")) %>%
index_by(Date = as_date(Month))
fcst_pigs %>%
autoplot(vic_pigs_plot) +
labs(x = "Month",
y = "Slaughter, Head",
title = "Monthly Slaughter + Forecast",
subtitle = "Pigs in Victoria") +
theme_ipsum_es()## `mutate_if()` ignored the following grouping variables:
## Column `Date`
#b.
resid_pigs<-augment(model_pigs)
#standard deviation
sd <-sd(resid_pigs$.resid)
knitr::kable(paste("Standard Deviation = ",sd),"html")| x |
|---|
| Standard Deviation = 9344.66579258967 |
# Confidence Interval
upper <- fcst_pigs$.mean[1] + 1.96*sd
mean <- fcst_pigs$.mean[1]
lower <- fcst_pigs$.mean[1] - 1.96*sd
knitr::kable(rbind(upper, mean, lower), "html")| upper | 113502.10 |
| mean | 95186.56 |
| lower | 76871.01 |
# knitr::kable(paste("Upper Limit equals = ",upper),"html")
# knitr::kable(paste("Point Forecast = ", mean),"html")
# knitr::kable(paste("Lower Limit equals = ",lower),"html")
#hilo function -
int <-hilo(fcst_pigs)
head(int)## # A tsibble: 4 x 8 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month Count .mean `80%`
## <fct> <fct> <chr> <mth> <dist> <dbl> <hilo>
## 1 Pigs Victor~ "ETS(~ 2019 Jan N(95187, 8.7e+07) 95187. [83200.06, 107173.1]80
## 2 Pigs Victor~ "ETS(~ 2019 Feb N(95187, 9.7e+07) 95187. [82593.52, 107779.6]80
## 3 Pigs Victor~ "ETS(~ 2019 Mar N(95187, 1.1e+08) 95187. [82014.88, 108358.2]80
## 4 Pigs Victor~ "ETS(~ 2019 Apr N(95187, 1.1e+08) 95187. [81460.61, 108912.5]80
## # ... with 1 more variable: `95%` <hilo>
int_hilo <- int$`95%`[1]
knitr::kable(paste("Limits equals = ", int_hilo),"html")| x |
|---|
| Limits equals = [76854.7888896402, 113518.325972343]95 |
8.2
Write your own function to implement simple exponential smoothing. The function should take arguments y (the time series), alpha (the smoothing parameter α) and level (the initial level ℓ0). It should return the forecast of the next observation in the series. Does it give the same forecast as ETS()?
Answer
I had to use the pigs dataset as ts and not aus_livestock as I was getting too many errors trying to use ETS (model “ANN”). The forecast using alpha, and the level, are the same as using the ses function ()
#borrowed from Jacob Stemmerich
tail(pigs)## Mar Apr May Jun Jul Aug
## 1995 106723 84307 114896 106749 87892 100506
tail(vic_pigs)## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 2018 Jul Pigs Victoria 101300
## 2 2018 Aug Pigs Victoria 102500
## 3 2018 Sep Pigs Victoria 82600
## 4 2018 Oct Pigs Victoria 100700
## 5 2018 Nov Pigs Victoria 98500
## 6 2018 Dec Pigs Victoria 92300
#used "pigs" from fpp2 as ts not aus_livestock as tibble
fc = ses(pigs, h=4)
fc_ses <-forecast(fc, h=4)
summary(fc)##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
summary(fc_ses) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
|---|---|---|---|---|---|
| Sep 1995 | 98816.41 | 85605.43 | 112027.4 | 78611.97 | 119020.8 |
| Oct 1995 | 98816.41 | 85034.52 | 112598.3 | 77738.83 | 119894.0 |
| Nov 1995 | 98816.41 | 84486.34 | 113146.5 | 76900.46 | 120732.4 |
| Dec 1995 | 98816.41 | 83958.37 | 113674.4 | 76092.99 | 121539.8 |
ses_manual = function(y, alpha, level){
x = level
for (i in 1:length(y)) {
x = alpha*y[i] + (1 - alpha)*x
}
cat("Forecast of next observation by SES function: ",
as.character(x),
sep = "\n")
}
# Find Alpha and Intial Level of fc
a = fc$model$par[1]
l = fc$model$par[2]
# run ses_manual()
knitr::kable(paste(
"Forecast of next observation is: ", as.character(fc$mean[1])
), "html")| x |
|---|
| Forecast of next observation is: 98816.4061115907 |
#ses_manual(y = pigs, alpha = a, level = l)
fc %>%
plot(pigs,
xlab = "Date"
, ylab = "Slaughter"
, main = "Pig Slaughter Fcts"
, col = brewer.pal(4, "Set2"))################## DID NOT WORK
# ets <-ets(pigs, model = "ANN")
# fc_ets <-forecast(ets, h = 4)
# summary(fc_ets)
#
# fc_ets %>%
# plot(vic_pigs$Count)
#
# ets_manual = function(y1, alpha1, level1){
# x1 = level1
# for (i in 1:length(y1)) {
# x1 = alpha1*y1[i] + (1 - alpha1)*x1
# }
# print(x1)
# }
#
# # Find Alpha and Intial Level of fc
# a1 = ets$model$par[1]
# l1 = ets$model$par[2]
#
# # run ses_manual()
# ets_manual(y1 = pigs, alpha1 = a1, level1 = l1)8.3
Modify your function from the previous exercise to return the sum of squared errors rather than the forecast of the next observation. Then use the optim() function to find the optimal values of α and ℓ0. Do you get the same values as the ETS() function?
Answer
No, there is not a considerable difference. Alpha was relatively close, and the first forecast was off by less than 1,000 head.
##Borrowed from Miguel Angel Mendez with a few changes.
SES <- function(pars = c(alpha, l0), y){
## understood that this is the difference
error <- 0
SSE <- 0
alpha <- pars[1]
l0 <- pars[2]
y_hat <- l0
for(index in 1:length(y)){
error <- y[index] - y_hat
SSE <- SSE + error^2
y_hat <- alpha*y[index] + (1 - alpha)*y_hat
}
return(SSE)
}
opt_SES_pigs <- optim(par = c(0.5, pigs[1]), y = pigs, fn = SES)
knitr::kable(paste(
"Optimal parameters for alpha and next focst obs using opt function: ",
"\n",
as.character(opt_SES_pigs$par[1]),
", ",
as.character(opt_SES_pigs$par[2]),
sep = ""
), "html")| x |
|---|
| Optimal parameters for alpha and next focst obs using opt function: 0.299008094014243, 76379.2653476235 |
knitr::kable(paste(
"Optimal parameters for alpha and next focst obs manual ses function: ",
"\n",
as.character(a),
", ",
as.character(l),
sep = ""
), "html")| x |
|---|
| Optimal parameters for alpha and next focst obs manual ses function: 0.297148833372095, 77260.0561458528 |
8.5
Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
Plot the Exports series and discuss the main features of the data. Looking at exports from France, there is an upward trend but no seasonality from 1960 to 2017. There were periods of rapid growth and rapid decline, the overall trend was positive growth.
Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
Compute the RMSE values for the training data.
Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
Compare the forecasts from both methods. Which do you think is best?
f.Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
Answer
The RMSA remained relatively the same when performing the second model, although with AAN it was a tad lower. The difference is remarkable since one follows a trend, at least the linear trend from previous years, while the other basically provides a flat line.
The confidence intervals look close to each other.
#a.
mex_exports <- global_economy %>%
filter(Country == "Mexico")
mex_exports %>%
autoplot(Exports) +
labs(y = "Count", title = "Mexico exports"
, x = "Date") +
theme_ipsum_es() +
geom_smooth(formula = y ~ x, method = "loess")#b.
ets_mex <- mex_exports %>%
model(ETS(Exports ~ error("A") + trend("N") + season("N")))
fc_mex <- ets_mex %>%
forecast(h = 10)
fc_mex %>%
autoplot(mex_exports) +
labs(y = "Count", title = "Mexico exports with Forecast"
, x = "Date") +
theme_ipsum_es()knitr::kable(
ets_mex %>%
accuracy(), "html")| Country | .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|---|
| Mexico | ETS(Exports ~ error("A") + trend("N") + season("N")) | Training | 0.5063087 | 2.154425 | 1.375397 | 1.830655 | 7.782879 | 0.9828152 | 0.9913663 | 0.202677 |
#c.
comp_mex <- mex_exports %>%
model(
ANN= ETS(Exports ~error("A")+trend("N")+season("N"))
,AAN= ETS(Exports ~error("A")+trend("A")+season("N"))
#,ANA= ETS(Exports ~error("A")+trend("N")+season("A"))
)
accuracy(comp_mex) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| Country | .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|---|
| Mexico | ANN | Training | 0.5063087 | 2.154425 | 1.375397 | 1.830655 | 7.782879 | 0.9828152 | 0.9913663 | 0.2026770 |
| Mexico | AAN | Training | -0.0081622 | 2.093999 | 1.407705 | -1.969378 | 8.682648 | 1.0059014 | 0.9635608 | 0.2025917 |
#d. - #e.
comp_mex %>%
forecast(h = 10) %>%
autoplot(mex_exports, level=NULL) +
labs(y = "Count", title = "Mexico exports with Forecast"
, x = "Date") +
theme_ipsum_es()#f
knitr::kable(comp_mex %>%
forecast(h=1) %>%
mutate(interval = hilo(Exports, 95)) %>%
pull(interval), "html")| x |
|---|
| [33.56900, 42.16368]95 |
| [34.12878, 42.63569]95 |
comp2 <-comp_mex %>%
dplyr::select(Country,AAN)%>%
accuracy()%>%
transmute(Country,s=RMSE)
knitr::kable(comp_mex %>%
dplyr::select(Country,ANN)%>%
forecast(h=1)%>%
left_join(comp2,by="Country")%>%
mutate(low= .mean -1.96*comp2$s,
high= .mean +1.96*comp2$s)%>%
dplyr::select(Country,Exports,low,high), "html")| Country | Exports | low | high | Year |
|---|---|---|---|---|
| Mexico | N(38, 4.8) | 33.7621 | 41.97058 | 2018 |
8.6
Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]
Answer
From the models drawn, when looking at RMSE, it appears that ets1, which is an AAN model is the best fit, along with ets_damp. This makes sense because it follows the trend. The box-cox transformation did not do much to help, although mixed with the trend (damp) model, although not great, do not appear as out of control as the ets, and log transformed model, which become exponential. Maybe we should mutate the data and start the model from 2000 instead of 1960, or anything before that.
#intuition
cna_gdp <- global_economy %>%
filter(Country=="China") %>%
mutate(gdp_bill = GDP/1000000000)
cna_gdp %>%
autoplot(gdp_bill) +
labs(y = "GDP in billions", title = "China GDP in USD$"
, x = "Date") +
theme_ipsum_es() +
geom_smooth(method = "loess", formula = y ~ x, color="red", size = 0.5) +
scale_y_continuous(labels = scales::dollar) #+ geom_smooth(formula = y ~ x, method = "loess")
################## modeling
#bring lambda from Box Cox transformation first
lamb_cna <- cna_gdp %>%
features(gdp_bill, features = guerrero) %>%
pull(lambda_guerrero)
cna_bc_plot <- cna_gdp %>%
autoplot(box_cox(gdp_bill, lamb_cna)) +
labs(x = "Date"
, y = "GDP Transformed from Billions of $"
, title = "China GDP"
, subtitle = "lambda = -0.03446284"
) +
geom_smooth(method = "loess", formula = y ~ x, color="red", size = 0.5) +
theme_ipsum_es() +
scale_y_continuous(labels = scales::number)
cna_bc_plot# holy s... variance was removed considerably, smoothed from an exponential move.
# bunch of models
cna_models <- cna_gdp %>%
model(
ets1 = ETS(gdp_bill ~ error("A") + trend("A") + season("N")) # no seasonality
, ets2 = ETS(gdp_bill ~ error("A") + trend("N") + season("N")) #remove the trend--mmm
, ets_damp = ETS(gdp_bill ~ error("A")+trend("Ad")+season("N"))
, ets_bc = ETS(box_cox(gdp_bill, lamb_cna))
, ets_bc_damp = ETS(box_cox(gdp_bill, lamb_cna) ~ trend("Ad"))
, ets_log = ETS(log(gdp_bill))
)
cna_models %>%
forecast(h = 15) %>%
autoplot(cna_gdp, level = NULL) +
labs(x = "Date"
, y = "GDP in Billions of $"
, title = "China GDP"
#, subtitle = "lambda = -0.03446284"
) +
#geom_smooth(method = "loess", formula = y ~ x, color="red", size = 0.8) +
theme_ipsum_es() +
scale_y_continuous(labels = scales::dollar)cna_models %>% accuracy() %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| Country | .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|---|
| China | ets1 | Training | 23.60170 | 189.9933 | 95.97306 | 1.3601515 | 7.722673 | 0.4426818 | 0.4530969 | 0.0088202 |
| China | ets2 | Training | 209.91501 | 415.7256 | 213.14963 | 8.2496612 | 10.885532 | 0.9831663 | 0.9914245 | 0.7894614 |
| China | ets_damp | Training | 29.40038 | 190.2006 | 94.81743 | 1.8563709 | 7.557042 | 0.4373514 | 0.4535914 | -0.0068405 |
| China | ets_bc | Training | -44.52804 | 301.4019 | 130.28696 | 0.4506423 | 7.238425 | 0.6009569 | 0.7187848 | 0.6772563 |
| China | ets_bc_damp | Training | -18.47975 | 221.3901 | 109.02780 | 1.1655196 | 7.231635 | 0.5028977 | 0.5279723 | 0.3458946 |
| China | ets_log | Training | -35.22709 | 286.6255 | 125.21288 | 0.7330760 | 7.206882 | 0.5775524 | 0.6835460 | 0.6543689 |
8.6 ALT
This is filtering since 2005, and although the observations are few, and probabaly not a great view, it is much more realistic.
#intuition
cna_gdp <- global_economy %>%
filter(Country=="China") %>%
filter(Year > 2005) %>%
mutate(gdp_bill = GDP/1000000000)
cna_gdp %>%
autoplot(gdp_bill) +
labs(y = "GDP in billions", title = "China GDP in USD$"
, x = "Date") +
theme_ipsum_es() +
geom_smooth(method = "loess", formula = y ~ x, color="red", size = 0.5) +
scale_y_continuous(labels = scales::dollar) #+ geom_smooth(formula = y ~ x, method = "loess")
################## modeling
#bring lambda from Box Cox transformation first
lamb_cna <- cna_gdp %>%
features(gdp_bill, features = guerrero) %>%
pull(lambda_guerrero)
cna_bc_plot <- cna_gdp %>%
autoplot(box_cox(gdp_bill, lamb_cna)) +
labs(x = "Date"
, y = "GDP Transformed from Billions of $"
, title = "China GDP"
, subtitle = "lambda = -0.03446284"
) +
geom_smooth(method = "loess", formula = y ~ x, color="red", size = 0.5) +
theme_ipsum_es() +
scale_y_continuous(labels = scales::number)
cna_bc_plot# holy s... variance was removed considerably, smoothed from an exponential move.
# bunch of models
models <- cna_gdp %>%
model(
ets1 = ETS(gdp_bill ~ error("A") + trend("A") + season("N")) # no seasonality
, ets2 = ETS(gdp_bill ~ error("A") + trend("N") + season("N")) #remove the trend--mmm
, ets_damp = ETS(gdp_bill ~ error("A")+trend("Ad")+season("N"))
, ets_bc = ETS(box_cox(gdp_bill, lamb_cna))
, ets_bc_damp = ETS(box_cox(gdp_bill, lamb_cna) ~ trend("Ad"))
, ets_log = ETS(log(gdp_bill))
)
models %>%
forecast(h = 15) %>%
autoplot(cna_gdp, level = NULL) +
labs(x = "Date"
, y = "GDP in Billions of $"
, title = "China GDP"
#, subtitle = "lambda = -0.03446284"
) +
#geom_smooth(method = "loess", formula = y ~ x, color="red", size = 0.8) +
theme_ipsum_es() +
scale_y_continuous(labels = scales::dollar)models %>% accuracy() %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| Country | .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|---|
| China | ets1 | Training | 3.516724 | 323.9554 | 243.2503 | 0.2992512 | 3.274707 | 0.2820868 | 0.3498425 | 0.1037441 |
| China | ets2 | Training | 790.789928 | 886.6516 | 790.7899 | 11.4011968 | 11.401197 | 0.9170446 | 0.9575033 | 0.0837944 |
| China | ets_damp | Training | 4.086000 | 314.6352 | 247.7478 | -0.1059686 | 3.412736 | 0.2873023 | 0.3397775 | 0.0022402 |
| China | ets_bc | Training | -77.673697 | 354.9053 | 232.8493 | -0.5934381 | 2.921971 | 0.2700252 | 0.3832655 | 0.2097398 |
| China | ets_bc_damp | Training | -24.224104 | 329.0923 | 233.3626 | -0.1392523 | 3.003210 | 0.2706205 | 0.3553898 | 0.1543044 |
| China | ets_log | Training | -1.501552 | 306.6668 | 222.6429 | -0.0617449 | 2.850769 | 0.2581893 | 0.3311723 | -0.0389640 |
8.7
Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
gas_aus <- aus_production %>%
filter(Quarter > yearquarter("1980 Q1")) %>%
select(Quarter, Gas)
gas_aus %>%
autoplot(Gas) +
labs(y = "Petajouls"
, x = "Date"
, title = "Gas Production in Australia") +
theme_ipsum_es() +
geom_smooth(formula = y ~ x, method = "loess", size = 0.5)# Model
models_gas <-
gas_aus %>%
model(
gas_m = ETS(Gas ~ error("M") + trend("A") + season("M"))
, gas_damp = ETS(Gas ~ error("A") + trend("Ad") + season("M"))
, gas_damp_sm = ETS(Gas ~ error("A") + trend("Ad",phi=0.90) + season("M"))
)
gas_fc <-
models_gas %>%
forecast(h = 10)
gas_fc %>%
autoplot(gas_aus, level = NULL) +
labs(y = "Petajouls", title = "Gas Production in Australia"
, x = "Date") +
facet_grid(.model~.)8.8
Recall your retail time series data (from Exercise 8 in Section 2.10).
8.8.a.
Why is multiplicative seasonality necessary for this series?
Answer
Multiplicative seasonality is needed because the seasonal variations become larger as the trend changes. This method takes into account these changes–the variance–over time.
#a. quick look
set.seed(54321)
aus_turnover <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
autoplot(aus_turnover, Turnover) +
labs(title = "Turnover by Month") +
theme_ipsum_es() +
geom_smooth(formula = y ~ x, method = "loess", color="red", size = 0.5) +
scale_y_continuous(labels = scales::number)8.8.b.
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
Answer
Answer here
#b.
models_aus_turnover <- aus_turnover %>%
model(
multiplicative = ETS(Turnover ~ error("A") + trend("A") + season("M")),
add_damped = ETS(Turnover ~ error("A") + trend("Ad") + season("M")),
add_damped_0.95 = ETS(Turnover ~ error("A") + trend("Ad",phi=0.95) + season("M"))
)
fc_aus_turnover <- models_aus_turnover %>%
forecast(h = 60)
(fc_aus_turnover %>%
autoplot(aus_turnover, level = NULL) +
labs(y = "Turnover", title = "Turnover by Month") +
facet_grid(.model~.))8.8.c.
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
Answer
The RMSE was better for the additive model than the multiplicative. Hard to say which one I prefer based on the results, as I would probably chose the one with the lower RSME, but the multiplicative is not too far off
# c
test_models_aus_turnover<-
models_aus_turnover %>%
accuracy() %>%
select(.model,RMSE)
test_models_aus_turnover %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| .model | RMSE |
|---|---|
| multiplicative | 3.023802 |
| add_damped | 2.976607 |
| add_damped_0.95 | 3.092881 |
test_models_aus_turnover## # A tibble: 3 x 2
## .model RMSE
## <chr> <dbl>
## 1 multiplicative 3.02
## 2 add_damped 2.98
## 3 add_damped_0.95 3.09
8.8.d.
Check that the residuals from the best method look like white noise.
Answer
There appears to be anomalies, with large residuals around January 2010. Autocorrelation shows that there is seasonality–every 12 months.
(aus_t_resid <- aus_turnover %>%
model(damped = ETS(Turnover ~ error("A") + trend("Ad") + season("M"))) %>%
gg_tsresiduals() +
labs(title = "Residuals, Damped Model"))8.8.e.
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
Answer
The models provide very different RMSE, and based on this metric, I would choose either the damped model or the multiplicative one. I experimented with damped at 0.95 and results were ok. Maybe finding the optimal phi would be better for a model with changing variance.
#e. quick look
## train data, split prior to outliers in 2010
aus_turnover_train <- aus_turnover %>%
filter(year(Month) < 2011)
model_train_aus_turnover <- aus_turnover_train %>% # fit the new train set
model(add_damped_train = ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
## test the model
aus_turnover_test <- aus_turnover %>%
filter(year(Month) > 2010) #split to assess prior to these outliers
model_test_aus_turnover <- aus_turnover_test %>% # fit the new train set
model(add_damped_test = ETS(Turnover ~ error("A") + trend("Ad") + season("M")))
#VS excercise 7 from chapter 5.11
model_5.11 <- aus_turnover_train %>%
model(SNAIVE(Turnover))
## Compare
aus_train_turnover <- model_train_aus_turnover %>%
accuracy() %>%
select(.model,RMSE)
aus_test_turnover <- model_test_aus_turnover %>%
accuracy() %>%
select(.model,RMSE)
mod_5.11 <- model_5.11 %>%
accuracy() %>%
select(.model,RMSE)
rbind(aus_train_turnover,aus_test_turnover, mod_5.11) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| .model | RMSE |
|---|---|
| add_damped_train | 2.456220 |
| add_damped_test | 3.606061 |
| SNAIVE(Turnover) | 5.637434 |
8.9
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
Answer
Using the same data, and looking at RMSE, it appears that the transformed data performed considerably better.
lambda_turnover <-
aus_turnover %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
aus_turnover_trans <- aus_turnover_train %>%
mutate(bc = box_cox(Turnover,lambda_turnover))
stl_ets_holt <- aus_turnover_trans %>%
model("STL(BoxCox)" = STL(bc ~ season(window="periodic"),robust=T)
,"ETS (BoxCox)" = ETS(bc)
)
#different response variable- separate
holts <- aus_turnover_trans %>%
model("Holt Winters Damp" =ETS(Turnover ~ error("M") + trend("Ad") + season("M")))
rbind(accuracy(stl_ets_holt),accuracy(holts)) %>%
kbl()%>%
kable_material(c("striped","hover", full_width = T))| State | Industry | .model | .type | ME | RMSE | MAE | MPE | MAPE | MASE | RMSSE | ACF1 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| South Australia | Clothing retailing | STL(BoxCox) | Training | -0.0017812 | 0.0426149 | 0.0313928 | -0.0821430 | 1.100718 | 0.3788116 | 0.4093824 | 0.1646267 |
| South Australia | Clothing retailing | ETS (BoxCox) | Training | -0.0003438 | 0.0507727 | 0.0380585 | -0.0467375 | 1.337469 | 0.4592464 | 0.4877505 | 0.0940660 |
| South Australia | Clothing retailing | Holt Winters Damp | Training | 0.2263411 | 2.5082303 | 1.6952146 | 0.1134505 | 5.298609 | 0.4275916 | 0.4449241 | 0.0612196 |