8.1, 8.5, 8.6, 8.7, 8.8, 8.9
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.4.3
## Registered S3 methods overwritten by 'ggtime':
## method from
## autolayer.fbl_ts fabletools
## autolayer.tbl_ts fabletools
## autoplot.dcmp_ts fabletools
## autoplot.fbl_ts fabletools
## autoplot.tbl_ts fabletools
## fortify.fbl_ts fabletools
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.3 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.2.0
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ ggtime 0.2.0
## ✔ lubridate 1.9.4 ✔ feasts 0.5.0
## ✔ ggplot2 4.0.2 ✔ fable 0.5.0
## Warning: package 'ggplot2' was built under R version 4.4.3
## Warning: package 'tsibble' was built under R version 4.4.3
## Warning: package 'ggtime' was built under R version 4.4.3
## Warning: package 'feasts' was built under R version 4.4.3
## Warning: package 'fabletools' was built under R version 4.4.3
## Warning: package 'fable' was built under R version 4.4.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
#8.1: Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
head(aus_livestock)
## # A tsibble: 6 x 4 [1M]
## # Key: Animal, State [1]
## Month Animal State Count
## <mth> <fct> <fct> <dbl>
## 1 1976 Jul Bulls, bullocks and steers Australian Capital Territory 2300
## 2 1976 Aug Bulls, bullocks and steers Australian Capital Territory 2100
## 3 1976 Sep Bulls, bullocks and steers Australian Capital Territory 2100
## 4 1976 Oct Bulls, bullocks and steers Australian Capital Territory 1900
## 5 1976 Nov Bulls, bullocks and steers Australian Capital Territory 2100
## 6 1976 Dec Bulls, bullocks and steers Australian Capital Territory 1800
unique(aus_livestock$Animal)
## [1] Bulls, bullocks and steers Calves
## [3] Cattle (excl. calves) Cows and heifers
## [5] Lambs Pigs
## [7] Sheep
## 7 Levels: Bulls, bullocks and steers Calves ... Sheep
unique(aus_livestock$State)
## [1] Australian Capital Territory New South Wales
## [3] Northern Territory Queensland
## [5] South Australia Tasmania
## [7] Victoria Western Australia
## 8 Levels: Australian Capital Territory New South Wales ... Western Australia
Fitlering by animal and state first:
vic_pigs <- aus_livestock |>
filter(Animal == "Pigs", State == "Victoria")
Exponential smoothing:
fit <- vic_pigs |>
model(SES = ETS(Count ~ error("A") + trend("N") + season("N")))
tidy(fit)
## # A tibble: 2 × 5
## Animal State .model term estimate
## <fct> <fct> <chr> <chr> <dbl>
## 1 Pigs Victoria SES alpha 0.322
## 2 Pigs Victoria SES l[0] 100647.
Forecast for the next four months
fit |> forecast(h = 4) |>
autoplot(vic_pigs)
Presenting the forecast
fc <- fit |> forecast(h=4)
fc
## # A fable: 4 x 6 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month
## <fct> <fct> <chr> <mth>
## 1 Pigs Victoria SES 2019 Jan
## 2 Pigs Victoria SES 2019 Feb
## 3 Pigs Victoria SES 2019 Mar
## 4 Pigs Victoria SES 2019 Apr
## # ℹ 2 more variables: Count <dist>, .mean <dbl>
Using the ETS() function I fit the simple exponential smoothing model to the number of pigs slaughtered in Victoria. The parameters were alpha 3.221247e-01 (smoothing parameter) and l[0] 1.006466e+05 (initial level). The forecast for the next 4 months is 95186.56 pigs each months. The forecasts are the same for all of the 4 months because this is a simple exponential smoothing model, so it doesn’t take into account any trend or seasonality.
Checking sum stats
report(fit)
## Series: Count
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.3221247
##
## Initial states:
## l[0]
## 100646.6
##
## sigma^2: 87480760
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
glance(fit)
## # A tibble: 1 × 11
## Animal State .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <fct> <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Pigs Victoria SES 8.75e7 -6866. 13737. 13737. 13750. 8.72e7 9.70e7 7190.
Converting sigma to standard deviation (square root of variance)
s <- sqrt(glance(fit)$sigma2)
Printing
s
## [1] 9353.115
Storing the forecasted value from the model, fc, as y-hat
y_hat <- 95186.56
Calculating lower and upper bounds of the interval
lower <- y_hat - 1.96 * s
upper <- y_hat + 1.96 * s
upper
## [1] 113518.7
lower
## [1] 76854.45
Checking for 95% prediction level for the earlier model, fc
hilo(fc, 95)
## # A tsibble: 4 x 7 [1M]
## # Key: Animal, State, .model [1]
## Animal State .model Month
## <fct> <fct> <chr> <mth>
## 1 Pigs Victoria SES 2019 Jan
## 2 Pigs Victoria SES 2019 Feb
## 3 Pigs Victoria SES 2019 Mar
## 4 Pigs Victoria SES 2019 Apr
## # ℹ 3 more variables: Count <dist>, .mean <dbl>, `95%` <hilo>
They are actually pretty much the same. The manual calculation rounded slightly differently, but overall both methods work.
Taking a look at the data.
head(global_economy)
## # A tsibble: 6 x 9 [1Y]
## # Key: Country [1]
## Country Code Year GDP Growth CPI Imports Exports Population
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan AFG 1960 537777811. NA NA 7.02 4.13 8996351
## 2 Afghanistan AFG 1961 548888896. NA NA 8.10 4.45 9166764
## 3 Afghanistan AFG 1962 546666678. NA NA 9.35 4.88 9345868
## 4 Afghanistan AFG 1963 751111191. NA NA 16.9 9.17 9533954
## 5 Afghanistan AFG 1964 800000044. NA NA 18.1 8.89 9731361
## 6 Afghanistan AFG 1965 1006666638. NA NA 21.4 11.3 9938414
For this question I will choose the United States to analyze.
us_exports <- global_economy |>
filter(Country == "United States")
Plotting the export series
autoplot(us_exports, Exports) +
labs(
title = "Exports for United States",
x = "Year",
y = "Exports") +
theme_light()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
Generally, appears to be an upward trend. The data does not repeaat in
any particular pattern, so I don’t think it is seasonal. It just looks
volatile as there are many noticeable fluctuations throughout the years.
It also appears to be going down at the most recent years.
Fitting the ETS model
fit_exports <- us_exports |>
model(ETS_ANN = ETS(Exports ~error("A") + trend("N") + season("N")))
report(fit_exports)
## Series: Exports
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.9998995
##
## Initial states:
## l[0]
## 4.76618
##
## sigma^2: 0.4002
##
## AIC AICc BIC
## 186.3596 186.8040 192.5409
Forecasting future values for next 5 years
fc_exports <- fit_exports |>
forecast(h=5)
Plotting the forecasts
fc_exports |>
autoplot(us_exports) +
labs(
title = "US Exports Forecasts",
x = "Year",
y = "Exports"
) +
theme_light()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
The model is predicting that the exports will be around 12 and will not
increase or decrease. The interval becomes more wide (less confidence)
as the time progresses. Overall, the model is not really capturing the
upwards trend in this case and is predicting a flat trend (or no trend)
since that was one of the parameters originally added to the model.
accuracy(fit_exports)
## # A tibble: 1 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United States ETS_ANN Training 0.125 0.627 0.465 1.37 5.10 0.990 0.992 0.239
The RMSE is 0.6270672 which means that the model’s fitted values are about 0.63 units away from the real export values. The small value suggests that the model fits relatively well.
Fitting the ETS(A,A,N) model (so now it will include trend)
fit_trend <- us_exports |>
model(ETS_ANN = ETS(Exports ~error("A") + trend("A") + season("N")))
Summary of the model
report(fit_trend)
## Series: Exports
## Model: ETS(A,A,N)
## Smoothing parameters:
## alpha = 0.9999
## beta = 0.0001088322
##
## Initial states:
## l[0] b[0]
## 4.669623 0.1158518
##
## sigma^2: 0.3992
##
## AIC AICc BIC
## 188.0974 189.2512 198.3996
Comparing training RMSE by fitting them together
fits <- us_exports |>
model(
ETS_ANN = ETS(Exports ~error("A") + trend("N") + season("N")),
ETS_AAN = ETS(Exports ~error("A") + trend("A") + season("N"))
)
accuracy(fits)
## # A tibble: 2 × 11
## Country .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United States ETS_ANN Trai… 0.125 0.627 0.465 1.37 5.10 0.990 0.992 0.239
## 2 United States ETS_AAN Trai… 0.0108 0.615 0.466 -0.0570 5.19 0.992 0.973 0.238
glance(fit_exports)
## # A tibble: 1 × 10
## Country .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United States ETS_ANN 0.400 -90.2 186. 187. 193. 0.386 0.972 NA
glance(fit_trend)
## # A tibble: 1 × 10
## Country .model sigma2 log_lik AIC AICc BIC MSE AMSE MAE
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 United States ETS_ANN 0.399 -89.0 188. 189. 198. 0.372 0.887 NA
Plotting
fc_trend <- fit_trend |>
forecast(h=5)
fc_trend |>
autoplot(us_exports) +
labs(
title = "Forecast of US Exports (with trend)",
x = "Year",
y = "Exports"
) +
theme_light()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
The fitted model with the trend (A) has a smaller RMSE value, which
means that it fits the historical data better than the model that does
not take trend into account. This is further shown in the display of the
plot where the prediction actually has an upward trend instead of just a
straight line. This model is more complex and takes trend into account,
which makes it ultimately, a better model for this data.
The forecasts for the two models are completely different. One shows an extremely flat prediction that will be the same for the upcoming years, and the other model shows an upward trend for the next few years. The main difference between the two is that one of the models takes trend into account and that is what ultimately makes it the better model. This data has a trend, a volatile upward trend, but an upward trend nonetheless. Therefore, taking trend into account is important for forecasting future data so for this case, the better model is the ETS(A,A,N), because it takes trend as a parameter.
Getting the forecasts for each model
fc_models <- fits |>
forecast(h=5)
fc_models
## # A fable: 10 x 5 [1Y]
## # Key: Country, .model [2]
## Country .model Year
## <fct> <chr> <dbl>
## 1 United States ETS_ANN 2018
## 2 United States ETS_ANN 2019
## 3 United States ETS_ANN 2020
## 4 United States ETS_ANN 2021
## 5 United States ETS_ANN 2022
## 6 United States ETS_AAN 2018
## 7 United States ETS_AAN 2019
## 8 United States ETS_AAN 2020
## 9 United States ETS_AAN 2021
## 10 United States ETS_AAN 2022
## # ℹ 2 more variables: Exports <dist>, .mean <dbl>
Storing RMSE values from the previous question
rmse_ann <- 0.6270672
rmse_aan <- 0.6149566
Calculating the internvals
yhat_ann <- 11.89068
yhat_aan <- 12.12253
lower_ann <- yhat_ann - 1.96 * rmse_ann
upper_ann <- yhat_ann + 1.96 * rmse_ann
lower_aan <- yhat_aan - 1.96 * rmse_aan
upper_aan <- yhat_aan + 1.96 * rmse_aan
lower_ann
## [1] 10.66163
upper_ann
## [1] 13.11973
lower_aan
## [1] 10.91722
upper_aan
## [1] 13.32784
Comparing with R internvals
hilo(fc_models, 95)
## # A tsibble: 10 x 6 [1Y]
## # Key: Country, .model [2]
## Country .model Year
## <fct> <chr> <dbl>
## 1 United States ETS_ANN 2018
## 2 United States ETS_ANN 2019
## 3 United States ETS_ANN 2020
## 4 United States ETS_ANN 2021
## 5 United States ETS_ANN 2022
## 6 United States ETS_AAN 2018
## 7 United States ETS_AAN 2019
## 8 United States ETS_AAN 2020
## 9 United States ETS_AAN 2021
## 10 United States ETS_AAN 2022
## # ℹ 3 more variables: Exports <dist>, .mean <dbl>, `95%` <hilo>
The manual calculations are relatively close to the actual values, they just differ due to some rounding happening in the background.
[Hint: use a relatively large value of h when forecasting, so you can clearly see the differences between the various options when plotting the forecasts.]
Filtering for China GDP for the model
china_gdp <- global_economy |>
filter(Country == "China")
Plotting
autoplot(china_gdp, GDP ) +
labs(
title = "China GDP Over Time",
x = "Years",
y = "GDP"
) +
theme_light()
Fitting a basic ETs model by letting R choose
fit_default <- china_gdp |>
model(Basic = ETS(GDP))
Fitting a damped trend model
fit_damped <- china_gdp |>
model(Damped = ETS(GDP ~trend("Ad")))
Fitting a box-cox model
fit_boxcox <- china_gdp |>
model(BoxCox = ETS(box_cox(GDP, 0)))
Forecasting for the next 25 years
fc_default <- forecast(fit_default, h = 25)
fc_damped <- forecast(fit_damped, h = 25)
fc_boxcox <- forecast(fit_boxcox, h = 25)
Plotting the model 1
fc_default |> autoplot(china_gdp) +
autolayer(fc_default, color = "blue") +
labs(
title = "China GDP Forecast",
x = "Years",
y = "GDP"
)
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
Plotting the model 2
fc_damped |> autoplot(china_gdp) +
autolayer(fc_damped, color = "green") +
labs(
title = "China GDP Forecast",
x = "Years",
y = "GDP"
)
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
Plotting the model 3
fc_boxcox |> autoplot(china_gdp) +
autolayer(fc_boxcox, color = "orange") +
labs(
title = "China GDP Forecast",
x = "Years",
y = "GDP"
)
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.
All of the forecasts show a strong upward trend as predictions. Both the
default and the damped models show a steady growth for the next 25
years, with the damped model being slightly slower. The boxcox model
shows a very steep growth as compared to the previous two models showing
the impact this transformation can have on the forecasts. I think the
damped model would be the most realistic of the two as it shows a steady
growth overtime.
Plotting gas production
autoplot(aus_production, Gas) +
labs(
title = "Gas Production (Australia)",
x = "Years",
y = "Gas") +
theme_light()
This plot shows a strong upward trend with a seasonal pattern. The
pattern starts of with small swings and then they beggin to grow with
the upward trend, therefore, multiplicative seasonality is necessary
here.
Fitting ETS
fit_gas <- aus_production |>
model(ETS(Gas))
report(fit_gas)
## Series: Gas
## Model: ETS(M,A,M)
## Smoothing parameters:
## alpha = 0.6528545
## beta = 0.1441675
## gamma = 0.09784922
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3]
## 5.945592 0.07062881 0.9309236 1.177883 1.074851 0.8163427
##
## sigma^2: 0.0032
##
## AIC AICc BIC
## 1680.929 1681.794 1711.389
Forecast model
fc_gas <- fit_gas |>
forecast(h=12)
plotting the forecast model
fc_gas |>
autoplot(aus_production) +
theme_light() +
labs(
title = "Gas Forecast",
x = "Years",
y = "Gas"
)
Damped model
fit_gas_damped <- aus_production |>
model(ETS(Gas~trend("Ad")))
report(fit_gas_damped)
## Series: Gas
## Model: ETS(M,Ad,M)
## Smoothing parameters:
## alpha = 0.6489044
## beta = 0.1551275
## gamma = 0.09369372
## phi = 0.98
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3]
## 5.858941 0.09944006 0.9281912 1.177903 1.07678 0.8171255
##
## sigma^2: 0.0033
##
## AIC AICc BIC
## 1684.028 1685.091 1717.873
Forecast damped model version
fc_gas_damped <- fit_gas_damped |>
forecast(h=12)
plotting damped model
fc_gas_damped |>
autoplot(aus_production) +
theme_light() +
labs(
title = "Gas Forecast Damped Model",
x = "Year",
y = "Gas"
)
Comparing the RMSE
fits_gas <- aus_production |>
model(
Default = ETS(Gas),
Damped = ETS(Gas~trend("Ad"))
)
accuracy(fits_gas)
## # A tibble: 2 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Default Training -0.115 4.60 3.02 0.199 4.08 0.542 0.606 -0.0131
## 2 Damped Training -0.00439 4.59 3.03 0.326 4.10 0.544 0.606 -0.0217
The damped model is better because the RMSE for it is lower, but not by a lot. There was not a colossal improvement, but an improvmenet nonetheless. Overall, the ETS model captured both, the trend and the seasonality very well in its predictions. The models matched the proportional growth of the series alongside the seasonal pattern.
unique(aus_retail$Industry)
## [1] "Cafes, restaurants and catering services"
## [2] "Cafes, restaurants and takeaway food services"
## [3] "Clothing retailing"
## [4] "Clothing, footwear and personal accessory retailing"
## [5] "Department stores"
## [6] "Electrical and electronic goods retailing"
## [7] "Food retailing"
## [8] "Footwear and other personal accessory retailing"
## [9] "Furniture, floor coverings, houseware and textile goods retailing"
## [10] "Hardware, building and garden supplies retailing"
## [11] "Household goods retailing"
## [12] "Liquor retailing"
## [13] "Newspaper and book retailing"
## [14] "Other recreational goods retailing"
## [15] "Other retailing"
## [16] "Other retailing n.e.c."
## [17] "Other specialised food retailing"
## [18] "Pharmaceutical, cosmetic and toiletry goods retailing"
## [19] "Supermarket and grocery stores"
## [20] "Takeaway food services"
unique(aus_retail$State)
## [1] "Australian Capital Territory" "New South Wales"
## [3] "Northern Territory" "Queensland"
## [5] "South Australia" "Tasmania"
## [7] "Victoria" "Western Australia"
retail <- aus_retail |>
filter(
Industry == "Cafes, restaurants and takeaway food services",
State == "New South Wales"
)
autoplot(retail, Turnover) +
theme_light() +
labs(
title = "Retail Turnover",
x = "Year",
y = "Turnover"
)
In the beginning there are smaller seasonal spikes and then they grow as the model grows (so basically multiplicative seasonality is happening here like in the previous question). The fluctuations are growing with the model.
fit_hw <- retail |>
model(
HW = ETS(Turnover~trend("A") + season("M"))
)
report(fit_hw)
## Series: Turnover
## Model: ETS(M,A,M)
## Smoothing parameters:
## alpha = 0.7729142
## beta = 0.004491741
## gamma = 0.0001004778
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5]
## 142.3738 0.4409842 0.994291 0.9301837 1.022985 1.140497 1.023531 1.01942
## s[-6] s[-7] s[-8] s[-9] s[-10] s[-11]
## 0.987387 0.9847622 0.9841327 0.944313 0.9872559 0.9812413
##
## sigma^2: 0.0015
##
## AIC AICc BIC
## 5253.739 5255.186 5323.253
Forecasting
fc_hw <- fit_hw |>
forecast(h=24)
plotting forecast
fc_hw |>
autoplot(retail) +
theme_light() +
labs(
title = "Retail Forecast with Holt-Winters Model",
x = "Years",
y = "Turnover"
)
Damped Holt-Winters Model
fit_hw_damped <- retail |>
model(
HW_Damped = ETS(Turnover~trend("Ad") + season("M"))
)
report(fit_hw_damped)
## Series: Turnover
## Model: ETS(M,Ad,M)
## Smoothing parameters:
## alpha = 0.7681617
## beta = 0.009440331
## gamma = 0.0001011594
## phi = 0.9799997
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3] s[-4] s[-5]
## 142.2247 1.499298 0.9910721 0.9273323 1.02064 1.138143 1.022578 1.022682
## s[-6] s[-7] s[-8] s[-9] s[-10] s[-11]
## 0.9925157 0.9886548 0.9869216 0.9442324 0.9864236 0.9788046
##
## sigma^2: 0.0015
##
## AIC AICc BIC
## 5259.854 5261.475 5333.457
Forecast Damped Model
fc_hw_damped <- fit_hw_damped |>
forecast(h=24)
fc_hw_damped |>
autoplot(retail) +
labs(
title = "Retail Forecast with Holt-Winters Model (Damped)",
x = "Years",
y = "Turnover"
)
Comparing Model Accuracy
fits_hw <- retail |>
model(
HW = ETS(Turnover~trend("A") + season("M")),
HW_Damped = ETS(Turnover~trend("Ad") + season("M"))
)
accuracy(fits_hw)
## # A tibble: 2 × 12
## State Industry .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 New So… Cafes, … HW Trai… 1.49 19.5 14.0 0.213 2.74 0.278 0.300 0.0527
## 2 New So… Cafes, … HW_Da… Trai… 2.22 19.4 13.9 0.298 2.76 0.278 0.299 0.0439
The damped model again has a lower RMSE, so it is a better fit in this case as well, especially considering that there is multiplicative seasonality.
Checking residuals
fit_hw_damped |>
gg_tsresiduals()
Ljung-Box
augment(fit_hw_damped) |>
features(.innov, ljung_box, lag = 24, dof = 4)
## # A tibble: 1 × 5
## State Industry .model lb_stat lb_pvalue
## <chr> <chr> <chr> <dbl> <dbl>
## 1 New South Wales Cafes, restaurants and takeaway food… HW_Da… 34.9 0.0209
The residuals are centered around 0, but I am not seeing a trend and the histogram is mostly following the normal bell curve. The ACF does show some spike though, so there might be some autocorrelation. With the Ljung-Box test, the p-value is less than 0.05, which means that the residuals are not white noise. There is some structure here that cannot be explained by the model.
Training model
train <- retail |>
filter(Month <= yearmonth("2010 Dec"))
Test set for the model
test <- retail |>
filter(Month > yearmonth("2010 Dec"))
Fitting the model
fit_train <- train |>
model(
HW_Damped = ETS(Turnover~trend("Ad") + season("M"))
)
Forecast model
h <- nrow(test)
fc_test <- fit_train |>
forecast(h = h)
Calculating RMSE
accuracy(fc_test, test)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 HW_Damped New … Cafes, … Test 174. 240. 191. 13.8 16.0 NaN NaN 0.961
Comparing to seasonal Naive Model
fit_snaive <- train |>
model(
SNAIVE = SNAIVE(Turnover)
)
fc_naive <- fit_snaive |>
forecast(h=h)
accuracy(fc_naive, test)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SNAIVE New Sou… Cafes, … Test 223. 286. 232. 18.4 19.5 NaN NaN 0.965
Holt-Winters Damp model beats the Naive model as it is able to better capture the trend and the multiplicative seasonality. The model has smaller errors overall.
Fitting the STL Decomposition on Box-Cox Serieis
fit_stl <- train |>
model(
STL_ETS = decomposition_model(
STL(box_cox(Turnover, 0)),
ETS(season_adjust ~error("A") + trend("A") + season("N"))
)
)
Forecast
h <- nrow(test)
fc_stl <- fit_stl |>
forecast(h=h)
Calculating RMSE
accuracy(fc_stl, test)
## # A tibble: 1 × 12
## .model State Industry .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 STL_ETS New So… Cafes, … Test -80.6 90.3 80.6 -7.80 7.80 NaN NaN 0.784
HW_Damped RMSE is 239.6895
This shows that the STL model is a much bigger improvement. The RMSE is so much lower, almost 3 times. This means that the model is able to stabilize variance, adjust to the trend and noise, and take into account seasonality much better. This model is evidently the best method for forecasting this data.