Data 624 HW 5
9/24/2021
Chapter 8 Exercises
8.1
Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.
- Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.
Kept getting this error using ETS(): 1 error encountered for ETS(Count ~ error(“A”) + trend(“N”) + seasonal(“N”)) [1] Exogenous regressors are not supported for this model type.
## Simple exponential smoothing
##
## Call:
## ses(y = vpigs$Count)
##
## Smoothing parameters:
## alpha = 0.322
##
## Initial states:
## l = 100646.6098
##
## sigma: 9353.115
##
## AIC AICc BIC
## 13737.10 13737.14 13750.07
Using the ses package from the fpp2 library, we calculate alpha = 0.322 and l = 100646.6098
Plotting the forecast for 4 months:
- Compute a 95% prediction interval for the first forecast using y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
95% prediction interval
Upper Interval = 113518.5
## 95%
## 113518.5
Lower Interval = 76855.01
## 95%
## 76855.01
## [1] 113502.3
## [1] 76871.23
The difference in the upper and lower bounds is minimal.
8.5
Data set global_economy contains the annual Exports from many countries. Select one country to analyse.
- Plot the Exports series and discuss the main features of the data.
There is no real trend or seasonality. The Dominican Republic is a third world country with a history of corruption and inept political leaders. The up and downs in exports correspond to the competency of the government and political party in charge of the time.
- Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.
- Compute the RMSE values for the training data.
## [1] 4.168135
- Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.
RMSE value for holt model
## [1] 4.16867
The RMSE values are very simlar, but the simple model is slightly better. I believe that’s due to the erratic nature of the data, Holt’s model is for data with some trend. The exports of the Dominican Republic doesn’t follow a clear trend.
- Compare the forecasts from both methods. Which do you think is best?
Visually the models look exactly the same.
Looking at the values below, they’re pretty much similar but not exact. In this case, I don’t think either model offers an advantage over the other.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 59 24.87050 19.33380 30.40719 16.40285 33.33814
## 60 24.88703 18.34283 31.43124 14.87854 34.89553
## 61 24.90357 17.48723 32.31990 13.56126 36.24588
## 62 24.92010 16.72368 33.11653 12.38475 37.45545
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 559 95186.77 83200.27 107173.3 76855.01 113518.5
## 560 95186.77 82594.29 107779.3 75928.23 114445.3
## 561 95186.77 82016.16 108357.4 75044.05 115329.5
## 562 95186.77 81462.36 108911.2 74197.09 116176.5
- Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.
Simple Model
## [1] 113502.3
## [1] 76871.23
Holt Model
## [1] 33.11237
## [1] 16.62862
8.6
Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.
Filter China’s GDP data
Plot China’s GDP
## Plot variable not specified, automatically selected `.vars = GDP`
We see a major upward trend in China’s GDP.
Let’s look at the various forecasts
Holt Model
Holt Model with Damped Trend
Box-Cox transformation
Evaluating the model’s RMSE values
Holt RMSE
## [1] 189990265538
Holt Damped RMSE
## [1] 190206597624
BoxCox
## [1] 288333700735
The large RMSE values probably stem from the fact I’m trying to forecast 50 years of GDP for a country like China that has seen explosive economic growth. The holt forecast seems to be the most accurate, I wonder if that would change if I use smaller time frames. Holt’s model performed the best I suspect due to the clear upward trend of the data. Is such sharp, prolonged growth sustainable for any country? Only time will tell.
8.7
Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?
## Plot variable not specified, automatically selected `.vars = Gas`
fit <- aussie_gas %>%
model(
Damp = ETS(Gas ~ error("M") + trend("Ad") +
season("M")),
multiplicative = ETS(Gas ~ error("M") + trend("A") +
season("M"))
)
fc <- fit %>% forecast(h = "3 years")
fc %>%
autoplot(aussie_gas, level = NULL) +
labs(title="Australian Gas") +
guides(colour = guide_legend(title = "Forecast"))Multiplicative method is preferrable when the seasonal variations are changing proportional to the level of the series.
damped_gas <- aussie_gas %>% model(ETS((Gas ~ error("M") + trend("Ad") + season("M"))))
report(damped_gas)## Series: Gas
## Model: ETS(M,Ad,M)
## Smoothing parameters:
## alpha = 0.6489044
## beta = 0.1551275
## gamma = 0.09369372
## phi = 0.98
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3]
## 5.858941 0.09944006 0.9281912 1.177903 1.07678 0.8171255
##
## sigma^2: 0.0033
##
## AIC AICc BIC
## 1684.028 1685.091 1717.873
multi_gas <-aussie_gas %>% model(ETS(Gas ~ error("M") + trend("A") + season("M")))
report(multi_gas)## Series: Gas
## Model: ETS(M,A,M)
## Smoothing parameters:
## alpha = 0.6528545
## beta = 0.1441675
## gamma = 0.09784922
##
## Initial states:
## l[0] b[0] s[0] s[-1] s[-2] s[-3]
## 5.945592 0.07062881 0.9309236 1.177883 1.074851 0.8163427
##
## sigma^2: 0.0032
##
## AIC AICc BIC
## 1680.929 1681.794 1711.389
The Holt-Winters model returns a better AIC score of 1680.929, making it the better model but not by a significant margin. The damped model returned an AIC score of 1684.028.
8.8
Recall your retail time series data (from Exercise 8 in Section 2.10).
- Why is multiplicative seasonality necessary for this series?
## Plot variable not specified, automatically selected `.vars = Turnover`
The multiplicative method is preferable because the seasonal variation is increasing as the levels are increasing.
- Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
Holt-Winters’ Method & Damped Trend Method Graphed
holt_retail <- hw(m2, seasonal = "multiplicative" )
holt_retail_damped <- hw(m2, seasonal = "multiplicative", damped = TRUE)##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = m2, seasonal = "multiplicative")
##
## Smoothing parameters:
## alpha = 0.4041
## beta = 0.0016
## gamma = 0.1343
##
## Initial states:
## l = 42.4412
## b = 0.2966
## s = 0.9283 0.9113 0.9608 1.4731 1.0479 1.1274
## 0.9066 0.9237 0.9198 0.9167 1.0395 0.8448
##
## sigma: 0.0485
##
## AIC AICc BIC
## 4430.931 4432.378 4500.445
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.556308 8.563755 6.111975 0.2420252 3.711658 0.4842085 0.1429854
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2019 381.1633 357.4486 404.8781 344.8947 417.4319
## Feb 2019 353.5683 329.8322 377.3044 317.2671 389.8695
## Mar 2019 379.2880 352.0813 406.4947 337.6789 420.8970
## Apr 2019 361.8308 334.3068 389.3547 319.7365 403.9250
## May 2019 380.7616 350.2291 411.2941 334.0662 427.4570
## Jun 2019 367.5053 336.5904 398.4203 320.2249 414.7857
## Jul 2019 377.1744 344.0233 410.3255 326.4741 427.8746
## Aug 2019 387.6184 352.1419 423.0949 333.3618 441.8750
## Sep 2019 381.6034 345.3395 417.8673 326.1426 437.0642
## Oct 2019 403.2007 363.5158 442.8856 342.5079 463.8935
## Nov 2019 421.5746 378.6926 464.4565 355.9922 487.1569
## Dec 2019 544.9697 487.7906 602.1488 457.5219 632.4176
## Jan 2020 390.7328 347.6189 433.8467 324.7958 456.6698
## Feb 2020 362.4267 321.3548 403.4985 299.6127 425.2406
## Mar 2020 388.7711 343.5793 433.9629 319.6562 457.8860
## Apr 2020 370.8588 326.6894 415.0282 303.3075 438.4101
## May 2020 390.2425 342.6713 437.8138 317.4886 462.9964
## Jun 2020 376.6374 329.6893 423.5855 304.8365 448.4383
## Jul 2020 386.5276 337.3031 435.7521 311.2452 461.8099
## Aug 2020 397.2110 345.5718 448.8501 318.2356 476.1863
## Sep 2020 391.0279 339.1717 442.8841 311.7206 470.3352
## Oct 2020 413.1384 357.2882 468.9885 327.7229 498.5538
## Nov 2020 431.9440 372.4581 491.4299 340.9681 522.9199
## Dec 2020 558.3472 480.0602 636.6342 438.6176 678.0768
##
## Forecast method: Damped Holt-Winters' multiplicative method
##
## Model Information:
## Damped Holt-Winters' multiplicative method
##
## Call:
## hw(y = m2, seasonal = "multiplicative", damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.5874
## beta = 0.0094
## gamma = 1e-04
## phi = 0.98
##
## Initial states:
## l = 42.3409
## b = 0.3287
## s = 0.9589 0.898 0.9741 1.387 1.0606 1.0185
## 0.9689 0.9769 0.9527 0.9262 0.9617 0.9164
##
## sigma: 0.0448
##
## AIC AICc BIC
## 4360.565 4362.186 4434.168
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.7814484 7.918222 5.671396 0.3377205 3.403172 0.4493045
## ACF1
## Training set -0.04057024
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2019 375.5931 354.0237 397.1624 342.6055 408.5806
## Feb 2019 346.5790 323.3995 369.7586 311.1290 382.0291
## Mar 2019 370.3907 342.4669 398.3145 327.6850 413.0964
## Apr 2019 354.2925 324.7998 383.7851 309.1873 399.3976
## May 2019 372.1102 338.3926 405.8279 320.5436 423.6769
## Jun 2019 358.6404 323.6359 393.6450 305.1056 412.1753
## Jul 2019 369.2106 330.7045 407.7167 310.3206 428.1005
## Aug 2019 378.8800 336.9249 420.8350 314.7152 443.0447
## Sep 2019 376.0461 332.0608 420.0313 308.7764 443.3157
## Oct 2019 395.5815 346.9155 444.2474 321.1534 470.0096
## Nov 2019 412.2287 359.0816 465.3759 330.9472 493.5103
## Dec 2019 539.5623 466.8860 612.2387 428.4135 650.7112
## Jan 2020 379.1443 325.9326 432.3560 297.7641 460.5245
## Feb 2020 349.7876 298.7566 400.8185 271.7425 427.8327
## Mar 2020 373.7482 317.1848 430.3116 287.2420 460.2545
## Apr 2020 357.4372 301.4250 413.4493 271.7740 443.1004
## May 2020 375.3444 314.5421 436.1466 282.3553 468.3334
## Jun 2020 361.6927 301.2159 422.1695 269.2014 454.1840
## Jul 2020 372.2875 308.1225 436.4525 274.1556 470.4194
## Aug 2020 381.9719 314.1944 449.7494 278.3152 485.6286
## Sep 2020 379.0512 309.8850 448.2175 273.2706 484.8319
## Oct 2020 398.6772 323.9450 473.4094 284.3842 512.9703
## Nov 2020 415.3880 335.4759 495.3000 293.1730 537.6029
## Dec 2020 543.6118 436.3780 650.8457 379.6118 707.6119
- Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
I tried multiple functions to generate data with residuals and RMSE. Using the ETS() function I could generate RMSE but the fit did not have residuals. I used the hw() function which satisfied both criteria. Using both ETS and hw, the preferred model was still the damped movel.
holt_retail <- hw(m2, h=1, seasonal = "multiplicative" )
holt_retail_damped <- hw(m2, h=1, seasonal = "multiplicative", damped = TRUE)RMSE Value for Holt Winter
## [1] 8.563755
RMSE Value for Holt Winter Damped
## [1] 7.918222
The damped method returns a lowoer RMSE, making it the model I would select from the two.
- Check that the residuals from the best method look like white noise.
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 85.058, df = 7, p-value = 1.221e-15
##
## Model df: 17. Total lags used: 24
The residuals histogram show a relatively normally distributed plot. ACF shows no clear autocorrelation but there are spikes, and given our previous knowledge of the data set, we know there is seasonality to the data. I would say the residuals isn’t white noise.
- Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?
train_retail <- window(m2, end = 2010)
test_retail <- window(m2, start = 2011)
train_holtw <- hw(train_retail, h= 108, seasonal = "multiplicative")
train_damped <- hw(train_retail, h=108, seasonal = "multiplicative" ,damped = TRUE)
seasonal_train <- snaive(train_retail)Plot all forecasts
masterplot <- autoplot(m2) +
autolayer(train_holtw, series = "Holt-Winters M") +
autolayer(train_damped, series = "Damped HW") +
autolayer(seasonal_train, series = "Seasonal Naive") +
autolayer(test_retail, series = "Test Data")## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
Check Values
Holt-Winter vs Test
## ME RMSE MAE MPE MAPE MASE
## Training set 0.368935 6.24345 4.660297 0.1346397 3.777011 0.4468246
## Test set 41.504127 44.75497 41.859534 11.6880429 11.824242 4.0134499
## ACF1 Theil's U
## Training set 0.04216116 NA
## Test set 0.52698521 1.005545
HW Damped vs Test
## ME RMSE MAE MPE MAPE MASE
## Training set 0.6928938 5.943741 4.397592 0.4200239 3.54584 0.4216367
## Test set 56.1214996 60.861036 56.439430 15.6403940 15.76460 5.4113555
## ACF1 Theil's U
## Training set -0.04939215 NA
## Test set 0.76478249 1.33378
Seasonal Naive vs Test
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 7.741615 13.80603 10.42981 5.978256 8.083253 1.000000 0.7329914
## Test set 46.015385 50.59015 47.92308 15.150892 15.866724 4.594816 0.2207417
## Theil's U
## Training set NA
## Test set 1.398029
Based on the RMSE difference, the Seasonal Naive Model outperforms the Holt_winters. We are also predicting 8 years of future data, which could explain why the Seasonal Naive is a better model for longer term forecasting than Holt-Winters.
8.9
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
stl_ets <- train_retail %>%
stlf(
h= 108,
method = 'ets',
etsmodel = 'MAM',
lambda = BoxCox.lambda(train_retail)
) ## Warning in forecast.stl(object, method = method, etsmodel = etsmodel,
## forecastfunction = forecastfunction, : The ETS model must be non-seasonal. I'm
## ignoring the seasonal component specified.
## ME RMSE MAE MPE MAPE MASE
## Training set 0.01359351 5.207214 3.881942 -0.1148306 3.065566 0.3721967
## Test set -16.56812436 34.391625 26.871735 -4.4092926 7.540894 2.5764348
## ACF1 Theil's U
## Training set -0.01557101 NA
## Test set 0.82503585 0.7379409
Based on the RMSE difference, this model is better than the seasonal naive. Honestly, I don’t know if I implemented the solution correctly.
Sources: