Data 624 HW 5

9/24/2021

Gabe Abreu

Chapter 8 Exercises

8.1

Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ0, and generate forecasts for the next four months.

Kept getting this error using ETS(): 1 error encountered for ETS(Count ~ error(“A”) + trend(“N”) + seasonal(“N”)) [1] Exogenous regressors are not supported for this model type.

fit <- ses(vpigs$Count)

fit$model

## Simple exponential smoothing 
## 
## Call:
##  ses(y = vpigs$Count) 
## 
##   Smoothing parameters:
##     alpha = 0.322 
## 
##   Initial states:
##     l = 100646.6098 
## 
##   sigma:  9353.115
## 
##      AIC     AICc      BIC 
## 13737.10 13737.14 13750.07

Using the ses package from the fpp2 library, we calculate alpha = 0.322 and l = 100646.6098

Plotting the forecast for 4 months:

ses(vpigs$Count, h = 4) %>% autoplot()

Compute a 95% prediction interval for the first forecast using y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

model <- ses(vpigs$Count, h = 4)

95% prediction interval

Upper Interval = 113518.5

model$upper[1, '95%']

##      95% 
## 113518.5

Lower Interval = 76855.01

model$lower[1, '95%']

##      95% 
## 76855.01

s <- sd(model$residuals)
model$mean[1] + 1.96 * s

## [1] 113502.3

model$mean[1] - 1.96 * s

## [1] 76871.23

The difference in the upper and lower bounds is minimal.

8.5

Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

Plot the Exports series and discuss the main features of the data.

domR <- global_economy %>% filter(Country == "Dominican Republic")

domR %>% autoplot(Exports)

There is no real trend or seasonality. The Dominican Republic is a third world country with a history of corruption and inept political leaders. The up and downs in exports correspond to the competency of the government and political party in charge of the time.

Use an ETS(A,N,N) model to forecast the series, and plot the forecasts.

model1 <- domR$Exports %>% ets(model="ANN") %>% forecast(h = 4)

autoplot(model1)

Compute the RMSE values for the training data.

accuracy(model1)[2]

## [1] 4.168135

Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

holt_model <- domR$Exports %>% ets(model="AAN") %>% forecast(h = 4)

RMSE value for holt model

accuracy(holt_model)[2]

## [1] 4.16867

The RMSE values are very simlar, but the simple model is slightly better. I believe that’s due to the erratic nature of the data, Holt’s model is for data with some trend. The exports of the Dominican Republic doesn’t follow a clear trend.

Compare the forecasts from both methods. Which do you think is best?

autoplot(model)

autoplot(holt_model)

Visually the models look exactly the same.

Looking at the values below, they’re pretty much similar but not exact. In this case, I don’t think either model offers an advantage over the other.

holt_model

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 59       24.87050 19.33380 30.40719 16.40285 33.33814
## 60       24.88703 18.34283 31.43124 14.87854 34.89553
## 61       24.90357 17.48723 32.31990 13.56126 36.24588
## 62       24.92010 16.72368 33.11653 12.38475 37.45545

model

##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 559       95186.77 83200.27 107173.3 76855.01 113518.5
## 560       95186.77 82594.29 107779.3 75928.23 114445.3
## 561       95186.77 82016.16 108357.4 75044.05 115329.5
## 562       95186.77 81462.36 108911.2 74197.09 116176.5

Calculate a 95% prediction interval for the first forecast for each model, using the RMSE values and assuming normal errors. Compare your intervals with those produced using R.

Simple Model

s_model <- sd(model$residuals)
model$mean[1] + 1.96 * s_model

## [1] 113502.3

model$mean[1] - 1.96 * s_model

## [1] 76871.23

Holt Model

s_holt <- sd(holt_model$residuals)
holt_model$mean[1] + 1.96 * s_holt

## [1] 33.11237

holt_model$mean[1] - 1.96 * s_holt

## [1] 16.62862

8.6

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

Filter China’s GDP data

global_economy %>%
  filter(Country == "China") %>%
  select(c("Country", "Year", "GDP")) -> china_d

Plot China’s GDP

autoplot(china_d)

## Plot variable not specified, automatically selected `.vars = GDP`

We see a major upward trend in China’s GDP.

Let’s look at the various forecasts

Holt Model

china_holt <- holt(china_d$GDP, h=50)
autoplot(china_holt)

Holt Model with Damped Trend

china_damped <- holt(china_d$GDP, h=50, damped = T)
autoplot(china_damped)

Box-Cox transformation

china_boxcox <- holt(china_d$GDP, h=50, lambda = "auto")
autoplot(china_boxcox)

Evaluating the model’s RMSE values

Holt RMSE

accuracy(china_holt)[2]

## [1] 189990265538

Holt Damped RMSE

accuracy(china_damped)[2]

## [1] 190206597624

BoxCox

accuracy(china_boxcox)[2]

## [1] 288333700735

The large RMSE values probably stem from the fact I’m trying to forecast 50 years of GDP for a country like China that has seen explosive economic growth. The holt forecast seems to be the most accurate, I wonder if that would change if I use smaller time frames. Holt’s model performed the best I suspect due to the clear upward trend of the data. Is such sharp, prolonged growth sustainable for any country? Only time will tell.

8.7

Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

aus_production%>%
  select(c("Quarter", "Gas")) -> aussie_gas

autoplot(aussie_gas)

## Plot variable not specified, automatically selected `.vars = Gas`

fit <- aussie_gas %>%
  model(
    Damp = ETS(Gas ~ error("M") + trend("Ad") +
                                                season("M")),
    multiplicative = ETS(Gas ~ error("M") + trend("A") +
                                                season("M"))
  )
fc <- fit %>% forecast(h = "3 years")
fc %>%
  autoplot(aussie_gas, level = NULL) +
  labs(title="Australian Gas") +
  guides(colour = guide_legend(title = "Forecast"))

Multiplicative method is preferrable when the seasonal variations are changing proportional to the level of the series.

damped_gas <- aussie_gas %>% model(ETS((Gas ~ error("M") + trend("Ad") + season("M"))))

report(damped_gas)

## Series: Gas 
## Model: ETS(M,Ad,M) 
##   Smoothing parameters:
##     alpha = 0.6489044 
##     beta  = 0.1551275 
##     gamma = 0.09369372 
##     phi   = 0.98 
## 
##   Initial states:
##      l[0]       b[0]      s[0]    s[-1]   s[-2]     s[-3]
##  5.858941 0.09944006 0.9281912 1.177903 1.07678 0.8171255
## 
##   sigma^2:  0.0033
## 
##      AIC     AICc      BIC 
## 1684.028 1685.091 1717.873

multi_gas <-aussie_gas %>% model(ETS(Gas ~ error("M") + trend("A") + season("M")))

report(multi_gas)

## Series: Gas 
## Model: ETS(M,A,M) 
##   Smoothing parameters:
##     alpha = 0.6528545 
##     beta  = 0.1441675 
##     gamma = 0.09784922 
## 
##   Initial states:
##      l[0]       b[0]      s[0]    s[-1]    s[-2]     s[-3]
##  5.945592 0.07062881 0.9309236 1.177883 1.074851 0.8163427
## 
##   sigma^2:  0.0032
## 
##      AIC     AICc      BIC 
## 1680.929 1681.794 1711.389

The Holt-Winters model returns a better AIC score of 1680.929, making it the better model but not by a significant margin. The damped model returned an AIC score of 1684.028.

8.8

Recall your retail time series data (from Exercise 8 in Section 2.10).

set.seed(128)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

m2 <- ts(myseries$Turnover, frequency = 12, start = c(1982, 4))

Why is multiplicative seasonality necessary for this series?

autoplot(myseries)

## Plot variable not specified, automatically selected `.vars = Turnover`

The multiplicative method is preferable because the seasonal variation is increasing as the levels are increasing.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

Holt-Winters’ Method & Damped Trend Method Graphed

holt_retail <- hw(m2, seasonal = "multiplicative" )
holt_retail_damped <- hw(m2, seasonal = "multiplicative", damped = TRUE)

summary(holt_retail)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = m2, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.4041 
##     beta  = 0.0016 
##     gamma = 0.1343 
## 
##   Initial states:
##     l = 42.4412 
##     b = 0.2966 
##     s = 0.9283 0.9113 0.9608 1.4731 1.0479 1.1274
##            0.9066 0.9237 0.9198 0.9167 1.0395 0.8448
## 
##   sigma:  0.0485
## 
##      AIC     AICc      BIC 
## 4430.931 4432.378 4500.445 
## 
## Error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE      ACF1
## Training set 0.556308 8.563755 6.111975 0.2420252 3.711658 0.4842085 0.1429854
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2019       381.1633 357.4486 404.8781 344.8947 417.4319
## Feb 2019       353.5683 329.8322 377.3044 317.2671 389.8695
## Mar 2019       379.2880 352.0813 406.4947 337.6789 420.8970
## Apr 2019       361.8308 334.3068 389.3547 319.7365 403.9250
## May 2019       380.7616 350.2291 411.2941 334.0662 427.4570
## Jun 2019       367.5053 336.5904 398.4203 320.2249 414.7857
## Jul 2019       377.1744 344.0233 410.3255 326.4741 427.8746
## Aug 2019       387.6184 352.1419 423.0949 333.3618 441.8750
## Sep 2019       381.6034 345.3395 417.8673 326.1426 437.0642
## Oct 2019       403.2007 363.5158 442.8856 342.5079 463.8935
## Nov 2019       421.5746 378.6926 464.4565 355.9922 487.1569
## Dec 2019       544.9697 487.7906 602.1488 457.5219 632.4176
## Jan 2020       390.7328 347.6189 433.8467 324.7958 456.6698
## Feb 2020       362.4267 321.3548 403.4985 299.6127 425.2406
## Mar 2020       388.7711 343.5793 433.9629 319.6562 457.8860
## Apr 2020       370.8588 326.6894 415.0282 303.3075 438.4101
## May 2020       390.2425 342.6713 437.8138 317.4886 462.9964
## Jun 2020       376.6374 329.6893 423.5855 304.8365 448.4383
## Jul 2020       386.5276 337.3031 435.7521 311.2452 461.8099
## Aug 2020       397.2110 345.5718 448.8501 318.2356 476.1863
## Sep 2020       391.0279 339.1717 442.8841 311.7206 470.3352
## Oct 2020       413.1384 357.2882 468.9885 327.7229 498.5538
## Nov 2020       431.9440 372.4581 491.4299 340.9681 522.9199
## Dec 2020       558.3472 480.0602 636.6342 438.6176 678.0768

summary(holt_retail_damped)

## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = m2, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.5874 
##     beta  = 0.0094 
##     gamma = 1e-04 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 42.3409 
##     b = 0.3287 
##     s = 0.9589 0.898 0.9741 1.387 1.0606 1.0185
##            0.9689 0.9769 0.9527 0.9262 0.9617 0.9164
## 
##   sigma:  0.0448
## 
##      AIC     AICc      BIC 
## 4360.565 4362.186 4434.168 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.7814484 7.918222 5.671396 0.3377205 3.403172 0.4493045
##                     ACF1
## Training set -0.04057024
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2019       375.5931 354.0237 397.1624 342.6055 408.5806
## Feb 2019       346.5790 323.3995 369.7586 311.1290 382.0291
## Mar 2019       370.3907 342.4669 398.3145 327.6850 413.0964
## Apr 2019       354.2925 324.7998 383.7851 309.1873 399.3976
## May 2019       372.1102 338.3926 405.8279 320.5436 423.6769
## Jun 2019       358.6404 323.6359 393.6450 305.1056 412.1753
## Jul 2019       369.2106 330.7045 407.7167 310.3206 428.1005
## Aug 2019       378.8800 336.9249 420.8350 314.7152 443.0447
## Sep 2019       376.0461 332.0608 420.0313 308.7764 443.3157
## Oct 2019       395.5815 346.9155 444.2474 321.1534 470.0096
## Nov 2019       412.2287 359.0816 465.3759 330.9472 493.5103
## Dec 2019       539.5623 466.8860 612.2387 428.4135 650.7112
## Jan 2020       379.1443 325.9326 432.3560 297.7641 460.5245
## Feb 2020       349.7876 298.7566 400.8185 271.7425 427.8327
## Mar 2020       373.7482 317.1848 430.3116 287.2420 460.2545
## Apr 2020       357.4372 301.4250 413.4493 271.7740 443.1004
## May 2020       375.3444 314.5421 436.1466 282.3553 468.3334
## Jun 2020       361.6927 301.2159 422.1695 269.2014 454.1840
## Jul 2020       372.2875 308.1225 436.4525 274.1556 470.4194
## Aug 2020       381.9719 314.1944 449.7494 278.3152 485.6286
## Sep 2020       379.0512 309.8850 448.2175 273.2706 484.8319
## Oct 2020       398.6772 323.9450 473.4094 284.3842 512.9703
## Nov 2020       415.3880 335.4759 495.3000 293.1730 537.6029
## Dec 2020       543.6118 436.3780 650.8457 379.6118 707.6119

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

I tried multiple functions to generate data with residuals and RMSE. Using the ETS() function I could generate RMSE but the fit did not have residuals. I used the hw() function which satisfied both criteria. Using both ETS and hw, the preferred model was still the damped movel.

holt_retail <- hw(m2, h=1, seasonal = "multiplicative" )
holt_retail_damped <- hw(m2, h=1, seasonal = "multiplicative", damped = TRUE)

RMSE Value for Holt Winter

accuracy(holt_retail)[2]

## [1] 8.563755

RMSE Value for Holt Winter Damped

accuracy(holt_retail_damped)[2]

## [1] 7.918222

The damped method returns a lowoer RMSE, making it the model I would select from the two.

Check that the residuals from the best method look like white noise.

checkresiduals(holt_retail_damped)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 85.058, df = 7, p-value = 1.221e-15
## 
## Model df: 17.   Total lags used: 24

The residuals histogram show a relatively normally distributed plot. ACF shows no clear autocorrelation but there are spikes, and given our previous knowledge of the data set, we know there is seasonality to the data. I would say the residuals isn’t white noise.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 7 in Section 5.11?

train_retail <- window(m2, end = 2010)
test_retail <- window(m2, start = 2011)

train_holtw <- hw(train_retail, h= 108, seasonal = "multiplicative")

train_damped <- hw(train_retail, h=108, seasonal = "multiplicative" ,damped = TRUE)

seasonal_train <- snaive(train_retail)

Plot all forecasts

masterplot <- autoplot(m2) +
  autolayer(train_holtw, series = "Holt-Winters M") +
  autolayer(train_damped, series = "Damped HW") +
  autolayer(seasonal_train, series = "Seasonal Naive") +
  autolayer(test_retail, series = "Test Data")

masterplot + xlim(c(2011, 2019))

## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

Check Values

Holt-Winter vs Test

accuracy(train_holtw, test_retail)

##                     ME     RMSE       MAE        MPE      MAPE      MASE
## Training set  0.368935  6.24345  4.660297  0.1346397  3.777011 0.4468246
## Test set     41.504127 44.75497 41.859534 11.6880429 11.824242 4.0134499
##                    ACF1 Theil's U
## Training set 0.04216116        NA
## Test set     0.52698521  1.005545

HW Damped vs Test

accuracy(train_damped, test_retail)

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set  0.6928938  5.943741  4.397592  0.4200239  3.54584 0.4216367
## Test set     56.1214996 60.861036 56.439430 15.6403940 15.76460 5.4113555
##                     ACF1 Theil's U
## Training set -0.04939215        NA
## Test set      0.76478249   1.33378

Seasonal Naive vs Test

accuracy(seasonal_train, test_retail)

##                     ME     RMSE      MAE       MPE      MAPE     MASE      ACF1
## Training set  7.741615 13.80603 10.42981  5.978256  8.083253 1.000000 0.7329914
## Test set     46.015385 50.59015 47.92308 15.150892 15.866724 4.594816 0.2207417
##              Theil's U
## Training set        NA
## Test set      1.398029

Based on the RMSE difference, the Seasonal Naive Model outperforms the Holt_winters. We are also predicting 8 years of future data, which could explain why the Seasonal Naive is a better model for longer term forecasting than Holt-Winters.

8.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

stl_ets <- train_retail %>%
  stlf(
    h= 108,
    method = 'ets',
    etsmodel = 'MAM',
    lambda = BoxCox.lambda(train_retail)
  )

## Warning in forecast.stl(object, method = method, etsmodel = etsmodel,
## forecastfunction = forecastfunction, : The ETS model must be non-seasonal. I'm
## ignoring the seasonal component specified.

autoplot(stl_ets)

accuracy(stl_ets, test_retail)

##                        ME      RMSE       MAE        MPE     MAPE      MASE
## Training set   0.01359351  5.207214  3.881942 -0.1148306 3.065566 0.3721967
## Test set     -16.56812436 34.391625 26.871735 -4.4092926 7.540894 2.5764348
##                     ACF1 Theil's U
## Training set -0.01557101        NA
## Test set      0.82503585 0.7379409

Based on the RMSE difference, this model is better than the seasonal naive. Honestly, I don’t know if I implemented the solution correctly.

Sources:

https://www.r-bloggers.com/2015/05/new-in-forecast-6-0/