DATA624: Predictive Analytics: HW#5

Exercise 7.1

Consider the pigs series — the number of pigs slaughtered in Victoria each month.

a)

Use the ses() function in R to find the optimal values of α and ℓ0 , and generate forecasts for the next four months.

data(pigs)

## Warning in data(pigs): data set 'pigs' not found

head(pigs)

##         Jan    Feb    Mar    Apr    May    Jun
## 1980  76378  71947  33873  96428 105084  95741

ses_p<-ses(pigs,4)
summary(ses_p)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
##                    ACF1
## Training set 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

Answer:

Optimal Value of Alpha: 0.2971 Optimal Value of L0: 77260.06

b)

Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

z<-qnorm(.025,lower.tail=FALSE)
z

## [1] 1.959964

s <- sd(ses_p$residuals)
ses_p$mean[1] + z*s

## [1] 118952.5

ses_p$mean[1] - z*s

## [1] 78680.34

Exercise 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

a)

Plot the series and discuss the main features of the data.

summary(books)

##    Paperback       Hardcover    
##  Min.   :111.0   Min.   :128.0  
##  1st Qu.:167.2   1st Qu.:170.5  
##  Median :189.0   Median :200.5  
##  Mean   :186.4   Mean   :198.8  
##  3rd Qu.:207.2   3rd Qu.:222.0  
##  Max.   :247.0   Max.   :283.0

autoplot(books) +
  ggtitle("Daily Book Sales")

Based on the plot, above we can conclude there is an upward trend in book sales, but it is hard to tell whether there is any cyclicity or seasonality in the data - doesn’t look like there is. My assumption is that book sales might vary by day of the week. It seems that Hardcover book sales have a more sharp upward trend.

b)

Use the ses() function to forecast each series, and plot the forecasts.

ses_b_p<-ses(books[,1],4)
summary(ses_b_p)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, 1], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8271 
## 
##   sigma:  34.8183
## 
##      AIC     AICc      BIC 
## 318.9747 319.8978 323.1783 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005

autoplot(ses_b_p) +
  ggtitle("Daily Book Sales Forecast Paperback")

ses_b_h<-ses(books[,2],4)
summary(ses_b_h)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = books[, 2], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2861 
## 
##   sigma:  33.0517
## 
##      AIC     AICc      BIC 
## 315.8506 316.7737 320.0542 
## 
## Error measures:
##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       239.5601 197.2026 281.9176 174.7799 304.3403
## 32       239.5601 194.9788 284.1414 171.3788 307.7414
## 33       239.5601 192.8607 286.2595 168.1396 310.9806
## 34       239.5601 190.8347 288.2855 165.0410 314.0792

autoplot(ses_b_h) +
  ggtitle("Daily Book Sales Forecast Hardcover")

c)

Compute the RMSE values for the training data in each case.

accuracy(ses_b_p)

##                    ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
##                    ACF1
## Training set -0.2117522

accuracy(ses_b_h)

##                    ME     RMSE      MAE      MPE     MAPE      MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
##                    ACF1
## Training set -0.1417763

Hardcover books have lower RMSE than Paperback books Forecast. Hardcover RMSE: 31.6; Paperback RMSE: 33.6.

Exercise 7.6

We will continue with the daily sales of paperback and hardcover books in data set books.

a)

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

#Paperback
holt_b_p <- holt(books[, 1], 4)
summary(holt_b_p)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, 1], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

#Hardcover
holt_b_h <- holt(books[, 2], 4)
summary(holt_b_h)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, 2], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

b)

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

accuracy(holt_b_p)[2]

## [1] 31.13692

accuracy(holt_b_h)[2]

## [1] 27.19358

RMSE is better for Holt Method: Hardcover RMSE: 27.2; Paperback RMSE: 31.1. For comparison SES: Hardcover RMSE: 31.6; Paperback RMSE: 33.6.

Holt method has a Trend component while SES has none, that’s why we are seeing better results with this data that’s trending upward.

c)

Compare the forecasts for the two series using both methods. Which do you think is best?

autoplot(holt_b_p) +
  autolayer(ses_b_p, series='ses', PI = FALSE) +
  ggtitle("Daily Book Sales Holt & SES Forecast Paperback")

autoplot(holt_b_h) +
  autolayer(ses_b_h, series='ses', PI = FALSE) +
  ggtitle("Daily Book Sales Holt & SES Forecast Hardcover")

Holt’s method is better because it is matching the upward trend we are seeing in the data for both paperback and hardcover books.

d)

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

#Holt Paperback Calculation
holt_b_p$mean[1] + z*accuracy(holt_b_p)[2]

## [1] 270.494

holt_b_p$mean[1] - z*accuracy(holt_b_p)[2]

## [1] 148.4395

holt_b_p$upper[1,2]

##      95% 
## 275.0205

holt_b_p$lower[1,2]

##     95% 
## 143.913

#Holt Hardcover Calculation
holt_b_h$mean[1] + z*accuracy(holt_b_h)[2]

## [1] 303.4723

holt_b_h$mean[1] - z*accuracy(holt_b_h)[2]

## [1] 196.8754

holt_b_h$upper[1,2]

##      95% 
## 307.4256

holt_b_h$lower[1,2]

##      95% 
## 192.9222

#SES Paperback Calculation
ses_b_p$mean[1] + z*accuracy(ses_b_p)[2]

## [1] 273.0383

ses_b_p$mean[1] - z*accuracy(ses_b_p)[2]

## [1] 141.181

ses_b_p$upper[1,2]

##      95% 
## 275.3523

ses_b_p$lower[1,2]

##     95% 
## 138.867

#SES Hardcover Calculation
ses_b_h$mean[1] + z*accuracy(ses_b_h)[2]

## [1] 302.1437

ses_b_h$mean[1] - z*accuracy(ses_b_h)[2]

## [1] 176.9765

ses_b_h$upper[1,2]

##      95% 
## 304.3403

ses_b_h$lower[1,2]

##      95% 
## 174.7799

The calculated values are very close to what ses and holt function produces.

Exercise 7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

holt_eggs <- holt(eggs, h=100)
holt_eggs_d <- holt(eggs, h=100, damped = TRUE)
holt_eggs_e <- holt(eggs, h=100, exponential = TRUE)
holt_eggs_bc <- holt(eggs, h=100, lambda = "auto")

autoplot(eggs) +
  autolayer(holt_eggs, series='Holt', PI = FALSE) +
  autolayer(holt_eggs_d, series='Holt Damped', PI = FALSE) +
  autolayer(holt_eggs_e, series='Holt Exponential', PI = FALSE) +
  autolayer(holt_eggs_bc, series='Holt BoxCox Transformation', PI = FALSE)

RMSE Comparison:

accuracy(holt_eggs)[2]

## [1] 26.58219

accuracy(holt_eggs_d)[2]

## [1] 26.54019

accuracy(holt_eggs_e)[2]

## [1] 26.49795

accuracy(holt_eggs_bc)[2]

## [1] 26.39376

Box-cox transformation is giving the best RMSE, but the difference is not very dramatic.

Exercise 7.8

Recall your retail time series data (from Exercise 3 in Section 2.10).

#setwd("/Users/elinaazrilyan/Documents/Data624/")
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349882C"],
  frequency=12, start=c(1982,4))

a)

Why is multiplicative seasonality necessary for this series?

autoplot(myts)

Multiplicative seasonality is necessary for this series because the variablity is not constant - it increases over time.

b)

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

retail_hw <- hw(myts, seasonal='multiplicative', damped=FALSE)
summary(retail_hw)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative", damped = FALSE) 
## 
##   Smoothing parameters:
##     alpha = 0.453 
##     beta  = 0.0178 
##     gamma = 0.133 
## 
##   Initial states:
##     l = 138.6858 
##     b = 4.7288 
##     s = 1.0295 0.9407 1.0381 1.096 0.9799 1.0171
##            0.9455 0.9896 0.9999 0.9988 1.03 0.9347
## 
##   sigma:  0.0287
## 
##      AIC     AICc      BIC 
## 4412.010 4413.696 4479.037 
## 
## Error measures:
##                       ME     RMSE      MAE        MPE     MAPE      MASE
## Training set -0.01537593 16.93608 12.97634 -0.2077745 2.194719 0.2784834
##                    ACF1
## Training set -0.1225174
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       1649.404 1588.728 1710.080 1556.608 1742.200
## Feb 2014       1508.565 1447.257 1569.872 1414.803 1602.327
## Mar 2014       1653.322 1579.921 1726.722 1541.066 1765.578
## Apr 2014       1597.295 1520.492 1674.099 1479.834 1714.757
## May 2014       1623.758 1539.771 1707.745 1495.311 1752.205
## Jun 2014       1572.224 1485.232 1659.216 1439.181 1705.266
## Jul 2014       1673.675 1575.071 1772.280 1522.873 1824.478
## Aug 2014       1680.191 1575.204 1785.177 1519.627 1840.754
## Sep 2014       1621.988 1514.860 1729.116 1458.150 1785.826
## Oct 2014       1683.600 1566.411 1800.790 1504.374 1862.827
## Nov 2014       1648.033 1527.451 1768.615 1463.619 1832.447
## Dec 2014       1799.739 1661.642 1937.837 1588.537 2010.942
## Jan 2015       1712.610 1572.379 1852.840 1498.146 1927.074
## Feb 2015       1566.190 1432.444 1699.937 1361.643 1770.738
## Mar 2015       1716.277 1563.654 1868.900 1482.861 1949.694
## Apr 2015       1657.926 1504.611 1811.241 1423.451 1892.401
## May 2015       1685.200 1523.356 1847.043 1437.682 1932.718
## Jun 2015       1631.529 1468.996 1794.061 1382.956 1880.101
## Jul 2015       1736.610 1557.358 1915.862 1462.468 2010.753
## Aug 2015       1743.174 1556.938 1929.410 1458.350 2027.998
## Sep 2015       1682.601 1496.718 1868.484 1398.317 1966.885
## Oct 2015       1746.321 1547.016 1945.626 1441.510 2051.132
## Nov 2015       1709.239 1507.888 1910.591 1401.298 2017.180
## Dec 2015       1866.375 1639.621 2093.128 1519.585 2213.164

retail_hw_d <- hw(myts, seasonal='multiplicative', damped=TRUE)
summary(retail_hw_d)

## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = myts, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.4501 
##     beta  = 0.0492 
##     gamma = 0.0648 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 139.8911 
##     b = 1.7179 
##     s = 1.0277 0.9278 1.0599 1.0604 0.9727 0.9899
##            0.9669 1.0201 1.0361 0.948 0.9936 0.997
## 
##   sigma:  0.0301
## 
##      AIC     AICc      BIC 
## 4445.435 4447.324 4516.405 
## 
## Error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 1.465078 16.72069 12.88664 0.1888671 2.249597 0.2765582
##                    ACF1
## Training set -0.1721738
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       1638.899 1575.733 1702.065 1542.295 1735.503
## Feb 2014       1500.894 1436.275 1565.513 1402.068 1599.721
## Mar 2014       1644.834 1566.060 1723.607 1524.360 1765.307
## Apr 2014       1592.758 1508.331 1677.184 1463.639 1721.876
## May 2014       1619.248 1524.750 1713.746 1474.725 1763.770
## Jun 2014       1567.433 1467.245 1667.621 1414.209 1720.657
## Jul 2014       1667.271 1551.132 1783.410 1489.652 1844.890
## Aug 2014       1672.373 1546.014 1798.731 1479.124 1865.621
## Sep 2014       1618.750 1486.668 1750.832 1416.748 1820.752
## Oct 2014       1678.707 1531.385 1826.029 1453.397 1904.016
## Nov 2014       1639.761 1485.566 1793.956 1403.940 1875.583
## Dec 2014       1797.559 1617.063 1978.055 1521.514 2073.604
## Jan 2015       1697.679 1514.740 1880.619 1417.898 1977.461
## Feb 2015       1553.473 1375.976 1730.970 1282.015 1824.931
## Mar 2015       1701.119 1495.562 1906.677 1386.746 2015.492
## Apr 2015       1646.001 1436.158 1855.845 1325.073 1966.929
## May 2015       1672.130 1447.727 1896.533 1328.936 2015.324
## Jun 2015       1617.448 1389.427 1845.468 1268.721 1966.175
## Jul 2015       1719.254 1465.144 1973.363 1330.626 2107.881
## Aug 2015       1723.324 1456.764 1989.884 1315.656 2130.992
## Sep 2015       1666.945 1397.572 1936.318 1254.974 2078.916
## Oct 2015       1727.552 1436.363 2018.742 1282.216 2172.888
## Nov 2015       1686.393 1390.345 1982.442 1233.626 2139.161
## Dec 2015       1847.525 1510.205 2184.844 1331.638 2363.411

c)

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

accuracy(retail_hw)[2]

## [1] 16.93608

accuracy(retail_hw_d)[2]

## [1] 16.72069

The RMSE is lower with damped trend - I prefere that method.

d)

Check that the residuals from the best method look like white noise.

autoplot(retail_hw_d$residuals)

Residuals do look like white noise - they do seems to be a bit smaller in later years.

e)

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

myts.train <- window(myts, end=c(2010, 12))
myts.test <- window(myts, start=c(2011))

fc <- snaive(myts.train)
fc_hw <- hw(myts.train, h=36, seasonal='multiplicative', damped=TRUE)

RMSE(myts.test, fc$mean)

## [1] 76.67352

RMSE(myts.test, fc_hw$mean)

## [1] 29.08321

The RMSE of the Holt-Winters’ multiplicative damped method is much better than seasonal naive method.

Exercise 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

fc_stl <- stlf(myts.train, method ="ets", lambda = "auto")
RMSE(myts.test, fc_stl$mean)

## [1] 57.91001

The RMSE of this method is slightly better than seasonal naive, but not as good as Holt-Winters’ multiplicative damped method.

Sources:

https://cran.r-project.org/web/packages/forecast/forecast.pdf

https://datascience.stackexchange.com/questions/10093/how-to-find-a-confidence-level-given-the-z-value

https://github.com/business-science/timetk/issues/8