Consider the pigs series — the number of pigs slaughtered in Victoria each month.
Use the ses() function in R to find the optimal values of α and ℓ0 , and generate forecasts for the next four months.
data(pigs)
## Warning in data(pigs): data set 'pigs' not found
head(pigs)
## Jan Feb Mar Apr May Jun
## 1980 76378 71947 33873 96428 105084 95741
ses_p<-ses(pigs,4)
summary(ses_p)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
## ACF1
## Training set 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
Optimal Value of Alpha: 0.2971 Optimal Value of L0: 77260.06
Compute a 95% prediction interval for the first forecast using ^y±1.96s where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
z<-qnorm(.025,lower.tail=FALSE)
z
## [1] 1.959964
s <- sd(ses_p$residuals)
ses_p$mean[1] + z*s
## [1] 118952.5
ses_p$mean[1] - z*s
## [1] 78680.34
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
Plot the series and discuss the main features of the data.
summary(books)
## Paperback Hardcover
## Min. :111.0 Min. :128.0
## 1st Qu.:167.2 1st Qu.:170.5
## Median :189.0 Median :200.5
## Mean :186.4 Mean :198.8
## 3rd Qu.:207.2 3rd Qu.:222.0
## Max. :247.0 Max. :283.0
autoplot(books) +
ggtitle("Daily Book Sales")
Based on the plot, above we can conclude there is an upward trend in book sales, but it is hard to tell whether there is any cyclicity or seasonality in the data - doesn’t look like there is. My assumption is that book sales might vary by day of the week. It seems that Hardcover book sales have a more sharp upward trend.
Use the ses() function to forecast each series, and plot the forecasts.
ses_b_p<-ses(books[,1],4)
summary(ses_b_p)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = books[, 1], h = 4)
##
## Smoothing parameters:
## alpha = 0.1685
##
## Initial states:
## l = 170.8271
##
## sigma: 34.8183
##
## AIC AICc BIC
## 318.9747 319.8978 323.1783
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
## ACF1
## Training set -0.2117522
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.8670 275.3523
## 32 207.1097 161.8589 252.3604 137.9046 276.3147
## 33 207.1097 161.2382 252.9811 136.9554 277.2639
## 34 207.1097 160.6259 253.5935 136.0188 278.2005
autoplot(ses_b_p) +
ggtitle("Daily Book Sales Forecast Paperback")
ses_b_h<-ses(books[,2],4)
summary(ses_b_h)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = books[, 2], h = 4)
##
## Smoothing parameters:
## alpha = 0.3283
##
## Initial states:
## l = 149.2861
##
## sigma: 33.0517
##
## AIC AICc BIC
## 315.8506 316.7737 320.0542
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
## ACF1
## Training set -0.1417763
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
## 32 239.5601 194.9788 284.1414 171.3788 307.7414
## 33 239.5601 192.8607 286.2595 168.1396 310.9806
## 34 239.5601 190.8347 288.2855 165.0410 314.0792
autoplot(ses_b_h) +
ggtitle("Daily Book Sales Forecast Hardcover")
Compute the RMSE values for the training data in each case.
accuracy(ses_b_p)
## ME RMSE MAE MPE MAPE MASE
## Training set 7.175981 33.63769 27.8431 0.4736071 15.57784 0.7021303
## ACF1
## Training set -0.2117522
accuracy(ses_b_h)
## ME RMSE MAE MPE MAPE MASE
## Training set 9.166735 31.93101 26.77319 2.636189 13.39487 0.7987887
## ACF1
## Training set -0.1417763
Hardcover books have lower RMSE than Paperback books Forecast. Hardcover RMSE: 31.6; Paperback RMSE: 33.6.
We will continue with the daily sales of paperback and hardcover books in data set books.
Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
#Paperback
holt_b_p <- holt(books[, 1], 4)
summary(holt_b_p)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = books[, 1], h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 170.699
## b = 1.2621
##
## sigma: 33.4464
##
## AIC AICc BIC
## 318.3396 320.8396 325.3456
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
## ACF1
## Training set -0.1750792
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.9130 275.0205
## 32 210.7177 167.8544 253.5811 145.1640 276.2715
## 33 211.9687 169.1054 254.8320 146.4149 277.5225
## 34 213.2197 170.3564 256.0830 147.6659 278.7735
#Hardcover
holt_b_h <- holt(books[, 2], 4)
summary(holt_b_h)
##
## Forecast method: Holt's method
##
## Model Information:
## Holt's method
##
## Call:
## holt(y = books[, 2], h = 4)
##
## Smoothing parameters:
## alpha = 1e-04
## beta = 1e-04
##
## Initial states:
## l = 147.7935
## b = 3.303
##
## sigma: 29.2106
##
## AIC AICc BIC
## 310.2148 312.7148 317.2208
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
## ACF1
## Training set -0.03245186
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.7390 287.6087 192.9222 307.4256
## 32 253.4765 216.0416 290.9113 196.2248 310.7282
## 33 256.7791 219.3442 294.2140 199.5274 314.0308
## 34 260.0817 222.6468 297.5166 202.8300 317.3334
Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
accuracy(holt_b_p)[2]
## [1] 31.13692
accuracy(holt_b_h)[2]
## [1] 27.19358
RMSE is better for Holt Method: Hardcover RMSE: 27.2; Paperback RMSE: 31.1. For comparison SES: Hardcover RMSE: 31.6; Paperback RMSE: 33.6.
Holt method has a Trend component while SES has none, that’s why we are seeing better results with this data that’s trending upward.
Compare the forecasts for the two series using both methods. Which do you think is best?
autoplot(holt_b_p) +
autolayer(ses_b_p, series='ses', PI = FALSE) +
ggtitle("Daily Book Sales Holt & SES Forecast Paperback")
autoplot(holt_b_h) +
autolayer(ses_b_h, series='ses', PI = FALSE) +
ggtitle("Daily Book Sales Holt & SES Forecast Hardcover")
Holt’s method is better because it is matching the upward trend we are seeing in the data for both paperback and hardcover books.
Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
#Holt Paperback Calculation
holt_b_p$mean[1] + z*accuracy(holt_b_p)[2]
## [1] 270.494
holt_b_p$mean[1] - z*accuracy(holt_b_p)[2]
## [1] 148.4395
holt_b_p$upper[1,2]
## 95%
## 275.0205
holt_b_p$lower[1,2]
## 95%
## 143.913
#Holt Hardcover Calculation
holt_b_h$mean[1] + z*accuracy(holt_b_h)[2]
## [1] 303.4723
holt_b_h$mean[1] - z*accuracy(holt_b_h)[2]
## [1] 196.8754
holt_b_h$upper[1,2]
## 95%
## 307.4256
holt_b_h$lower[1,2]
## 95%
## 192.9222
#SES Paperback Calculation
ses_b_p$mean[1] + z*accuracy(ses_b_p)[2]
## [1] 273.0383
ses_b_p$mean[1] - z*accuracy(ses_b_p)[2]
## [1] 141.181
ses_b_p$upper[1,2]
## 95%
## 275.3523
ses_b_p$lower[1,2]
## 95%
## 138.867
#SES Hardcover Calculation
ses_b_h$mean[1] + z*accuracy(ses_b_h)[2]
## [1] 302.1437
ses_b_h$mean[1] - z*accuracy(ses_b_h)[2]
## [1] 176.9765
ses_b_h$upper[1,2]
## 95%
## 304.3403
ses_b_h$lower[1,2]
## 95%
## 174.7799
The calculated values are very close to what ses and holt function produces.
For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
holt_eggs <- holt(eggs, h=100)
holt_eggs_d <- holt(eggs, h=100, damped = TRUE)
holt_eggs_e <- holt(eggs, h=100, exponential = TRUE)
holt_eggs_bc <- holt(eggs, h=100, lambda = "auto")
autoplot(eggs) +
autolayer(holt_eggs, series='Holt', PI = FALSE) +
autolayer(holt_eggs_d, series='Holt Damped', PI = FALSE) +
autolayer(holt_eggs_e, series='Holt Exponential', PI = FALSE) +
autolayer(holt_eggs_bc, series='Holt BoxCox Transformation', PI = FALSE)
RMSE Comparison:
accuracy(holt_eggs)[2]
## [1] 26.58219
accuracy(holt_eggs_d)[2]
## [1] 26.54019
accuracy(holt_eggs_e)[2]
## [1] 26.49795
accuracy(holt_eggs_bc)[2]
## [1] 26.39376
Box-cox transformation is giving the best RMSE, but the difference is not very dramatic.
Recall your retail time series data (from Exercise 3 in Section 2.10).
#setwd("/Users/elinaazrilyan/Documents/Data624/")
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349882C"],
frequency=12, start=c(1982,4))
Why is multiplicative seasonality necessary for this series?
autoplot(myts)
Multiplicative seasonality is necessary for this series because the variablity is not constant - it increases over time.
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
retail_hw <- hw(myts, seasonal='multiplicative', damped=FALSE)
summary(retail_hw)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = myts, seasonal = "multiplicative", damped = FALSE)
##
## Smoothing parameters:
## alpha = 0.453
## beta = 0.0178
## gamma = 0.133
##
## Initial states:
## l = 138.6858
## b = 4.7288
## s = 1.0295 0.9407 1.0381 1.096 0.9799 1.0171
## 0.9455 0.9896 0.9999 0.9988 1.03 0.9347
##
## sigma: 0.0287
##
## AIC AICc BIC
## 4412.010 4413.696 4479.037
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01537593 16.93608 12.97634 -0.2077745 2.194719 0.2784834
## ACF1
## Training set -0.1225174
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 1649.404 1588.728 1710.080 1556.608 1742.200
## Feb 2014 1508.565 1447.257 1569.872 1414.803 1602.327
## Mar 2014 1653.322 1579.921 1726.722 1541.066 1765.578
## Apr 2014 1597.295 1520.492 1674.099 1479.834 1714.757
## May 2014 1623.758 1539.771 1707.745 1495.311 1752.205
## Jun 2014 1572.224 1485.232 1659.216 1439.181 1705.266
## Jul 2014 1673.675 1575.071 1772.280 1522.873 1824.478
## Aug 2014 1680.191 1575.204 1785.177 1519.627 1840.754
## Sep 2014 1621.988 1514.860 1729.116 1458.150 1785.826
## Oct 2014 1683.600 1566.411 1800.790 1504.374 1862.827
## Nov 2014 1648.033 1527.451 1768.615 1463.619 1832.447
## Dec 2014 1799.739 1661.642 1937.837 1588.537 2010.942
## Jan 2015 1712.610 1572.379 1852.840 1498.146 1927.074
## Feb 2015 1566.190 1432.444 1699.937 1361.643 1770.738
## Mar 2015 1716.277 1563.654 1868.900 1482.861 1949.694
## Apr 2015 1657.926 1504.611 1811.241 1423.451 1892.401
## May 2015 1685.200 1523.356 1847.043 1437.682 1932.718
## Jun 2015 1631.529 1468.996 1794.061 1382.956 1880.101
## Jul 2015 1736.610 1557.358 1915.862 1462.468 2010.753
## Aug 2015 1743.174 1556.938 1929.410 1458.350 2027.998
## Sep 2015 1682.601 1496.718 1868.484 1398.317 1966.885
## Oct 2015 1746.321 1547.016 1945.626 1441.510 2051.132
## Nov 2015 1709.239 1507.888 1910.591 1401.298 2017.180
## Dec 2015 1866.375 1639.621 2093.128 1519.585 2213.164
retail_hw_d <- hw(myts, seasonal='multiplicative', damped=TRUE)
summary(retail_hw_d)
##
## Forecast method: Damped Holt-Winters' multiplicative method
##
## Model Information:
## Damped Holt-Winters' multiplicative method
##
## Call:
## hw(y = myts, seasonal = "multiplicative", damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.4501
## beta = 0.0492
## gamma = 0.0648
## phi = 0.98
##
## Initial states:
## l = 139.8911
## b = 1.7179
## s = 1.0277 0.9278 1.0599 1.0604 0.9727 0.9899
## 0.9669 1.0201 1.0361 0.948 0.9936 0.997
##
## sigma: 0.0301
##
## AIC AICc BIC
## 4445.435 4447.324 4516.405
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 1.465078 16.72069 12.88664 0.1888671 2.249597 0.2765582
## ACF1
## Training set -0.1721738
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 1638.899 1575.733 1702.065 1542.295 1735.503
## Feb 2014 1500.894 1436.275 1565.513 1402.068 1599.721
## Mar 2014 1644.834 1566.060 1723.607 1524.360 1765.307
## Apr 2014 1592.758 1508.331 1677.184 1463.639 1721.876
## May 2014 1619.248 1524.750 1713.746 1474.725 1763.770
## Jun 2014 1567.433 1467.245 1667.621 1414.209 1720.657
## Jul 2014 1667.271 1551.132 1783.410 1489.652 1844.890
## Aug 2014 1672.373 1546.014 1798.731 1479.124 1865.621
## Sep 2014 1618.750 1486.668 1750.832 1416.748 1820.752
## Oct 2014 1678.707 1531.385 1826.029 1453.397 1904.016
## Nov 2014 1639.761 1485.566 1793.956 1403.940 1875.583
## Dec 2014 1797.559 1617.063 1978.055 1521.514 2073.604
## Jan 2015 1697.679 1514.740 1880.619 1417.898 1977.461
## Feb 2015 1553.473 1375.976 1730.970 1282.015 1824.931
## Mar 2015 1701.119 1495.562 1906.677 1386.746 2015.492
## Apr 2015 1646.001 1436.158 1855.845 1325.073 1966.929
## May 2015 1672.130 1447.727 1896.533 1328.936 2015.324
## Jun 2015 1617.448 1389.427 1845.468 1268.721 1966.175
## Jul 2015 1719.254 1465.144 1973.363 1330.626 2107.881
## Aug 2015 1723.324 1456.764 1989.884 1315.656 2130.992
## Sep 2015 1666.945 1397.572 1936.318 1254.974 2078.916
## Oct 2015 1727.552 1436.363 2018.742 1282.216 2172.888
## Nov 2015 1686.393 1390.345 1982.442 1233.626 2139.161
## Dec 2015 1847.525 1510.205 2184.844 1331.638 2363.411
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
accuracy(retail_hw)[2]
## [1] 16.93608
accuracy(retail_hw_d)[2]
## [1] 16.72069
The RMSE is lower with damped trend - I prefere that method.
Check that the residuals from the best method look like white noise.
autoplot(retail_hw_d$residuals)
Residuals do look like white noise - they do seems to be a bit smaller in later years.
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?
myts.train <- window(myts, end=c(2010, 12))
myts.test <- window(myts, start=c(2011))
fc <- snaive(myts.train)
fc_hw <- hw(myts.train, h=36, seasonal='multiplicative', damped=TRUE)
RMSE(myts.test, fc$mean)
## [1] 76.67352
RMSE(myts.test, fc_hw$mean)
## [1] 29.08321
The RMSE of the Holt-Winters’ multiplicative damped method is much better than seasonal naive method.
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
fc_stl <- stlf(myts.train, method ="ets", lambda = "auto")
RMSE(myts.test, fc_stl$mean)
## [1] 57.91001
The RMSE of this method is slightly better than seasonal naive, but not as good as Holt-Winters’ multiplicative damped method.