library(fpp2)
Monthly total number of pigs slaughtered in Victoria, Australia (Jan 1980 – Aug 1995).
#help("pigs")
data("pigs")
head((pigs))
## Jan Feb Mar Apr May Jun
## 1980 76378 71947 33873 96428 105084 95741
summary(pigs)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 33873 79080 91662 90640 101493 120184
paste0("Frequency: ", frequency(pigs))
## [1] "Frequency: 12"
autoplot(pigs)
ggseasonplot(pigs)
ggsubseriesplot(pigs)
gglagplot(pigs)
Acf(pigs, lag.max=150)
It appears from the plots that the dataset does not have seasonality and trending comonents. However, there is presence of some cyclic component present.
Use the ses() function in R to find the optimal values of \(\alpha\) and \(\ell_0\) , and generate forecasts for the next four months.
Let’s use SES (simple exponential smoothing) function to explore exponential smoothing in the dataset
ses.pigs <- ses(pigs)
summary(ses.pigs)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249
## ACF1
## Training set 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
## Jan 1996 98816.41 83448.52 114184.3 75313.25 122319.6
## Feb 1996 98816.41 82955.06 114677.8 74558.56 123074.2
## Mar 1996 98816.41 82476.49 115156.3 73826.66 123806.2
## Apr 1996 98816.41 82011.54 115621.3 73115.58 124517.2
## May 1996 98816.41 81559.12 116073.7 72423.66 125209.2
## Jun 1996 98816.41 81118.26 116514.6 71749.42 125883.4
The use of ses function gives us alpha = 0.2971 (Smoothing parameters).
ses_pigs.4 <- ses(pigs, h = 4)
ses_pigs.4$model
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
autoplot(ses.pigs) +
autolayer(fitted(ses.pigs), series="Fitted")
s <- sd(ses.pigs$residuals)
(ses.pigs$mean[1] + 1.96*s)
## [1] 118952.8
(ses.pigs$mean[1] - 1.96*s)
## [1] 78679.97
Daily sales of paperback and hardcover books at the same store.
#help("books")
data("books")
summary(books)
## Paperback Hardcover
## Min. :111.0 Min. :128.0
## 1st Qu.:167.2 1st Qu.:170.5
## Median :189.0 Median :200.5
## Mean :186.4 Mean :198.8
## 3rd Qu.:207.2 3rd Qu.:222.0
## Max. :247.0 Max. :283.0
head(books)
## Time Series:
## Start = 1
## End = 6
## Frequency = 1
## Paperback Hardcover
## 1 199 139
## 2 172 128
## 3 111 172
## 4 209 139
## 5 161 191
## 6 119 168
Plot the series and discuss the main features of the data
autoplot(books)
For Paperback
books[,"Paperback"]
## Time Series:
## Start = 1
## End = 30
## Frequency = 1
## [1] 199 172 111 209 161 119 195 195 131 183 143 141 168 201 155 243 225
## [18] 167 237 202 186 176 232 195 190 182 222 217 188 247
autoplot(books[,"Paperback"]) + ggtitle("Plot of Paperback")
gglagplot(books[,"Paperback"])
Acf(books[,"Paperback"], lag.max=150)
Paperbacks don’t appear to have obvious cyclical or seasonal component to the , however there are some signs of an upward trend. The ACF points are within the dotted blue lines suggests that there is unlikely to be correlations within the autocorrelations.
For Hardcover
books[,"Hardcover"]
## Time Series:
## Start = 1
## End = 30
## Frequency = 1
## [1] 139 128 172 139 191 168 170 145 184 135 218 198 230 222 206 240 189
## [18] 222 158 178 217 261 238 240 214 200 201 283 220 259
autoplot(books[,"Hardcover"]) + ggtitle("Plot of Hardcover")
gglagplot(books[,"Hardcover"])
Acf(books[,"Hardcover"], lag.max=150)
Hardcovers also appear to be similar to the paperbacks with no seasonal or cyclical component to this. However, there appears to be an upward component to the hardcovers.
Use the ses() function to forecast each series, and plot the forecasts
ses.paperback <- ses(books[, "Paperback"])
ses.hardcover <- ses(books[, "Hardcover"])
autoplot(books[, "Paperback"], series = "Paperback") +
autolayer(ses.paperback, series = "Paperback") +
autolayer(books[, "Hardcover"], series = "Hardcover") +
autolayer(ses.hardcover, series = "Hardcover", PI = FALSE) +
ylab("Sales Amount") +
ggtitle("SES Books Sales")
Compute the RMSE values for the training data in each case
ses.paperback.rmse <- sqrt(mean(ses.paperback$residuals^2))
ses.paperback.rmse
## [1] 33.63769
ses.hardcover.rmse <- sqrt(mean(ses.hardcover$residuals^2))
ses.hardcover.rmse
## [1] 31.93101
Now apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
holt.paperback <- holt(books[, "Paperback"], h = 4)
holt.hardcover <- holt(books[, "Hardcover"], h = 4)
autoplot(books[, "Paperback"], series = "Paperback") +
autolayer(holt.paperback, series = "Paperback") +
autolayer(books[, "Hardcover"], series = "Hardcover") +
autolayer(holt.hardcover, series = "Hardcover", PI = FALSE) +
ylab("Sales Amount") +
ggtitle("Holt: Books Sales")
Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
holt.paperback.rmse <- sqrt(mean(holt.paperback$residuals^2))
holt.paperback.rmse
## [1] 31.13692
holt.hardcover.rmse <- sqrt(mean(holt.hardcover$residuals^2))
holt.hardcover.rmse
## [1] 27.19358
Compare the forecasts for the two series using both methods. Which do you think is best?
Holt’s method appear to be better choice for more linear dataset as it has low RMSE values. But for better results, one must split the dataset into training and test datasets using cross valication techniques.
Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.
#summary(forecast(ses.paperback, h=1))
forecast(ses.paperback, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.867 275.3523
forecast(holt.paperback, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.913 275.0205
The 95% confidence interval for SES Paperback: 138.867 - 275.3523
The 95% confidence interval for Holt Paperback: 143.913 - 275.0205
The 95% confidence interval is more narrow in the Holt method compared to the SES.
forecast(ses.hardcover, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
forecast(holt.hardcover, h=1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.739 287.6087 192.9222 307.4256
The 95% confidence interval for SES Hardcover: 174.7799 - 304.3403
The 95% confidence interval for Holt Hardcover: 192.9222 - 307.4256
The 95% confidence interval is more narrow in the Holt method compared to the SES.
Price of dozen eggs in US, 1900–1993, in constant dollars.
#help("eggs")
data("eggs")
head(eggs)
## Time Series:
## Start = 1900
## End = 1905
## Frequency = 1
## [1] 276.79 315.42 314.87 321.25 314.54 317.92
summary(eggs)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 62.27 148.87 209.15 206.15 276.71 358.78
autoplot(eggs)
gglagplot(eggs)
Acf(eggs)
holt.eggs <- holt(eggs, h = 100)
autoplot(holt.eggs) + autolayer(holt.eggs$fitted)
holt.damped.eggs <- holt(eggs, damped = TRUE, h = 100)
autoplot(holt.damped.eggs) + autolayer(holt.damped.eggs$fitted)
holt.BoxCox.eggs <- holt(eggs, lambda = BoxCox.lambda(eggs), h = 100)
autoplot(holt.BoxCox.eggs) + autolayer(holt.BoxCox.eggs$fitted)
holt.BoxCox.damped.eggs <- holt(eggs, damped = TRUE, lambda = BoxCox.lambda(eggs), h = 100)
autoplot(holt.BoxCox.damped.eggs) + autolayer(holt.BoxCox.damped.eggs$fitted)
paste0("RMSE Holt function = ", sqrt(mean(holt.eggs$residuals^2)))
## [1] "RMSE Holt function = 26.5821881569903"
paste0("RMSE Holt function with damped option = ", sqrt(mean(holt.damped.eggs$residuals^2)))
## [1] "RMSE Holt function with damped option = 26.5401852798325"
paste0("RMSE Holt function with Box-Cox transformation = ", sqrt(mean(holt.BoxCox.eggs$residuals^2)))
## [1] "RMSE Holt function with Box-Cox transformation = 1.03221710764543"
paste0("RMSE Holt function with damped option and Box-Cox transformation = ", sqrt(mean(holt.BoxCox.damped.eggs$residuals^2)))
## [1] "RMSE Holt function with damped option and Box-Cox transformation = 1.03918723939767"
Why is multiplicative seasonality necessary for this series?
retaildata <- readxl::read_excel("C:\\NITEEN\\GitHub\\CUNY_DATA_624\\Dataset\\retail.xlsx", skip=1)
#A3349661X
myts <- ts(retaildata$A3349661X, start = c(1982,4),frequency = 12)
autoplot(myts) + ggtitle("Retail Data Time Series")
autoplot(decompose(myts,type="multiplicative"))
Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
holt.fit <- hw(myts, seasonal="multiplicative",damped=FALSE)
summary(holt.fit)
##
## Forecast method: Holt-Winters' multiplicative method
##
## Model Information:
## Holt-Winters' multiplicative method
##
## Call:
## hw(y = myts, seasonal = "multiplicative", damped = FALSE)
##
## Smoothing parameters:
## alpha = 0.581
## beta = 0.0146
## gamma = 1e-04
##
## Initial states:
## l = 19.4276
## b = -0.1358
## s = 0.9298 0.8839 0.9697 1.1372 1.0888 1.0706
## 1.0107 0.9993 1.0149 1.0267 0.978 0.8902
##
## sigma: 0.0755
##
## AIC AICc BIC
## 3346.914 3348.600 3413.941
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.1358893 4.859899 3.397537 0.001891542 5.684886 0.4111136
## ACF1
## Training set -0.07382362
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 151.7305 137.0487 166.4123 129.27665 174.1843
## Feb 2014 138.8964 123.2586 154.5341 114.98044 162.8123
## Mar 2014 146.7200 128.0858 165.3541 118.22151 175.2184
## Apr 2014 141.0709 121.2561 160.8857 110.76676 171.3750
## May 2014 155.6390 131.7931 179.4848 119.16991 192.1081
## Jun 2014 164.0470 136.9084 191.1856 122.54205 205.5519
## Jul 2014 162.8631 133.9982 191.7280 118.71801 207.0082
## Aug 2014 161.0243 130.6385 191.4102 114.55314 207.4955
## Sep 2014 163.5302 130.8401 196.2203 113.53504 213.5254
## Oct 2014 173.9624 137.2774 210.6475 117.85757 230.0673
## Nov 2014 177.6277 138.2528 217.0025 117.40907 237.8463
## Dec 2014 186.2875 143.0115 229.5636 120.10256 252.4725
## Jan 2015 159.5075 120.7763 198.2387 100.27320 218.7418
## Feb 2015 145.9853 109.0197 182.9509 89.45123 202.5193
## Mar 2015 154.1765 113.5481 194.8049 92.04067 216.3122
## Apr 2015 148.2100 107.6384 188.7817 86.16108 210.2590
## May 2015 163.4823 117.0693 209.8954 92.49967 234.4650
## Jun 2015 172.2795 121.6285 222.9304 94.81548 249.7435
## Jul 2015 171.0022 119.0074 222.9969 91.48300 250.5213
## Aug 2015 169.0381 115.9477 222.1286 87.84327 250.2330
## Sep 2015 171.6351 116.0158 227.2544 86.57275 256.6975
## Oct 2015 182.5489 121.5758 243.5220 89.29855 275.7993
## Nov 2015 186.3591 122.2623 250.4559 88.33154 284.3867
## Dec 2015 195.4073 126.2610 264.5537 89.65709 301.1575
autoplot(holt.fit) + ggtitle("Holt Winter Multiplicative Fit ")
holt.fit.damped <- hw(myts, seasonal="multiplicative",damped=TRUE)
summary(holt.fit.damped)
##
## Forecast method: Damped Holt-Winters' multiplicative method
##
## Model Information:
## Damped Holt-Winters' multiplicative method
##
## Call:
## hw(y = myts, seasonal = "multiplicative", damped = TRUE)
##
## Smoothing parameters:
## alpha = 0.4886
## beta = 0.017
## gamma = 1e-04
## phi = 0.9618
##
## Initial states:
## l = 19.4102
## b = -0.0203
## s = 0.9302 0.8819 0.9689 1.1379 1.0896 1.0694
## 1.0119 1 1.0151 1.026 0.9767 0.8924
##
## sigma: 0.0756
##
## AIC AICc BIC
## 3346.966 3348.855 3417.936
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.4006301 4.847269 3.401302 0.2947955 5.676417 0.4115693
## ACF1
## Training set 0.009577922
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2014 151.1860 136.5317 165.8403 128.77417 173.5978
## Feb 2014 137.8184 122.8455 152.7912 114.91937 160.7174
## Mar 2014 145.5703 128.1229 163.0176 118.88689 172.2536
## Apr 2014 139.8699 121.5899 158.1498 111.91314 167.8266
## May 2014 153.2848 131.6355 174.9341 120.17503 186.3946
## Jun 2014 161.2102 136.7801 185.6403 123.84763 198.5728
## Jul 2014 159.7206 133.9018 185.5395 120.23408 199.2072
## Aug 2014 157.5212 130.4921 184.5504 116.18372 198.8588
## Sep 2014 159.5724 130.6283 188.5165 115.30628 203.8385
## Oct 2014 168.8528 136.5933 201.1123 119.51617 218.1894
## Nov 2014 172.2165 137.6697 206.7634 119.38171 225.0514
## Dec 2014 180.0537 142.2340 217.8733 122.21353 237.8938
## Jan 2015 153.4668 119.7965 187.1371 101.97249 204.9611
## Feb 2015 139.8151 107.8454 171.7848 90.92166 188.7086
## Mar 2015 147.5958 112.4928 182.6988 93.91039 201.2812
## Apr 2015 141.7392 106.7399 176.7385 88.21232 195.2661
## May 2015 155.2525 115.5160 194.9891 94.48067 216.0244
## Jun 2015 163.1981 119.9675 206.4287 97.08261 229.3136
## Jul 2015 161.6126 117.3670 205.8583 93.94478 229.2805
## Aug 2015 159.3138 114.2936 204.3341 90.46127 228.1664
## Sep 2015 161.3170 114.3194 208.3146 89.44036 233.1937
## Oct 2015 170.6264 119.4346 221.8183 92.33524 248.9176
## Nov 2015 173.9546 120.2638 227.6455 91.84162 256.0677
## Dec 2015 181.7997 124.1302 239.4693 93.60176 269.9977
autoplot(holt.fit.damped) + ggtitle("Holt Winter Multiplicative Fit with Damped Option")
When comparing RMSE, AIC, BIC, the two different methods are not very different. The damped method may provide a trivial improvement over the non-damped method.
Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
error.retail <- tsCV(myts, hw, h = 1, seasonal = "multiplicative")
error.retail.damped <- tsCV(myts, hw, h = 1, seasonal = "multiplicative", damped = TRUE)
sqrt(mean(error.retail^2, na.rm = TRUE))
## [1] 5.570076
sqrt(mean(error.retail.damped^2, na.rm = TRUE))
## [1] 5.349409
RMSE values alomst same. Damped model is recommended because it will prohibit the limitless increase of sales forecast.
Check that the residuals from the best method look like white noise.
checkresiduals(holt.fit.damped)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 58.137, df = 7, p-value = 3.551e-10
##
## Model df: 17. Total lags used: 24
Ljung-Box test result and ACF plot show that the residuals aren’t white noise.
Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?
myts.train <- window(myts, end = c(2010, 12))
myts.test <- window(myts, start = 2011)
myts.train.damped <- hw(myts.train, h = 36, seasonal = "multiplicative", damped = TRUE)
autoplot(myts.train.damped)
accuracy(myts.train.damped, myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.4139488 4.263774 3.027293 0.3720019 5.691012 0.4178952
## Test set 18.9011487 21.422397 19.238531 12.1660151 12.450007 2.6557355
## ACF1 Theil's U
## Training set -0.00534152 NA
## Test set 0.34110553 1.691748
myts.train.nd <- hw(myts.train, h = 36, seasonal = "multiplicative")
autoplot(myts.train.nd)
accuracy(myts.train.nd, myts.test)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.4461792 4.269761 3.072234 0.6227387 5.755778 0.424099
## Test set 16.6842661 19.426233 17.060375 10.7146352 11.031225 2.355057
## ACF1 Theil's U
## Training set 0.03067392 NA
## Test set 0.34575920 1.536211
# Find lambda
lambda <- BoxCox.lambda(myts)
stl_retail <- myts %>%
BoxCox(lambda = lambda) %>%
stl(t.window=13, s.window="periodic", robust=TRUE) %>% seasadj()
summary(stl_retail)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.619 3.142 3.653 3.608 4.033 4.529
autoplot(stl_retail) + ggtitle("STL Decomposition (Seasonally Adjusted)")
There is an upward trend, even when adjusted for the seaonality.
ets_model <- stl_retail %>% ets(model="ZZZ")
summary(ets_model)
## ETS(A,A,N)
##
## Call:
## ets(y = ., model = "ZZZ")
##
## Smoothing parameters:
## alpha = 0.5066
## beta = 1e-04
##
## Initial states:
## l = 2.8107
## b = 0.0044
##
## sigma: 0.0585
##
## AIC AICc BIC
## 106.758 106.918 126.472
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -4.894186e-05 0.05816795 0.04539999 -0.02577639 1.30073
## MASE ACF1
## Training set 0.4443646 -0.01057731
autoplot(ets_model) + ggtitle("ETS Model")
The ETS model returned an AAN model, which is the Holt’s linear method with additive errors. The training set RMSE is 0.05816795 which is better than the damped Holt Method.