library(fpp2)
library(readxl)
#?pigs - monthly total number of pigs slaughtered in Victoria, AU (Jan 1980 - Aug 1995)
autoplot(pigs)
pigs_ses <- ses(pigs, h = 4)
summary(pigs_ses)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
autoplot(pigs_ses) +
autolayer(fitted(pigs_ses), series="Fitted") +
ylab("Number of pigs slaughtered") + xlab("Month")
The ses() function suggests an optimal \(\alpha\) of approximately 0.30 and an initial state l of approximately 77260.06. The point forecast for each of the four forecasted months (September 1995 - December 1995) is approximately 98816.41.
paste0("95% prediction interval: ", "[", round(pigs_ses$mean[1]-1.96*sd(pigs_ses$residuals),2), ", ", round(pigs_ses$mean[1]+1.96*sd(pigs_ses$residuals),2), "]")
## [1] "95% prediction interval: [78679.97, 118952.84]"
A 95% prediction interval for the first forecast is approximately 78679.97 to 118952.84. This computed interval is slightly narrower than the R-produced interval of approximately 78611.97 to 119020.80.
#?books - daily sales of paperback and hardcover books at the same store
autoplot(books, main="Sales of paperback and hardcover books") + xlab("Day") + ylab("Books sold")
Daily sales of paperback and hardcover books appear to trend upward over the course of the 30 days. Early in the two series, they tend to share an inverse relationship, i.e. paperback sales are up when hardcover sales are down, and vice versa. The relationship tends to flip every day through day 10, when hardcover sales are higher for roughly 6 days, and then reemerges for a few days. Following day 20, the two series fall into similar sales patterns, with hardcover sales exceeding paperback sales each day.
data(books)
paperback_ses <- ses(books[, 1], h=4)
autoplot(paperback_ses, main="Sales of paperback books - SES") +
autolayer(paperback_ses$fitted, series="Fitted") +
ylab("Books sold") + xlab("Day")
hardcover_ses <- ses(books[, 2], h=4)
autoplot(hardcover_ses, main="Sales of hardcover books - SES") +
autolayer(hardcover_ses$fitted, series="Fitted") +
ylab("Books sold") + xlab("Day")
paste0("Paperback SES RMSE: ", round(sqrt(mean(paperback_ses$residuals^2)), 2))
## [1] "Paperback SES RMSE: 33.64"
paste0("Hardcover SES RMSE: ", round(sqrt(mean(hardcover_ses$residuals^2)), 2))
## [1] "Hardcover SES RMSE: 31.93"
paperback_holt <- holt(books[, 1], h=4)
autoplot(books[, 1], main="Sales of paperback books - Holt's linear method") +
autolayer(paperback_holt$fitted, series="Fitted") +
autolayer(paperback_holt, series="Holt's Linear") +
ylab("Books sold") + xlab("Day")
hardcover_holt <- holt(books[, 2], h = 4)
autoplot(books[, 2], main="Sales of hardcover books - Holt's linear method") +
autolayer(hardcover_holt$fitted, series="Fitted") +
autolayer(hardcover_holt, series="Holt's Linear") +
ylab("Books sold") + xlab("Day")
print("Root mean squared error (RMSE)")
## [1] "Root mean squared error (RMSE)"
paste0("Paperback SES: ", round(sqrt(mean(paperback_ses$residuals^2)), 2))
## [1] "Paperback SES: 33.64"
paste0("Paperback Holt's: ", round(sqrt(mean(paperback_holt$residuals^2)), 2))
## [1] "Paperback Holt's: 31.14"
paste0("Hardcover SES: ", round(sqrt(mean(hardcover_ses$residuals^2)), 2))
## [1] "Hardcover SES: 31.93"
paste0("Hardcover Holt's: ", round(sqrt(mean(hardcover_holt$residuals^2)), 2))
## [1] "Hardcover Holt's: 27.19"
Across paperbacks (approximately 31.14 versus 33.64) and hardcovers (approximately 27.19 versus 31.93), the Holt's linear method returns smaller RMSE values. Simple exponential smoothing (SES) is appropriate for series with no clear trend or seasonality, whereas Holt's method extends SES to account for trends. Holt's method seems more appropriate than SES for forecasting each books series given both series trend upward.
print("Point Forecasts (h = 1-4)")
## [1] "Point Forecasts (h = 1-4)"
paste0("Paperback SES: ", round(paperback_ses$mean, 2))
## [1] "Paperback SES: 207.11" "Paperback SES: 207.11" "Paperback SES: 207.11" "Paperback SES: 207.11"
paste0("Paperback Holt's: ", round(paperback_holt$mean, 2))
## [1] "Paperback Holt's: 209.47" "Paperback Holt's: 210.72" "Paperback Holt's: 211.97" "Paperback Holt's: 213.22"
paste0("Hardcover SES: ", round(hardcover_ses$mean, 2))
## [1] "Hardcover SES: 239.56" "Hardcover SES: 239.56" "Hardcover SES: 239.56" "Hardcover SES: 239.56"
paste0("Hardcover Holt's: ", round(hardcover_holt$mean, 2))
## [1] "Hardcover Holt's: 250.17" "Hardcover Holt's: 253.48" "Hardcover Holt's: 256.78" "Hardcover Holt's: 260.08"
As noted above, Holt's linear method appears better than SES for forecasting these series given the presence of upward trends in sales [and thus the resulting forecasts]. SES forecasts cannot account for trends. Further analysis is necessary regarding whether Holt's linear method is more appropriate than other non-SES methods.
paperback_ses
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.8670 275.3523
## 32 207.1097 161.8589 252.3604 137.9046 276.3147
## 33 207.1097 161.2382 252.9811 136.9554 277.2639
## 34 207.1097 160.6259 253.5935 136.0188 278.2005
paste0("Paperback SES: ", "[", round(paperback_ses$mean[1]-1.96*sqrt(mean(paperback_ses$residuals^2)),2), ", ", round(paperback_ses$mean[1]+1.96*sqrt(mean(paperback_ses$residuals^2)),2), "]")
## [1] "Paperback SES: [141.18, 273.04]"
The calculated 95% prediction interval for the first forecasted element of paperback series using SES is slightly narrower than its counterpart produced by R.
paperback_holt
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.9130 275.0205
## 32 210.7177 167.8544 253.5811 145.1640 276.2715
## 33 211.9687 169.1054 254.8320 146.4149 277.5225
## 34 213.2197 170.3564 256.0830 147.6659 278.7735
paste0("Paperback Holt's: ", "[", round(paperback_holt$mean[1]-1.96*sqrt(mean(paperback_holt$residuals^2)),2), ", ", round(paperback_holt$mean[1]+1.96*sqrt(mean(paperback_holt$residuals^2)),2), "]")
## [1] "Paperback Holt's: [148.44, 270.5]"
The calculated 95% prediction interval for the first forecasted element of the paperback series using Holt's linear method is narrower than its counterpart produced by R.
hardcover_ses
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
## 32 239.5601 194.9788 284.1414 171.3788 307.7414
## 33 239.5601 192.8607 286.2595 168.1396 310.9806
## 34 239.5601 190.8347 288.2855 165.0410 314.0792
paste0("Hardcover SES: ", "[", round(hardcover_ses$mean[1]-1.96*sqrt(mean(hardcover_ses$residuals^2)),2), ", ", round(hardcover_ses$mean[1]+1.96*sqrt(mean(hardcover_ses$residuals^2)),2), "]")
## [1] "Hardcover SES: [176.98, 302.14]"
The calculated 95% prediction interval for the first forecasted element of the hardcover series using SES is slightly narrower than its counterpart produced by R.
hardcover_holt
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.7390 287.6087 192.9222 307.4256
## 32 253.4765 216.0416 290.9113 196.2248 310.7282
## 33 256.7791 219.3442 294.2140 199.5274 314.0308
## 34 260.0817 222.6468 297.5166 202.8300 317.3334
paste0("Hardcover Holt's: ", "[", round(hardcover_holt$mean[1]-1.96*sqrt(mean(hardcover_holt$residuals^2)),2), ", ", round(hardcover_holt$mean[1]+1.96*sqrt(mean(hardcover_holt$residuals^2)),2), "]")
## [1] "Hardcover Holt's: [196.87, 303.47]"
The calculated 95% prediction interval for the first forecasted element of the hardcover series using Holt's linear method is narrower than its counterpart produced by R.
autoplot(eggs)
#?holt()
eggs_holt <- holt(eggs, h=100)
autoplot(eggs, main="Egg prices (1 Dozen) - Holt's linear method") +
autolayer(eggs_holt$fitted, series="Fitted") +
autolayer(eggs_holt, series="Holt's Linear") +
ylab("Price") + xlab("Year")
Holt's linear method, without transformation or dampening, results in forecasts that follow the series trend downward and linearly. They also become negative within roughly 20 years, which is obviously not a realistic price.
eggs_holt_boxcox <- holt(eggs, lambda = "auto", h=100)
autoplot(eggs, main="Egg prices (1 Dozen) - Box-Cox Transformation + Holt's linear method") +
autolayer(eggs_holt_boxcox$fitted, series="Fitted") +
autolayer(eggs_holt_boxcox, series="Box-Cox + Holt's Linear") +
ylab("Price") + xlab("Year")
A Box-Cox transformation alongside Holt's linear method results in a set of forecasts that slopes downward approaching zero. Negative prices are not possible, though a price of approximately free is not particularly realistic either.
eggs_holt_damp <- holt(eggs, damped=TRUE, h=100)
autoplot(eggs, main="Egg prices (1 Dozen) - Damped Holt's linear method ") +
autolayer(eggs_holt_damp$fitted, series="Fitted") +
autolayer(eggs_holt_damp, series="Damped Holt's Linear") +
ylab("Price") + xlab("Year")
Damped Holt's linear method results in an essentially flat forecast, which doesn't reflect the historical trend of the series.
print("Root mean squared error (RMSE)")
## [1] "Root mean squared error (RMSE)"
paste0("Holt's: ", round(sqrt(mean(eggs_holt$residuals^2)), 2))
## [1] "Holt's: 26.58"
paste0("Box-Cox + Holt's: ", round(accuracy(eggs_holt_boxcox)[2], 2))
## [1] "Box-Cox + Holt's: 26.39"
paste0("Damped Holt's: ", round(sqrt(mean(eggs_holt_damp$residuals^2)), 2))
## [1] "Damped Holt's: 26.54"
The combination of a Box-Cox transformation and Holt's linear method results in the lowest RMSE (approximately 26.39).
retaildata <- read_excel("retail.xlsx", skip=1) # 381 x 19
myts <- ts(retaildata[,4], frequency=12, start=c(1982,4)) # 4th column ("A3349338X")
autoplot(myts, main="Retail - A3349338x")
Seasonality appears to shift multiple times over the course of the series, suggesting multiplicative seasonality is necessary for forecasting.
retail_hw <- hw(myts, seasonal="multiplicative")
autoplot(myts, main="Retail - A3349338x") +
autolayer(retail_hw$fitted, series="Fitted") +
autolayer(retail_hw, series="Holt-Winters' Multiplicative") +
ylab("Sales") + xlab("Year")
retail_hw_damp <- hw(myts, seasonal="multiplicative", damped=TRUE)
autoplot(myts, main="Retail - A3349338x") +
autolayer(retail_hw_damp$fitted, series="Fitted") +
autolayer(retail_hw_damp, series="Damped HW Multiplicative") +
ylab("Sales") + xlab("Year")
Visually, dampening the trend does not appear to have a large effect on the forecast.
accuracy(retail_hw)[2]
## [1] 9.570098
accuracy(retail_hw_damp)[2]
## [1] 9.022512
Statistically, dampening the trend results in a slightly lower RMSE (approximately 9.02 versus 9.57), in which case, the dampened forecast is preferable.
checkresiduals(retail_hw_damp)
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 41.907, df = 7, p-value = 5.418e-07
##
## Model df: 17. Total lags used: 24
The residuals are not clearly white noise. A Ljung-Box test returns a p-value of approximately zero, meaning that there is sufficient evidence that the residuals are different from zero. The residual plots support the test results. Notably, there are relatively large residuals between around 1997 and 2002, and there is significant autocorrelation around lags of 12, 18, and 30 months.
retail_train <- window(myts, end=c(2010,12))
retail_test <- window(myts, start=2011)
retail_train_hw_damp <- hw(retail_train, method="multiplicative", damped=TRUE)
retail_train_naive <- snaive(retail_train)
accuracy(retail_train_hw_damp, retail_test)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 0.4837761 10.50095 6.863432 0.2838099 4.001396 0.3549224 0.08324464 NA
## Test set 20.1761150 24.71252 21.043296 9.6757767 10.124165 1.0881928 0.62805178 1.132464
accuracy(retail_train_naive, retail_test)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 5.502402 27.74935 19.33784 3.369450 10.447161 1.000000 0.8703252 NA
## Test set -10.845833 24.12202 18.52083 -5.910245 9.201624 0.957751 0.3564215 0.9855325
No, I cannot beat the seasonal naive approach. The test set RMSE for the Damped Holt Winters' Multiplicative model is approximately 24.71 compared with approximately 24.12 for seasonal naive.
retail_ets <- stlf(retail_train, method="ets", lambda="auto")
autoplot(retail_train, main="Retail - A3349338x") +
autolayer(retail_ets$fitted, series="Fitted") +
autolayer(retail_ets, series="STL + Box-Cox + ETS") +
ylab("Sales") + xlab("Year")
accuracy(retail_ets, retail_test)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set -0.3219394 8.319367 5.18686 -0.1652031 2.896367 0.2682234 0.01537319 NA
## Test set -8.5654693 12.459784 10.04513 -4.3478741 5.054943 0.5194549 0.25063867 0.5963045
The STL + Box-Cox + ETS test RMSE of approximately 12.46 is lower than its counterparts for Damped Holt Winters' Multiplicative (24.71) and seasonal naive (24.12).