Question HA 7.1
Consider the pigs
series – the number of pigs slaughtered in Victoria each month.
Answers
Part a
Use the ses()
function in R to find the optimal values of alpha and l, and generate forecasts for the next four months.
## Simple exponential smoothing
##
## Call:
## ses(y = pigs, h = 4)
##
## Smoothing parameters:
## alpha = 0.2971
##
## Initial states:
## l = 77260.0561
##
## sigma: 10308.58
##
## AIC AICc BIC
## 4462.955 4463.086 4472.665
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
After running the ses()
function on the pigs
data, we can see that the optimal value for alpha is 0.2971 and the optimal value for l is 77260.0561.
The forecast for the next four months is shown below:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Sep 1995 98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995 98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995 98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995 98816.41 83958.37 113674.4 76092.99 121539.8
Part b
Compute a 95% prediction interval for the first forecast using predicted value +/- 1.96 * s, where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.
pigs_sd_resid = sd(pigs_output$residuals)
pigs_lb = pigs_output$mean - (1.96 * pigs_sd_resid)
pigs_ub = pigs_output$mean + (1.96 * pigs_sd_resid)
pigs_lb
## Sep Oct Nov Dec
## 1995 78679.97 78679.97 78679.97 78679.97
## Sep Oct Nov Dec
## 1995 118952.8 118952.8 118952.8 118952.8
The interval created does not get wider as time increases, whereas the interval produced by R gradually gets wider as time increases. Using the basic confidence interval calculation with a simple point forecast that does not adjust for increasing uncertainty as time increases creates this static rectangle forecast. The output from R appears to adjust for this uncertainty.
Question HA 7.5
Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
- Plot the series and discuss the main features of the data.
- Use the ses() function to forecast each series, and plot the forecasts.
- Compute the RMSE values for the training data in each case.
Answers
part a
This is a daily time series of two types of books. They both appear to have positive trend though it could also be argued that there in no trend. I would not call this seasonal data even though there is a variation to the daily sales.
part b
sales_paper = ses(books[,'Paperback'], h = 7)
autoplot(sales_paper) +
autolayer(fitted(sales_paper), series="Fitted Paperback") +
autolayer(books[,"Paperback"], series="Original Data") +
ylab("Sales") + xlab("days")
sales_hard = ses(books[,'Hardcover'], h=7)
autoplot(sales_hard) +
autolayer(fitted(sales_hard), series="Fitted Hardcover") +
autolayer(books[,"Hardcover"], series="Original Data") +
ylab("Sales") + xlab("days")
Given these plots one could argue that for Hardcover books, there might be a trend and thereby worth exploring some other forecasting methods such as linear trend. For paperback books, there does not appear to be trend.
part c
paste("Paperbacks RMSE",
round(accuracy(sales_paper),3)[2],
"Hardcovers RMSE",
round(accuracy(sales_hard),3)[2])
## [1] "Paperbacks RMSE 33.638 Hardcovers RMSE 31.931"
Each of the SES models had around 31-33 RMSE or about 15% of the average daily sales value in error. That is not a great result, but it is not horrible either.
Question HA 7.6
We will continue with the daily sales of paperback and hardcover books in data set books.
Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.
Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.
Compare the forecasts for the two series using both methods. Which do you think is best?
Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using
ses
andholt
.
Answers
part b
paste("Paperbacks RMSE",
round(accuracy(h_paper),3)[2],
"Hardcovers RMSE",
round(accuracy(h_hard),3)[2])
## [1] "Paperbacks RMSE 31.137 Hardcovers RMSE 27.194"
For paperback sales we don’t really see much difference between SES and Holt’s. There is a small decrease in RMSE, MASE, and MAE but the changes are on the order of 2% overall from the daily average sales. Hardcovers see a noticeable improvement with Holt’s method for linear trend, which confirms insights from 7.5(b). There is some kind of trend to hardcover books sales over this time interval, or at least a more convincing one than paperbacks. It seems that for now SES works fine for paperbacks whereas Holt’s would be a better option for hardcovers.
part c
First, let’s look at Hardcovers and remind ourselves of the SES forecast in addition to the Holt’s method.
autoplot(sales_hard) +
autolayer(fitted(sales_hard), series="Fitted Hardcover") +
autolayer(books[,"Hardcover"], series="Original Data") +
ylab("Sales") + xlab("days")
autoplot(h_hard) +
autolayer(fitted(h_hard), series="Fitted Hardcover") +
autolayer(books[,"Hardcover"], series="Original Data") +
ylab("Sales") + xlab("days")
Certainly, the confidence intervals are better and the forecast bands “look” appropriate for the Holt’s method over the SES. Let’s do the same for paperbacks.
autoplot(sales_paper) +
autolayer(fitted(sales_paper), series="Fitted Paperback") +
autolayer(books[,"Paperback"], series="Original Data") +
ylab("Sales") + xlab("days")
autoplot(h_paper) +
autolayer(fitted(h_paper), series="Fitted Paperback") +
autolayer(books[,"Paperback"], series="Original Data") +
ylab("Sales") + xlab("days")
Here the SES model’s 80% CI captures almost all the original data, which is pretty good. The Holt forecast is marginally better from reviewing the traiing set metrics, but it’s unclear if one is definitely better. For that reason, I’d probably stick with the SES forecast for the paperback data. I would rather steer more cautiously than assuming linear trend.
part d - Paperback
The rudimentary method of calculating prediction intervals for the paperback forecast.
h_paper_res = sd(h_paper$residuals)
h_paper_lb = h_paper$mean - (1.96 * h_paper_res)
h_paper_ub = h_paper$mean + (1.96 * h_paper_res)
h_paper_lb
## Time Series:
## Start = 31
## End = 34
## Frequency = 1
## [1] 147.8390 149.0900 150.3410 151.5919
## Time Series:
## Start = 31
## End = 34
## Frequency = 1
## [1] 271.0945 272.3455 273.5965 274.8474
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 209.4668 166.6035 252.3301 143.9130 275.0205
## 32 210.7177 167.8544 253.5811 145.1640 276.2715
## 33 211.9687 169.1054 254.8320 146.4149 277.5225
## 34 213.2197 170.3564 256.0830 147.6659 278.7735
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 207.1097 162.4882 251.7311 138.8670 275.3523
## 32 207.1097 161.8589 252.3604 137.9046 276.3147
## 33 207.1097 161.2382 252.9811 136.9554 277.2639
## 34 207.1097 160.6259 253.5935 136.0188 278.2005
## 35 207.1097 160.0215 254.1979 135.0945 279.1249
## 36 207.1097 159.4247 254.7946 134.1818 280.0375
## 37 207.1097 158.8353 255.3840 133.2804 280.9389
For paperback sales the SES method is much wider for both the upper and lower bounds of the 95% PI as compared to the rudimentary method. The Holt’s method 95% PI seems to also be bigger, but less than that of SES.
part d - Hardcover
h_hard_res = sd(h_hard$residuals)
h_hard_lb = h_hard$mean - (1.96 * h_hard_res)
h_hard_ub = h_hard$mean + (1.96 * h_hard_res)
h_hard_lb
## Time Series:
## Start = 31
## End = 34
## Frequency = 1
## [1] 195.9640 199.2666 202.5692 205.8718
## Time Series:
## Start = 31
## End = 34
## Frequency = 1
## [1] 304.3838 307.6864 310.9890 314.2916
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 250.1739 212.7390 287.6087 192.9222 307.4256
## 32 253.4765 216.0416 290.9113 196.2248 310.7282
## 33 256.7791 219.3442 294.2140 199.5274 314.0308
## 34 260.0817 222.6468 297.5166 202.8300 317.3334
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31 239.5601 197.2026 281.9176 174.7799 304.3403
## 32 239.5601 194.9788 284.1414 171.3788 307.7414
## 33 239.5601 192.8607 286.2595 168.1396 310.9806
## 34 239.5601 190.8347 288.2855 165.0410 314.0792
## 35 239.5601 188.8895 290.2306 162.0662 317.0540
## 36 239.5601 187.0164 292.1038 159.2014 319.9188
## 37 239.5601 185.2077 293.9124 156.4353 322.6848
Again we see that the SES method casts a wide net in PI compared to SES and Holt. Furthermore, we see that Holt and classic CI computation are similar.
Question HA 7.7
For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.
[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]
Which model gives the best RMSE?
Answers
For this we will operationalize this to some extent to produce various metrics and forecasts with different variations of holt
.
First, let’s look at the dataset eggs
Wow. Not sure the description means when it says, “constant dollars”, but I’m guessing that has something to do with inflation. Seems like the price has totally tanked over this time span. I guess when you mass produce eggs in inhumnae ways (like we do now) the cost has to plummet due to incredible supply. Being an egg farmer must be a brutal endeavor.
Let’s also look at a BoxCox transform on this data.
Not impressive. Box Cox has dampened the data and reduced the scale, but has made the data appear more linear nor has it toned down the variation in a meaningful way. Skeptical to the value of using Box Cox for this data.
Back to the variations of holt. We will run several CV models for comparison and pick the best based on RMSE.
lambda=BoxCox.lambda(eggs)
t1 <- tsCV(eggs, ses, h=100)
t2 <- tsCV(eggs, holt, h=100)
t3 <- tsCV(eggs, holt, damped=TRUE, h=100)
t4 <- tsCV(eggs, holt, damped=TRUE, phi=.8, h=100)
t5 <- tsCV(eggs, holt, damped=TRUE, phi=.85, h=100)
t6 <- tsCV(eggs, holt, damped=TRUE, phi=.90, h=100)
t7 <- tsCV(eggs, holt, damped=TRUE, phi=.95, h=100)
model_list<- list(t1,t2,t3,t4,t5,t6,t7)
result<-list()
for (i in seq_len(length(model_list))){
result[[i]]<-paste0("RMSE for model ",i,
": ",sqrt(mean(model_list[[i]]^2,
na.rm=TRUE)))
}
result
## [[1]]
## [1] "RMSE for model 1: 114.891500930993"
##
## [[2]]
## [1] "RMSE for model 2: 277.53957931947"
##
## [[3]]
## [1] "RMSE for model 3: 278.582393335714"
##
## [[4]]
## [1] "RMSE for model 4: 116.027712923421"
##
## [[5]]
## [1] "RMSE for model 5: 118.172324850128"
##
## [[6]]
## [1] "RMSE for model 6: 121.878319452599"
##
## [[7]]
## [1] "RMSE for model 7: 135.926307961239"
Now let’s get an idea of what is happening with the Holt model vs. SES.
fc_1<- ses(eggs, h=100)
fc_2<- holt(eggs, h=100)
fc_3<- holt(eggs,h=100, damped=TRUE, phi=.98)
autoplot(eggs) +
autolayer(fc_1, series="SES",PI=FALSE) +
autolayer(fc_2, series="holt undamped",PI=FALSE) +
autolayer(fc_3, series="holt - phi = .98",PI=FALSE) +
ggtitle("SES and Holt forecast on eggs data")
For heavily damped Holt forecasts we see only a marginal deviation from the SES forecast. This is why the CV models 1, 4, and 5 had similarly low RMSE. Those models were, respectively, SES, holt(phi=.8), and holt(phi=.85). This would mean that the top models were highly similar to the SES result in this case. The undamped model yields negative prices pretty quickly, which makes no sense and is probably why the RMSE is so high for those model types.
Question HA 7.8
Recall your retail time series data (from Exercise 3 in Section 2.10).
- Why is multiplicative seasonality necessary for this series?
- Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.
- Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?
- Check that the residuals from the best method look like white noise.
- Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?
Answers
part a
Load up the retail data from each week.
library(httr)
url <- "https://otexts.com/fpp2/extrafiles/retail.xlsx"
GET(url, write_disk("retail.xlsx", overwrite=TRUE))
## Response [https://otexts.com/fpp2/extrafiles/retail.xlsx]
## Date: 2021-03-11 01:19
## Status: 200
## Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
## Size: 639 kB
## <ON DISK> /Users/jeffshamp/Documents/GitHub/cuny_msds/DATA_624/retail.xlsx
retail<- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retail[, "A3349721R"], frequency = 12, start = c(1982, 1))
The original data
Multiplicative is needed due to the fact that the seasonality is increasing a steady rate.
part b
fit_1<- hw(myts,seasonal="multiplicative", h=24)
fit_2 <- hw(myts,seasonal="multiplicative",
damped = TRUE, h=24)
autoplot(myts) +
autolayer(fit_1,
series="HW Damped Multiplicative",
PI=FALSE, alpha=.5) +
autolayer(fit_2, series="HW Multiplicative",
PI=FALSE, alpha=.5) +
xlab("Year") +
ylab("Some Sales") +
ggtitle("Retail Sales in Australia") +
guides(color=guide_legend(title="Forecast"))
We see the undamped Holt-Winter’s method start to deviated from the forecast window. Seems like the damped HW multiplicative does a better job for this dataset.
part c
## [1] "Undamped RMSE 18.588 Damped RMSE 17.637"
Indeed the RMSE of the damped model is slight better. The MASE is marginal though, but the plots seem to be convincing that the damped model is best.
part d
##
## Ljung-Box test
##
## data: Residuals from Damped Holt-Winters' multiplicative method
## Q* = 532.71, df = 7, p-value < 2.2e-16
##
## Model df: 17. Total lags used: 24
This is a mixed bag of results on the residuals. The residuals have mean of zero and appear to be normally distributed. However, the ACF plot shows alternating correlations and the scatter plot does not have constant variance. Additionally, we have significant finding in the Ljung-Box test suggesting that the residuals are distinguishable from white noise. So we have a model that is not very biased, but also is not capturing all the forecastable information. This model could be improved.
part e
fit_3 <- hw(retail_train,seasonal="multiplicative",
damped = TRUE)
fit_snaive <- snaive(retail_train)
paste("Holt's damped RMSE",
round(accuracy(fit_3, retail_test),3)[2],
"Seasonal Naive",
round(accuracy(fit_snaive, retail_test),3)[2])
## [1] "Holt's damped RMSE 90.299 Seasonal Naive 133.875"
Both of these are significantly better than other models. This might be due to the fact that the variance in becomes more constant and overall less in the latter third of the data. So the weights for recency in the model more closely mirrors the test set. Seems more like luck than skill.
Question HA 7.9
For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?
Answer
lambda<- BoxCox.lambda(myts)
bc_retail<- BoxCox(myts, lambda)
retail_train <- window(bc_retail,end=c(2010,12))
retail_test <- window(bc_retail,start=2011)
fit_stl= stlf(retail_train,robust = TRUE, lambda = lambda)
fit_ets = ets(seasadj(decompose(retail_train, "multiplicative")),
model = "ZZZ",
lambda = lambda)
autoplot(retail_train, series = "train set") +
autolayer(forecast(fit_stl, h = 24, PI=FALSE), series = "STL Forecast") +
autolayer(forecast(fit_ets, h = 24, PI=FALSE), series = "ETS Forecast") +
autolayer(retail_test, series = "test set")
paste('STL Method RMSE',
round(accuracy(fit_stl, retail_test)[2],3),
'ETS Method',
round(fit_ets$mse^(0.5),3))
## [1] "STL Method RMSE 0.053 ETS Method 0.006"
ETS is way better than STL for this dataset, though it is hard to determine the relative impact of this forecast for just a single, possibly lucky split. In progressively testing the various parameters of ETS we see that BoxCox is helpful in this context as is letting R choose the optimum model (“ZZZ”).