Data 624 HW3

if (!require("fpp2")) install.packages("fpp2")
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("gridExtra")) install.packages("gridExtra")
library(fma)
library(forecast)

7.1 Consider the pigs series — the number of pigs slaughtered in Victoria each month.

A. Use the `ses()` function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

fc<- ses(pigs,h=4)

## Simple exponential smoothing 
fc$model

## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665

## forecast for next four months
forecast(fc)

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

B. Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R

s <- sd(fc$residuals)
mean_fc <- fc$mean[1]

print(paste0("Lower Confidence Interval: ", round(mean_fc - (1.96*s), 2)))

## [1] "Lower Confidence Interval: 78679.97"

print(paste0("Upper Confidence Interval: ", round(mean_fc + (1.96*s), 2)))

## [1] "Upper Confidence Interval: 118952.84"

# plot the data with fitted values to forcast
autoplot(fc) + autolayer(fc$fitted)

7.5 Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

books

## Time Series:
## Start = 1 
## End = 30 
## Frequency = 1 
##    Paperback Hardcover
##  1       199       139
##  2       172       128
##  3       111       172
##  4       209       139
##  5       161       191
##  6       119       168
##  7       195       170
##  8       195       145
##  9       131       184
## 10       183       135
## 11       143       218
## 12       141       198
## 13       168       230
## 14       201       222
## 15       155       206
## 16       243       240
## 17       225       189
## 18       167       222
## 19       237       158
## 20       202       178
## 21       186       217
## 22       176       261
## 23       232       238
## 24       195       240
## 25       190       214
## 26       182       200
## 27       222       201
## 28       217       283
## 29       188       220
## 30       247       259

A. Plot the series and discuss the main features of the data.

autoplot(books)

B. Use the ses() function to forecast each series, and plot the forecasts.

fc_paperback <- ses(books[,1], h=4)
fc_hardcover <- ses(books[,2], h=4)

#forecast paperback
forecast(fc_paperback)

##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       207.1097 162.4882 251.7311 138.8670 275.3523
## 32       207.1097 161.8589 252.3604 137.9046 276.3147
## 33       207.1097 161.2382 252.9811 136.9554 277.2639
## 34       207.1097 160.6259 253.5935 136.0188 278.2005

#plot paperback
a<- autoplot(fc_paperback) 

#plot forecast
b<- autoplot(fc_hardcover)
grid.arrange(a, b, nrow = 2)

C. Compute the RMSE values for the training data in each case.

print(paste0("RMSE values Paperback: ", round(accuracy(fc_paperback)[2], 2)))

## [1] "RMSE values Paperback: 33.64"

print(paste0("RMSE values Hardcover: ", round(accuracy(fc_hardcover)[2], 2)))

## [1] "RMSE values Hardcover: 31.93"

7.6 We will continue with the daily sales of paperback and hardcover books in data set books.

A. Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

# holt method
holt_Paperback <- holt(books[, "Paperback"], h = 4)
holt_Hardcover <- holt(books[, "Hardcover"], h = 4)

#plot
a<- autoplot(holt_Paperback) 

b<- autoplot(holt_Hardcover)
grid.arrange(a, b, nrow = 2)

B. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

print(paste0("Holt’s method, RMSE values Paperback: ", round(accuracy(holt_Paperback)[2], 2)))

## [1] "Holt’s method, RMSE values Paperback: 31.14"

print(paste0("Holt’s method, RMSE values Hardcover: ", round(accuracy(holt_Hardcover)[2], 2)))

## [1] "Holt’s method, RMSE values Hardcover: 27.19"

Holt’s method is using one more parameter than SES which improves the RMSE value

C. Compare the forecasts for the two series using both methods. Which do you think is best?

Comparing both models, holt is the best model in terms of RMSE values.

D. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

PAPERBACK

#PAPERBACK 

s <- sd(fc_paperback$residuals)
mean_fc <- fc_paperback$mean[1]

print(paste0("SES, Lower Confidence Interval: ", round(mean_fc - (1.96*s), 2)))

## [1] "SES, Lower Confidence Interval: 141.6"

print(paste0("SES, Upper Confidence Interval: ", round(mean_fc + (1.96*s), 2)))

## [1] "SES, Upper Confidence Interval: 272.62"

#from model

print(paste0("SES, Lower Confidence Interval from formula: ", round(forecast(fc_paperback)$lower[1, "95%"],2) ))

## [1] "SES, Lower Confidence Interval from formula: 138.87"

print(paste0("SES, Upper Confidence Interval from formula: ", round(forecast(fc_paperback)$upper[1, "95%"], 2)))

## [1] "SES, Upper Confidence Interval from formula: 275.35"

s <- sd(holt_Paperback$residuals)
mean_fc <- holt_Paperback$mean[1]

print(paste0("Holt, Lower Confidence Interval: ", round(mean_fc - (1.96*s), 2)))

## [1] "Holt, Lower Confidence Interval: 147.84"

print(paste0("Holt, Upper Confidence Interval: ", round(mean_fc + (1.96*s), 2)))

## [1] "Holt, Upper Confidence Interval: 271.09"

#from model

print(paste0("Holt, Lower Confidence Interval from formula: ", round(forecast(holt_Paperback)$lower[1, "95%"],2) ))

## [1] "Holt, Lower Confidence Interval from formula: 143.91"

print(paste0("Holt, Upper Confidence Interval from formula: ", round(forecast(holt_Paperback)$upper[1, "95%"], 2)))

## [1] "Holt, Upper Confidence Interval from formula: 275.02"

Lower and Upper intervals in paperback for both the time series using ses and holt methods are comparitively similar.

HARDCOVER

s <- sd(fc_hardcover$residuals)
mean_fc <- fc_hardcover$mean[1]

print(paste0("SES, Lower Confidence Interval: ", round(mean_fc - (1.96*s), 2)))

## [1] "SES, Lower Confidence Interval: 178.58"

print(paste0("SES, Upper Confidence Interval: ", round(mean_fc + (1.96*s), 2)))

## [1] "SES, Upper Confidence Interval: 300.54"

#from model

print(paste0("SES, Lower Confidence Interval from formula: ", round(forecast(fc_hardcover)$lower[1, "95%"],2) ))

## [1] "SES, Lower Confidence Interval from formula: 174.78"

print(paste0("SES, Upper Confidence Interval from formula: ", round(forecast(fc_hardcover)$upper[1, "95%"], 2)))

## [1] "SES, Upper Confidence Interval from formula: 304.34"

s <- sd(holt_Hardcover$residuals)
mean_fc <- holt_Hardcover$mean[1]

print(paste0("Holt, Lower Confidence Interval: ", round(mean_fc - (1.96*s), 2)))

## [1] "Holt, Lower Confidence Interval: 195.96"

print(paste0("Holt, Upper Confidence Interval: ", round(mean_fc + (1.96*s), 2)))

## [1] "Holt, Upper Confidence Interval: 304.38"

#from model

print(paste0("Holt, Lower Confidence Interval from formula: ", round(forecast(holt_Hardcover)$lower[1, "95%"],2) ))

## [1] "Holt, Lower Confidence Interval from formula: 192.92"

print(paste0("Holt, Upper Confidence Interval from formula: ", round(forecast(holt_Hardcover)$upper[1, "95%"], 2)))

## [1] "Holt, Upper Confidence Interval from formula: 307.43"

Upper Confidence interval in Hardcover seems similar in both method SES and Holt. However, there is differece in lower confidence interval.

7.7 For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900–1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

fc1 <- holt(eggs, h=100)
fc2 <- holt(eggs, damped=TRUE, h=100)
fc3 <- holt(eggs, lambda="auto", h=100)
fc4 <- holt(eggs, damped=TRUE, lambda="auto", h=100)

a<- autoplot(fc1)
b<- autoplot(fc2)
c<- autoplot(fc3)
d<- autoplot(fc4)

grid.arrange(a,b,c,d, nrow = 2)

Which model gives the best RMSE?

print(paste0("RMSE - Holt                : ", round(accuracy(fc1)[2], 2)))

## [1] "RMSE - Holt                : 26.58"

print(paste0("RMSE - Holt damped         : ", round(accuracy(fc2)[2], 2)))

## [1] "RMSE - Holt damped         : 26.54"

print(paste0("RMSE - Holt box-cox        : ", round(accuracy(fc3)[2], 2)))

## [1] "RMSE - Holt box-cox        : 26.39"

print(paste0("RMSE - Holt damped box-cox : ", round(accuracy(fc4)[2], 2)))

## [1] "RMSE - Holt damped box-cox : 26.53"

Comparing the accuracy of all four method reveals that the RMSE was almost similar, but the best method when compared to other methods is Holt’s Method with Box-Cox transformation since RMSE is lowest

7.8 Recall your retail time series data (from Exercise 3 in Section 2.10).

library(readxl)
library(seasonal)

retaildata <- readxl::read_excel("C:/Users/patel/Documents/Data_624/retail.xlsx", skip=1)

myts <- ts(retaildata[,"A3349335T"],
  frequency=12, start=c(1982,4))
autoplot(myts)

A. Why is multiplicative seasonality necessary for this series?

It is clear from the graph that seasonality variations are changing with increase in time. In that case, multiplicative seasonality is necessary.

B. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

fc_myts <- hw(myts, seasonal="multiplicative", h=100)
fc_myts_d <- hw(myts, damped=TRUE, seasonal="multiplicative", h=100)

a<- autoplot(fc_myts)

b<- autoplot(fc_myts_d)

grid.arrange(a,b, ncol = 2)

C. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

print(paste0("RMSE - Holt         : ", round(accuracy(fc_myts)[2], 2)))

## [1] "RMSE - Holt         : 25.2"

print(paste0("RMSE - Holt damped  : ", round(accuracy(fc_myts_d)[2], 2)))

## [1] "RMSE - Holt damped  : 25.1"

Since RMSE is 0.1 lower for damped method campare to Holt method, it is the best method.

D. Check that the residuals from the best method look like white noise.

checkresiduals(fc_myts_d)

## 
##  Ljung-Box test
## 
## data:  Residuals from Damped Holt-Winters' multiplicative method
## Q* = 285.62, df = 7, p-value < 2.2e-16
## 
## Model df: 17.   Total lags used: 24

It doesnot seem any white Noise

E. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

myts_train <- window(myts, end = c(2010, 12))
myts_test <- window(myts, start = 2011)

myts_train_d <- hw(myts_train, damped = TRUE, seasonal = "multiplicative")

print(paste0("1. RMSE - Holt damped  : " ))

## [1] "1. RMSE - Holt damped  : "

accuracy(myts_train_d, myts_test)[,2]

## Training set     Test set 
##     25.68057     41.08034

myts_train_sn <- snaive(myts_train, h=100)
print(paste0("2. RMSE - naïve approach  : " ))

## [1] "2. RMSE - naïve approach  : "

accuracy(myts_train_sn,myts_test)[,2]

## Training set     Test set 
##     72.20702    145.46662

RMSE in Holt damped method is lower compare to snaive method. Hence, the Holt-Winter’s Multiplicative Damped method outperformed seasonal naive forecast.

Data 624 HW3

V Patel

2020-10-05

7.1 Consider the pigs series — the number of pigs slaughtered in Victoria each month.

A. Use the `ses()` function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

B. Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R

7.5 Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

A. Plot the series and discuss the main features of the data.

B. Use the ses() function to forecast each series, and plot the forecasts.

C. Compute the RMSE values for the training data in each case.

7.6 We will continue with the daily sales of paperback and hardcover books in data set books.

A. Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

B. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

C. Compare the forecasts for the two series using both methods. Which do you think is best?

D. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

7.8 Recall your retail time series data (from Exercise 3 in Section 2.10).

A. Why is multiplicative seasonality necessary for this series?

B. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

C. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

D. Check that the residuals from the best method look like white noise.

E. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

Data 624 HW3

V Patel

2020-10-05

7.1 Consider the pigs series — the number of pigs slaughtered in Victoria each month.

A. Use the ses() function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.

B. Compute a 95% prediction interval for the first forecast using \(\hat{y} \pm 1.96s\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R

7.5 Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

A. Plot the series and discuss the main features of the data.

B. Use the ses() function to forecast each series, and plot the forecasts.

C. Compute the RMSE values for the training data in each case.

7.6 We will continue with the daily sales of paperback and hardcover books in data set books.

A. Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

B. Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

C. Compare the forecasts for the two series using both methods. Which do you think is best?

D. Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

7.8 Recall your retail time series data (from Exercise 3 in Section 2.10).

A. Why is multiplicative seasonality necessary for this series?

B. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

C. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

D. Check that the residuals from the best method look like white noise.

E. Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naïve approach from Exercise 8 in Section 3.7?

A. Use the `ses()` function in R to find the optimal values of \(\alpha\) and \(\ell_0\), and generate forecasts for the next four months.