DATA 624 - Homework 5

Do exercises 7.1, 7.5,7.6, 7.7, 7.8 and 7.9 in Hyndman. Please submit both the link to your Rpubs and the .rmd file.

Question 7.1

a.

Use the ses() function in R to find the optimal values of apha and sigma, and generate forecasts for the next four months.

model_fit <- ses(pigs, h = 4) # the next four months
summary(model_fit)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pigs, h = 4) 
## 
##   Smoothing parameters:
##     alpha = 0.2971 
## 
##   Initial states:
##     l = 77260.0561 
## 
##   sigma:  10308.58
## 
##      AIC     AICc      BIC 
## 4462.955 4463.086 4472.665 
## 
## Error measures:
##                    ME    RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 385.8721 10253.6 7961.383 -0.922652 9.274016 0.7966249 0.01282239
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Sep 1995       98816.41 85605.43 112027.4 78611.97 119020.8
## Oct 1995       98816.41 85034.52 112598.3 77738.83 119894.0
## Nov 1995       98816.41 84486.34 113146.5 76900.46 120732.4
## Dec 1995       98816.41 83958.37 113674.4 76092.99 121539.8

By using ‘ses()’ function,the optimal values for alpha of 0.2971488 and a sigma of 7.726005610^{4}.

b.

Compute a 95% prediction interval for the first forecast using \(\hat{y}\pm1.96s\) where s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

# First start with predicting the first forecast, and the standard deviation
model_fit.stdev <- sd(model_fit$residuals)
model_fit.forecast_1 <- model_fit$mean[1]
# generate the prediction interval
model_fit.pred95 <- c(
  model_fit.Lower.95 = model_fit.forecast_1 - 1.96 * model_fit.stdev, 
  model_fit.Upper.95 = model_fit.forecast_1 + 1.96 * model_fit.stdev
)
# 95% prediction interval 
model_fit.pred95

## model_fit.Lower.95 model_fit.Upper.95 
##           78679.97          118952.84

c.

Compare your interval with the interval produced by R.

Prediction interval using residuals : 7.867996710^{4}, 1.189528410^{5}

Prediction interval of R : 7.86119710^{4} ,1.190208410^{5}

They are slightly different

Question 7.5

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.

a.

Plot the series and discuss the main features of the data.

autoplot(books) +
  ggtitle('Daily sales of paper and hardcover books sale')

The dataset presents the daily sales of paperback and hardcover books over a period of 30 days.Since, there is only 30 days of data so we can’t make TRUE conclusion on seasonality or weekly effects.However, by just looking given dataset paperback book sale shows seasonal pattern where book salepeak by end of the week and it decreases in mid week.In addition to that, Paperback and Hardcover sales are uptrend but hardcover sales mean is higher than paperback sales.

b.

Use the ses() function to forecast each series, and plot the forecasts.

model_ses_paperback <- ses(books[, 1], h = 4)
autoplot(model_ses_paperback, PI=FALSE) +
  ylab("paperback Book Sales ")

model_ses_hardcover <- ses(books[, 2], h = 4)
autoplot(model_ses_hardcover, PI=FALSE) +
  ylab("hardback Book Sales")

c.

Compute the RMSE values for the training data in each case.

# Hardcover RMSE
hardcover_RMSE<- sqrt(model_ses_hardcover$model$mse)
# Paperback RMSE
paperback_RMSE <- sqrt(model_ses_paperback$model$mse)
combined_books_RMSE<- c(Hardcover=hardcover_RMSE,
                PaperBack=paperback_RMSE)
# RMSE for papaerback and hardcover
combined_books_RMSE

## Hardcover PaperBack 
##  31.93101  33.63769

By using SES model, the hardcover RMSE is 31.931015 and the paperback RMSE is 33.6376868.

Question 7.6

We will continue with the daily sales of paperback and hardcover books in data set books.

a.

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

model_holt_paperback <- holt(books[, 1], h = 4) # four days predictions
autoplot(model_holt_paperback) +
  ylab("Paperback Book Sales")

summary(model_holt_paperback)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, 1], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 170.699 
##     b = 1.2621 
## 
##   sigma:  33.4464
## 
##      AIC     AICc      BIC 
## 318.3396 320.8396 325.3456 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE
## Training set -3.717178 31.13692 26.18083 -5.508526 15.58354 0.6602122
##                    ACF1
## Training set -0.1750792
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       209.4668 166.6035 252.3301 143.9130 275.0205
## 32       210.7177 167.8544 253.5811 145.1640 276.2715
## 33       211.9687 169.1054 254.8320 146.4149 277.5225
## 34       213.2197 170.3564 256.0830 147.6659 278.7735

model_holt_hardcover <- holt(books[, 2], h = 4) # four days predictions
autoplot(model_holt_hardcover) +
  ylab("Hardcover Book Sales")

summary(model_holt_hardcover) # if I'd like to see summary

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = books[, 2], h = 4) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     beta  = 1e-04 
## 
##   Initial states:
##     l = 147.7935 
##     b = 3.303 
## 
##   sigma:  29.2106
## 
##      AIC     AICc      BIC 
## 310.2148 312.7148 317.2208 
## 
## Error measures:
##                      ME     RMSE      MAE       MPE    MAPE      MASE
## Training set -0.1357882 27.19358 23.15557 -2.114792 12.1626 0.6908555
##                     ACF1
## Training set -0.03245186
## 
## Forecasts:
##    Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 31       250.1739 212.7390 287.6087 192.9222 307.4256
## 32       253.4765 216.0416 290.9113 196.2248 310.7282
## 33       256.7791 219.3442 294.2140 199.5274 314.0308
## 34       260.0817 222.6468 297.5166 202.8300 317.3334

b.

Compare the RMSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. (Remember that Holt’s method is using one more parameter than SES.) Discuss the merits of the two forecasting methods for these data sets.

The RMSE in Holt’s method is smaller for both the paperback and hardback.So we can say taht Holt’s model does a better job than the SES model.The reason might be is because Holt’s model includes a trend component and SES model does not includetrend.

Type	SES	Holt’s Method
Paperback	33.63769	31.13692
Hardback	31.93101	27.19358

c.

Compare the forecasts for the two series using both methods. Which do you think is best?

# Prediction for Hardcover
rbind(SES=model_ses_hardcover$mean[1:4],
      Holt=model_holt_hardcover$mean[1:4]) %>% t

##           SES     Holt
## [1,] 239.5601 250.1739
## [2,] 239.5601 253.4765
## [3,] 239.5601 256.7791
## [4,] 239.5601 260.0817

# Prediction for Paperback
rbind(SES=model_ses_paperback$mean[1:4],
      Holt=model_holt_paperback$mean[1:4]) %>% t

##           SES     Holt
## [1,] 207.1097 209.4668
## [2,] 207.1097 210.7177
## [3,] 207.1097 211.9687
## [4,] 207.1097 213.2197

The Holt forecast appears better, The SES method generates straightline forecast, while the Holt forecasting method generates the trend.In addition to that ,The RMSE in Holt’s method is smaller for both the paperback and hardback.

d.

Calculate a 95% prediction interval for the first forecast for each series, using the RMSE values and assuming normal errors. Compare your intervals with those produced using ses and holt.

Ses Model

Prediction interval of R SES (Papeback) : 138.87 , 275.35

Prediction interval of R SES (Hardcover) : 174.78 , 304.34

Prediction interval using RMSE SES (Paperback) : 141.18, 273.04

Prediction interval using RMSE SES (Hardcover) : 176.98, 302.14

Holts model

Prediction interval of R Holt’s (Papeback) : 143.91 , 275.02

Prediction interval of R Holt’s (Hardcover) : 192.92 , 307.43

Prediction interval using RMSE Holt’s (Paperback) : 148.44, 270.5

Prediction interval using RMSE Holt’s (Hardcover) : 196.87, 303.47

They are different but close for both the models (SES and Holt)

Question 7.7

For this exercise use data set eggs, the price of a dozen eggs in the United States from 1900-1993. Experiment with the various options in the holt() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each argument is doing to the forecasts.

[Hint: use h=100 when calling holt() so you can clearly see the differences between the various options when plotting the forecasts.]

Which model gives the best RMSE?

model_holt_eggs <- holt(eggs, h=100)
model_bc_eggs <- holt(eggs, h=100, lambda=TRUE)
model_damped_eggs <- holt(eggs, h=100, damped=TRUE)
model_damped_bc_eggs <- holt(eggs, h=100, damped=TRUE, lambda=TRUE)
model_exponential_eggs <- holt(eggs, h=100, exponential=TRUE)
autoplot(model_holt_eggs) + ggtitle("Holt's Method")

autoplot(model_damped_eggs) + ggtitle("Damped")

autoplot(model_bc_eggs) + ggtitle("Box-Cox")

autoplot(model_damped_bc_eggs) + ggtitle("Damped & Box-Cox")

autoplot(model_exponential_eggs) + ggtitle("Exponential")

get_rmse <- function(model){
  accuracy(model)[2]
}
Model_Name <- c("Holt's Linear", 
           "Box-Cox Transformed",
           "Damped",
           "Damped and Box-Cox",
           "Exponential")
RMSE <- c(get_rmse(model_holt_eggs), 
          get_rmse(model_bc_eggs),
          get_rmse(model_damped_eggs),
          get_rmse(model_damped_bc_eggs),
          get_rmse(model_exponential_eggs))
eggs_rmse_df <- data.frame(Model_Name, RMSE) 
eggs_rmse_df %>%
  kable() %>%
  kable_styling()

Model_Name	RMSE
Holt’s Linear	26.58219
Box-Cox Transformed	26.55504
Damped	26.54019
Damped and Box-Cox	26.73445
Exponential	26.49795

The minimum RMSE 26.54 occurs when lambda = TRUE which model name is model_damped_eggs

Question 7.8

#borrowed code from week1 hw to load the aussie retail data
temp_file <- tempfile(fileext = ".xlsx")
download.file(url = "https://github.com/omerozeren/DATA624/raw/master/HMW1/retail.xlsx", 
              destfile = temp_file, 
              mode = "wb", 
              quiet = TRUE)
retaildata <- readxl::read_excel(temp_file,skip=1)
aussie.retail <- ts(retaildata[,"A3349388W"],
  frequency=12, start=c(1982,4))
#run decomp as per the text
x11.decomp <- seas(aussie.retail, x11="")
autoplot(x11.decomp, main = "Aussie Retail - X11 Decomposition" )

a.

Why is multiplicative seasonality necessary for this series?

The multiplicative seasonality adjustment is really important when the variability in the series is increasing over time.The meaning the effect of the seasonality is added to the trend to get the forecast.

b.

Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

model_multiplicative <- hw(aussie.retail, seasonal = "multiplicative")
autoplot(model_multiplicative) + 
  ggtitle("Multiplicative") + 
  ylab("Retail Sales")

summary(model_multiplicative)

## 
## Forecast method: Holt-Winters' multiplicative method
## 
## Model Information:
## Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = aussie.retail, seasonal = "multiplicative") 
## 
##   Smoothing parameters:
##     alpha = 0.5069 
##     beta  = 0.0109 
##     gamma = 0.3893 
## 
##   Initial states:
##     l = 196.3497 
##     b = -0.3395 
##     s = 0.9955 0.9534 1.0488 1.1059 1.007 1.0136
##            1.0111 1.0129 0.9942 0.9413 0.9623 0.954
## 
##   sigma:  0.0271
## 
##      AIC     AICc      BIC 
## 4349.545 4351.230 4416.572 
## 
## Error measures:
##                     ME     RMSE      MAE       MPE     MAPE      MASE     ACF1
## Training set 0.9840012 16.79301 12.60148 0.2287578 2.080925 0.3078252 0.129644
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 2014       1259.786 1216.094 1303.477 1192.965 1326.606
## Feb 2014       1131.455 1087.291 1175.619 1063.912 1198.997
## Mar 2014       1279.834 1224.698 1334.969 1195.511 1364.157
## Apr 2014       1248.841 1190.264 1307.418 1159.255 1338.427
## May 2014       1276.490 1211.952 1341.028 1177.788 1375.193
## Jun 2014       1267.524 1198.979 1336.068 1162.694 1372.354
## Jul 2014       1318.062 1242.291 1393.834 1202.180 1433.945
## Aug 2014       1344.255 1262.514 1425.995 1219.244 1469.266
## Sep 2014       1302.659 1219.221 1386.098 1175.051 1430.268
## Oct 2014       1352.658 1261.713 1443.604 1213.569 1491.748
## Nov 2014       1333.216 1239.406 1427.026 1189.746 1476.686
## Dec 2014       1441.242 1335.388 1547.096 1279.352 1603.132
## Jan 2015       1307.917 1201.576 1414.258 1145.282 1470.551
## Feb 2015       1174.547 1075.716 1273.378 1023.398 1325.697
## Mar 2015       1328.424 1212.896 1443.952 1151.740 1505.109
## Apr 2015       1296.107 1179.751 1412.463 1118.156 1474.058
## May 2015       1324.652 1202.031 1447.273 1137.120 1512.185
## Jun 2015       1315.199 1189.792 1440.606 1123.405 1506.993
## Jul 2015       1367.485 1233.298 1501.672 1162.264 1572.706
## Aug 2015       1394.504 1253.808 1535.200 1179.327 1609.680
## Sep 2015       1351.204 1211.148 1491.260 1137.006 1565.402
## Oct 2015       1402.912 1253.632 1552.191 1174.609 1631.214
## Nov 2015       1382.596 1231.678 1533.513 1151.787 1613.404
## Dec 2015       1494.460 1327.230 1661.690 1238.704 1750.216

model_multiplicative_damped <- hw(aussie.retail, seasonal = "multiplicative", damped = TRUE)
autoplot(model_multiplicative_damped) + 
  ggtitle("Multiplicative & Damped") + 
  ylab("Retail Sales")

summary(model_multiplicative_damped)

## 
## Forecast method: Damped Holt-Winters' multiplicative method
## 
## Model Information:
## Damped Holt-Winters' multiplicative method 
## 
## Call:
##  hw(y = aussie.retail, seasonal = "multiplicative", damped = TRUE) 
## 
##   Smoothing parameters:
##     alpha = 0.7217 
##     beta  = 0.0184 
##     gamma = 0.1941 
##     phi   = 0.98 
## 
##   Initial states:
##     l = 194.8538 
##     b = 1.876 
##     s = 1.0706 0.9899 1.0889 1.1184 0.9812 0.9841
##            0.9334 0.9467 0.9527 0.9505 0.9858 0.9977
## 
##   sigma:  0.0278
## 
##      AIC     AICc      BIC 
## 4372.016 4373.906 4442.987 
## 
## Error measures:
##                    ME     RMSE     MAE       MPE     MAPE      MASE        ACF1
## Training set 1.743824 16.92689 12.7743 0.2497514 2.146748 0.3120469 -0.06672948
## 
## Forecasts:
##          Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
## Jan 2014       1281.192 1235.475 1326.909 1211.2738 1351.111
## Feb 2014       1161.406 1109.877 1212.936 1082.5993 1240.213
## Mar 2014       1300.951 1233.407 1368.495 1197.6517 1404.250
## Apr 2014       1283.547 1208.107 1358.988 1168.1718 1398.923
## May 2014       1300.948 1216.184 1385.712 1171.3132 1430.583
## Jun 2014       1276.431 1185.564 1367.297 1137.4618 1415.399
## Jul 2014       1326.973 1224.856 1429.090 1170.7983 1483.148
## Aug 2014       1328.941 1219.289 1438.594 1161.2423 1496.640
## Sep 2014       1287.370 1174.216 1400.525 1114.3156 1460.425
## Oct 2014       1331.462 1207.452 1455.472 1141.8046 1521.119
## Nov 2014       1306.327 1177.966 1434.688 1110.0154 1502.638
## Dec 2014       1419.291 1272.706 1565.877 1195.1081 1643.475
## Jan 2015       1308.473 1163.918 1453.029 1087.3943 1529.552
## Feb 2015       1185.597 1048.974 1322.220  976.6504 1394.544
## Mar 2015       1327.459 1168.251 1486.666 1083.9717 1570.945
## Apr 2015       1309.132 1146.043 1472.221 1059.7088 1558.556
## May 2015       1326.317 1154.993 1497.642 1064.2993 1588.335
## Jun 2015       1300.783 1126.839 1474.728 1034.7579 1566.808
## Jul 2015       1351.743 1164.889 1538.597 1065.9748 1637.512
## Aug 2015       1353.213 1160.106 1546.320 1057.8811 1648.545
## Sep 2015       1310.377 1117.570 1503.184 1015.5046 1605.250
## Oct 2015       1354.745 1149.446 1560.045 1040.7664 1668.724
## Nov 2015       1328.681 1121.522 1535.840 1011.8581 1645.503
## Dec 2015       1443.059 1211.799 1674.318 1089.3775 1796.740

c.

Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

cat("RMSE of Multiplicative = ", accuracy(model_multiplicative)[2])

## RMSE of Multiplicative =  16.79301

cat("RMSE of Multiplicative & Damped = ", accuracy(model_multiplicative_damped)[2])

## RMSE of Multiplicative & Damped =  16.92689

The non-damped model is preforming better with lower RMSE.

Check that the residuals from the best method look like white noise.

checkresiduals(model_multiplicative)

## 
##  Ljung-Box test
## 
## data:  Residuals from Holt-Winters' multiplicative method
## Q* = 70.814, df = 8, p-value = 3.383e-12
## 
## Model df: 16.   Total lags used: 24

ADF Test shows that the residuals seems like white noise.In addition to that The residuals are normally distributed with a mean of zero

e.

Now find the test set RMSE, while training the model to the end of 2010. Can you beat the seasonal naive approach from Exercise 8 in Section 3.7?

retail_train <- window(aussie.retail, end = c(2010, 12))
retail_test <- window(aussie.retail, start = 2011)
rmse <- function(model){
  accuracy(model, retail_test)[4]
}
Model <- c("Seasonal Naive (Baseline)", 
           "SES",
           "Holt's Method",
           "Damped Holt's Method",
           "Holt-Winters Additive",
           "Holt-Winters Multiplicative",
           "Damped Holt-Winters Additive",
           "Damped Holt-Winters Multiplicative")
RMSE <- c(rmse(snaive(retail_train)), 
          rmse(ses(retail_train)),
          rmse(holt(retail_train)),
          rmse(holt(retail_train, damped = TRUE)),
          rmse(hw(retail_train, seasonal = "additive")),
          rmse(hw(retail_train, seasonal = "multiplicative")),
          rmse(hw(retail_train, seasonal = "additive", damped = TRUE)),
          rmse(hw(retail_train, seasonal = "multiplicative", damped = TRUE)))
rmse_df <- data.frame(Model, RMSE) 
rmse_df %>%
  kable() %>%
  kable_styling()

Model	RMSE
Seasonal Naive (Baseline)	31.22915
SES	109.60655
Holt’s Method	140.78574
Damped Holt’s Method	136.45906
Holt-Winters Additive	116.89609
Holt-Winters Multiplicative	135.86940
Damped Holt-Winters Additive	99.97563
Damped Holt-Winters Multiplicative	72.36739

By comparing all models,I found out Seasonal Naive (Baseline) with 31.229153 RMSE preformed the best on the test set.

Question 7.9

For the same retail data, try an STL decomposition applied to the Box-Cox transformed series, followed by ETS on the seasonally adjusted data. How does that compare with your best previous forecasts on the test set?

# Train Dta
retail_train <- ts(as.vector(aussie.retail), start=c(1982,4), end=c(2010,12), frequency = 12)
# Get the optimal lambda
lambda <- BoxCox.lambda(retail_train)
# Preform a Box-Cox transformation
bc_retail_train <- BoxCox(retail_train, lambda = lambda)
# Preform the STL decomposition
stl_retail_train <- mstl(bc_retail_train)
## Create a seasonally adjusted series
sa_stl_retail_train <- stl_retail_train[,1] - stl_retail_train[,3]
sa_retail_train <- InvBoxCox(sa_stl_retail_train, lambda = lambda)
# the output
autoplot(retail_train) + autolayer(sa_retail_train)

ets_retail_train <- forecast(sa_retail_train)
rmse_df %>%
  mutate(Model = as.character(Model)) %>%
  rbind(c("STL Seasonally-Adjusted data", rmse(ets_retail_train))) %>%
  kable() %>%
  kable_styling()

Model	RMSE
Seasonal Naive (Baseline)	31.2291530464725
SES	109.606548655807
Holt’s Method	140.785741128324
Damped Holt’s Method	136.459063569829
Holt-Winters Additive	116.896086321529
Holt-Winters Multiplicative	135.869398230742
Damped Holt-Winters Additive	99.975627125544
Damped Holt-Winters Multiplicative	72.3673888915714
STL Seasonally-Adjusted data	106.84631242036

The ETS model error metric RMSE is worse than the Seasonal Naive model.