knitr::opts_chunk$set(echo = TRUE)

Assignment 5, Problem 1

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books (data set books).

#  Load libraries and set environment options
#library(dplyr)
#library(tidyr)
library(knitr)
#library(readxl)
library(forecast)
library(ggplot2)
#library(zoo)
library(fpp)

#  Use this option to supress scientific notation in printing values
options(scipen = 10, digits = 2)

str(books)

##  Time-Series [1:30, 1:2] from 1 to 30: 199 172 111 209 161 119 195 195 131 183 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "Paperback" "Hardcover"

summary(books)

##    Paperback     Hardcover  
##  Min.   :111   Min.   :128  
##  1st Qu.:167   1st Qu.:170  
##  Median :189   Median :200  
##  Mean   :186   Mean   :199  
##  3rd Qu.:207   3rd Qu.:222  
##  Max.   :247   Max.   :283

head(books)

## Time Series:
## Start = 1 
## End = 6 
## Frequency = 1 
##   Paperback Hardcover
## 1       199       139
## 2       172       128
## 3       111       172
## 4       209       139
## 5       161       191
## 6       119       168

tail(books)

## Time Series:
## Start = 25 
## End = 30 
## Frequency = 1 
##    Paperback Hardcover
## 25       190       214
## 26       182       200
## 27       222       201
## 28       217       283
## 29       188       220
## 30       247       259

a. Plot the series and discuss the main features of the data.

# Take a quick look at each time series
autoplot(books) + xlab("day") + ylab("count")

Based on an initial view of the plot there seems to be an upward trend to both the paperback and hardcover sales volume for the month. Since we only have one month’s worth of data it is hard to see if there might be monthly, quarterly, or annual seasonality. There could be some weekly seasonality. A decomposition may help identify if any exists.

#  Create a seasonal (weekly) time series and see if a decomposition will give any more clues to the data
books_weeklyts <- ts(books, frequency=7)
print("Decomposition of Paperback data")

## [1] "Decomposition of Paperback data"

plot(decompose(books_weeklyts[,1]))

print("Decomposition of Hardcover data")

## [1] "Decomposition of Hardcover data"

plot(decompose(books_weeklyts[,2]))

Decomposition shows a definite upward trend for both paperback and hardcover sales for the month. There could be some weekly seasonality in each, but for this exercise we will assume none.

b. Use simple exponential smoothing with the ses function (setting initial=“simple”) and explore different values of alpha for the paperback series. Record the within-sample SSE for the one-step forecasts. Plot SSE against alpha and find which value of alpha works best. What is the effect of alpha on the forecasts?

We will start with the SES function for the paperback data

#  Create vectors for setting values for alpha and SSEs, for paperback
alphas<-numeric()
SSEs <- numeric()

for(i in seq(0, 1, 0.05)) {
  alphas <-c(alphas, i)
  s <-ses(pback, initial="simple", alpha=i, h=4)
  SSEs<-c(SSEs, s$model$SSE)
}

#  Create data frames for paperback with alphas and SSEs, then show values and plot
pbackDF<-data.frame(alphas, SSEs)
pbackDF

##    alphas  SSEs
## 1    0.00 41270
## 2    0.05 39245
## 3    0.10 37785
## 4    0.15 36738
## 5    0.20 36329
## 6    0.25 36438
## 7    0.30 36931
## 8    0.35 37716
## 9    0.40 38738
## 10   0.45 39967
## 11   0.50 41384
## 12   0.55 42977
## 13   0.60 44743
## 14   0.65 46675
## 15   0.70 48774
## 16   0.75 51035
## 17   0.80 53456
## 18   0.85 56035
## 19   0.90 58769
## 20   0.95 61655
## 21   1.00 64690

plot(pbackDF, main="Alpha Values for Paperback Data")

Looking at the output from the SSE function, and the plot of SSE, it appears that the best value for alpha is around 0.20.

The smaller the value of alpha, the more weight that is placed on more recent data.

c. Now let ses select the optimal value of alpha. Use this value to generate forecasts for the next four days. Compare your results with 2.

#  let SES select the optimal value for alpha for paperback using "simple" for the initial value, then generate a forecast for the next 4 days.
optpbackSimple <- ses(pback, initial="simple", h=4)
summary(optpbackSimple)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pback, h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.2125 
## 
##   Initial states:
##     l = 199 
## 
##   sigma:  35
## Error measures:
##               ME RMSE MAE  MPE MAPE MASE  ACF1
## Training set 1.7   35  29 -2.8   17 0.72 -0.13
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            210   166   255   142   278
## 32            210   165   256   140   280
## 33            210   164   257   139   281
## 34            210   163   258   137   283

The SES function chose an alpha of 0.2125, very close to the estimate of 0.20. And the forecast for the next four days is 210 paperback books per day. SES does not take the upward trend into consideration for the forecast.

d. Repeat but with initial=“optimal”. How much difference does an optimal initial level make?

#  Now, run the forecast for paperback using "optimal" for the initial value, then generate a forecast for the next 4 days.
optpbackOptimal <- ses(pback, initial="optimal", h=4)
summary(optpbackOptimal)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = pback, h = 4, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.1685 
## 
##   Initial states:
##     l = 170.8257 
## 
##   sigma:  34
## 
##  AIC AICc  BIC 
##  319  320  323 
## 
## Error measures:
##               ME RMSE MAE  MPE MAPE MASE  ACF1
## Training set 7.2   34  28 0.47   16  0.7 -0.21
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            207   164   250   141   273
## 32            207   163   251   140   274
## 33            207   163   251   139   275
## 34            207   162   252   138   276

Using the SES function with the initial parameter set to “optimal”, SES chose an alpha of 0.1685. This is still relatively close to the earlier estimate of 0.20. And the forecast for the next four days is 207 paperback books per day. This lower forecast is the result of additional weighting on the older data values in the series with the smaller alpha value.

e. Repeat steps (b)-(d) with the hardcover series.

First, use SSE to generate values of different alphas.

#  Create vectors for setting values for alpha and SSEs, for hardcover
alphas<-numeric()
SSEs <- numeric()

for(i in seq(0, 1, 0.05)) {
  alphas <-c(alphas, i)
  s <-ses(hcover, initial="simple", alpha=i, h=4)
  SSEs<-c(SSEs, s$model$SSE)
}

#  Create data frames for hardcover with alphas and SSEs, then show values and plot
hcoverDF<-data.frame(alphas, SSEs)
hcoverDF

##    alphas   SSEs
## 1    0.00 154503
## 2    0.05  70483
## 3    0.10  45715
## 4    0.15  36814
## 5    0.20  33148
## 6    0.25  31554
## 7    0.30  30910
## 8    0.35  30758
## 9    0.40  30895
## 10   0.45  31224
## 11   0.50  31703
## 12   0.55  32314
## 13   0.60  33060
## 14   0.65  33948
## 15   0.70  34994
## 16   0.75  36217
## 17   0.80  37642
## 18   0.85  39295
## 19   0.90  41210
## 20   0.95  43423
## 21   1.00  45982

plot(hcoverDF, main="values for alpha for hardcover data")

For Hardcover book sales, the best alpha looks to be around 0.35.

Now, run the SES function with initial set to “simple”.

#  let SES select the optimal value for alpha for hardcover using "simple" for the initial value, then generate a forecast for the next 4 days.
opthcoverSimple <- ses(hcover, initial="simple", h=4)
summary(opthcoverSimple)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = hcover, h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.3473 
## 
##   Initial states:
##     l = 139 
## 
##   sigma:  32
## Error measures:
##               ME RMSE MAE MPE MAPE MASE  ACF1
## Training set 9.7   32  26 3.1   13 0.79 -0.16
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            240   199   281   178   303
## 32            240   197   284   174   307
## 33            240   195   286   170   310
## 34            240   192   288   167   314

As expected, the alpha came in very close to 0.35, at 0.3473. The simple forecast is for 240 hardcover books sold each day for the next four days.

Now, run SES with initial set to “optimal”.

#  Now, run the forecast for hardcover using "optimal" for the initial value, then generate a forecast for the next 4 days.
opthcoverOptimal <- ses(hcover, initial="optimal", h=4)
summary(opthcoverOptimal)

## 
## Forecast method: Simple exponential smoothing
## 
## Model Information:
## Simple exponential smoothing 
## 
## Call:
##  ses(y = hcover, h = 4, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.3283 
## 
##   Initial states:
##     l = 149.2836 
## 
##   sigma:  32
## 
##  AIC AICc  BIC 
##  316  317  320 
## 
## Error measures:
##               ME RMSE MAE MPE MAPE MASE  ACF1
## Training set 9.2   32  27 2.6   13  0.8 -0.14
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            240   199   280   177   302
## 32            240   196   283   174   305
## 33            240   194   285   171   309
## 34            240   192   287   168   312

Using an initial setting of “optimal” generates an alpha of 0.3283, but the same 240 books per day for the next four days. The change in alpha was not significant enough to change the forecast, MAPE, or RMSE.

Assignment 5, Problem 2

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

a. Compare the SSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. Discuss the merits of the two forecasting methods for these data sets.

Paperback

#  paperback Simple
pbackHS <- holt(pback, initial="simple", h=4)
summary(pbackHS)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = pback, h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.2984 
##     beta  = 0.4984 
## 
##   Initial states:
##     l = 199 
##     b = -27 
## 
##   sigma:  40
## Error measures:
##               ME RMSE MAE MPE MAPE MASE  ACF1
## Training set 7.8   40  34 1.6   18 0.85 -0.11
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            222   171   273   145   300
## 32            230   165   294   131   329
## 33            237   145   330    96   378
## 34            245   116   375    47   443

Using Holt’s method with initial set to simple, the forecast for the next four days is significantly higher than the SES forecast. The Holt method also reflects the upward trend in the four day forecast.

#  paperback optimal
pbackH <- holt(pback, initial="optimal", h=4)
summary(pbackH)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = pback, h = 4, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.0001 
##     beta  = 0.0001 
## 
##   Initial states:
##     l = 174.6003 
##     b = 1.0606 
## 
##   sigma:  32
## 
##  AIC AICc  BIC 
##  319  322  326 
## 
## Error measures:
##                ME RMSE MAE MPE MAPE MASE  ACF1
## Training set -4.5   32  26  -6   16 0.67 -0.14
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            207   166   248   145   269
## 32            208   168   249   146   270
## 33            209   169   250   147   271
## 34            210   170   251   148   272

With Initial set to optimal, the forecast is closer to the forecast for both the SES simple and optimal forecasts.

Hardcover

#  hardcover Smiple
hcoverHS <- holt(hcover, initial="simple", h=4)
summary(hcoverHS)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = hcover, h = 4, initial = "simple") 
## 
##   Smoothing parameters:
##     alpha = 0.439 
##     beta  = 0.1574 
## 
##   Initial states:
##     l = 139 
##     b = -11 
## 
##   sigma:  35
## Error measures:
##               ME RMSE MAE MPE MAPE MASE   ACF1
## Training set 7.2   35  28 2.4   14 0.84 -0.077
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            251   206   296   182   319
## 32            255   202   307   175   335
## 33            259   196   321   163   354
## 34            263   188   337   149   377

Using Holt’s method with initial set to simple and optimal both generated forecasts for the next four days higher than the SES model.

Again, the Holt method reflected the upward trend in the four day forecast.

#  hardcover Optimal
hcoverH <- holt(hcover, initial="optimal", h=4)
summary(hcoverH)

## 
## Forecast method: Holt's method
## 
## Model Information:
## Holt's method 
## 
## Call:
##  holt(y = hcover, h = 4, initial = "optimal") 
## 
##   Smoothing parameters:
##     alpha = 0.0001 
##     beta  = 0.0001 
## 
##   Initial states:
##     l = 154.6543 
##     b = 2.9961 
## 
##   sigma:  27
## 
##  AIC AICc  BIC 
##  311  313  318 
## 
## Error measures:
##                ME RMSE MAE  MPE MAPE MASE   ACF1
## Training set -2.2   27  23 -3.4   12  0.7 -0.027
## 
## Forecasts:
##    Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 31            247   212   283   194   301
## 32            250   215   286   197   304
## 33            253   218   288   200   307
## 34            256   221   291   203   310

With Initial set to optimal, the forecast is closer to the forecast for both the SES simple and optimal forecasts.

b. Compare the forecasts for the two series using both methods. Which do you think is best?

Compare the SSE measures of Holt’s method for paperbacks and hardcovers with those of the SES models

Paperback

#   Paperback plot of all forecasts
autoplot(pback) + xlab("day") + ylab("Sales") + autolayer(optpbackSimple, PI=FALSE, series="SES Simple") + autolayer(optpbackOptimal, PI=FALSE, series="SES optimal") + autolayer(pbackH, series="Holt Optimal", PI=FALSE) + autolayer(pbackHS, series="Holt Simple", PI=FALSE)

Running the accuracy function confirms that Holt’s optimal forecast has the lowest MAPE and RMSE.

#   Run Accuracy check for forecasts
print("Paperback - SES Simple")

## [1] "Paperback - SES Simple"

accuracy(optpbackSimple$fitted, pback)

##           ME RMSE MAE  MPE MAPE  ACF1 Theil's U
## Test set 1.7   35  29 -2.8   17 -0.13      0.68

print("Paperback - SES Optimal")

## [1] "Paperback - SES Optimal"

accuracy(optpbackOptimal$fitted, pback)

##           ME RMSE MAE  MPE MAPE  ACF1 Theil's U
## Test set 7.2   34  28 0.47   16 -0.21      0.67

print("Paperback - Holt Simple")

## [1] "Paperback - Holt Simple"

accuracy(pbackHS$fitted, pback)

##           ME RMSE MAE MPE MAPE  ACF1 Theil's U
## Test set 7.8   40  34 1.6   18 -0.11      0.88

print("Paperback - Holt Optimal")

## [1] "Paperback - Holt Optimal"

accuracy(pbackH$fitted, pback)

##            ME RMSE MAE MPE MAPE  ACF1 Theil's U
## Test set -4.5   32  26  -6   16 -0.14      0.61

Hardcover

#   Hardcover plot of all forecasts
autoplot(hcover) + xlab("day") + autolayer(opthcoverSimple, PI=FALSE, series="SES Simple") + autolayer(opthcoverOptimal, PI=FALSE, series="SES optimal") + autolayer(hcoverH, series="Holt Optimal", PI=FALSE) + autolayer(hcoverHS, series="Holt Simple", PI=FALSE)

Running the accuracy function confirms that Holt’s optimal forecast has the lowest MAPE and RMSE.

#   Run Accuracy check for forecasts
print("Hardcover - SES Simple")

## [1] "Hardcover - SES Simple"

accuracy(opthcoverSimple$fitted, hcover)

##           ME RMSE MAE MPE MAPE  ACF1 Theil's U
## Test set 9.7   32  26 3.1   13 -0.16      0.81

print("Hardcover - SES Optimale")

## [1] "Hardcover - SES Optimale"

accuracy(opthcoverOptimal$fitted, hcover)

##           ME RMSE MAE MPE MAPE  ACF1 Theil's U
## Test set 9.2   32  27 2.6   13 -0.14      0.81

print("Hardcover - Holt Simple")

## [1] "Hardcover - Holt Simple"

accuracy(hcoverHS$fitted, hcover)

##           ME RMSE MAE MPE MAPE   ACF1 Theil's U
## Test set 7.2   35  28 2.4   14 -0.077      0.92

print("Hardcover - Holt Optimal")

## [1] "Hardcover - Holt Optimal"

accuracy(hcoverH$fitted, hcover)

##            ME RMSE MAE  MPE MAPE   ACF1 Theil's U
## Test set -2.2   27  23 -3.4   12 -0.027      0.66

c. Calculate a 95% prediction interval for the first forecast for each series using both methods, assuming normal errors. Compare your forecasts with those produced by R.

Paperback - Holt’s Optimal forecast with Predication Intervals

#   Calculate 95% Prediction interval, and plot
autoplot(forecast(pbackH, h=4), ylab="Paperback Books Sold", xlab="Day")

Calculating the 95th percent interval for the first day’s forecast manually would give:

ForecastDay31= ValueDay30 +/- 1.96 x sigma, where sigma = 32 & ValueDay30=247 —> ForecastDay31= 184 to 310

From the output of the forecast summary the 95th% interval was 145 to 269

Hardcover - Holt’s Optimal forecast with Predication Intervals

#   Calculate 95% Prediction interval, and plot
autoplot(forecast(hcoverH, h=4), ylab="Hardcover Books Sold", xlab="Day")

MBA678 - Predictive Analytics

Helen Smith

March 5, 2018

Assignment 5, Problem 1

Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books (data set books).

a. Plot the series and discuss the main features of the data.

Decomposition shows a definite upward trend for both paperback and hardcover sales for the month. There could be some weekly seasonality in each, but for this exercise we will assume none.

We will start with the SES function for the paperback data

Looking at the output from the SSE function, and the plot of SSE, it appears that the best value for alpha is around 0.20.

The smaller the value of alpha, the more weight that is placed on more recent data.

c. Now let ses select the optimal value of alpha. Use this value to generate forecasts for the next four days. Compare your results with 2.

The SES function chose an alpha of 0.2125, very close to the estimate of 0.20. And the forecast for the next four days is 210 paperback books per day. SES does not take the upward trend into consideration for the forecast.

d. Repeat but with initial=“optimal”. How much difference does an optimal initial level make?

e. Repeat steps (b)-(d) with the hardcover series.

First, use SSE to generate values of different alphas.

For Hardcover book sales, the best alpha looks to be around 0.35.

Now, run the SES function with initial set to “simple”.

As expected, the alpha came in very close to 0.35, at 0.3473. The simple forecast is for 240 hardcover books sold each day for the next four days.

Now, run SES with initial set to “optimal”.

Using an initial setting of “optimal” generates an alpha of 0.3283, but the same 240 books per day for the next four days. The change in alpha was not significant enough to change the forecast, MAPE, or RMSE.

Assignment 5, Problem 2

Apply Holt’s linear method to the paperback and hardback series and compute four-day forecasts in each case.

a. Compare the SSE measures of Holt’s method for the two series to those of simple exponential smoothing in the previous question. Discuss the merits of the two forecasting methods for these data sets.

Paperback

Using Holt’s method with initial set to simple, the forecast for the next four days is significantly higher than the SES forecast. The Holt method also reflects the upward trend in the four day forecast.

With Initial set to optimal, the forecast is closer to the forecast for both the SES simple and optimal forecasts.

Hardcover

Using Holt’s method with initial set to simple and optimal both generated forecasts for the next four days higher than the SES model.

Again, the Holt method reflected the upward trend in the four day forecast.

With Initial set to optimal, the forecast is closer to the forecast for both the SES simple and optimal forecasts.

b. Compare the forecasts for the two series using both methods. Which do you think is best?

Compare the SSE measures of Holt’s method for paperbacks and hardcovers with those of the SES models

Paperback

Comparing the simple and optimal forecasts using both SES and Holt’s methods, I would recommend the Holt optimal forecast. This forecast reflects the upward trend that the SES forecasts don’t, and it puts adequate weight on the earlier data.

Running the accuracy function confirms that Holt’s optimal forecast has the lowest MAPE and RMSE.

Hardcover

Comparing the simple and optimal forecasts using both SES and Holt’s methods for hardcover, I would also recommend the Holt optimal forecast, for the same reason. NOTE, the SES optimal and Simple both generated the same forecast.

Running the accuracy function confirms that Holt’s optimal forecast has the lowest MAPE and RMSE.

c. Calculate a 95% prediction interval for the first forecast for each series using both methods, assuming normal errors. Compare your forecasts with those produced by R.

Paperback - Holt’s Optimal forecast with Predication Intervals

Calculating the 95th percent interval for the first day’s forecast manually would give:

ForecastDay31= ValueDay30 +/- 1.96 x sigma, where sigma = 32 & ValueDay30=247 —> ForecastDay31= 184 to 310

From the output of the forecast summary the 95th% interval was 145 to 269

Hardcover - Holt’s Optimal forecast with Predication Intervals

Calculating the 95th percent interval for the first day’s forecast manually would give:

ForecastDay31= ValueDay30 +/- 1.96 x sigma, where sigma = 27 & ValueDay30=259 —> ForecastDay31= 206 to 312

From the output of the forecast summary the 95th% interval was 194 to 301