MBA 678: Assignment 2

Chapter 3: Questions 1 + 2


1.(a) The souvenir sale data was partitioned in order for the analyst to avoid overfitting. By partitioning the data, she is able to measure forecast errors by generating the forecasting model on the training period and testing it on the validation period.

  1. The analyst chose a 12-month validation period because the shop is aiming to get a forecast of sales for the next 12 months. The validation period is determined, among other things, by the forecasting goal.

  2. For part C of question 1, I needed to bring in the .csv file with the souvenir sale data. I pulled in the file and set it up as a timeseries plot.

souvenir <- read.csv("SouvenirSales.csv", stringsAsFactors=FALSE)
str(souvenir)
## 'data.frame':    84 obs. of  2 variables:
##  $ Date : chr  "Jan-95" "Feb-95" "Mar-95" "Apr-95" ...
##  $ Sales: num  1665 2398 2841 3547 3753 ...
souvenirSales<- ts(souvenir$Sales, start=c(1995,1), frequency=12)
plot(souvenirSales, bty="l")

library(forecast)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.3

Then I’ll bring in the forecast library and create the naive forecast. Because the plot shows very obvious seasonality, I will use snaive for a seasonal naive forecast. I need to partition the data in this step as well, before generating the forecast.

library(forecast)

validation <- 12

training <- length(souvenirSales) - validation

salestraining <- window(souvenirSales, start=c(1995,1), end=c(1995, training))
salesvalidation <- window(souvenirSales, start=c(1995,training+1), end=c(1995,training+validation))

snaiveValidation <- snaive(salestraining, h=validation)


snaiveValidation$mean
##           Jan      Feb      Mar      Apr      May      Jun      Jul
## 2001  7615.03  9849.69 14558.40 11587.33  9332.56 13082.09 16732.78
##           Aug      Sep      Oct      Nov      Dec
## 2001 19888.61 23933.38 25391.35 36024.80 80721.71
  1. To calculate the RMSE and MAPE for the naive forecasts, the forecast package in R makes the process very easy. In the table, Test set is the validation period’s accuracy measures.
accuracy(snaiveValidation, validation)
##                     ME     RMSE      MAE         MPE        MAPE      MASE
## Training set  3401.361 6467.818 3744.801     22.3927    25.64127 0.6800223
## Test set     -7603.030 7603.030 7603.030 -63358.5833 63358.58333 1.3806422
##                   ACF1
## Training set 0.4140974
## Test set            NA
  1. Next, it asks to create a histogram of the forecast errors of the validation period. I also created a plot with actuals and forecasted values.
souvenirhist <- hist(snaiveValidation$residuals, ylab="Frequency", xlab="Forecast Error", main="", bty="l")

plot(salesvalidation, xlab="Year", ylab="Souvenir Sales",  bty="l")

lines(snaiveValidation$mean, col=2, lty=2)
legend("topleft", c("Actual","Forecast"), col=1:2, lty=1:2)

The seasonal naive forecast seems to be reasonably accurate, when compared with the actual values for the validation period.

  1. The analyst must combine the data from both the training and the validation period before generating the actual forecast for the year 2002. Once she combines to create one complete set of data, then she must apply the satisfactory forecasting model to the complete data set. This will give the most accurate 2002 forecast possible.

    1. In order to forecast sales for future months using the ShampooSales.csv data, the following should be done:
  • partition the data into training and validation periods
  • compute naive forecasts
  • look at the MAPE and RMSE values for the validation period.