1.(a) The souvenir sale data was partitioned in order for the analyst to avoid overfitting. By partitioning the data, she is able to measure forecast errors by generating the forecasting model on the training period and testing it on the validation period.
The analyst chose a 12-month validation period because the shop is aiming to get a forecast of sales for the next 12 months. The validation period is determined, among other things, by the forecasting goal.
For part C of question 1, I needed to bring in the .csv file with the souvenir sale data. I pulled in the file and set it up as a timeseries plot.
souvenir <- read.csv("SouvenirSales.csv", stringsAsFactors=FALSE)
str(souvenir)
## 'data.frame': 84 obs. of 2 variables:
## $ Date : chr "Jan-95" "Feb-95" "Mar-95" "Apr-95" ...
## $ Sales: num 1665 2398 2841 3547 3753 ...
souvenirSales<- ts(souvenir$Sales, start=c(1995,1), frequency=12)
plot(souvenirSales, bty="l")
library(forecast)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.3
Then I’ll bring in the forecast library and create the naive forecast. Because the plot shows very obvious seasonality, I will use snaive for a seasonal naive forecast. I need to partition the data in this step as well, before generating the forecast.
library(forecast)
validation <- 12
training <- length(souvenirSales) - validation
salestraining <- window(souvenirSales, start=c(1995,1), end=c(1995, training))
salesvalidation <- window(souvenirSales, start=c(1995,training+1), end=c(1995,training+validation))
snaiveValidation <- snaive(salestraining, h=validation)
snaiveValidation$mean
## Jan Feb Mar Apr May Jun Jul
## 2001 7615.03 9849.69 14558.40 11587.33 9332.56 13082.09 16732.78
## Aug Sep Oct Nov Dec
## 2001 19888.61 23933.38 25391.35 36024.80 80721.71
accuracy(snaiveValidation, validation)
## ME RMSE MAE MPE MAPE MASE
## Training set 3401.361 6467.818 3744.801 22.3927 25.64127 0.6800223
## Test set -7603.030 7603.030 7603.030 -63358.5833 63358.58333 1.3806422
## ACF1
## Training set 0.4140974
## Test set NA
souvenirhist <- hist(snaiveValidation$residuals, ylab="Frequency", xlab="Forecast Error", main="", bty="l")
plot(salesvalidation, xlab="Year", ylab="Souvenir Sales", bty="l")
lines(snaiveValidation$mean, col=2, lty=2)
legend("topleft", c("Actual","Forecast"), col=1:2, lty=1:2)
The seasonal naive forecast seems to be reasonably accurate, when compared with the actual values for the validation period.