## 'data.frame': 84 obs. of 2 variables:
## $ Date : chr "Jan-95" "Feb-95" "Mar-95" "Apr-95" ...
## $ Sales: num 1665 2398 2841 3547 3753 ...
Plot souvenir sales:
SouvenirSales <- ts(souvenir$Sales, start=c(1995,1), frequency=12)
autoplot(SouvenirSales, bty="l")
Partition the dataset:
#set length of validation period to 12
validperiod= 12
#training period
trainperiod= length(SouvenirSales) - validperiod
#data partitioning
souvtrain = window(SouvenirSales, start=c(1995,1), end=c(1995, trainperiod))
souvvalid= window(SouvenirSales, start=c(1995, trainperiod+1), end=c(1995, trainperiod + validperiod))
a) Why was the data partitioned?
As a preliminary step to doing the forecast, in order to try to avoid the problem of overfitting the data.
b) Why a 12-month validation period?
because you typically want the validation period to be as long as the forecasting period.
c) Naive forecast for validation period:
#naive forecast
naive_for_valid = naive(souvtrain, h=validperiod)
naive_for_valid
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2001 80721.71 67315.75 94127.67 60219.059 101224.4
## Feb 2001 80721.71 61762.82 99680.60 51726.583 109716.8
## Mar 2001 80721.71 57501.90 103941.52 45210.077 116233.3
## Apr 2001 80721.71 53909.78 107533.64 39716.409 121727.0
## May 2001 80721.71 50745.07 110698.35 34876.389 126567.0
## Jun 2001 80721.71 47883.94 113559.48 30500.677 130942.7
## Jul 2001 80721.71 45252.87 116190.55 26476.795 134966.6
## Aug 2001 80721.71 42803.92 118639.50 22731.457 138712.0
## Sep 2001 80721.71 40503.82 120939.60 19213.758 142229.7
## Oct 2001 80721.71 38328.33 123115.09 15886.636 145556.8
## Nov 2001 80721.71 36259.16 125184.26 12722.110 148721.3
## Dec 2001 80721.71 34282.09 127161.33 9698.445 151745.0
d) Compute RMSE and MAPE:
accuracy(naive_for_valid, souvvalid)
## ME RMSE MAE MPE MAPE MASE
## Training set 1113.477 10460.73 5506.879 -25.27554 61.16191 1.47054
## Test set -50500.288 56099.07 54490.114 -287.13834 290.95050 14.55087
## ACF1 Theil's U
## Training set -0.1968879 NA
## Test set 0.3182456 6.649124
e) Plot a histogram of forecast errors for validation period, as well as a time plot for naive forecasts and the actual sales numbers in the validation period. Explain the behavior of the naive forecasts.
#plot histogram
hist(naive_for_valid$residuals, breaks= 20, probability = TRUE)
#add density line
lines(density(naive_for_valid$residuals, na.rm = TRUE))
#plot actual values from 2002
plot(souvvalid, bty="l", xaxt="n", xlab="The Year 2002", yaxt="n", ylab="Sales")
#add forecast line
lines(naive_for_valid$mean, col = "red", lty =2)
legend(2002,300, c("Actual","Forecast"), col=1:2, lty=1:2)
the forecasted sales overestimate the actual sales for almost the entire year, as they are based on the sales from December of the previous year, and as we can see from the first plot, there is a sharp decrease in sales after every december. A seasonal forecast would have worked much better in this case.
f) What must the analyst do to use the forecasting model for generating forecasts for 2002?
she needs to use a seasonal naive forecast instead of a naive forecast, as you can see from the data that there is a clear seasonal pattern, this way the forecast will be more accurate.
-partition the data into training and validation periods
-look at MAPE and RMSE values for the validation period
-compute naive forecasts