# Load libraries and set environment options
library(dplyr)
library(tidyr)
library(knitr)
library(forecast)
library(readxl)
library(ggplot2)
# Use this option to supress scientific notation in printing values
options(scipen = 10, digits = 2)
a) Why was the data partitioned?
The data was partitioned to create training and validation periods, or sets, for the time series forecast. By partitioning the data, a subset of the most current data could be used to compare the forecast against actual results.
b) Why did the analyst choose a 12-month validation period?
Start by decomposing the data, as shown here. By looking at the decomposition it appears that the data has a seasonal pattern.
By createing a 12-month validation period the forecast can be validated against all seasons.
# Decompose and plot the time series for initial analysis
dec_SSales <- decompose(S_Sales.ts)
plot(dec_SSales)
c) What is the naive forecast for the validation period? (assume you must provide forecasts for 12 months ahead).
I ran Naive and Seasonal Naive forecasts. I believe the best forecast is the Seasonal Naive forecast.
Charts for each are shown here.
# Plot the forecasts - Method 1 with Confidence Intervals
library(ggplot2)
#print("Naive forecast", quote=FALSE)
autoplot(S_Sales_nfc, xlab="Year", ylab="Sales Volume") + autolayer(fitted(S_Sales_nfc))
#print("Seasonal Naive forecast", quote=FALSE)
autoplot(S_Sales_snfc, xlab="Year", ylab="Sales Volume") + autolayer(fitted(S_Sales_snfc))
d) Compute the RMSE and MAPE for the naive forecasts.
# Compare accuracy of forecasts
print("Accuracy of Naive forecast", quote=FALSE)
## [1] Accuracy of Naive forecast
accuracy(S_Sales_nfc, S_Sales_valid)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 1113 10461 5507 -25 61 1.5 -0.20 NA
## Test set -50500 56099 54490 -287 291 14.6 0.32 6.6
print("Accuracy of Seasonal Naive forecast", quote=FALSE)
## [1] Accuracy of Seasonal Naive forecast
accuracy(S_Sales_snfc, S_Sales_valid)
## ME RMSE MAE MPE MAPE MASE ACF1 Theil's U
## Training set 3401 6468 3745 22 26 1.0 0.41 NA
## Test set 7828 9542 7828 27 27 2.1 0.23 0.74
e) Plot a histogrm of the forecast errors that result from the naive forecasts (for the validation period). Plot also a time plot for the naive forecasts and the actual sales numbers in the validation period. What can you say about the behavior of the naive forecasts?
I used both methods shown in the video, density & multiplier, to show the forecast errors (residuals) for the validation period based on the Seasonal Naive forecast. Both methods show that the predictions are right skewed, indicating a higher probability of under-forecasting.
# Create a histogram of the forecast errors
my_hist<-hist(S_Sales_snfc$residuals, ylab="Frequency", xlab = "Forecast Error", bty="l", main="Queensland Sourvier Sales Forecast Residuals \n Method 1 (Multiplier)")
multiplier <- my_hist$counts / my_hist$density
my_density<- density(S_Sales_snfc$residuals, na.rm=TRUE)
my_density$y<- my_density$y * multiplier[1]
lines(my_density)
hist(S_Sales_snfc$residuals, breaks=20, probability = TRUE, ylab="Frequency", xlab = "Forecast Error", bty="l", main="Queensland Sourvier Sales Forecast Residuals \n Method 2 ( Density)")
lines(density(S_Sales_snfc$residuals, na.rm=TRUE))
As shown below, the Seasonal Naive forecast closely follows the pattern of the prior 12 months.
# Plot the forecasts - Method 2
plot(S_Sales_valid, bty="l",xant="n", xlab="Year", yaxt="n", ylab="Sales Volume")
axis(2, las=2)
lines(S_Sales_snfc$mean, col=2, lty=2)
legend(2001, 100000, c("Actual", "Forecast"), col=1:2, lty=1:2)
f) The analyst found a forecasting model that gives satisfactory performance on the validation set. What must she do to use the forecasting model for generating forecasts for year 2002?
A forecast can be generated by running the selected forecast model against the entire time series for the forecast period. In this case 12 months.
# Run seasonal naive forecast for next 12 months
S_Sales_sn_next12<-snaive(S_Sales.ts, h=12)
print(S_Sales_sn_next12)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2002 10243 1178 19308 -3621 24107
## Feb 2002 11267 2202 20332 -2597 25131
## Mar 2002 21827 12762 30892 7963 35691
## Apr 2002 17357 8292 26423 3493 31221
## May 2002 15998 6933 25063 2134 29862
## Jun 2002 18602 9536 27667 4737 32466
## Jul 2002 26155 17090 35220 12291 40019
## Aug 2002 28587 19521 37652 14722 42451
## Sep 2002 30505 21440 39571 16641 44370
## Oct 2002 30821 21756 39887 16957 44685
## Nov 2002 46634 37569 55700 32770 60498
## Dec 2002 104661 95595 113726 90797 118525
# Plot the forecasts - Method 1 with Confidence Intervals
autoplot(S_Sales_sn_next12, xlab="Year", ylab="Sales Volume") + autolayer(fitted(S_Sales_sn_next12))
The file ShampooSales.xls contains data on the monthly sales of a certain shampoo over a three-year period. If the goal is forecasting sales in future months, which of the following steps should be taken? (choose one or more)
Of the available options listed in the problem, I would select:
- partition the data into training and validation periods
This should be done after some initial review of the time series to estimate any seasonality, trends, and cycles that might be apparent.
- look at MAPE and RMSE values for the validation period By comparing the MAPE and RMSE forecast for the validation period against the actuals for the forecast period, we can evaluate how close the forecast comes to actuals.
- compute naive forecasts
Starting with a naive forecast creates a baseline of other forecasts to compare against.