Assignment 2, Problem 1

Queensland, AUS - Souvenir Sales

#  Load libraries and set environment options
library(dplyr)
library(tidyr)
library(knitr)
library(forecast)
library(readxl)
library(ggplot2)

#  Use this option to supress scientific notation in printing values
options(scipen = 10, digits = 2)

a) Why was the data partitioned?
The data was partitioned to create training and validation periods, or sets, for the time series forecast. By partitioning the data, a subset of the most current data could be used to compare the forecast against actual results.

b) Why did the analyst choose a 12-month validation period?
Start by decomposing the data, as shown here. By looking at the decomposition it appears that the data has a seasonal pattern.
By createing a 12-month validation period the forecast can be validated against all seasons.

#  Decompose and plot the time series for initial analysis
dec_SSales <- decompose(S_Sales.ts)
plot(dec_SSales)

c) What is the naive forecast for the validation period? (assume you must provide forecasts for 12 months ahead).
I ran Naive and Seasonal Naive forecasts. I believe the best forecast is the Seasonal Naive forecast.
Charts for each are shown here.

#  Plot the forecasts - Method 1 with Confidence Intervals
library(ggplot2)
#print("Naive forecast", quote=FALSE)
autoplot(S_Sales_nfc,  xlab="Year", ylab="Sales Volume") + autolayer(fitted(S_Sales_nfc))

#print("Seasonal Naive forecast", quote=FALSE)
autoplot(S_Sales_snfc,  xlab="Year", ylab="Sales Volume") + autolayer(fitted(S_Sales_snfc))

d) Compute the RMSE and MAPE for the naive forecasts.

#  Compare accuracy of forecasts 
print("Accuracy of Naive forecast", quote=FALSE)
## [1] Accuracy of Naive forecast
accuracy(S_Sales_nfc, S_Sales_valid)
##                  ME  RMSE   MAE  MPE MAPE MASE  ACF1 Theil's U
## Training set   1113 10461  5507  -25   61  1.5 -0.20        NA
## Test set     -50500 56099 54490 -287  291 14.6  0.32       6.6
print("Accuracy of Seasonal Naive forecast", quote=FALSE)
## [1] Accuracy of Seasonal Naive forecast
accuracy(S_Sales_snfc, S_Sales_valid)
##                ME RMSE  MAE MPE MAPE MASE ACF1 Theil's U
## Training set 3401 6468 3745  22   26  1.0 0.41        NA
## Test set     7828 9542 7828  27   27  2.1 0.23      0.74

e) Plot a histogrm of the forecast errors that result from the naive forecasts (for the validation period). Plot also a time plot for the naive forecasts and the actual sales numbers in the validation period. What can you say about the behavior of the naive forecasts?
I used both methods shown in the video, density & multiplier, to show the forecast errors (residuals) for the validation period based on the Seasonal Naive forecast. Both methods show that the predictions are right skewed, indicating a higher probability of under-forecasting.

# Create a histogram of the forecast errors
my_hist<-hist(S_Sales_snfc$residuals, ylab="Frequency", xlab = "Forecast Error", bty="l", main="Queensland Sourvier Sales Forecast Residuals \n Method 1 (Multiplier)")
multiplier <- my_hist$counts / my_hist$density

my_density<- density(S_Sales_snfc$residuals, na.rm=TRUE) 
my_density$y<- my_density$y * multiplier[1]

lines(my_density)

hist(S_Sales_snfc$residuals, breaks=20, probability = TRUE, ylab="Frequency", xlab = "Forecast Error", bty="l", main="Queensland Sourvier Sales Forecast Residuals \n Method 2 ( Density)")
lines(density(S_Sales_snfc$residuals, na.rm=TRUE))

As shown below, the Seasonal Naive forecast closely follows the pattern of the prior 12 months.

#  Plot the forecasts - Method 2
plot(S_Sales_valid, bty="l",xant="n", xlab="Year", yaxt="n", ylab="Sales Volume")
axis(2, las=2)
lines(S_Sales_snfc$mean, col=2, lty=2)
legend(2001, 100000, c("Actual", "Forecast"), col=1:2, lty=1:2)

f) The analyst found a forecasting model that gives satisfactory performance on the validation set. What must she do to use the forecasting model for generating forecasts for year 2002?
A forecast can be generated by running the selected forecast model against the entire time series for the forecast period. In this case 12 months.

#  Run seasonal naive forecast for next 12 months
S_Sales_sn_next12<-snaive(S_Sales.ts, h=12)
print(S_Sales_sn_next12)
##          Point Forecast Lo 80  Hi 80 Lo 95  Hi 95
## Jan 2002          10243  1178  19308 -3621  24107
## Feb 2002          11267  2202  20332 -2597  25131
## Mar 2002          21827 12762  30892  7963  35691
## Apr 2002          17357  8292  26423  3493  31221
## May 2002          15998  6933  25063  2134  29862
## Jun 2002          18602  9536  27667  4737  32466
## Jul 2002          26155 17090  35220 12291  40019
## Aug 2002          28587 19521  37652 14722  42451
## Sep 2002          30505 21440  39571 16641  44370
## Oct 2002          30821 21756  39887 16957  44685
## Nov 2002          46634 37569  55700 32770  60498
## Dec 2002         104661 95595 113726 90797 118525
#  Plot the forecasts - Method 1 with Confidence Intervals

autoplot(S_Sales_sn_next12,  xlab="Year", ylab="Sales Volume") + autolayer(fitted(S_Sales_sn_next12))

Assignment 2, Problem 2

Shampoo Sales Forecast Method(s)

The file ShampooSales.xls contains data on the monthly sales of a certain shampoo over a three-year period. If the goal is forecasting sales in future months, which of the following steps should be taken? (choose one or more)

Of the available options listed in the problem, I would select:

- partition the data into training and validation periods
This should be done after some initial review of the time series to estimate any seasonality, trends, and cycles that might be apparent.

- look at MAPE and RMSE values for the validation period By comparing the MAPE and RMSE forecast for the validation period against the actuals for the forecast period, we can evaluate how close the forecast comes to actuals.

- compute naive forecasts
Starting with a naive forecast creates a baseline of other forecasts to compare against.