1 Introduction

For this report, I will be analyzing monthly national grocery store sales (in millions of dollars) in the United States from January 1992 to December of 2022. A time series model will be built and 4 different forecasting methods will be used to forecast future values and the accuracy metrics will be assessed for all 4 forecasting methods.

Grocery <- read.csv("STA321_GroceryStoreData.csv")[,-1]

2 Building the Time Series

2.1 Data Split

Since I am interested in testing for forecasting performance, I will split the data into a training set and a testing set. The last ten observations will be used in the testing set.

training = Grocery[1:362]
testing = Grocery[363:372]

Grocery.ts = ts(training, frequency = 12, start = c(1992, 1))

2.2 Forecasting table

The four forecasting methods i will use for building a forecasting table and for testing prediction accuracy for the time series model are moving average, naive, seasonal naive, and random walk. To forecast, i will use the training set to train and use the test set to forecast the next 10 months of sales.

pred.mv = meanf(Grocery.ts, h = 10)$mean
pred.naive = naive(Grocery.ts, h=10)$mean
pred.snaive = snaive(Grocery.ts, h=10)$mean
pred.rwf = rwf(Grocery.ts, h=10, drift = TRUE)$mean

pred.table = cbind(pred.mv = pred.mv,
                   pred.naive = pred.naive,
                   pred.snaive = pred.snaive,
                   pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")

Forecasting Table
pred.mv	pred.naive	pred.snaive	pred.rwf
42007.74	69619	63837	69734.75
42007.74	69619	64366	69850.49
42007.74	69619	65018	69966.24
42007.74	69619	65595	70081.98
42007.74	69619	65656	70197.73
42007.74	69619	67330	70313.47
42007.74	69619	67712	70429.22
42007.74	69619	68341	70544.96
42007.74	69619	68398	70660.71
42007.74	69619	68944	70776.45

2.3 Time Series

Next, the time series will be generated with the 4 forecasting methods mentioned previously.

plot(340:372, Grocery[340:372], type="l", xlim=c(340,372), ylim=c(27000, 90000),
     xlab = "observation sequence",
     ylab = "Grocery Store Sales (Millions of Dollars)",
     main = "Monthly Grocery Store Sales and forecasting")
points(363:372, Grocery[363:372],pch=20)
##
points(363:372, pred.mv, pch=15, col = "red")
points(363:372, pred.naive, pch=16, col = "blue")
points(363:372, pred.rwf, pch=18, col = "navy")
points(363:372, pred.snaive, pch=17, col = "purple")
##
lines(363:372, pred.mv, lty=2, col = "red")
lines(363:372, pred.snaive, lty=2, col = "purple")
lines(363:372, pred.naive, lty=2, col = "blue")
lines(363:372, pred.rwf, lty=2, col = "navy")
## 
legend("topright", c("moving average", "naive", "drift", "seasonal naive"),
       col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
       bty="n", cex = 0.8)

## Accuracy checks To measure and compare the accuracy of the 4 forecasting methods, i will use the mean absolute prediction error.

true.value = Grocery[363:372]
PE.mv =  100*(true.value - pred.mv)/true.value
PE.naive =  100*(true.value - pred.naive)/true.value
PE.snaive =  100*(true.value - pred.snaive)/true.value
PE.rwf =  100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")

Overall performance of the four forecasting methods
	MAPE	MAD	MSE
Moving Average	42.025209	304740.57	930341432
Naive	3.918975	28628.00	9868912
Seasonal Naive	8.237912	59621.00	35913534
Drift	3.048405	22262.02	5901977

2.4 Summary and Conclusion

I analyzed national grocery store sales from January 1992 to December 2022. The data was split into training and testing sets, and those sets were used to forecast future values, build a time series, and check accuracy measures for the forecasting methods used. The drift method worked best as it had the lowest prediction error of all 4 forecasting methods.

STA321 Week 11 Assignment: Analyzing Grocery Store sales using Time Series

Ian VanWright

11/12/2023