1 Introduction

For this report, I will be analyzing monthly national grocery store sales (in millions of dollars) in the United States from January 1992 to December of 2022. A time series model will be built and 4 different forecasting methods will be used to forecast future values and the accuracy metrics will be assessed for all 4 forecasting methods.

Grocery <- read.csv("STA321_GroceryStoreData.csv")[,-1]

2 Building the Time Series

2.1 Data Split

Since I am interested in testing for forecasting performance, I will split the data into a training set and a testing set. The last ten observations will be used in the testing set.

training = Grocery[1:362]
testing = Grocery[363:372]

Grocery.ts = ts(training, frequency = 12, start = c(1992, 1))

2.2 Forecasting table

The four forecasting methods i will use for building a forecasting table and for testing prediction accuracy for the time series model are moving average, naive, seasonal naive, and random walk. To forecast, i will use the training set to train and use the test set to forecast the next 10 months of sales.

pred.mv = meanf(Grocery.ts, h = 10)$mean
pred.naive = naive(Grocery.ts, h=10)$mean
pred.snaive = snaive(Grocery.ts, h=10)$mean
pred.rwf = rwf(Grocery.ts, h=10, drift = TRUE)$mean

pred.table = cbind(pred.mv = pred.mv,
                   pred.naive = pred.naive,
                   pred.snaive = pred.snaive,
                   pred.rwf = pred.rwf)
kable(pred.table, caption = "Forecasting Table")
Forecasting Table
pred.mv pred.naive pred.snaive pred.rwf
42007.74 69619 63837 69734.75
42007.74 69619 64366 69850.49
42007.74 69619 65018 69966.24
42007.74 69619 65595 70081.98
42007.74 69619 65656 70197.73
42007.74 69619 67330 70313.47
42007.74 69619 67712 70429.22
42007.74 69619 68341 70544.96
42007.74 69619 68398 70660.71
42007.74 69619 68944 70776.45

2.3 Time Series

Next, the time series will be generated with the 4 forecasting methods mentioned previously.

plot(340:372, Grocery[340:372], type="l", xlim=c(340,372), ylim=c(27000, 90000),
     xlab = "observation sequence",
     ylab = "Grocery Store Sales (Millions of Dollars)",
     main = "Monthly Grocery Store Sales and forecasting")
points(363:372, Grocery[363:372],pch=20)
##
points(363:372, pred.mv, pch=15, col = "red")
points(363:372, pred.naive, pch=16, col = "blue")
points(363:372, pred.rwf, pch=18, col = "navy")
points(363:372, pred.snaive, pch=17, col = "purple")
##
lines(363:372, pred.mv, lty=2, col = "red")
lines(363:372, pred.snaive, lty=2, col = "purple")
lines(363:372, pred.naive, lty=2, col = "blue")
lines(363:372, pred.rwf, lty=2, col = "navy")
## 
legend("topright", c("moving average", "naive", "drift", "seasonal naive"),
       col=c("red", "blue", "navy", "purple"), pch=15:18, lty=rep(2,4),
       bty="n", cex = 0.8)

## Accuracy checks To measure and compare the accuracy of the 4 forecasting methods, i will use the mean absolute prediction error.

true.value = Grocery[363:372]
PE.mv =  100*(true.value - pred.mv)/true.value
PE.naive =  100*(true.value - pred.naive)/true.value
PE.snaive =  100*(true.value - pred.snaive)/true.value
PE.rwf =  100*(true.value - pred.rwf)/true.value
##
MAPE.mv = mean(abs(PE.mv))
MAPE.naive = mean(abs(PE.naive))
MAPE.snaive = mean(abs(PE.snaive))
MAPE.rwf = mean(abs(PE.rwf))
##
MAPE = c(MAPE.mv, MAPE.naive, MAPE.snaive, MAPE.rwf)
## residual-based Error
e.mv = true.value - pred.mv
e.naive = true.value - pred.naive
e.snaive = true.value - pred.snaive
e.rwf = true.value - pred.rwf
## MAD
MAD.mv = sum(abs(e.mv))
MAD.naive = sum(abs(e.naive))
MAD.snaive = sum(abs(e.snaive))
MAD.rwf = sum(abs(e.rwf))
MAD = c(MAD.mv, MAD.naive, MAD.snaive, MAD.rwf)
## MSE
MSE.mv = mean((e.mv)^2)
MSE.naive = mean((e.naive)^2)
MSE.snaive = mean((e.snaive)^2)
MSE.rwf = mean((e.rwf)^2)
MSE = c(MSE.mv, MSE.naive, MSE.snaive, MSE.rwf)
##
accuracy.table = cbind(MAPE = MAPE, MAD = MAD, MSE = MSE)
row.names(accuracy.table) = c("Moving Average", "Naive", "Seasonal Naive", "Drift")
kable(accuracy.table, caption ="Overall performance of the four forecasting methods")
Overall performance of the four forecasting methods
MAPE MAD MSE
Moving Average 42.025209 304740.57 930341432
Naive 3.918975 28628.00 9868912
Seasonal Naive 8.237912 59621.00 35913534
Drift 3.048405 22262.02 5901977

2.4 Summary and Conclusion

I analyzed national grocery store sales from January 1992 to December 2022. The data was split into training and testing sets, and those sets were used to forecast future values, build a time series, and check accuracy measures for the forecasting methods used. The drift method worked best as it had the lowest prediction error of all 4 forecasting methods.