This paper uses time series analysis technique to examine the best forecast the sales of the specific item of an exlcusive distributor of Frito Lay brand in the Philippines. Analysis showed that the sales is not seasonal but has increasing trend. ARIMA was employed and the best model found was ARIMA(4,1,1). It is forecasted that the increasing trend will continue for the rest of this year.
This research aims to describe the bahavior of sales in pc of item Lay’s Sour Cream & Onion 6.5oz which is the top 1 contributor of expired items and create a time series model to forecast the sales for the next 5 months of the data. This study would help to minimize the expired items and loses of the company. The data analyzed is from January 2012 to September 2019 and was generated from SAP sales report of a private company.
The data for monthly sales of the items were obtained from the exclusive distributor of Frito Lay products in the Philippines from January 2012 to September 2019. Monthly sales is the actual delivered order to the customer.
In this research, the researchers utilized the ARIMA model or the Autoregressive Integrated Moving Average. ARIMA is one of the most powerful and commonly used time series model whose performance is comparable to to modern machine learning models. No data splitting was done to assess which ARIMA models is the best and to validate the model, instead the research relies on Akaike Information Criterion (AIC) in selecting the best model.
This research used the R language to run ARIMA functions. Particurly, the forecast package was utilized.
Data shown below is the top 5 contirbutor of expired items in the company. Item Lay’s Sour Cream & Onion 6.5oz was chosen to analyze and forecast the next 5 months.
| Material Description | %ofReturns |
|---|---|
| Lay’s Sour Cream & Onion 6.5oz | 8.734717 |
| Ruffles Regular 6.5oz | 7.832820 |
| Lay’s Regular 6.5oz | 3.601015 |
| Doritos Smokin’ BBQ 198.4g | 3.144970 |
| Lay’ s Honey BBQ 6.5oz | 2.667366 |
Figure 1 shows the time series plot of the monthly sales in PC of the item Lay’s Sour Cream & Onion 6.5oz. The plot shows that the sales is increasing.
The researcher explored possible non-seasonal ARIMA models in finding the best model to fit the monthly sales of the item.
The data clearly is not stationary as it shows increasing trend. It was found using the Augment Dickey-Fuller Test that differencing at lag 1 is enough to make the data stationary (see R code).
Examining the ACF and PACF of the differenced data below, we see that there is spike in ACF indicating MA(1). While PACF dies down also after lag 4 which suggests AR(4).
The there are several seasonal ARIMA models that were tried and listed below:
The best model was determined using Akaike Information Criterion (AIC).
Table 2 summarized the AIC of different models. We can see that the 4th model has the lowest AIC. All the parameters for ARIMA turned to be all significant, hence this ARIMA model ARIMA(4,1,1) is the best to fit the monthly sales of the item.
## AIC
## model1 1518.108
## model2 1513.620
## model3 1511.246
## model4 1504.065
## model5 1512.420
## model6 1512.982
As seen in the R code, the residual is normally distributed based on Shapiro Wilk test, plotting against fitted values and order shows no sign of non-constant variance and serial dependencies. Hence, the residual of ARIMA (4,1,1) is gaussian white noise which means that the model is appropriate.
Therefore, the best forecast of the next 5 months of sales of item Lay’s Sour Cream & Onion 6.5oz is given below. To minimize the expired quantity, it is recommended to ship these quantities from supplier for the next five months.
| Point Forecast | Lo 80 | Hi 80 | Lo 95 | Hi 95 | |
|---|---|---|---|---|---|
| Oct 2019 | 3050.773 | 2098.748 | 4002.797 | 1594.7761 | 4506.769 |
| Nov 2019 | 3101.752 | 2143.619 | 4059.885 | 1636.4138 | 4567.090 |
| Dec 2019 | 2739.732 | 1781.595 | 3697.869 | 1274.3874 | 4205.076 |
| Jan 2020 | 2368.536 | 1405.246 | 3331.827 | 895.3106 | 3841.762 |
| Feb 2020 | 2431.771 | 1450.298 | 3413.244 | 930.7370 | 3932.804 |
Load necessary packages
dataR<-read_excel("data1.xlsx",sheet="Returns")
kable(dataR[c(1:5),c("Material Description","%ofReturns")],type = "html", caption = "Table 1: Top 5 contributor of Expired Items")
| Material Description | %ofReturns |
|---|---|
| Lay’s Sour Cream & Onion 6.5oz | 8.734717 |
| Ruffles Regular 6.5oz | 7.832820 |
| Lay’s Regular 6.5oz | 3.601015 |
| Doritos Smokin’ BBQ 198.4g | 3.144970 |
| Lay’ s Honey BBQ 6.5oz | 2.667366 |
Graph shows increasing trend and indicates no seasonality.
Acf(data1_sales_ts)
Pacf(data1_sales_ts)
Clearly, the data is increasing hence not stationary. ACF is showing the symptoms that data is indeed non-stationary. We need to difference it to make it stationary. Let’s difference it at lag 1 first.
Ddata1<-diff(data1_sales_ts,1)
autoplot(Ddata1)
Seems the data is stationary already. Let’s used ADF test to confirm.
adf.test(Ddata1)
## Warning in adf.test(Ddata1): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: Ddata1
## Dickey-Fuller = -8.4477, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
Since the p-vaue is less than 0.05, we could confidently say that the differenced data is already stationary. Let’s check ACF and PACF.
Acf(Ddata1,lag.max=48)
Pacf(Ddata1,lag.max=48)
Seems d=1 is necessary and enough to make it stationary. There is spike in ACF suggesting MA(1), but we can also try AR(4).
model1<-Arima(data1_sales_ts, order = c(0,0,1))
model2<-Arima(data1_sales_ts, order = c(4,0,0))
model3<-Arima(data1_sales_ts, order = c(4,1,1))
model4<-Arima(data1_sales_ts, order = c(3,1,1))
model5<-Arima(data1_sales_ts, order = c(2,1,1))
model6<-Arima(data1_sales_ts, order = c(1,1,1))
model1$aic
## [1] 1518.108
model2$aic
## [1] 1513.62
model3$aic
## [1] 1486.266
model4$aic
## [1] 1485.319
model5$aic
## [1] 1493.841
model6$aic
## [1] 1494.578
Based on AIC, model 4 is better. Let’s check significance of parameters:
require(lmtest)
coeftest(model4)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.13520 0.11315 -1.1948 0.2321496
## ar2 -0.25754 0.10168 -2.5329 0.0113128 *
## ar3 -0.36433 0.10408 -3.5005 0.0004643 ***
## ma1 -0.71908 0.08143 -8.8306 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
All parameters are significant. Hence, we will select this as our final model. Let’s proceed to model diagnostic.
res <- as.numeric(model4$residuals)
fit <- as.numeric(model4$fitted)
Checking the normality of the residuals.
hist(res,xlab="Residuals", ylab="Frequency")
qqnorm(res)
qqline(res, col="red")
shapiro.test(res)
##
## Shapiro-Wilk normality test
##
## data: res
## W = 0.97923, p-value = 0.145
Checking res versus fit for constant variance.
plot(fit, res)
abline(h=0, col="red")
Checking res vs order for residual independence.
plot(res)
Acf(res)
Pacf(res)