Abstract

This paper uses time series analysis technique to examine the best forecast the sales of the specific item of an exlcusive distributor of Frito Lay brand in the Philippines. Analysis showed that the sales is not seasonal but has increasing trend. ARIMA was employed and the best model found was ARIMA(4,1,1). It is forecasted that the increasing trend will continue for the rest of this year.


I. Introduction

This research aims to describe the bahavior of sales in pc of item Lay’s Sour Cream & Onion 6.5oz which is the top 1 contributor of expired items and create a time series model to forecast the sales for the next 5 months of the data. This study would help to minimize the expired items and loses of the company. The data analyzed is from January 2012 to September 2019 and was generated from SAP sales report of a private company.


II. Methodology

Source of Data

The data for monthly sales of the items were obtained from the exclusive distributor of Frito Lay products in the Philippines from January 2012 to September 2019. Monthly sales is the actual delivered order to the customer.

Time Series Modeling

In this research, the researchers utilized the ARIMA model or the Autoregressive Integrated Moving Average. ARIMA is one of the most powerful and commonly used time series model whose performance is comparable to to modern machine learning models. No data splitting was done to assess which ARIMA models is the best and to validate the model, instead the research relies on Akaike Information Criterion (AIC) in selecting the best model.

Analytic Tool

This research used the R language to run ARIMA functions. Particurly, the forecast package was utilized.


III. Results and Discussion

Data shown below is the top 5 contirbutor of expired items in the company. Item Lay’s Sour Cream & Onion 6.5oz was chosen to analyze and forecast the next 5 months.

Table 1: Top 5 contributor of Expired Items
Material Description %ofReturns
Lay’s Sour Cream & Onion 6.5oz 8.734717
Ruffles Regular 6.5oz 7.832820
Lay’s Regular 6.5oz 3.601015
Doritos Smokin’ BBQ 198.4g 3.144970
Lay’ s Honey BBQ 6.5oz 2.667366

Figure 1 shows the time series plot of the monthly sales in PC of the item Lay’s Sour Cream & Onion 6.5oz. The plot shows that the sales is increasing.

Time Series Modeling

The researcher explored possible non-seasonal ARIMA models in finding the best model to fit the monthly sales of the item.

Model Identification

The data clearly is not stationary as it shows increasing trend. It was found using the Augment Dickey-Fuller Test that differencing at lag 1 is enough to make the data stationary (see R code).

Examining the ACF and PACF of the differenced data below, we see that there is spike in ACF indicating MA(1). While PACF dies down also after lag 4 which suggests AR(4).

The there are several seasonal ARIMA models that were tried and listed below:

  1. ARIMA(0,0,1)
  2. ARIMA(4,0,0)
  3. ARIMA(1,1,1)
  4. ARIMA(4,1,1)
  5. ARIMA(3,1,1)
  6. ARIMA(2,1,1)

The best model was determined using Akaike Information Criterion (AIC).

Model Building

Table 2 summarized the AIC of different models. We can see that the 4th model has the lowest AIC. All the parameters for ARIMA turned to be all significant, hence this ARIMA model ARIMA(4,1,1) is the best to fit the monthly sales of the item.

##             AIC
## model1 1518.108
## model2 1513.620
## model3 1511.246
## model4 1504.065
## model5 1512.420
## model6 1512.982

Model Diagnostic Checking

As seen in the R code, the residual is normally distributed based on Shapiro Wilk test, plotting against fitted values and order shows no sign of non-constant variance and serial dependencies. Hence, the residual of ARIMA (4,1,1) is gaussian white noise which means that the model is appropriate.


IV. Conclusion and Recommendation

Therefore, the best forecast of the next 5 months of sales of item Lay’s Sour Cream & Onion 6.5oz is given below. To minimize the expired quantity, it is recommended to ship these quantities from supplier for the next five months.

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Oct 2019 3050.773 2098.748 4002.797 1594.7761 4506.769
Nov 2019 3101.752 2143.619 4059.885 1636.4138 4567.090
Dec 2019 2739.732 1781.595 3697.869 1274.3874 4205.076
Jan 2020 2368.536 1405.246 3331.827 895.3106 3841.762
Feb 2020 2431.771 1450.298 3413.244 930.7370 3932.804



R Code

Load necessary packages

Import Data

dataR<-read_excel("data1.xlsx",sheet="Returns")
kable(dataR[c(1:5),c("Material Description","%ofReturns")],type = "html", caption = "Table 1: Top 5 contributor of Expired Items")
Table 1: Top 5 contributor of Expired Items
Material Description %ofReturns
Lay’s Sour Cream & Onion 6.5oz 8.734717
Ruffles Regular 6.5oz 7.832820
Lay’s Regular 6.5oz 3.601015
Doritos Smokin’ BBQ 198.4g 3.144970
Lay’ s Honey BBQ 6.5oz 2.667366

Graph shows increasing trend and indicates no seasonality.

Acf(data1_sales_ts)

Pacf(data1_sales_ts)

Clearly, the data is increasing hence not stationary. ACF is showing the symptoms that data is indeed non-stationary. We need to difference it to make it stationary. Let’s difference it at lag 1 first.

Ddata1<-diff(data1_sales_ts,1)
autoplot(Ddata1)

Seems the data is stationary already. Let’s used ADF test to confirm.

adf.test(Ddata1)
## Warning in adf.test(Ddata1): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  Ddata1
## Dickey-Fuller = -8.4477, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

Since the p-vaue is less than 0.05, we could confidently say that the differenced data is already stationary. Let’s check ACF and PACF.

Acf(Ddata1,lag.max=48)

Pacf(Ddata1,lag.max=48)

Seems d=1 is necessary and enough to make it stationary. There is spike in ACF suggesting MA(1), but we can also try AR(4).

model1<-Arima(data1_sales_ts, order = c(0,0,1))
model2<-Arima(data1_sales_ts, order = c(4,0,0))
model3<-Arima(data1_sales_ts, order = c(4,1,1))
model4<-Arima(data1_sales_ts, order = c(3,1,1))
model5<-Arima(data1_sales_ts, order = c(2,1,1))
model6<-Arima(data1_sales_ts, order = c(1,1,1))
model1$aic
## [1] 1518.108
model2$aic
## [1] 1513.62
model3$aic
## [1] 1486.266
model4$aic
## [1] 1485.319
model5$aic
## [1] 1493.841
model6$aic
## [1] 1494.578

Based on AIC, model 4 is better. Let’s check significance of parameters:

require(lmtest)
coeftest(model4)
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.13520    0.11315 -1.1948 0.2321496    
## ar2 -0.25754    0.10168 -2.5329 0.0113128 *  
## ar3 -0.36433    0.10408 -3.5005 0.0004643 ***
## ma1 -0.71908    0.08143 -8.8306 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All parameters are significant. Hence, we will select this as our final model. Let’s proceed to model diagnostic.

res <- as.numeric(model4$residuals)
fit <- as.numeric(model4$fitted)

Checking the normality of the residuals.

hist(res,xlab="Residuals", ylab="Frequency")

qqnorm(res)
qqline(res, col="red")

shapiro.test(res)
## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.97923, p-value = 0.145

Checking res versus fit for constant variance.

plot(fit, res)
abline(h=0, col="red")

Checking res vs order for residual independence.

plot(res)

Acf(res)

Pacf(res)