MATH-2014-Project.utf8

Abstract

This paper uses time series analysis technique to examine the best forecast the sales of the specific item of an exlcusive distributor of Frito Lay brand in the Philippines. Analysis showed that the sales is not seasonal but has increasing trend. ARIMA was employed and the best model found was ARIMA(4,1,1). It is forecasted that the increasing trend will continue for the rest of this year.

I. Introduction

This research aims to describe the bahavior of sales in pc of item Lay’s Sour Cream & Onion 6.5oz which is the top 1 contributor of expired items and create a time series model to forecast the sales for the next 5 months of the data. This study would help to minimize the expired items and loses of the company. The data analyzed is from January 2012 to September 2019 and was generated from SAP sales report of a private company.

II. Methodology

Source of Data

The data for monthly sales of the items were obtained from the exclusive distributor of Frito Lay products in the Philippines from January 2012 to September 2019. Monthly sales is the actual delivered order to the customer.

Time Series Modeling

In this research, the researchers utilized the ARIMA model or the Autoregressive Integrated Moving Average. ARIMA is one of the most powerful and commonly used time series model whose performance is comparable to to modern machine learning models. No data splitting was done to assess which ARIMA models is the best and to validate the model, instead the research relies on Akaike Information Criterion (AIC) in selecting the best model.

Analytic Tool

This research used the R language to run ARIMA functions. Particurly, the forecast package was utilized.

III. Results and Discussion

Data shown below is the top 5 contirbutor of expired items in the company. Item Lay’s Sour Cream & Onion 6.5oz was chosen to analyze and forecast the next 5 months.

Table 1: Top 5 contributor of Expired Items
Material Description	%ofReturns
Lay’s Sour Cream & Onion 6.5oz	8.734717
Ruffles Regular 6.5oz	7.832820
Lay’s Regular 6.5oz	3.601015
Doritos Smokin’ BBQ 198.4g	3.144970
Lay’ s Honey BBQ 6.5oz	2.667366

Figure 1 shows the time series plot of the monthly sales in PC of the item Lay’s Sour Cream & Onion 6.5oz. The plot shows that the sales is increasing.

Time Series Modeling

The researcher explored possible non-seasonal ARIMA models in finding the best model to fit the monthly sales of the item.

Model Identification

The data clearly is not stationary as it shows increasing trend. It was found using the Augment Dickey-Fuller Test that differencing at lag 1 is enough to make the data stationary (see R code).

Examining the ACF and PACF of the differenced data below, we see that there is spike in ACF indicating MA(1). While PACF dies down also after lag 4 which suggests AR(4).

The there are several seasonal ARIMA models that were tried and listed below:

ARIMA(0,0,1)
ARIMA(4,0,0)
ARIMA(1,1,1)
ARIMA(4,1,1)
ARIMA(3,1,1)
ARIMA(2,1,1)

The best model was determined using Akaike Information Criterion (AIC).

Model Building

Table 2 summarized the AIC of different models. We can see that the 4th model has the lowest AIC. All the parameters for ARIMA turned to be all significant, hence this ARIMA model ARIMA(4,1,1) is the best to fit the monthly sales of the item.

##             AIC
## model1 1518.108
## model2 1513.620
## model3 1511.246
## model4 1504.065
## model5 1512.420
## model6 1512.982

Model Diagnostic Checking

As seen in the R code, the residual is normally distributed based on Shapiro Wilk test, plotting against fitted values and order shows no sign of non-constant variance and serial dependencies. Hence, the residual of ARIMA (4,1,1) is gaussian white noise which means that the model is appropriate.

IV. Conclusion and Recommendation

Therefore, the best forecast of the next 5 months of sales of item Lay’s Sour Cream & Onion 6.5oz is given below. To minimize the expired quantity, it is recommended to ship these quantities from supplier for the next five months.

	Point Forecast	Lo 80	Hi 80	Lo 95	Hi 95
Oct 2019	3050.773	2098.748	4002.797	1594.7761	4506.769
Nov 2019	3101.752	2143.619	4059.885	1636.4138	4567.090
Dec 2019	2739.732	1781.595	3697.869	1274.3874	4205.076
Jan 2020	2368.536	1405.246	3331.827	895.3106	3841.762
Feb 2020	2431.771	1450.298	3413.244	930.7370	3932.804

R Code

Load necessary packages

Import Data

dataR<-read_excel("data1.xlsx",sheet="Returns")

kable(dataR[c(1:5),c("Material Description","%ofReturns")],type = "html", caption = "Table 1: Top 5 contributor of Expired Items")

Table 1: Top 5 contributor of Expired Items
Material Description	%ofReturns
Lay’s Sour Cream & Onion 6.5oz	8.734717
Ruffles Regular 6.5oz	7.832820
Lay’s Regular 6.5oz	3.601015
Doritos Smokin’ BBQ 198.4g	3.144970
Lay’ s Honey BBQ 6.5oz	2.667366

Graph shows increasing trend and indicates no seasonality.

Acf(data1_sales_ts)

Pacf(data1_sales_ts)

Clearly, the data is increasing hence not stationary. ACF is showing the symptoms that data is indeed non-stationary. We need to difference it to make it stationary. Let’s difference it at lag 1 first.

Ddata1<-diff(data1_sales_ts,1)

autoplot(Ddata1)

Seems the data is stationary already. Let’s used ADF test to confirm.

adf.test(Ddata1)

## Warning in adf.test(Ddata1): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Ddata1
## Dickey-Fuller = -8.4477, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

Since the p-vaue is less than 0.05, we could confidently say that the differenced data is already stationary. Let’s check ACF and PACF.

Acf(Ddata1,lag.max=48)

Pacf(Ddata1,lag.max=48)

Seems d=1 is necessary and enough to make it stationary. There is spike in ACF suggesting MA(1), but we can also try AR(4).

model1<-Arima(data1_sales_ts, order = c(0,0,1))
model2<-Arima(data1_sales_ts, order = c(4,0,0))
model3<-Arima(data1_sales_ts, order = c(4,1,1))
model4<-Arima(data1_sales_ts, order = c(3,1,1))
model5<-Arima(data1_sales_ts, order = c(2,1,1))
model6<-Arima(data1_sales_ts, order = c(1,1,1))

model1$aic

## [1] 1518.108

model2$aic

## [1] 1513.62

model3$aic

## [1] 1486.266

model4$aic

## [1] 1485.319

model5$aic

## [1] 1493.841

model6$aic

## [1] 1494.578

Based on AIC, model 4 is better. Let’s check significance of parameters:

require(lmtest)
coeftest(model4)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.13520    0.11315 -1.1948 0.2321496    
## ar2 -0.25754    0.10168 -2.5329 0.0113128 *  
## ar3 -0.36433    0.10408 -3.5005 0.0004643 ***
## ma1 -0.71908    0.08143 -8.8306 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All parameters are significant. Hence, we will select this as our final model. Let’s proceed to model diagnostic.

res <- as.numeric(model4$residuals)
fit <- as.numeric(model4$fitted)

Checking the normality of the residuals.

hist(res,xlab="Residuals", ylab="Frequency")

qqnorm(res)
qqline(res, col="red")

shapiro.test(res)

## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.97923, p-value = 0.145

Checking res versus fit for constant variance.

plot(fit, res)
abline(h=0, col="red")

Checking res vs order for residual independence.

plot(res)

Acf(res)

Pacf(res)

Forcasting of Product Sales Using ARIMA

Forcasting of Product Sales Using ARIMA

Joseph D. Alejo

Joseph D. Alejo

Abstract

I. Introduction

II. Methodology

Source of Data

Time Series Modeling

Analytic Tool

III. Results and Discussion

Time Series Modeling

Model Identification

Model Building

Model Diagnostic Checking

IV. Conclusion and Recommendation

R Code

Import Data