1 Executive Summary

1.1 Abstract

This report discuss suitable methods to forecast the demand for the following products -

Red Bull Energy Drink
Hershey’s Chocolate Bar
Heinz Tomato Ketchup

The methods have been based upon historical sales data and analysis of external factors which have an influence on the sales of the products.

1.2 Problem Statement

Sales forecasts provide an input to business planning
Efficiency in supply chain can cut costs and increase profits
Company requires store-level forecasts for a horizon of 15 days to increase the efficiency of their operations

1.3 Project Goals

Understand the historical demand for the products in the last 2 years
Analyze the daily sales data and understand influence of external factors
Build predictive ML model to obtain forecasts for a 15 - day horizon
Create a web-based UI to produce forecasts

1.4 Methods

Analyzed past 2 year’s sales data for the products (Jan 2015 - Dec 2016)
Performed data analysis to identify outliers, impute missing values and attribute selection
Built predictive models using sales data and influential external factors
Mean Absolute Percentage Error (MAPE) used to evaluate the performance in comparison to baseline models
Analysis performed using R, an open source statistical language

1.5 Outcome

Forecast Accuracy for each product -

Product	Model Accuracy (in %)
o.k. Energy-drink 25CL	91.02
Snickers 50G	67.73
Murren Ch-Poulet Tartarsauce 205G	71.21

The selected models will provide reliable forecasts for the next 4 months. Re-training will be required thereafter with the latest sales data.

2 Data Handling

2.1 Importing Data

Following data files were imported for the analysis:

Sales_Final.csv - historical sales data including promotion details (Jan 01, 2015 - Dec 07, 2016)
Holiday.csv - holiday data (Jan 2014 - Dec 2017)
Lotto_Final.csv - Swiss Lotto, Joker Lotto and Euro Millions Jackpot data (Jan 2015 - Dec 2017)
Weather.csv - weather data (Jan 01, 2015 - Dec 07, 2016)

The data set was created by cleaning each file independently and merging the files on the date attribute.

The merged data set was subsetted for each product containing data for the period Jan 01, 2015 - Dec 07, 2016.

2.2 Cleaning Data

The following are some of the major pre-processing steps performed to the data set:

Product SKey 615783 was missing sales data for 21 days. Is_Promotion was imputed with N and missing sales values were imputed with the median of the month & year.
Holiday data set had all attributes amalgamated in one column. Text processing was performed to split the attributes into multiple columns.
A back-fill was performed on the missing Jackpot amounts for days when no result was declared, i.e., missing amounts were filled with the latest value.
Missing values in the weather data were imputed with the median of the month.
Outliers present in Product SKey 615783 were imputed with median of the sales quantity of the product.
New attributes were created by splitting Date_BKey variable into date, month, year and weekday.
Data types of various attributes were converted before using them in the model.

2.3 Exploring data through Visualizations

The distribution of the sales values in the data were as below.

Red Bull Energy Drink (102443)

Figure 1: Distribution for Sales Quantity of Red Bull Energy Drink

The distribution is slightly deviated from normal with a skewness towards the right and has characteristics of weekly seasonality are present.

Hershey’s Chocolate Bar (615783)

Figure 2: Distribution for Sales Quantity of Hershey’s Chocolate Bar

The distribution is deviated from normal with a skewness towards the right and has a long tail. No characteristics of seasonality are present.

Heinz Tomato Ketchup (758078)

Figure 3: Distribution for Sales Quantity of Heinz Tomato Ketchup

The distribution is normal without any skewness and has characteristics of weekly seasonality are present.

Outlier values were identified for Product SKey 615783.

The distribution of the outlier values in the data were as below.

Figure 4: Outlier Handling for Sales Quantity of Hershey’s Chocolate Bar (615783)

3 Data Analysis

3.1 Creation of Test Data

To predict the Sales Quantity for the next 15 days, the models needed data for the Independent/Input variables for at least 1.5 years.

The following steps were used to generate the Test Data :

Data was extracted from the processed Data set for the past 2 years: 2015 and 2016.
Highly correlated numerical input variables were handled to avoid multi-collinearity.
Feature selection was performed using Random Forest model and only the most influential input variables were retained for each product.

Figure 5: Variable Importance Plots

Log was applied to Sales Quantity for Product SKey 102443 to stabilize the series on variance.

Below are snapshots of the created test data for the three products:

Red Bull Energy Drink (102443)

Date_Bkey	Sales_Quantity	Is_Promotion	Date	DayofWeek	Month	Year	Is_Holiday	MeanTempC	Swiss_Lotto_Amount	Joker_Amount
2015-01-01	296	N	1	Thu	Jan	2015	Y	-6	9600000	70000
2015-01-02	380	N	2	Fri	Jan	2015	Y	-2	9600000	70000
2015-01-03	460	N	3	Sat	Jan	2015	N	4	9600000	70000
2015-01-04	357	N	4	Sun	Jan	2015	N	1	10300000	80000
2015-01-05	578	N	5	Mon	Jan	2015	N	-2	10300000	80000

Hershey’s Chocolate Bar (615783)

Date_Bkey	Sales_Quantity	Date	DayofWeek	Month	Year	MeanTempC	Swiss_Lotto_Amount	Joker_Amount	Euro_Millions_Amount
2015-01-01	8	1	Thu	Jan	2015	-6	9600000	70000	1.8e+07
2015-01-02	11	2	Fri	Jan	2015	-2	9600000	70000	1.8e+07
2015-01-03	8	3	Sat	Jan	2015	4	9600000	70000	1.8e+07
2015-01-04	7	4	Sun	Jan	2015	1	10300000	80000	2.6e+07
2015-01-05	13	5	Mon	Jan	2015	-2	10300000	80000	2.6e+07

Heinz Tomato Ketchup (758078)

Date_Bkey	Sales_Quantity	Date	DayofWeek	Month	Year	MeanTempC	MeanSeaLPressurehPa	Swiss_Lotto_Amount	Joker_Amount
2015-01-01	15	1	Thu	Jan	2015	-6	1037	9600000	70000
2015-01-02	10	2	Fri	Jan	2015	-2	1036	9600000	70000
2015-01-03	9	3	Sat	Jan	2015	4	1028	9600000	70000
2015-01-04	6	4	Sun	Jan	2015	1	1032	10300000	80000
2015-01-05	11	5	Mon	Jan	2015	-2	1034	10300000	80000

3.2 Training and Validation sets

For Time series models, Training set consisted of approximately 18 months (2015, 2016 - First 6 months) and Validation consisted of the next 15 days from the end of training set.
For non-Time series models (Random Forest, XGBoost, etc), Test data was randomly distributed to Training set and Validation set in the ratio 70:30 for Product SKey 102443 and 80:20 for Product SKey 615783 & 758078 with seed set as 1234.

3.3 Predictive Modelling

The following variables were considered for model building:

Red Bull Energy Drink (102443)
- Training set
  - X = Is Promotion, Date, DayofWeek, Month, Year, Is Holiday, MeanTempC, SwissLotto Amount, Joker Amount
  - Y = Sales Quantity
- Validation set
  - X = Is Promotion, Date, DayofWeek, Month, Year, Is Holiday, MeanTempC, SwissLotto Amount, Joker Amount
  - Y = PredSales Quantity
Hershey’s Chocolate Bar (615783)
- Training set
  - X = Date, DayofWeek, Month, Year, MeanTempC, SwissLotto Amount, Joker Amount, EuroMillions Amount
  - Y = Sales Quantity
- Validation set
  - X = Date, DayofWeek, Month, Year, MeanTempC, SwissLotto Amount, Joker Amount, EuroMillions Amount
  - Y = PredSales Quantity
Heinz Tomato Ketchup (758078)
- Training set
  - X = Date, DayofWeek, Month, Year, MeanTempC, MeanSeaLPressurehPa, SwissLotto Amount, Joker Amount
  - Y = Sales Quantity
- Validation set
  - X = Date, DayofWeek, Month, Year, MeanTempC, MeanSeaLPressurehPa, SwissLotto Amount, Joker Amount
  - Y = PredSales Quantity

Baseline Forecasting Models were applied to the variable Sales Quantity obtain standard results for each product,

Meanf Model
Naive Model
Random Walk with drift

Time Series Forecasting Models were applied to the variable Sales Quantity to obtain optimal results for each product,

Holt’s Linear Model (on Seasonally Adjusted component)
Non - seasonal Auto ARIMA Model (on Seasonally Adjusted component)
Non - seasonal ARIMA Model (on Seasonally Adjusted component) : Manual Search Tuning Method
Holt - Winter’s Model
STL + ARIMA Model
STL + Exponential Smoothing (ETS) Model
Seasonal Auto ARIMA Model
Seasonal ARIMA Model : Manual Search Tuning Method

Predictive Machine Learning Models were applied to selected input variables obtain optimal results for each product,

Random Forest Model (Basic Model)
Random Forest Model : Manual Search Tuning Method
Random Forest + GLMBoosting Model
XGBoost Model (Basic Model)
XGBoost Model : Manual Search Tuning Method
XGBoost Model : Grid Search Tuning Method

The following models were selected for predicting sales of each product:

Red Bull Energy Drink (102443)

Random Forest + GLMBoosting Model with the optimal values for Number of trees = 800, Min size of terminal nodes = 20 and Number of variables tried at each split = 6 is selected as the final model.

The output of the final Random Forest with GLMBoosting Model -

## 
## Call:
##  randomForest(formula = log(Sales_Quantity) ~ ., data = train.set,      ntree = 800, mtry = 6, replace = TRUE, nodesize = 20, keep.inbag = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 800
## No. of variables tried at each split: 6
## 
##           Mean of squared residuals: 0.01819294
##                     % Var explained: 66.52

## 
##   Generalized Linear Models Fitted via Gradient Boosting
## 
## Call:
## glmboost.formula(formula = train.set.Sales_Quantity ~ ., data = df)
## 
## 
##   Squared Error (Regression) 
## 
## Loss function: (y - f)^2 
##  
## 
## Number of boosting iterations: mstop = 100 
## Step size:  0.1 
## Offset:  557.6012 
## 
## Coefficients: 
## (Intercept)     pred.rf 
## -639.058094    1.157336 
## attr(,"offset")
## [1] 557.6012

Hershey’s Chocolate Bar (615783)

XGBoost Model: Grid Search Tuning Method with the optimal values for Learning rate = 0.1, Min loss reduction for partition = 5 and Min nodes in a leaf = 20 is selected as the final model.

The output of the final XGBoost Model -

## ##### xgb.Booster
## raw: 41.4 Kb 
## call:
##   xgb.train(params = params, data = dtrain, nrounds = 1000, watchlist = watchlist, 
##     verbose = 0, early_stopping_rounds = 10, maximize = F)
## params (as set within xgb.train):
##   objective = "reg:linear", booster = "gbtree", eta = "0.1", gamma = "5", lambda = "0", max_depth = "6", min_child_weight = "20", subsample = "1", colsample_bytree = "1", silent = "1"
## xgb.attributes:
##   best_iteration, best_msg, best_ntreelimit, best_score, niter
## callbacks:
##   cb.early.stop(stopping_rounds = early_stopping_rounds, maximize = maximize, 
##     verbose = verbose)
## niter: 32
## best_iteration: 22
## best_ntreelimit: 22
## best_score: 36.93789

Heinz Tomato Ketchup (758078)

STL + Exponential Smoothing (ETS) Model is selected as the final model.

The output of the STL + ETS Model -

## ETS(A,N,N) 
## 
## Call:
##  ets(y = x, model = etsmodel, allow.multiplicative.trend = allow.multiplicative.trend) 
## 
##   Smoothing parameters:
##     alpha = 0.0425 
## 
##   Initial states:
##     l = 9.637 
## 
##   sigma:  2.7076
## 
##      AIC     AICc      BIC 
## 4702.519 4702.562 4715.525

4 Results

4.1 Performance on Validation set

Red Bull Energy Drink (102443)

The Performance of various models built for Product SKey 102443 -

Figure 6: Model Performance for Red Bull Energy Drink (102443)

Forecast Accuracy for validation set on predictions obtained using Random Forest + GLMBoosting Model:

## [1] "91.02 %"

The following time series chart plots Actual vs Predicted Sales for the test data set:

Figure 7: Actual and Predicted Sales for Red Bull Energy Drink (102443)

Hershey’s Chocolate Bar (615783)

The Performance of various models built for Product SKey 615783 -

Figure 8: Model Performance for Snickers (615783)

Forecast Accuracy for validation set on predictions obtained using XGBoost Model:

## [1] "67.73 %"

The following time series chart plots Actual vs Predicted Sales for the test data set:

Figure 9: Actual and Predicted Sales for Snickers (615783)

Heinz Tomato Ketchup (758078)

The Performance of various models built for Product SKey 758078 -

Figure 10: Model Performance for Heinz Tomato Ketchup (758078)

Forecast Accuracy for validation set on predictions obtained using STL + Exponential Smoothing (ETS) Model:

## [1] "71.21 %"

The following time series chart plots Actual vs Predicted Sales for the test data set:

Figure 11: Actual and Predicted Sales for Heinz Tomato Ketchup (758078)

4.2 Sales Forecasts

The following are the sales forecasts for each product for the next 15 days (Dec 08,2016 - Dec 22,2016) obtained from the selected models:

Red Bull Energy Drink (102443)

Figure 12: Sales Forecasts for Red Bull Energy Drink (102443)

Hershey’s Chocolate Bar (615783)

Figure 13: Sales Forecasts for Snickers (615783)

Heinz Tomato Ketchup (758078)

Figure 14: Sales Forecasts for Heinz Tomato Ketchup (758078)

5 Conclusion

The Machine Learning Models resulted in the following Forecast Accuracy for each product in comparison to Baseline Models:

Product	Baseline Accuracy (in %)	Model Accuracy (in %)	Increase in Accuracy (in %)
o.k. Energy-drink 25CL	85.02	91.02	7.06
Snickers 50G	36.58	67.73	85.18
Murren Ch-Poulet Tartarsauce 205G	58.24	71.21	22.26

These models will provide reliable forecasts for the next 4 months. It will be required to re-train the models thereafter.
Accuracy can be further improved by collecting additional data, Feature Selection, and Model tuning.
Forecasts from the Models will increase company’s operational efficiency and result in greater returns.
Data Science could provide a key input in defining the demand for various products sold across multiple business locations worldwide.

Sales Forecasting

Amol Modi

Aug 01, 2017

1 Executive Summary

1.1 Abstract

1.2 Problem Statement

1.3 Project Goals

1.4 Methods

1.5 Outcome

2 Data Handling

2.1 Importing Data

2.2 Cleaning Data

2.3 Exploring data through Visualizations

3 Data Analysis

3.1 Creation of Test Data

3.2 Training and Validation sets

3.3 Predictive Modelling

4 Results

4.1 Performance on Validation set

4.2 Sales Forecasts

5 Conclusion