1. Introduction

Retailing involves the process of selling consumer goods or services to customers through multiple channels of distribution to earn a profit. The term “retailer” is typically applied where a service provider fills the small orders of a large number of individuals, who are end-users, rather than large orders of a small number of wholesale, corporate or government clientele.

One such service provider is ABC Private Limited. It sells products belonging to different categories to different customers. Being an offline store, it has its own advantages of providing the availablity of checking the great variety of products to the customers in real-time and physically . However, with the e-commerce industry growing tremendously and other retail companies resorting to Analytics to devise better marketing strategies, the company is faced with a lot of competition.

2. Overview of the Study

The retail company is catering to the needs of a wide customer base. The demographics of the customers has been studied and recorded. Gender, Age Group, Occupation, Kind of City the customer dwells in, Marital Status, Number of Years Spent in the Current City are the pieces of information gathered from the customer when he or she makes a purchase. Each product put for sale is categorised into at least one category and at most three. The above information can be analysed to make a prediction on how much will the customer spend while shopping at ABC

3. An empirical field study of ABC Private Limited

3.1 Overview

Now, the company wants to understand the customer purchase behaviour (specifically, purchase amount) against various products of different categories. They have shared purchase summary of various customers for selected high volume products from last month. The data set also contains customer demographics (age, gender, marital status, city_type, stay_in_current_city), product details (product_id and product category) and Total purchase_amount from last month. They want to build a model to predict the purchase amount of customer against various products which will help them to create personalized offer for customers against different products. Accordingly, we construct the following hypothesis:

Hypothesis: Which customer demographics explain the average amount of purchase made by the customer better?

3.2 Data

For this study, we collected data from the website https://datahack.analyticsvidhya.com/contest/black-friday/. The variables in the data set has the following definitons:

User_ID - User ID

Product_ID - Product ID

Gender - Sex of User

Age - Age in bins

Occupation - Occupation (Masked)

City_Category - Category of the City (A,B,C)

Stay_In_Current_City_Years - Number of years stay in current city

Marital_Status - Marital Status

Product_Category_1 - Product Category (Masked)

Product_Category_2 - Product may belongs to other category also (Masked)

Product_Category_3 - Product may belongs to other category also (Masked)

Purchase - Purchase Amount (Target Variable)

3.3 Model

In order to test the hypothesis, we proposed the following model:

\[Price= \alpha_0 + \alpha_1 GenderM + \alpha_2 Age18-25 + \alpha_3 Age26-35 + \alpha_4 Age36-45 + \alpha_5 Age46-50 + \alpha_6 Age51-55 + \alpha_7 Age55+ + \alpha_8 Occupation + \alpha_9 City_CategoryB + \alpha_10 City_CategoryC + \alpha_11 Stay_In_Current_City_Years1 + \alpha_12 Stay_In_Current_City_Years2 + \alpha_13 Stay_In_Current_City_Years3 + \alpha_14 Stay_In_Current_City_Years4 + \alpha_15 Marital_Status1 + \alpha_16 Product_Category_1 + \alpha_17 Product_Category_2 + \alpha_18 Product_Category_3 + \epsilon \]

setwd("C:/Users/Dell/Desktop/Project/Capstone Project")
ABC_train=read.csv("Training Data Set.csv")
ABC_train$Marital_Status=factor(ABC_train$Marital_Status)
m1=lm(Purchase~Gender+Age+Occupation+City_Category+Stay_In_Current_City_Years+Marital_Status+Product_Category_1+Product_Category_2+Product_Category_3,data=ABC_train)
summary(m1)
## 
## Call:
## lm(formula = Purchase ~ Gender + Age + Occupation + City_Category + 
##     Stay_In_Current_City_Years + Marital_Status + Product_Category_1 + 
##     Product_Category_2 + Product_Category_3, data = ABC_train)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11109.9  -2840.6   -437.6   2779.0  19941.8 
## 
## Coefficients:
##                               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)                  11469.093     85.178  134.649  < 2e-16 ***
## GenderM                        319.925     27.487   11.639  < 2e-16 ***
## Age18-25                       414.023     71.812    5.765 8.16e-09 ***
## Age26-35                       533.391     69.809    7.641 2.17e-14 ***
## Age36-45                       668.706     71.846    9.308  < 2e-16 ***
## Age46-50                       687.862     79.716    8.629  < 2e-16 ***
## Age51-55                      1040.773     81.748   12.732  < 2e-16 ***
## Age55+                         900.212     91.234    9.867  < 2e-16 ***
## Occupation                       7.643      1.774    4.307 1.65e-05 ***
## City_CategoryB                 209.119     28.901    7.236 4.65e-13 ***
## City_CategoryC                 837.074     30.425   27.512  < 2e-16 ***
## Stay_In_Current_City_Years1    115.460     36.506    3.163  0.00156 ** 
## Stay_In_Current_City_Years2    212.938     40.516    5.256 1.48e-07 ***
## Stay_In_Current_City_Years3     66.809     41.227    1.621  0.10512    
## Stay_In_Current_City_Years4+   148.511     42.530    3.492  0.00048 ***
## Marital_Status1                -39.823     24.673   -1.614  0.10652    
## Product_Category_1            -827.614      5.113 -161.867  < 2e-16 ***
## Product_Category_2              25.321      3.377    7.498 6.53e-14 ***
## Product_Category_3              73.052      3.285   22.236  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4631 on 166802 degrees of freedom
##   (383247 observations deleted due to missingness)
## Multiple R-squared:  0.1699, Adjusted R-squared:  0.1698 
## F-statistic:  1897 on 18 and 166802 DF,  p-value: < 2.2e-16

We established the effect of Purchase Amount on all the demographics of the customer with the simplest model. We estimated the model using linear least squares. The coefficients of the regression can be studied to estimate which demographic is of more importance than the other.

coef(m1)
##                  (Intercept)                      GenderM 
##                 11469.092934                   319.925227 
##                     Age18-25                     Age26-35 
##                   414.022514                   533.391444 
##                     Age36-45                     Age46-50 
##                   668.706288                   687.862193 
##                     Age51-55                       Age55+ 
##                  1040.773098                   900.211598 
##                   Occupation               City_CategoryB 
##                     7.643229                   209.118510 
##               City_CategoryC  Stay_In_Current_City_Years1 
##                   837.073958                   115.459813 
##  Stay_In_Current_City_Years2  Stay_In_Current_City_Years3 
##                   212.938441                    66.808870 
## Stay_In_Current_City_Years4+              Marital_Status1 
##                   148.510845                   -39.823226 
##           Product_Category_1           Product_Category_2 
##                  -827.613566                    25.320631 
##           Product_Category_3 
##                    73.051732

3.4 Results

We can see that Gender, Age Group and Product Category 1 seem to have more impact than City Category, Marital Status, Number of Years of Stay in the Current City and lastly Occupation. As we can observe that the value of Multiple R-squared is 0.17 indicating that fraction by which the variance of the errors is less than the variance of the dependent variable is 0.17. Though the value is less, yet the model is doing very well as it is able to tap the uncertainty in customer purchasing behaviour to quite some extent.

4. Conclusion

It is pretty much known that the customers can be very erratic as far as purchasing is concerned. With such consderations, this model enables to formulate some approximations as to which kind of customer will reap how much profit for the company. This will aid in making personalized offers to the customers and initating loyalty programs which offer credit points on making purchases.

5. References

https://datahack.analyticsvidhya.com/contest/black-friday/

https://en.wikipedia.org/wiki/Retail