+— title: ‘Team C Project - Exploring the Data (Revised)’ author: “Team 3” date: “23 October 2018” output: html_document — ##Goal

The goal of this project is to analyze cash on delivery (COD) payments, relative to electronic payments (e.g. debit card, credit card).

getwd()
## [1] "E:/IIM LUCKNOW/Term 5/DAM/pROJECT"
projData = read.csv("PromotionDataV4.csv")
str(projData)
## 'data.frame':    45898 obs. of  29 variables:
##  $ OrderItemCode      : int  50089810 50089812 50358742 48983376 49999082 50217970 49704460 49704458 49963528 49963526 ...
##  $ OrderID            : int  29171450 29171450 29330182 28513266 29116292 29248820 28943024 28943024 29094620 29094620 ...
##  $ OrderDate          : Factor w/ 19878 levels "4/10/2015 0:00",..: 8712 8712 10169 2286 8187 9447 6289 6289 7931 7931 ...
##  $ Category           : Factor w/ 1 level "Apparels": 1 1 1 1 1 1 1 1 1 1 ...
##  $ SubCategory        : Factor w/ 58 levels "3/4THS","ASYMMETRIC DRESS",..: 38 38 38 49 38 49 38 49 38 49 ...
##  $ Brand              : Factor w/ 10 levels "ATHENA","FABALLEY",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ ProductColor       : Factor w/ 44 levels "AQUA","AQUA BLUE",..: 43 4 4 35 5 33 42 5 42 5 ...
##  $ ProductSize        : Factor w/ 18 levels "28","30","32",..: 13 16 13 13 15 15 15 15 15 15 ...
##  $ MRP                : int  1100 800 1250 1200 1100 1250 950 850 950 850 ...
##  $ FinalTotalPrice    : num  866 630 919 805 808 ...
##  $ VATPercent         : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ VAT                : num  41.2 30 43.8 36 38.5 ...
##  $ CODCharge          : int  0 0 0 49 0 0 0 0 0 49 ...
##  $ VendorDiscount     : int  0 0 0 480 0 0 0 0 0 0 ...
##  $ WebsiteDiscountCode: Factor w/ 1894 levels "ACT200","ACT300",..: 1118 1118 1505 675 576 597 1108 1108 1118 1118 ...
##  $ WebsiteDiscount    : num  275 200 375 0 330 ...
##  $ HasVendorDiscount  : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ HasWebsiteDiscount : int  1 1 1 0 1 1 1 1 1 1 ...
##  $ CustomerID         : int  1224439 1224439 139127 15601014 6275116 6275116 3596101 3596101 3596101 3596101 ...
##  $ COD                : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ ShippingName       : Factor w/ 30892 levels "  Alok  Karmakar",..: 18346 18346 8692 9116 17418 17418 17541 17541 17541 17541 ...
##  $ ShippingCity       : Factor w/ 791 levels " Cuttack","110058",..: 590 590 321 427 590 590 321 321 321 321 ...
##  $ ShippingState      : Factor w/ 32 levels "AN","AP","AR",..: 19 19 2 26 19 19 2 2 2 2 ...
##  $ ShippingPincode    : int  411027 411027 500017 324007 411030 411030 500081 500081 500081 500081 ...
##  $ ShippingAddressType: Factor w/ 3 levels "Home","Null",..: 1 1 1 2 2 2 3 3 3 3 ...
##  $ BillingCity        : Factor w/ 983 levels "190008","Abbowal",..: 719 719 400 527 719 719 400 400 400 400 ...
##  $ BillingState       : Factor w/ 33 levels "AN","AP","AS",..: 18 18 2 27 18 18 2 2 2 2 ...
##  $ BillingPincode     : int  411027 411027 500017 324007 411030 411030 500081 500081 500081 500081 ...
##  $ BillingAddressType : Factor w/ 3 levels "Home","Null",..: 2 2 1 2 2 2 3 3 3 3 ...
psych::describe(projData)
##                      vars     n        mean         sd      median
## OrderItemCode           1 45898 49722785.04  940665.91 49663205.00
## OrderID                 2 45898 28953309.24  556520.76 28918355.00
## OrderDate*              3 45898     8625.75    6092.07     8403.00
## Category*               4 45898        1.00       0.00        1.00
## SubCategory*            5 45898       34.13      16.36       38.00
## Brand*                  6 45898        4.68       2.58        4.00
## ProductColor*           7 45898       18.86      14.38       16.00
## ProductSize*            8 45898       14.21       2.08       14.00
## MRP                     9 45898     1015.03     464.76      950.00
## FinalTotalPrice        10 45898      754.79     343.63      677.95
## VATPercent             11 45898        5.00       0.00        5.00
## VAT                    12 45898       35.29      16.20       29.95
## CODCharge              13 45898       13.70      21.99        0.00
## VendorDiscount         14 45898      225.42     269.63      110.00
## WebsiteDiscountCode*   15 45898      703.26     334.21      675.00
## WebsiteDiscount        16 45898       83.81     167.46        0.00
## HasVendorDiscount      17 45898        0.51       0.50        1.00
## HasWebsiteDiscount     18 45898        0.35       0.48        0.00
## CustomerID             19 45898  9014567.14 5248491.89  8963196.00
## COD                    20 45898        0.61       0.49        1.00
## ShippingName*          21 45898    15433.57    8920.88    15471.00
## ShippingCity*          22 45898      393.91     208.84      435.00
## ShippingState*         23 45898       17.59       8.30       17.00
## ShippingPincode        24 45898   400188.81  210286.71   400706.00
## ShippingAddressType*   25 45898        1.66       0.66        2.00
## BillingCity*           26 45898      482.17     254.31      535.00
## BillingState*          27 45898       17.18       8.92       16.00
## BillingPincode         28 45898   400234.40  210418.62   400706.00
## BillingAddressType*    29 45898        1.73       0.61        2.00
##                          trimmed        mad      min         max
## OrderItemCode        49709041.49 1313310.80 47856780 51238880.00
## OrderID              28943786.35  776774.17 27856296 29857723.00
## OrderDate*               8406.79    8375.21        1    19878.00
## Category*                   1.00       0.00        1        1.00
## SubCategory*               34.61      16.31        1       58.00
## Brand*                      4.55       2.97        1       10.00
## ProductColor*              17.95      17.79        1       44.00
## ProductSize*               14.33       1.48        1       18.00
## MRP                       966.92     511.50      299    10999.00
## FinalTotalPrice           711.12     235.07        0     6977.95
## VATPercent                  5.00       0.00        5        5.00
## VAT                        33.21      11.14        0      329.95
## CODCharge                  11.01       0.00        0       49.00
## VendorDiscount            188.44     163.09        0     4400.00
## WebsiteDiscountCode*      707.60       0.00        1     1894.00
## WebsiteDiscount            45.35       0.00        0     3149.00
## HasVendorDiscount           0.51       0.00        0        1.00
## HasWebsiteDiscount          0.31       0.00        0        1.00
## CustomerID            9185248.64 7705045.51      197 16007333.00
## COD                         0.64       0.00        0        1.00
## ShippingName*           15436.44   11391.56        1    30892.00
## ShippingCity*             396.36     212.01        1      791.00
## ShippingState*             17.71       8.90        1       32.00
## ShippingPincode        390462.46  251017.52   110001   854301.00
## ShippingAddressType*        1.57       1.48        1        3.00
## BillingCity*              485.21     262.42        1      983.00
## BillingState*              17.12       8.90        1       33.00
## BillingPincode         390493.96  251017.52   110001   900000.00
## BillingAddressType*         1.68       0.00        1        3.00
##                            range  skew kurtosis       se
## OrderItemCode         3382100.00  0.13    -1.43  4390.75
## OrderID               2001427.00  0.14    -1.43  2597.67
## OrderDate*              19877.00  0.18    -1.30    28.44
## Category*                   0.00   NaN      NaN     0.00
## SubCategory*               57.00 -0.32    -1.47     0.08
## Brand*                      9.00  0.43    -0.90     0.01
## ProductColor*              43.00  0.28    -1.54     0.07
## ProductSize*               17.00 -2.03     8.69     0.01
## MRP                     10700.00  2.72    27.29     2.17
## FinalTotalPrice          6977.95  2.39    17.32     1.60
## VATPercent                  0.00   NaN      NaN     0.00
## VAT                       329.95  2.42    17.71     0.08
## CODCharge                  49.00  0.98    -1.04     0.10
## VendorDiscount           4400.00  1.53     7.57     1.26
## WebsiteDiscountCode*     1893.00  0.17     1.55     1.56
## WebsiteDiscount          3149.00  2.88    16.66     0.78
## HasVendorDiscount           1.00 -0.02    -2.00     0.00
## HasWebsiteDiscount          1.00  0.63    -1.60     0.00
## CustomerID           16007136.00 -0.09    -1.37 24498.40
## COD                         1.00 -0.46    -1.78     0.00
## ShippingName*           30891.00 -0.01    -1.19    41.64
## ShippingCity*             790.00 -0.26    -1.03     0.97
## ShippingState*             31.00  0.00    -0.71     0.04
## ShippingPincode        744300.00  0.16    -0.99   981.56
## ShippingAddressType*        2.00  0.51    -0.73     0.00
## BillingCity*              982.00 -0.27    -0.97     1.19
## BillingState*              32.00  0.20    -0.82     0.04
## BillingPincode         789999.00  0.15    -0.99   982.17
## BillingAddressType*         2.00  0.22    -0.60     0.00

Primary Analysis

  1. Dependent Variable: COD

  2. Independent Variable: FinalTotalPrice, WebsiteDiscount, HasWebsiteDiscount, Brand, SubCategory, ShippingCity,
    ShippingState, ShippingAddressType, BillingCity, BillingState, BillingAddressType, CODCharge

  3. Continuous Variable: OrderItemCode, OrderID, MRP, VendorDiscount, CustomerID, COD, FinalTotalPrice, VAT, WebsiteDiscount

  4. Categorical Variable: OrderDate, Category, SubCategory, Brand, ProductColor, ProductSize, VATPercent, CODCharge, WebsiteDiscountCode, HasVendorDiscount, HasWebsiteDiscount, ShippingName, ShippingCity, ShippingState, ShippingPincode, ShippingAddressType, BillingCity, BillingState, BillingPincode, BillingAddressType

attach(projData)
round(addmargins(prop.table(table(projData$COD,projData$HasWebsiteDiscount,dnn=c("COD","DISCOUNT")),1)*100,2),1)
##    DISCOUNT
## COD     0     1   Sum
##   0  59.9  40.1 100.0
##   1  68.3  31.7 100.0
round(addmargins(prop.table(table(projData$COD,projData$Brand,dnn=c("COD","BRAND")),1)*100,2),1)
##    BRAND
## COD ATHENA FABALLEY GRITSTONES HARPA MEIRA MISS CHASE MONTEIL & MUNERO
##   0    7.5     16.2       14.5  17.6  12.2        3.0             13.3
##   1   10.0     14.3       15.3  14.7  11.6        4.1             14.9
##    BRAND
## COD MR BUTTON THE VANCA TSHIRT COMPANY   Sum
##   0       1.8      10.8            3.1 100.0
##   1       1.9      10.2            3.0 100.0
round(addmargins(prop.table(table(projData$COD,projData$ShippingState,dnn=c("COD","SHIPPING STATE")),1)*100,2),1)
##    SHIPPING STATE
## COD    AN    AP    AR    AS    BR    CH    CT    DD    DL    DN    GA
##   0   0.1   7.1   0.0   0.9   0.4   1.1   0.4   0.0  12.9   0.0   0.5
##   1   0.0   5.6   0.0   2.5   1.3   1.3   1.0   0.0  11.0   0.1   1.5
##    SHIPPING STATE
## COD    GJ    HP    HR    JH    JK    KA    KL    MH    ML    MP    MZ
##   0   2.8   0.5   7.9   0.7   0.4  16.4   2.0  21.3   0.2   1.5   0.0
##   1   5.0   0.7   5.7   1.5   2.1  10.1   0.1  17.4   0.8   3.1   0.4
##    SHIPPING STATE
## COD    OR    PB    PY    RJ    SK    TN    TR    UP    UT    WB   Sum
##   0   1.8   1.4   0.0   1.9   0.1   4.6   0.0   7.9   0.9   4.1 100.0
##   1   2.9   2.9   0.1   3.2   0.0   3.3   0.2   9.1   1.3   5.8 100.0
round(addmargins(prop.table(table(projData$COD,projData$BillingState,dnn=c("COD","BILLING STATE")),1)*100,2),1)
##    BILLING STATE
## COD    AN    AP    AS    BR    CH    CT    DD    DL    DN    GA    GJ
##   0   0.1   7.1   0.9   0.4   1.1   0.4   0.0  13.2   0.0   0.5   2.8
##   1   0.0   5.7   2.5   1.3   1.2   1.0   0.0  11.2   0.1   1.5   4.9
##    BILLING STATE
## COD    HP    HR    JH    JK    KA    KL    MH    ML    MN    MP    MZ
##   0   0.5   7.7   0.7   0.4  16.3   2.0  21.2   0.2   0.0   1.6   0.0
##   1   0.8   5.5   1.5   2.1  10.1   0.1  17.4   0.8   0.0   3.1   0.4
##    BILLING STATE
## COD    NL    OR    PB    PY    RJ    SK    TN    TR    UP    UT    WB
##   0   0.0   1.9   1.3   0.1   1.9   0.1   4.6   0.0   7.9   0.9   4.0
##   1   0.0   2.9   2.9   0.1   3.2   0.0   3.2   0.2   9.1   1.3   5.8
##    BILLING STATE
## COD   Sum
##   0 100.0
##   1 100.0
table(projData$COD,projData$ShippingAddressType,dnn=c("COD","ADDRESS TYPE"),exclude = c("Null"))
##    ADDRESS TYPE
## COD  Home Office
##   0  9697   2891
##   1 10840   2015
aggregate(projData$FinalTotalPrice~projData$HasVendorDiscount, FUN = mean)
##   projData$HasVendorDiscount projData$FinalTotalPrice
## 1                          0                 808.4329
## 2                          1                 702.3885
boxplot(projData$FinalTotalPrice~projData$HasVendorDiscount, ylab = "Final Total Price" , xlab =" Vendor discount (1= Present, 0 = Not Present" , main= "Mean price based on vendor discount")

library(gplots)
## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess
plotmeans(projData$FinalTotalPrice~projData$HasVendorDiscount, ylab = "Final Total Price" , xlab =" Vendor discount (1= Present, 0 = Not Present" , main= "Mean price based on vendor discount")
## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, li, x, pmax(y - gap, li), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped
## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(x, ui, x, pmin(y + gap, ui), col = barcol, lwd = lwd, :
## zero-length arrow is of indeterminate angle and so skipped

aggregate(projData$FinalTotalPrice~projData$HasWebsiteDiscount, FUN = mean)
##   projData$HasWebsiteDiscount projData$FinalTotalPrice
## 1                           0                  747.507
## 2                           1                  768.375
boxplot(projData$FinalTotalPrice~projData$HasWebsiteDiscount, ylab = "Final Total Price" , xlab =" Website discount (1= Present, 0 = Not Present" , main= "Mean price based on Website discount")

boxplot(projData$FinalTotalPrice~projData$COD, ylab = "Final Total Price" , xlab =" COD (1= COD, 0 = Not COD" , main= "Mean price based on COD")

projdata1<-subset(projData,select=(COD), projData$COD==1)
dim(projdata1)
## [1] 28144     1
aggregate(projData$COD, by=list(add=projData$ShippingAddressType), sum)
##      add     x
## 1   Home 10840
## 2   Null 15289
## 3 Office  2015

From the data we can see Home type address has higher COD than office type address. ##Correlations

Pearson’s Correlation Test

attach(projData)
## The following objects are masked from projData (pos = 4):
## 
##     BillingAddressType, BillingCity, BillingPincode, BillingState,
##     Brand, Category, COD, CODCharge, CustomerID, FinalTotalPrice,
##     HasVendorDiscount, HasWebsiteDiscount, MRP, OrderDate,
##     OrderID, OrderItemCode, ProductColor, ProductSize,
##     ShippingAddressType, ShippingCity, ShippingName,
##     ShippingPincode, ShippingState, SubCategory, VAT, VATPercent,
##     VendorDiscount, WebsiteDiscount, WebsiteDiscountCode
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
CODselected <- c("MRP", "VendorDiscount", "FinalTotalPrice", "VAT", "WebsiteDiscount")
rcorr(as.matrix(projData[,CODselected]))
##                  MRP VendorDiscount FinalTotalPrice  VAT WebsiteDiscount
## MRP             1.00           0.53            0.82 0.82            0.33
## VendorDiscount  0.53           1.00            0.09 0.08           -0.29
## FinalTotalPrice 0.82           0.09            1.00 1.00            0.19
## VAT             0.82           0.08            1.00 1.00            0.20
## WebsiteDiscount 0.33          -0.29            0.19 0.20            1.00
## 
## n= 45898 
## 
## 
## P
##                 MRP VendorDiscount FinalTotalPrice VAT WebsiteDiscount
## MRP                  0              0               0   0             
## VendorDiscount   0                  0               0   0             
## FinalTotalPrice  0   0                              0   0             
## VAT              0   0              0                   0             
## WebsiteDiscount  0   0              0               0

Corrgrams

library(corrgram)
## 
## Attaching package: 'corrgram'
## The following object is masked from 'package:lattice':
## 
##     panel.fill
corrgram(projData[,CODselected], order=TRUE,
         main="Impact of COD",
         lower.panel=panel.pts, upper.panel=panel.pie,
         diag.panel=panel.minmax, text.panel=panel.txt)

library("PerformanceAnalytics")
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:gplots':
## 
##     textplot
## The following object is masked from 'package:graphics':
## 
##     legend
chart.Correlation(projData[,CODselected], histogram = TRUE , main= "Correlation between Prices and discount")

As expected, VAT and FinalTotalPrice are highly correlated as VAT forms a fixed percentage of FinalTotalPrice (except the COD charge). Also, MRP is also quite highly correlated to FinalTotalPrice and VAT as MRP forms a major chunk of FinalTotalPrice.

Determining the Significant factors for COD

We will now examine the impact of the following independent variables on the decision of COD or not: 1)FinalTotalPrice 2)WebsiteDiscount 3)HasWebsiteDiscount 4)CODCharge

We are taking the above parameters because of the following reasons: 1) Generally price has a part to play in the COD decision. For some people, they may prefer COD for lesser prices but may not prefer COD when the price is high, as they may not have that much cash. 2) Some websites have a discount based on the mode of payment. For example, some websites have discounts when you pay with cards of certain banks. So if there is such a discount, people may prefer paying by card instead of COD. 3) Some people who are price sensitive, may not prefer COD if there is an additional charge being levied.

m=lm(COD~MRP+FinalTotalPrice+WebsiteDiscount+HasWebsiteDiscount+CODCharge, data = projData)
summary(m)
## 
## Call:
## lm(formula = COD ~ MRP + FinalTotalPrice + WebsiteDiscount + 
##     HasWebsiteDiscount + CODCharge, data = projData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7619 -0.4576 -0.0126  0.5166  0.9197 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5.072e-01  5.265e-03  96.335  < 2e-16 ***
## MRP                -6.451e-05  7.742e-06  -8.332  < 2e-16 ***
## FinalTotalPrice     2.360e-05  1.019e-05   2.317   0.0205 *  
## WebsiteDiscount     2.066e-04  1.702e-05  12.137  < 2e-16 ***
## HasWebsiteDiscount -4.258e-02  5.768e-03  -7.382 1.59e-13 ***
## CODCharge           1.103e-02  9.273e-05 118.997  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4223 on 45892 degrees of freedom
## Multiple R-squared:  0.2483, Adjusted R-squared:  0.2483 
## F-statistic:  3032 on 5 and 45892 DF,  p-value: < 2.2e-16

We wanted to understand if MRP, Final price, Web Discount and COD charge has any impact on the decision of COD. As per the model output p value of F-Test is very small and we can reject the null hypothesis, one or more predictor has significant impact on COD decision. We also see that all of these variables are significant in determining whether payment would be done by COD or not. However, the adjusted R-squared value is quite less. This means that this model can be modified to explain the decision in better way.