Group 8: Cash On Delivery

Introduction

  • COD has helped e-commerce grow, but it does have pitfalls.
  • COD can be expensive for the seller if the buyer returns a product, as the company pays two-way courier charges.
  • COD accounts for almost 50% to 80% of online transactions in India
  • But the return rates in online shopping in COD transactions averages around 40 per cent

Hence, we are using data analysis to see impact of COD on consumer behaviour, customer preferences and profits.

Reading data into data frame

COD_data=read.csv("PromotionDataV4.csv",header = TRUE)
attach(COD_data)
x = read.csv("PromotionDataV4.csv")
table(x$COD)

    0     1 
17754 28144 

Data Exploration

View(COD_data)
summary(COD_data)
 OrderItemCode         OrderID                   OrderDate    
 Min.   :47856780   Min.   :27856296   4/10/2015 15:39:   28  
 1st Qu.:48822578   1st Qu.:28418744   4/10/2015 20:48:   25  
 Median :49663205   Median :28918355   4/10/2015 12:38:   22  
 Mean   :49722785   Mean   :28953309   4/10/2015 12:52:   21  
 3rd Qu.:50629792   3rd Qu.:29484854   4/10/2015 11:48:   20  
 Max.   :51238880   Max.   :29857723   4/10/2015 12:29:   20  
                                       (Other)        :45762  
     Category                  SubCategory                 Brand     
 Apparels:45898   TOPS               :14605   HARPA           :7271  
                  DRESSES            : 8869   GRITSTONES      :6901  
                  POLO-T-SHIRTS      : 3793   FABALLEY        :6899  
                  ROUND NECK T-SHIRTS: 2937   MONTEIL & MUNERO:6539  
                  SHIRTS             : 2897   MEIRA           :5422  
                  CASUAL SHIRTS      : 1882   THE VANCA       :4804  
                  (Other)            :10915   (Other)         :8062  
    ProductColor    ProductSize         MRP        FinalTotalPrice 
 BLACK    :11357   M      :12959   Min.   :  299   Min.   :   0.0  
 BLUE     : 6048   L      :11087   1st Qu.:  649   1st Qu.: 525.0  
 RED      : 3570   S      : 9556   Median :  950   Median : 678.0  
 NAVY BLUE: 3304   XL     : 7350   Mean   : 1015   Mean   : 754.8  
 WHITE    : 2605   XXL    : 2199   3rd Qu.: 1295   3rd Qu.: 882.0  
 PINK     : 2461   XS     :  670   Max.   :10999   Max.   :6977.9  
 (Other)  :16553   (Other): 2077                                   
   VATPercent      VAT           CODCharge    VendorDiscount  
 Min.   :5    Min.   :  0.00   Min.   : 0.0   Min.   :   0.0  
 1st Qu.:5    1st Qu.: 24.95   1st Qu.: 0.0   1st Qu.:   0.0  
 Median :5    Median : 29.95   Median : 0.0   Median : 110.0  
 Mean   :5    Mean   : 35.29   Mean   :13.7   Mean   : 225.4  
 3rd Qu.:5    3rd Qu.: 41.85   3rd Qu.:49.0   3rd Qu.: 400.0  
 Max.   :5    Max.   :329.95   Max.   :49.0   Max.   :4400.0  

 WebsiteDiscountCode WebsiteDiscount   HasVendorDiscount
 NIL    :29871       Min.   :   0.00   Min.   :0.0000   
 APP05  : 2686       1st Qu.:   0.00   1st Qu.:0.0000   
 STYLE30: 1699       Median :   0.00   Median :1.0000   
 WOW30  : 1605       Mean   :  83.81   Mean   :0.5058   
 SALE30 : 1299       3rd Qu.:  44.95   3rd Qu.:1.0000   
 RUSH30 : 1144       Max.   :3149.00   Max.   :1.0000   
 (Other): 7594                                          
 HasWebsiteDiscount   CustomerID            COD        
 Min.   :0.0000     Min.   :     197   Min.   :0.0000  
 1st Qu.:0.0000     1st Qu.: 4427436   1st Qu.:0.0000  
 Median :0.0000     Median : 8963196   Median :1.0000  
 Mean   :0.3492     Mean   : 9014567   Mean   :0.6132  
 3rd Qu.:1.0000     3rd Qu.:14821276   3rd Qu.:1.0000  
 Max.   :1.0000     Max.   :16007333   Max.   :1.0000  

            ShippingName      ShippingCity   ShippingState  
 Divya            :   25   Bangalore: 4904   MH     : 8669  
 Priyanka         :   25   New Delhi: 4585   KA     : 5762  
 jol  shishir vyas:   23   Mumbai   : 3845   DL     : 5392  
 Pooja            :   23   Pune     : 2557   UP     : 3959  
 Neha             :   21   Gurgaon  : 2004   HR     : 3003  
 Shital  Shegokar :   21   Hyderabad: 1999   AP     : 2840  
 (Other)          :45760   (Other)  :26004   (Other):16273  
 ShippingPincode  ShippingAddressType    BillingCity     BillingState  
 Min.   :110001   Home  :20537        Bangalore: 4829   MH     : 8668  
 1st Qu.:201012   Null  :20455        New Delhi: 4707   KA     : 5750  
 Median :400706   Office: 4906        Mumbai   : 3867   DL     : 5504  
 Mean   :400189                       Pune     : 2513   UP     : 3962  
 3rd Qu.:560060                       Hyderabad: 1983   HR     : 2916  
 Max.   :854301                       Gurgaon  : 1927   AP     : 2864  
                                      (Other)  :26072   (Other):16234  
 BillingPincode   BillingAddressType
 Min.   :110001   Home  :16443      
 1st Qu.:201012   Null  :25571      
 Median :400706   Office: 3884      
 Mean   :400234                     
 3rd Qu.:560059                     
 Max.   :900000                     

Reading data into data frame

COD_data$COD.f <- factor(COD, 
                      levels=c(0,1), 
                      labels=c("Electronic","Cod"))
boxplot(FinalTotalPrice~COD.f,data=COD_data, main="Order data", xlab="Order Price", ylab="Payment Method", horizontal = TRUE)

plot of chunk unnamed-chunk-5

Data Exploration

library(gplots)
plotmeans(FinalTotalPrice ~ COD.f, data = COD_data, frame = FALSE)

plot of chunk unnamed-chunk-6

Data Exploration

mytable <- xtabs(~ COD.f+ShippingAddressType, data=COD_data)
addmargins(prop.table(mytable, 2), 1)*100 # row sum
            ShippingAddressType
COD.f             Home      Null    Office
  Electronic  47.21722  25.25544  58.92784
  Cod         52.78278  74.74456  41.07216
  Sum        100.00000 100.00000 100.00000
chisq.test(mytable)

    Pearson's Chi-squared test

data:  mytable
X-squared = 3033.3, df = 2, p-value < 2.2e-16
  • P<0.05, Hence, we reject the null hypothesis that Payment method and address type are independent.
  • Here we can say that users with address type office can be inclined towards Electronic payment
  • This insight can be used for targetted promotion based on payment methods

Price wise comparision

cod_percent = c()
non_cod_percent = c()
for(i in seq(500, max(x$FinalTotalPrice), by=500)) {
  cod_percent = c(cod_percent, nrow(subset(x, x$COD == 1 & x$FinalTotalPrice > i))*100/nrow(subset(x, x$COD == 1)))
  non_cod_percent = c(non_cod_percent, nrow(subset(x, x$COD == 0 & x$FinalTotalPrice > i))*100/nrow(subset(x, x$COD == 0)))
  i = i + 500
}

We divided the purchases into bins of 500 and found out the percent of purchases.

Compare average order QTY for COD to that with Electronic

Null Hypothesis: Mean number of items for an order is similar for COD and Non COD transactions Alternate Hypothesis: Mean number of items for an order is not similar for COD and Non COD transactions

Test Statistics: T-test Result of T-test: t = 1.7493, df = 28164, p-value = 0.08025

Compare_QTY = aggregate(OrderItemCode, by=list(ID=OrderID, PAY=COD), length)
Compare_QTY$PAY <- as.factor(Compare_QTY$PAY)
t.test(Compare_QTY$x~ Compare_QTY$PAY,data = Compare_QTY)

    Welch Two Sample t-test

data:  Compare_QTY$x by Compare_QTY$PAY
t = 1.7493, df = 28164, p-value = 0.08025
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.001684438  0.029644012
sample estimates:
mean in group 0 mean in group 1 
       1.323247        1.309267 

Compare average price per order for COD to that with Electronic

Null Hypothesis: Mean price for an order is similar for COD and Non COD transactions Alternate Hypothesis: Mean price for an order is not similar for COD and Non COD transactions

Test Statistics: T-test Result of T-test: t = -8.5188, df = 30373, p-value < 2.2e-16

Hence, we reject the null hypothesis.

Compare_price = aggregate(FinalTotalPrice, by=list(ID=OrderID, PAY=COD), sum)
Compare_QTY$PAY <- as.factor(Compare_QTY$PAY)
t.test(Compare_price$x~ Compare_price$PAY,data = Compare_price)

    Welch Two Sample t-test

data:  Compare_price$x by Compare_price$PAY
t = -8.5188, df = 30373, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -70.41672 -44.07408
sample estimates:
mean in group 0 mean in group 1 
       957.0355       1014.2809 

COD dependence

Logistic Regression to find effect of shipping state, final total price and brand on the dependant variable COD/Electronic

model <- glm (COD ~ ShippingState + FinalTotalPrice+ Brand , data = COD_data)
summary(model)

Call:
glm(formula = COD ~ ShippingState + FinalTotalPrice + Brand, 
    data = COD_data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1916  -0.5319   0.2473   0.4158   1.0273  

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)            1.069e-01  1.114e-01   0.960 0.337157    
ShippingStateAP        3.798e-01  1.114e-01   3.410 0.000649 ***
ShippingStateAR        7.769e-01  4.838e-01   1.606 0.108338    
ShippingStateAS        6.309e-01  1.122e-01   5.624 1.88e-08 ***
ShippingStateBR        6.658e-01  1.134e-01   5.873 4.30e-09 ***
ShippingStateCH        4.818e-01  1.128e-01   4.270 1.96e-05 ***
ShippingStateCT        6.257e-01  1.138e-01   5.500 3.83e-08 ***
ShippingStateDD        8.219e-01  1.923e-01   4.275 1.92e-05 ***
ShippingStateDL        4.014e-01  1.112e-01   3.609 0.000308 ***
ShippingStateDN        6.104e-01  1.482e-01   4.118 3.83e-05 ***
ShippingStateGA        6.535e-01  1.129e-01   5.786 7.25e-09 ***
ShippingStateGJ        5.598e-01  1.115e-01   5.018 5.23e-07 ***
ShippingStateHP        5.337e-01  1.144e-01   4.665 3.10e-06 ***
ShippingStateHR        3.614e-01  1.114e-01   3.245 0.001174 ** 
ShippingStateJH        6.016e-01  1.128e-01   5.333 9.71e-08 ***
ShippingStateJK        7.177e-01  1.125e-01   6.378 1.81e-10 ***
ShippingStateKA        3.238e-01  1.112e-01   2.912 0.003596 ** 
ShippingStateKL       -1.175e-01  1.137e-01  -1.033 0.301544    
ShippingStateMH        3.913e-01  1.111e-01   3.521 0.000431 ***
ShippingStateML        6.775e-01  1.146e-01   5.911 3.42e-09 ***
ShippingStateMP        5.856e-01  1.119e-01   5.234 1.67e-07 ***
ShippingStateMZ        7.755e-01  1.190e-01   6.518 7.18e-11 ***
ShippingStateOR        5.335e-01  1.119e-01   4.768 1.87e-06 ***
ShippingStatePB        5.948e-01  1.120e-01   5.313 1.08e-07 ***
ShippingStatePY        5.800e-01  1.388e-01   4.180 2.92e-05 ***
ShippingStateRJ        5.529e-01  1.118e-01   4.944 7.67e-07 ***
ShippingStateSK       -1.678e-01  1.497e-01  -1.121 0.262251    
ShippingStateTN        3.524e-01  1.116e-01   3.158 0.001592 ** 
ShippingStateTR        7.062e-01  1.285e-01   5.496 3.91e-08 ***
ShippingStateUP        4.697e-01  1.113e-01   4.221 2.44e-05 ***
ShippingStateUT        5.219e-01  1.129e-01   4.621 3.82e-06 ***
ShippingStateWB        5.169e-01  1.114e-01   4.638 3.52e-06 ***
FinalTotalPrice        1.343e-04  7.697e-06  17.453  < 2e-16 ***
BrandFABALLEY         -7.890e-02  9.278e-03  -8.504  < 2e-16 ***
BrandGRITSTONES        1.496e-03  9.689e-03   0.154 0.877312    
BrandHARPA            -7.323e-02  9.262e-03  -7.907 2.70e-15 ***
BrandMEIRA            -1.538e-02  1.022e-02  -1.506 0.132117    
BrandMISS CHASE        1.791e-02  1.366e-02   1.312 0.189569    
BrandMONTEIL & MUNERO -5.718e-03  9.661e-03  -0.592 0.553956    
BrandMR BUTTON        -1.600e-01  1.880e-02  -8.507  < 2e-16 ***
BrandTHE VANCA        -4.430e-02  1.015e-02  -4.364 1.28e-05 ***
BrandTSHIRT COMPANY   -4.379e-02  1.465e-02  -2.990 0.002793 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.2216896)

    Null deviance: 10886  on 45897  degrees of freedom
Residual deviance: 10166  on 45856  degrees of freedom
AIC: 61153

Number of Fisher Scoring iterations: 2

Plots (in %)

plot(cod_percent, type = "l")

plot of chunk unnamed-chunk-12

Plots (in %)

plot(non_cod_percent, type = "l")

plot of chunk unnamed-chunk-13

As seen from both the graphs, no significant change in purchase behaviour

City Wise Trend

city <- aggregate ( x$OrderID ~ x$BillingCity+x$COD, x, length)
head(city[order((city$`x$OrderID`), decreasing = TRUE),],15)
     x$BillingCity x$COD x$OrderID
1143     New Delhi     1      2645
61       Bangalore     0      2638
1110        Mumbai     1      2229
712      Bangalore     1      2191
455      New Delhi     0      2062
422         Mumbai     0      1638
498           Pune     0      1348
1045       Kolkata     1      1247
1199          Pune     1      1165
253        Gurgaon     0      1027
284      Hyderabad     0       992
953      Hyderabad     1       991
924        Gurgaon     1       900
461          Noida     0       591
359        Kolkata     0       561

Here we see city wise distribution of top cities

State Wise Trend

state <- aggregate ( x$OrderID ~ x$BillingState+x$COD, x, length)
head(state[order((state$`x$OrderID`), decreasing = TRUE),],15)
   x$BillingState x$COD x$OrderID
50             MH     1      4903
17             MH     0      3765
40             DL     1      3158
15             KA     0      2902
48             KA     1      2848
61             UP     1      2555
7              DL     0      2346
63             WB     1      1636
34             AP     1      1602
45             HR     1      1543
30             UP     0      1407
43             GJ     1      1379
12             HR     0      1373
2              AP     0      1262
59             TN     1       907

Here we see state wise distribution of top states

Vebdor Discount wise distribution

vendor_discount <- aggregate ( x$OrderID ~ x$HasVendorDiscount+x$COD, x, length)
vendor_discount[order((vendor_discount$`x$OrderID`), decreasing = TRUE),]
  x$HasVendorDiscount x$COD x$OrderID
4                   1     1     14227
3                   0     1     13917
2                   1     0      8989
1                   0     0      8765

Website Discount wise distribution

website_discount <- aggregate ( x$OrderID ~ x$HasWebsiteDiscount+x$COD, x, length)
website_discount[order((website_discount$`x$OrderID`), decreasing = TRUE),]
  x$HasWebsiteDiscount x$COD x$OrderID
3                    0     1     19229
1                    0     0     10642
4                    1     1      8915
2                    1     0      7112

Any Discount wise distribution

x$any_discount <- ifelse(x$HasVendorDiscount | x$HasWebsiteDiscount, 1, 0)
any_discount <- aggregate ( x$OrderID ~ x$any_discount+x$COD, x, length)
any_discount[order((any_discount$`x$OrderID`), decreasing = TRUE),]
  x$any_discount x$COD x$OrderID
4              1     1     19900
2              1     0     13462
3              0     1      8244
1              0     0      4292

Based on COD Charges

cod_charge <- aggregate ( x$OrderID ~ x$CODCharge+x$COD, x, length)
cod_charge[order((cod_charge$`x$OrderID`), decreasing = TRUE),]
  x$CODCharge x$COD x$OrderID
1           0     0     17754
2           0     1     15307
3          49     1     12837