Group 7 _ DAM_Project

K Chandirasekaran
12/07/2018

Understanding COD Purchase Behaviour, Impact of discounts and pricing impact fators in E-commerce

  • Basic Research and insights
  • Consumer Behaviour Analysis
  • Regression model & Normality test

Reading Data

coddata<-read.csv("C:/Users/Dinesh Sambasivam/Downloads/PromotionDataV4.csv")

Part 1- Basic Research - Ticket Sizes - COD vs non-COD

Breaking myth that Premium customers go for electronic transactions while low end transactions are usually done on COD mode Tests show the percentage split for both quantity of orders as well as value of orders are very similar

mytable1 <- with(coddata, table(COD))
prop.table(mytable1)*100
COD
       0        1 
38.68142 61.31858 
mytable2 <- aggregate(coddata$FinalTotalPrice ~ coddata$COD, FUN = sum)
mytable2
  coddata$COD coddata$FinalTotalPrice
1           0                12840545
2           1                21802982
prop.table(mytable2)*100
   coddata$COD coddata$FinalTotalPrice
1 0.000000e+00                37.06477
2 2.886542e-06                62.93522

Average Price Comparison

Average Price comparison between COD & non-COD orders

Has_COD = subset(coddata, coddata$COD == 1 )
Not_COD = subset(coddata, coddata$COD == 0)

# Average price paid with COD
mean(Has_COD$FinalTotalPrice- Has_COD$CODCharge)
[1] 752.344
mean(Not_COD$FinalTotalPrice)
[1] 723.248

Welch Two sample T-test to find significance of difference

t.test(coddata$FinalTotalPrice~ coddata$COD)

    Welch Two Sample t-test

data:  coddata$FinalTotalPrice by coddata$COD
t = -16.261, df = 42172, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -57.64667 -45.24489
sample estimates:
mean in group 0 mean in group 1 
       723.2480        774.6938 
# Welch Two sample T-Test

Analysis of the Test As the P value of the test is less than 0.05, we can say the mean average price of purchases with COD is significantly higher than that of without COD

Analyzing Distribution

plot of chunk unnamed-chunk-5

Part 2 -Consumer Behaviour

Are people more likely / less likely to use COD when they buy under promotions compared to when they buy without promotions?

Metro vs Non-Metro preferences

Does the customer's location (captured by Pin code/Shipping State) affects his/her choice of payment method?

Are premium priced goods less discounted? (analysing impact of MRP on discounts)

Are Discounts successful in pushing people out of COD to e-payment modes ?

mytable3 <- xtabs(~coddata$COD+coddata$HasVendorDiscount, data= coddata )
mytable3
           coddata$HasVendorDiscount
coddata$COD     0     1
          0  8765  8989
          1 13917 14227
mytable4 <- xtabs(~coddata$COD+coddata$HasWebsiteDiscount, data= coddata )
mytable4
           coddata$HasWebsiteDiscount
coddata$COD     0     1
          0 10642  7112
          1 19229  8915

Website Discounts have been successful to some extent to drive people to move out of COD to electronic payments

Pearson's chi-square test to check for significance

# For vendor Discount COD Behaviour
chisq.test(mytable3)

    Pearson's Chi-squared test with Yates' continuity correction

data:  mytable3
X-squared = 0.024833, df = 1, p-value = 0.8748

Statistically not significant

# For Website discount vs COD Behaviour
chisq.test(mytable4)

    Pearson's Chi-squared test with Yates' continuity correction

data:  mytable4
X-squared = 336.21, df = 1, p-value < 2.2e-16

Statistically Significant impact of website discount to drive people to electronic payments

Are People in metros less reliant on COD modes ?

Metro <- coddata[which(coddata$ShippingCity==c("Kolkata", "New Delhi", "Bangalore", "Mumbai")),]
metrotable <- table(Metro$COD)
prop.table(metrotable)*100

       0        1 
45.25469 54.74531 
NonMetro <- coddata[which(coddata$ShippingCity!=c("Kolkata", "New Delhi", "Bangalore", "Mumbai")),]
Nonmetrotable <- table(NonMetro$COD)
prop.table(Nonmetrotable)*100

       0        1 
38.09998 61.90002 

62% of non-metro orders serviced through COD against 55% of metro orders

COD Preference on high value - low value products

Premium <- coddata[which(coddata$FinalTotalPrice >= 1000),]
premiumtable <- table(Premium$COD)
prop.table(premiumtable)*100

       0        1 
32.08508 67.91492 
NonPremium <- coddata[which(coddata$FinalTotalPrice < 500),]
NonPremiumtable <- table(NonPremium$COD)
prop.table(NonPremiumtable)*100

       0        1 
41.32364 58.67636 

Contrary to popular belief we see that there is a higher proportion of E-transaction orders (over 10% lead) on items of low value. This leads to the insight that people prefer COD more on account of security rather than as a measure of premium/convenience

Part 3 - Regression Model

FinalTotalPrice = MRP + VAT + CODCharge - VendorDiscount - WebsiteDiscount

modelPrimary <- FinalTotalPrice ~ MRP + VAT + CODCharge - WebsiteDiscount - VendorDiscount
fit1 <- lm(modelPrimary, data = coddata)
summary(fit1)

Call:
lm(formula = modelPrimary, data = coddata)

Residuals:
       Min         1Q     Median         3Q        Max 
-3.301e-09 -2.000e-13 -2.000e-13 -1.000e-13  8.710e-09 

Coefficients:
             Estimate Std. Error   t value Pr(>|t|)    
(Intercept) 5.694e-11  5.121e-13 1.112e+02   <2e-16 ***
MRP         4.107e-14  7.606e-16 5.400e+01   <2e-16 ***
VAT         2.100e+01  2.183e-14 9.619e+14   <2e-16 ***
CODCharge   1.000e+00  9.325e-15 1.072e+14   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.356e-11 on 45894 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:      1 
F-statistic: 9.519e+29 on 3 and 45894 DF,  p-value: < 2.2e-16

Model for deriving Insights of other factors

  • FinalTotalPrice ~ ShippingCity + HasVendorDiscount X VendorDiscount + HasWebsiteDiscount X WebsiteDiscount X COD + SubCategory
  • Approx ~ 50% fit based on Adjusted R-Square value
modelSecondary <- FinalTotalPrice ~ ShippingCity + HasVendorDiscount * VendorDiscount + HasWebsiteDiscount * WebsiteDiscount * COD + SubCategory  
fit1 <- lm(modelSecondary, data = Metro)
summary(fit1)

Call:
lm(formula = modelSecondary, data = Metro)

Residuals:
    Min      1Q  Median      3Q     Max 
-1139.0  -121.1   -23.2    99.3  4178.5 

Coefficients: (3 not defined because of singularities)
                                         Estimate Std. Error t value
(Intercept)                             865.08998  258.39714   3.348
ShippingCityKolkata                       5.66181   14.30785   0.396
ShippingCityMumbai                       -0.04755   11.33215  -0.004
ShippingCityNew Delhi                    31.31575   10.82669   2.892
HasVendorDiscount                      -385.54866   18.13870 -21.256
VendorDiscount                            0.29502    0.03767   7.831
HasWebsiteDiscount                      -86.57308   18.46639  -4.688
WebsiteDiscount                           0.15930    0.06244   2.551
COD                                      36.65234   10.95589   3.345
SubCategoryBODYCON DRESS                422.23330  364.67559   1.158
SubCategoryCAMISOLES                   -224.31828  288.52701  -0.777
SubCategoryCAPRIS                       267.28824  282.58999   0.946
SubCategoryCAPS & HATS                   46.29319  266.36623   0.174
SubCategoryCASUAL JACKETS               631.95943  282.25506   2.239
SubCategoryCASUAL SHIRTS                316.78845  259.06512   1.223
SubCategoryCHINOS                       815.23762  263.91584   3.089
SubCategoryDENIM JACKETS                 76.83499  316.35539   0.243
SubCategoryDRESSES                      262.36196  258.20127   1.016
SubCategoryETHNIC JACKETS              1323.28008  276.29235   4.789
SubCategoryFORMAL SHIRTS                595.21630  365.54916   1.628
SubCategoryFORMAL TROUSERS              915.58483  364.88590   2.509
SubCategoryHENLEY T-SHIRTS             -311.63458  259.15396  -1.203
SubCategoryHIGH NECK T SHIRTS          -251.43315  259.95856  -0.967
SubCategoryJACKETS & BLAZERS           1979.47495  287.55775   6.884
SubCategoryJEGGINGS                     275.00532  264.54142   1.040
SubCategoryJUMPSUITS                    372.09137  267.94381   1.389
SubCategoryKURTAS                       -49.67643  267.09125  -0.186
SubCategoryKURTIS                      -258.51949  273.71337  -0.944
SubCategoryMANDARIN T-SHIRTS             37.42923  288.38111   0.130
SubCategoryNIGHTWEAR                   -105.24772  364.76848  -0.289
SubCategoryPOLO-T-SHIRTS               -201.71458  258.49050  -0.780
SubCategoryROUND NECK SWEATERS           17.51348  365.07882   0.048
SubCategoryROUND NECK T-SHIRTS         -195.28882  258.55923  -0.755
SubCategoryROUND NECK VESTS            -167.96162  297.85967  -0.564
SubCategorySALWAR                      -177.19412  278.60326  -0.636
SubCategorySCARVES                     -204.71095  364.76646  -0.561
SubCategorySHIFT DRESS                   87.47713  288.35038   0.303
SubCategorySHIRTS                        30.00966  258.53944   0.116
SubCategorySHORTS                      -130.24956  262.96009  -0.495
SubCategorySHRUGS                        70.69059  264.32520   0.267
SubCategorySKATER DRESS                 420.50531  365.32198   1.151
SubCategorySKIRTS                       222.98041  259.60213   0.859
SubCategorySUITS                        958.14318  293.20562   3.268
SubCategorySWEATERS                      77.27810  270.58529   0.286
SubCategorySWEATSHIRTS                  321.25067  297.80370   1.079
SubCategoryT SHIRTS                    -136.22406  259.43541  -0.525
SubCategoryTOPS                         -62.62031  258.02983  -0.243
SubCategoryTRACK PANTS                  302.31862  278.92060   1.084
SubCategoryTRACKPANTS                  1401.54587  364.88803   3.841
SubCategoryTROUSERS                    -128.76370  259.66138  -0.496
SubCategoryTUNICS                      -177.27655  269.43263  -0.658
SubCategoryV NECK T-SHIRTS             -325.74086  258.97442  -1.258
SubCategoryWAISTCOATS                   491.02387  263.55327   1.863
SubCategoryWINTER JACKETS               512.85049  261.34369   1.962
HasVendorDiscount:VendorDiscount               NA         NA      NA
HasWebsiteDiscount:WebsiteDiscount             NA         NA      NA
HasWebsiteDiscount:COD                   63.41220   24.53117   2.585
WebsiteDiscount:COD                      -0.40019    0.07079  -5.653
HasWebsiteDiscount:WebsiteDiscount:COD         NA         NA      NA
                                       Pr(>|t|)    
(Intercept)                            0.000822 ***
ShippingCityKolkata                    0.692339    
ShippingCityMumbai                     0.996652    
ShippingCityNew Delhi                  0.003845 ** 
HasVendorDiscount                       < 2e-16 ***
VendorDiscount                         6.28e-15 ***
HasWebsiteDiscount                     2.86e-06 ***
WebsiteDiscount                        0.010771 *  
COD                                    0.000830 ***
SubCategoryBODYCON DRESS               0.247008    
SubCategoryCAMISOLES                   0.436937    
SubCategoryCAPRIS                      0.344286    
SubCategoryCAPS & HATS                 0.862036    
SubCategoryCASUAL JACKETS              0.025218 *  
SubCategoryCASUAL SHIRTS               0.221478    
SubCategoryCHINOS                      0.002023 ** 
SubCategoryDENIM JACKETS               0.808115    
SubCategoryDRESSES                     0.309642    
SubCategoryETHNIC JACKETS              1.74e-06 ***
SubCategoryFORMAL SHIRTS               0.103551    
SubCategoryFORMAL TROUSERS             0.012142 *  
SubCategoryHENLEY T-SHIRTS             0.229244    
SubCategoryHIGH NECK T SHIRTS          0.333505    
SubCategoryJACKETS & BLAZERS           6.83e-12 ***
SubCategoryJEGGINGS                    0.298615    
SubCategoryJUMPSUITS                   0.165011    
SubCategoryKURTAS                      0.852462    
SubCategoryKURTIS                      0.344982    
SubCategoryMANDARIN T-SHIRTS           0.896739    
SubCategoryNIGHTWEAR                   0.772955    
SubCategoryPOLO-T-SHIRTS               0.435232    
SubCategoryROUND NECK SWEATERS         0.961741    
SubCategoryROUND NECK T-SHIRTS         0.450120    
SubCategoryROUND NECK VESTS            0.572860    
SubCategorySALWAR                      0.524810    
SubCategorySCARVES                     0.574688    
SubCategorySHIFT DRESS                 0.761624    
SubCategorySHIRTS                      0.907600    
SubCategorySHORTS                      0.620403    
SubCategorySHRUGS                      0.789147    
SubCategorySKATER DRESS                0.249785    
SubCategorySKIRTS                      0.390434    
SubCategorySUITS                       0.001094 ** 
SubCategorySWEATERS                    0.775204    
SubCategorySWEATSHIRTS                 0.280778    
SubCategoryT SHIRTS                    0.599560    
SubCategoryTOPS                        0.808262    
SubCategoryTRACK PANTS                 0.278486    
SubCategoryTRACKPANTS                  0.000125 ***
SubCategoryTROUSERS                    0.620001    
SubCategoryTUNICS                      0.510603    
SubCategoryV NECK T-SHIRTS             0.208540    
SubCategoryWAISTCOATS                  0.062529 .  
SubCategoryWINTER JACKETS              0.049796 *  
HasVendorDiscount:VendorDiscount             NA    
HasWebsiteDiscount:WebsiteDiscount           NA    
HasWebsiteDiscount:COD                 0.009777 ** 
WebsiteDiscount:COD                    1.70e-08 ***
HasWebsiteDiscount:WebsiteDiscount:COD       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 257.7 on 3674 degrees of freedom
Multiple R-squared:  0.5113,    Adjusted R-squared:  0.5039 
F-statistic: 69.88 on 55 and 3674 DF,  p-value: < 2.2e-16

Inferences

City wise

  • Orders from Kolkata tends to be amongst the cheapest
  • Orders from Delhi tends to more along the higher priced products & consequently bigger tickets

Discounts

  • Impact negatively as expected
  • We got the best fit when we interaced website Discount with COD & CODCharge
  • Probably a result of more holistic coverage of discounts as website discounts usually don't apply for COD mode

Product Category Wise

  • Ethnic Jackets & Blazers had the highest gross contributions in terms of product category
  • V-neck T-Shirts being the most discounted & under-priced category

Test for Normality

plot(fit1, 2)

plot of chunk unnamed-chunk-14

# Shapiro Test can't be done due to sample size being high
library(nortest)
ad.test(coddata$FinalTotalPrice)

    Anderson-Darling normality test

data:  coddata$FinalTotalPrice
A = 1459.7, p-value < 2.2e-16