K Chandirasekaran
12/07/2018
Reading Data
coddata<-read.csv("C:/Users/Dinesh Sambasivam/Downloads/PromotionDataV4.csv")
Breaking myth that Premium customers go for electronic transactions while low end transactions are usually done on COD mode Tests show the percentage split for both quantity of orders as well as value of orders are very similar
mytable1 <- with(coddata, table(COD))
prop.table(mytable1)*100
COD
0 1
38.68142 61.31858
mytable2 <- aggregate(coddata$FinalTotalPrice ~ coddata$COD, FUN = sum)
mytable2
coddata$COD coddata$FinalTotalPrice
1 0 12840545
2 1 21802982
prop.table(mytable2)*100
coddata$COD coddata$FinalTotalPrice
1 0.000000e+00 37.06477
2 2.886542e-06 62.93522
Average Price comparison between COD & non-COD orders
Has_COD = subset(coddata, coddata$COD == 1 )
Not_COD = subset(coddata, coddata$COD == 0)
# Average price paid with COD
mean(Has_COD$FinalTotalPrice- Has_COD$CODCharge)
[1] 752.344
mean(Not_COD$FinalTotalPrice)
[1] 723.248
t.test(coddata$FinalTotalPrice~ coddata$COD)
Welch Two Sample t-test
data: coddata$FinalTotalPrice by coddata$COD
t = -16.261, df = 42172, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-57.64667 -45.24489
sample estimates:
mean in group 0 mean in group 1
723.2480 774.6938
# Welch Two sample T-Test
Analysis of the Test As the P value of the test is less than 0.05, we can say the mean average price of purchases with COD is significantly higher than that of without COD
Are people more likely / less likely to use COD when they buy under promotions compared to when they buy without promotions?
Metro vs Non-Metro preferences
Does the customer's location (captured by Pin code/Shipping State) affects his/her choice of payment method?
Are premium priced goods less discounted? (analysing impact of MRP on discounts)
mytable3 <- xtabs(~coddata$COD+coddata$HasVendorDiscount, data= coddata )
mytable3
coddata$HasVendorDiscount
coddata$COD 0 1
0 8765 8989
1 13917 14227
mytable4 <- xtabs(~coddata$COD+coddata$HasWebsiteDiscount, data= coddata )
mytable4
coddata$HasWebsiteDiscount
coddata$COD 0 1
0 10642 7112
1 19229 8915
Website Discounts have been successful to some extent to drive people to move out of COD to electronic payments
# For vendor Discount COD Behaviour
chisq.test(mytable3)
Pearson's Chi-squared test with Yates' continuity correction
data: mytable3
X-squared = 0.024833, df = 1, p-value = 0.8748
Statistically not significant
# For Website discount vs COD Behaviour
chisq.test(mytable4)
Pearson's Chi-squared test with Yates' continuity correction
data: mytable4
X-squared = 336.21, df = 1, p-value < 2.2e-16
Statistically Significant impact of website discount to drive people to electronic payments
Metro <- coddata[which(coddata$ShippingCity==c("Kolkata", "New Delhi", "Bangalore", "Mumbai")),]
metrotable <- table(Metro$COD)
prop.table(metrotable)*100
0 1
45.25469 54.74531
NonMetro <- coddata[which(coddata$ShippingCity!=c("Kolkata", "New Delhi", "Bangalore", "Mumbai")),]
Nonmetrotable <- table(NonMetro$COD)
prop.table(Nonmetrotable)*100
0 1
38.09998 61.90002
62% of non-metro orders serviced through COD against 55% of metro orders
Premium <- coddata[which(coddata$FinalTotalPrice >= 1000),]
premiumtable <- table(Premium$COD)
prop.table(premiumtable)*100
0 1
32.08508 67.91492
NonPremium <- coddata[which(coddata$FinalTotalPrice < 500),]
NonPremiumtable <- table(NonPremium$COD)
prop.table(NonPremiumtable)*100
0 1
41.32364 58.67636
Contrary to popular belief we see that there is a higher proportion of E-transaction orders (over 10% lead) on items of low value. This leads to the insight that people prefer COD more on account of security rather than as a measure of premium/convenience
FinalTotalPrice = MRP + VAT + CODCharge - VendorDiscount - WebsiteDiscount
modelPrimary <- FinalTotalPrice ~ MRP + VAT + CODCharge - WebsiteDiscount - VendorDiscount
fit1 <- lm(modelPrimary, data = coddata)
summary(fit1)
Call:
lm(formula = modelPrimary, data = coddata)
Residuals:
Min 1Q Median 3Q Max
-3.301e-09 -2.000e-13 -2.000e-13 -1.000e-13 8.710e-09
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.694e-11 5.121e-13 1.112e+02 <2e-16 ***
MRP 4.107e-14 7.606e-16 5.400e+01 <2e-16 ***
VAT 2.100e+01 2.183e-14 9.619e+14 <2e-16 ***
CODCharge 1.000e+00 9.325e-15 1.072e+14 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.356e-11 on 45894 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 9.519e+29 on 3 and 45894 DF, p-value: < 2.2e-16
modelSecondary <- FinalTotalPrice ~ ShippingCity + HasVendorDiscount * VendorDiscount + HasWebsiteDiscount * WebsiteDiscount * COD + SubCategory
fit1 <- lm(modelSecondary, data = Metro)
summary(fit1)
Call:
lm(formula = modelSecondary, data = Metro)
Residuals:
Min 1Q Median 3Q Max
-1139.0 -121.1 -23.2 99.3 4178.5
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value
(Intercept) 865.08998 258.39714 3.348
ShippingCityKolkata 5.66181 14.30785 0.396
ShippingCityMumbai -0.04755 11.33215 -0.004
ShippingCityNew Delhi 31.31575 10.82669 2.892
HasVendorDiscount -385.54866 18.13870 -21.256
VendorDiscount 0.29502 0.03767 7.831
HasWebsiteDiscount -86.57308 18.46639 -4.688
WebsiteDiscount 0.15930 0.06244 2.551
COD 36.65234 10.95589 3.345
SubCategoryBODYCON DRESS 422.23330 364.67559 1.158
SubCategoryCAMISOLES -224.31828 288.52701 -0.777
SubCategoryCAPRIS 267.28824 282.58999 0.946
SubCategoryCAPS & HATS 46.29319 266.36623 0.174
SubCategoryCASUAL JACKETS 631.95943 282.25506 2.239
SubCategoryCASUAL SHIRTS 316.78845 259.06512 1.223
SubCategoryCHINOS 815.23762 263.91584 3.089
SubCategoryDENIM JACKETS 76.83499 316.35539 0.243
SubCategoryDRESSES 262.36196 258.20127 1.016
SubCategoryETHNIC JACKETS 1323.28008 276.29235 4.789
SubCategoryFORMAL SHIRTS 595.21630 365.54916 1.628
SubCategoryFORMAL TROUSERS 915.58483 364.88590 2.509
SubCategoryHENLEY T-SHIRTS -311.63458 259.15396 -1.203
SubCategoryHIGH NECK T SHIRTS -251.43315 259.95856 -0.967
SubCategoryJACKETS & BLAZERS 1979.47495 287.55775 6.884
SubCategoryJEGGINGS 275.00532 264.54142 1.040
SubCategoryJUMPSUITS 372.09137 267.94381 1.389
SubCategoryKURTAS -49.67643 267.09125 -0.186
SubCategoryKURTIS -258.51949 273.71337 -0.944
SubCategoryMANDARIN T-SHIRTS 37.42923 288.38111 0.130
SubCategoryNIGHTWEAR -105.24772 364.76848 -0.289
SubCategoryPOLO-T-SHIRTS -201.71458 258.49050 -0.780
SubCategoryROUND NECK SWEATERS 17.51348 365.07882 0.048
SubCategoryROUND NECK T-SHIRTS -195.28882 258.55923 -0.755
SubCategoryROUND NECK VESTS -167.96162 297.85967 -0.564
SubCategorySALWAR -177.19412 278.60326 -0.636
SubCategorySCARVES -204.71095 364.76646 -0.561
SubCategorySHIFT DRESS 87.47713 288.35038 0.303
SubCategorySHIRTS 30.00966 258.53944 0.116
SubCategorySHORTS -130.24956 262.96009 -0.495
SubCategorySHRUGS 70.69059 264.32520 0.267
SubCategorySKATER DRESS 420.50531 365.32198 1.151
SubCategorySKIRTS 222.98041 259.60213 0.859
SubCategorySUITS 958.14318 293.20562 3.268
SubCategorySWEATERS 77.27810 270.58529 0.286
SubCategorySWEATSHIRTS 321.25067 297.80370 1.079
SubCategoryT SHIRTS -136.22406 259.43541 -0.525
SubCategoryTOPS -62.62031 258.02983 -0.243
SubCategoryTRACK PANTS 302.31862 278.92060 1.084
SubCategoryTRACKPANTS 1401.54587 364.88803 3.841
SubCategoryTROUSERS -128.76370 259.66138 -0.496
SubCategoryTUNICS -177.27655 269.43263 -0.658
SubCategoryV NECK T-SHIRTS -325.74086 258.97442 -1.258
SubCategoryWAISTCOATS 491.02387 263.55327 1.863
SubCategoryWINTER JACKETS 512.85049 261.34369 1.962
HasVendorDiscount:VendorDiscount NA NA NA
HasWebsiteDiscount:WebsiteDiscount NA NA NA
HasWebsiteDiscount:COD 63.41220 24.53117 2.585
WebsiteDiscount:COD -0.40019 0.07079 -5.653
HasWebsiteDiscount:WebsiteDiscount:COD NA NA NA
Pr(>|t|)
(Intercept) 0.000822 ***
ShippingCityKolkata 0.692339
ShippingCityMumbai 0.996652
ShippingCityNew Delhi 0.003845 **
HasVendorDiscount < 2e-16 ***
VendorDiscount 6.28e-15 ***
HasWebsiteDiscount 2.86e-06 ***
WebsiteDiscount 0.010771 *
COD 0.000830 ***
SubCategoryBODYCON DRESS 0.247008
SubCategoryCAMISOLES 0.436937
SubCategoryCAPRIS 0.344286
SubCategoryCAPS & HATS 0.862036
SubCategoryCASUAL JACKETS 0.025218 *
SubCategoryCASUAL SHIRTS 0.221478
SubCategoryCHINOS 0.002023 **
SubCategoryDENIM JACKETS 0.808115
SubCategoryDRESSES 0.309642
SubCategoryETHNIC JACKETS 1.74e-06 ***
SubCategoryFORMAL SHIRTS 0.103551
SubCategoryFORMAL TROUSERS 0.012142 *
SubCategoryHENLEY T-SHIRTS 0.229244
SubCategoryHIGH NECK T SHIRTS 0.333505
SubCategoryJACKETS & BLAZERS 6.83e-12 ***
SubCategoryJEGGINGS 0.298615
SubCategoryJUMPSUITS 0.165011
SubCategoryKURTAS 0.852462
SubCategoryKURTIS 0.344982
SubCategoryMANDARIN T-SHIRTS 0.896739
SubCategoryNIGHTWEAR 0.772955
SubCategoryPOLO-T-SHIRTS 0.435232
SubCategoryROUND NECK SWEATERS 0.961741
SubCategoryROUND NECK T-SHIRTS 0.450120
SubCategoryROUND NECK VESTS 0.572860
SubCategorySALWAR 0.524810
SubCategorySCARVES 0.574688
SubCategorySHIFT DRESS 0.761624
SubCategorySHIRTS 0.907600
SubCategorySHORTS 0.620403
SubCategorySHRUGS 0.789147
SubCategorySKATER DRESS 0.249785
SubCategorySKIRTS 0.390434
SubCategorySUITS 0.001094 **
SubCategorySWEATERS 0.775204
SubCategorySWEATSHIRTS 0.280778
SubCategoryT SHIRTS 0.599560
SubCategoryTOPS 0.808262
SubCategoryTRACK PANTS 0.278486
SubCategoryTRACKPANTS 0.000125 ***
SubCategoryTROUSERS 0.620001
SubCategoryTUNICS 0.510603
SubCategoryV NECK T-SHIRTS 0.208540
SubCategoryWAISTCOATS 0.062529 .
SubCategoryWINTER JACKETS 0.049796 *
HasVendorDiscount:VendorDiscount NA
HasWebsiteDiscount:WebsiteDiscount NA
HasWebsiteDiscount:COD 0.009777 **
WebsiteDiscount:COD 1.70e-08 ***
HasWebsiteDiscount:WebsiteDiscount:COD NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 257.7 on 3674 degrees of freedom
Multiple R-squared: 0.5113, Adjusted R-squared: 0.5039
F-statistic: 69.88 on 55 and 3674 DF, p-value: < 2.2e-16
City wise
Discounts
Product Category Wise
plot(fit1, 2)
# Shapiro Test can't be done due to sample size being high
library(nortest)
ad.test(coddata$FinalTotalPrice)
Anderson-Darling normality test
data: coddata$FinalTotalPrice
A = 1459.7, p-value < 2.2e-16