Hence, we are using data analysis to see impact of COD on consumer behaviour, customer preferences and profits.
COD_data=read.csv("PromotionDataV4.csv",header = TRUE)
attach(COD_data)
x = read.csv("PromotionDataV4.csv")
table(x$COD)
0 1
17754 28144
View(COD_data)
summary(COD_data)
OrderItemCode OrderID OrderDate
Min. :47856780 Min. :27856296 4/10/2015 15:39: 28
1st Qu.:48822578 1st Qu.:28418744 4/10/2015 20:48: 25
Median :49663205 Median :28918355 4/10/2015 12:38: 22
Mean :49722785 Mean :28953309 4/10/2015 12:52: 21
3rd Qu.:50629792 3rd Qu.:29484854 4/10/2015 11:48: 20
Max. :51238880 Max. :29857723 4/10/2015 12:29: 20
(Other) :45762
Category SubCategory Brand
Apparels:45898 TOPS :14605 HARPA :7271
DRESSES : 8869 GRITSTONES :6901
POLO-T-SHIRTS : 3793 FABALLEY :6899
ROUND NECK T-SHIRTS: 2937 MONTEIL & MUNERO:6539
SHIRTS : 2897 MEIRA :5422
CASUAL SHIRTS : 1882 THE VANCA :4804
(Other) :10915 (Other) :8062
ProductColor ProductSize MRP FinalTotalPrice
BLACK :11357 M :12959 Min. : 299 Min. : 0.0
BLUE : 6048 L :11087 1st Qu.: 649 1st Qu.: 525.0
RED : 3570 S : 9556 Median : 950 Median : 678.0
NAVY BLUE: 3304 XL : 7350 Mean : 1015 Mean : 754.8
WHITE : 2605 XXL : 2199 3rd Qu.: 1295 3rd Qu.: 882.0
PINK : 2461 XS : 670 Max. :10999 Max. :6977.9
(Other) :16553 (Other): 2077
VATPercent VAT CODCharge VendorDiscount
Min. :5 Min. : 0.00 Min. : 0.0 Min. : 0.0
1st Qu.:5 1st Qu.: 24.95 1st Qu.: 0.0 1st Qu.: 0.0
Median :5 Median : 29.95 Median : 0.0 Median : 110.0
Mean :5 Mean : 35.29 Mean :13.7 Mean : 225.4
3rd Qu.:5 3rd Qu.: 41.85 3rd Qu.:49.0 3rd Qu.: 400.0
Max. :5 Max. :329.95 Max. :49.0 Max. :4400.0
WebsiteDiscountCode WebsiteDiscount HasVendorDiscount
NIL :29871 Min. : 0.00 Min. :0.0000
APP05 : 2686 1st Qu.: 0.00 1st Qu.:0.0000
STYLE30: 1699 Median : 0.00 Median :1.0000
WOW30 : 1605 Mean : 83.81 Mean :0.5058
SALE30 : 1299 3rd Qu.: 44.95 3rd Qu.:1.0000
RUSH30 : 1144 Max. :3149.00 Max. :1.0000
(Other): 7594
HasWebsiteDiscount CustomerID COD
Min. :0.0000 Min. : 197 Min. :0.0000
1st Qu.:0.0000 1st Qu.: 4427436 1st Qu.:0.0000
Median :0.0000 Median : 8963196 Median :1.0000
Mean :0.3492 Mean : 9014567 Mean :0.6132
3rd Qu.:1.0000 3rd Qu.:14821276 3rd Qu.:1.0000
Max. :1.0000 Max. :16007333 Max. :1.0000
ShippingName ShippingCity ShippingState
Divya : 25 Bangalore: 4904 MH : 8669
Priyanka : 25 New Delhi: 4585 KA : 5762
jol shishir vyas: 23 Mumbai : 3845 DL : 5392
Pooja : 23 Pune : 2557 UP : 3959
Neha : 21 Gurgaon : 2004 HR : 3003
Shital Shegokar : 21 Hyderabad: 1999 AP : 2840
(Other) :45760 (Other) :26004 (Other):16273
ShippingPincode ShippingAddressType BillingCity BillingState
Min. :110001 Home :20537 Bangalore: 4829 MH : 8668
1st Qu.:201012 Null :20455 New Delhi: 4707 KA : 5750
Median :400706 Office: 4906 Mumbai : 3867 DL : 5504
Mean :400189 Pune : 2513 UP : 3962
3rd Qu.:560060 Hyderabad: 1983 HR : 2916
Max. :854301 Gurgaon : 1927 AP : 2864
(Other) :26072 (Other):16234
BillingPincode BillingAddressType
Min. :110001 Home :16443
1st Qu.:201012 Null :25571
Median :400706 Office: 3884
Mean :400234
3rd Qu.:560059
Max. :900000
COD_data$COD.f <- factor(COD,
levels=c(0,1),
labels=c("Electronic","Cod"))
boxplot(FinalTotalPrice~COD.f,data=COD_data, main="Order data", xlab="Order Price", ylab="Payment Method", horizontal = TRUE)
library(gplots)
plotmeans(FinalTotalPrice ~ COD.f, data = COD_data, frame = FALSE)
mytable <- xtabs(~ COD.f+ShippingAddressType, data=COD_data)
addmargins(prop.table(mytable, 2), 1)*100 # row sum
ShippingAddressType
COD.f Home Null Office
Electronic 47.21722 25.25544 58.92784
Cod 52.78278 74.74456 41.07216
Sum 100.00000 100.00000 100.00000
chisq.test(mytable)
Pearson's Chi-squared test
data: mytable
X-squared = 3033.3, df = 2, p-value < 2.2e-16
cod_percent = c()
non_cod_percent = c()
for(i in seq(500, max(x$FinalTotalPrice), by=500)) {
cod_percent = c(cod_percent, nrow(subset(x, x$COD == 1 & x$FinalTotalPrice > i))*100/nrow(subset(x, x$COD == 1)))
non_cod_percent = c(non_cod_percent, nrow(subset(x, x$COD == 0 & x$FinalTotalPrice > i))*100/nrow(subset(x, x$COD == 0)))
i = i + 500
}
We divided the purchases into bins of 500 and found out the percent of purchases.
Null Hypothesis: Mean number of items for an order is similar for COD and Non COD transactions Alternate Hypothesis: Mean number of items for an order is not similar for COD and Non COD transactions
Test Statistics: T-test Result of T-test: t = 1.7493, df = 28164, p-value = 0.08025
Compare_QTY = aggregate(OrderItemCode, by=list(ID=OrderID, PAY=COD), length)
Compare_QTY$PAY <- as.factor(Compare_QTY$PAY)
t.test(Compare_QTY$x~ Compare_QTY$PAY,data = Compare_QTY)
Welch Two Sample t-test
data: Compare_QTY$x by Compare_QTY$PAY
t = 1.7493, df = 28164, p-value = 0.08025
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.001684438 0.029644012
sample estimates:
mean in group 0 mean in group 1
1.323247 1.309267
Null Hypothesis: Mean price for an order is similar for COD and Non COD transactions Alternate Hypothesis: Mean price for an order is not similar for COD and Non COD transactions
Test Statistics: T-test Result of T-test: t = -8.5188, df = 30373, p-value < 2.2e-16
Hence, we reject the null hypothesis.
Compare_price = aggregate(FinalTotalPrice, by=list(ID=OrderID, PAY=COD), sum)
Compare_QTY$PAY <- as.factor(Compare_QTY$PAY)
t.test(Compare_price$x~ Compare_price$PAY,data = Compare_price)
Welch Two Sample t-test
data: Compare_price$x by Compare_price$PAY
t = -8.5188, df = 30373, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-70.41672 -44.07408
sample estimates:
mean in group 0 mean in group 1
957.0355 1014.2809
Logistic Regression to find effect of shipping state, final total price and brand on the dependant variable COD/Electronic
model <- glm (COD ~ ShippingState + FinalTotalPrice+ Brand , data = COD_data)
summary(model)
Call:
glm(formula = COD ~ ShippingState + FinalTotalPrice + Brand,
data = COD_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1916 -0.5319 0.2473 0.4158 1.0273
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.069e-01 1.114e-01 0.960 0.337157
ShippingStateAP 3.798e-01 1.114e-01 3.410 0.000649 ***
ShippingStateAR 7.769e-01 4.838e-01 1.606 0.108338
ShippingStateAS 6.309e-01 1.122e-01 5.624 1.88e-08 ***
ShippingStateBR 6.658e-01 1.134e-01 5.873 4.30e-09 ***
ShippingStateCH 4.818e-01 1.128e-01 4.270 1.96e-05 ***
ShippingStateCT 6.257e-01 1.138e-01 5.500 3.83e-08 ***
ShippingStateDD 8.219e-01 1.923e-01 4.275 1.92e-05 ***
ShippingStateDL 4.014e-01 1.112e-01 3.609 0.000308 ***
ShippingStateDN 6.104e-01 1.482e-01 4.118 3.83e-05 ***
ShippingStateGA 6.535e-01 1.129e-01 5.786 7.25e-09 ***
ShippingStateGJ 5.598e-01 1.115e-01 5.018 5.23e-07 ***
ShippingStateHP 5.337e-01 1.144e-01 4.665 3.10e-06 ***
ShippingStateHR 3.614e-01 1.114e-01 3.245 0.001174 **
ShippingStateJH 6.016e-01 1.128e-01 5.333 9.71e-08 ***
ShippingStateJK 7.177e-01 1.125e-01 6.378 1.81e-10 ***
ShippingStateKA 3.238e-01 1.112e-01 2.912 0.003596 **
ShippingStateKL -1.175e-01 1.137e-01 -1.033 0.301544
ShippingStateMH 3.913e-01 1.111e-01 3.521 0.000431 ***
ShippingStateML 6.775e-01 1.146e-01 5.911 3.42e-09 ***
ShippingStateMP 5.856e-01 1.119e-01 5.234 1.67e-07 ***
ShippingStateMZ 7.755e-01 1.190e-01 6.518 7.18e-11 ***
ShippingStateOR 5.335e-01 1.119e-01 4.768 1.87e-06 ***
ShippingStatePB 5.948e-01 1.120e-01 5.313 1.08e-07 ***
ShippingStatePY 5.800e-01 1.388e-01 4.180 2.92e-05 ***
ShippingStateRJ 5.529e-01 1.118e-01 4.944 7.67e-07 ***
ShippingStateSK -1.678e-01 1.497e-01 -1.121 0.262251
ShippingStateTN 3.524e-01 1.116e-01 3.158 0.001592 **
ShippingStateTR 7.062e-01 1.285e-01 5.496 3.91e-08 ***
ShippingStateUP 4.697e-01 1.113e-01 4.221 2.44e-05 ***
ShippingStateUT 5.219e-01 1.129e-01 4.621 3.82e-06 ***
ShippingStateWB 5.169e-01 1.114e-01 4.638 3.52e-06 ***
FinalTotalPrice 1.343e-04 7.697e-06 17.453 < 2e-16 ***
BrandFABALLEY -7.890e-02 9.278e-03 -8.504 < 2e-16 ***
BrandGRITSTONES 1.496e-03 9.689e-03 0.154 0.877312
BrandHARPA -7.323e-02 9.262e-03 -7.907 2.70e-15 ***
BrandMEIRA -1.538e-02 1.022e-02 -1.506 0.132117
BrandMISS CHASE 1.791e-02 1.366e-02 1.312 0.189569
BrandMONTEIL & MUNERO -5.718e-03 9.661e-03 -0.592 0.553956
BrandMR BUTTON -1.600e-01 1.880e-02 -8.507 < 2e-16 ***
BrandTHE VANCA -4.430e-02 1.015e-02 -4.364 1.28e-05 ***
BrandTSHIRT COMPANY -4.379e-02 1.465e-02 -2.990 0.002793 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.2216896)
Null deviance: 10886 on 45897 degrees of freedom
Residual deviance: 10166 on 45856 degrees of freedom
AIC: 61153
Number of Fisher Scoring iterations: 2
plot(cod_percent, type = "l")
plot(non_cod_percent, type = "l")
As seen from both the graphs, no significant change in purchase behaviour
city <- aggregate ( x$OrderID ~ x$BillingCity+x$COD, x, length)
head(city[order((city$`x$OrderID`), decreasing = TRUE),],15)
x$BillingCity x$COD x$OrderID
1143 New Delhi 1 2645
61 Bangalore 0 2638
1110 Mumbai 1 2229
712 Bangalore 1 2191
455 New Delhi 0 2062
422 Mumbai 0 1638
498 Pune 0 1348
1045 Kolkata 1 1247
1199 Pune 1 1165
253 Gurgaon 0 1027
284 Hyderabad 0 992
953 Hyderabad 1 991
924 Gurgaon 1 900
461 Noida 0 591
359 Kolkata 0 561
Here we see city wise distribution of top cities
state <- aggregate ( x$OrderID ~ x$BillingState+x$COD, x, length)
head(state[order((state$`x$OrderID`), decreasing = TRUE),],15)
x$BillingState x$COD x$OrderID
50 MH 1 4903
17 MH 0 3765
40 DL 1 3158
15 KA 0 2902
48 KA 1 2848
61 UP 1 2555
7 DL 0 2346
63 WB 1 1636
34 AP 1 1602
45 HR 1 1543
30 UP 0 1407
43 GJ 1 1379
12 HR 0 1373
2 AP 0 1262
59 TN 1 907
Here we see state wise distribution of top states
vendor_discount <- aggregate ( x$OrderID ~ x$HasVendorDiscount+x$COD, x, length)
vendor_discount[order((vendor_discount$`x$OrderID`), decreasing = TRUE),]
x$HasVendorDiscount x$COD x$OrderID
4 1 1 14227
3 0 1 13917
2 1 0 8989
1 0 0 8765
website_discount <- aggregate ( x$OrderID ~ x$HasWebsiteDiscount+x$COD, x, length)
website_discount[order((website_discount$`x$OrderID`), decreasing = TRUE),]
x$HasWebsiteDiscount x$COD x$OrderID
3 0 1 19229
1 0 0 10642
4 1 1 8915
2 1 0 7112
x$any_discount <- ifelse(x$HasVendorDiscount | x$HasWebsiteDiscount, 1, 0)
any_discount <- aggregate ( x$OrderID ~ x$any_discount+x$COD, x, length)
any_discount[order((any_discount$`x$OrderID`), decreasing = TRUE),]
x$any_discount x$COD x$OrderID
4 1 1 19900
2 1 0 13462
3 0 1 8244
1 0 0 4292
cod_charge <- aggregate ( x$OrderID ~ x$CODCharge+x$COD, x, length)
cod_charge[order((cod_charge$`x$OrderID`), decreasing = TRUE),]
x$CODCharge x$COD x$OrderID
1 0 0 17754
2 0 1 15307
3 49 1 12837