promotion.df <- read.csv("PromotionDataV4.csv")
attach(promotion.df)
##Number of Orders and total number of columns
dim(promotion.df)
## [1] 45898 29
##Number of Unique Brands
length(unique(Brand))
## [1] 10
##Number of subcategories
length(unique(SubCategory))
## [1] 58
##Number of Order IDs
length(unique(OrderID))
## [1] 34913
##Number of Shipping Cities
length(unique(ShippingCity))
## [1] 791
##Number of Shipping Pincodes
length(unique(ShippingPincode))
## [1] 2818
##Number of Shipping States
length(unique(ShippingState))
## [1] 32
##Number of COD orders
cod.df <- subset(promotion.df, COD=='1')
nrow(cod.df)
## [1] 28144
##Number of Electronic payments
nocod.df <- subset(promotion.df, COD=='0')
nrow(nocod.df)
## [1] 17754
table(COD)
## COD
## 0 1
## 17754 28144
round(prop.table(table(COD))*100,2)
## COD
## 0 1
## 38.68 61.32
table(HasVendorDiscount)
## HasVendorDiscount
## 0 1
## 22682 23216
round(prop.table(table(HasVendorDiscount))*100,2)
## HasVendorDiscount
## 0 1
## 49.42 50.58
table(HasWebsiteDiscount)
## HasWebsiteDiscount
## 0 1
## 29871 16027
round(prop.table(table(HasWebsiteDiscount))*100,2)
## HasWebsiteDiscount
## 0 1
## 65.08 34.92
table(COD,HasVendorDiscount,HasWebsiteDiscount)
## , , HasWebsiteDiscount = 0
##
## HasVendorDiscount
## COD 0 1
## 0 4292 6350
## 1 8244 10985
##
## , , HasWebsiteDiscount = 1
##
## HasVendorDiscount
## COD 0 1
## 0 4473 2639
## 1 5673 3242
round(prop.table(table(COD,HasVendorDiscount,HasWebsiteDiscount))*100,2)
## , , HasWebsiteDiscount = 0
##
## HasVendorDiscount
## COD 0 1
## 0 9.35 13.84
## 1 17.96 23.93
##
## , , HasWebsiteDiscount = 1
##
## HasVendorDiscount
## COD 0 1
## 0 9.75 5.75
## 1 12.36 7.06
boxplot(FinalTotalPrice ~ COD+HasVendorDiscount+HasWebsiteDiscount,
horizontal = TRUE, xlab = "Final Total Price",
ylab = "COD + VENDOR + WEBSITE")
This boxplot shows that when a product has COD option, website discount but no vendor discount, the total price is the highest.
lens <- as.data.frame(aggregate(FinalTotalPrice, by = list(Brand),length))
lens[order(-lens[,2]),]
## Group.1 x
## 4 HARPA 7271
## 3 GRITSTONES 6901
## 2 FABALLEY 6899
## 7 MONTEIL & MUNERO 6539
## 5 MEIRA 5422
## 9 THE VANCA 4804
## 1 ATHENA 4140
## 6 MISS CHASE 1678
## 10 TSHIRT COMPANY 1396
## 8 MR BUTTON 848
plot(lens,type ="s")
The above table shows that HARPA (7271) has the greatest number of orders followed by GRITSTONES (6901), FABALLEY (6899) and MONTEIL & MUNERO (6539)
rev <- as.data.frame(aggregate(FinalTotalPrice, by = list(Brand),sum))
rev[order(-rev[,2]),]
## Group.1 x
## 2 FABALLEY 6169716
## 4 HARPA 5768717
## 7 MONTEIL & MUNERO 4246537
## 3 GRITSTONES 4089553
## 1 ATHENA 3945855
## 9 THE VANCA 3463846
## 5 MEIRA 2960980
## 8 MR BUTTON 1488278
## 6 MISS CHASE 1431303
## 10 TSHIRT COMPANY 1078742
The Table aboove shows that Faballey has the greatest revenue from all the listed brands
avgrev <- as.data.frame(aggregate(FinalTotalPrice, by = list(Brand),mean))
avgrev[order(-avgrev[,2]),]
## Group.1 x
## 8 MR BUTTON 1755.0447
## 1 ATHENA 953.1052
## 2 FABALLEY 894.2914
## 6 MISS CHASE 852.9814
## 4 HARPA 793.3870
## 10 TSHIRT COMPANY 772.7380
## 9 THE VANCA 721.0337
## 7 MONTEIL & MUNERO 649.4169
## 3 GRITSTONES 592.6030
## 5 MEIRA 546.1047
The above table shows that the average revenue per order for MR BUTTON is highest amongst all other brands
weekly <- weekdays(as.POSIXct(OrderDate, format="%m/%d/%Y %H:%M"))
weekly <- as.data.frame(table(weekly))
weekly[order(-weekly[,2]),]
## weekly Freq
## 1 Friday 9400
## 4 Sunday 8771
## 3 Saturday 8230
## 5 Thursday 6100
## 2 Monday 5171
## 6 Tuesday 4398
## 7 Wednesday 3828
Thus the above table shows that the last three days of the weeks constitute most of the orders and Tuesdays and Wednesdays have very low order made on the site.
The top 10 cities in terms of number of NON-COD orders are as
city <- aggregate(ShippingCity, by = list(ShippingCity), length)
#city[order(-city[,2]),][1:10,1:2]
table(ShippingCity,COD)[order(-table(ShippingCity,COD)[,1]),][1:10,]
## COD
## ShippingCity 0 1
## Bangalore 2674 2230
## New Delhi 1999 2586
## Mumbai 1627 2218
## Pune 1395 1162
## Gurgaon 1060 944
## Hyderabad 999 1000
## Noida 606 536
## Kolkata 578 1261
## Chennai 544 476
## Thane 416 505
The top 10 cities in terms of number of COD orders are as
table(ShippingCity,COD)[order(-table(ShippingCity,COD)[,2]),][1:10,]
## COD
## ShippingCity 0 1
## New Delhi 1999 2586
## Bangalore 2674 2230
## Mumbai 1627 2218
## Kolkata 578 1261
## Pune 1395 1162
## Hyderabad 999 1000
## Gurgaon 1060 944
## Noida 606 536
## Thane 416 505
## Chennai 544 476
Lets try and focus on our most famous brand HARPA. As already stated it has the most number of orders and the second best revenue amongst all the brands
harpa.df <- subset(promotion.df, Brand == "HARPA")
nrow(harpa.df)
## [1] 7271
subs <- as.data.frame(aggregate(harpa.df$SubCategory, by = list(harpa.df$SubCategory), length))
subs[order(-subs[,2]),]
## Group.1 x
## 19 TOPS 4958
## 5 DRESSES 1111
## 22 WINTER JACKETS 425
## 13 SHRUGS 148
## 6 JEGGINGS 132
## 17 SWEATERS 126
## 11 SHIRTS 93
## 20 TROUSERS 71
## 15 SKIRTS 67
## 4 CAPS & HATS 58
## 21 TUNICS 26
## 10 SHIFT DRESS 22
## 14 SKATER DRESS 9
## 2 ASYMMETRIC DRESS 7
## 1 3/4THS 3
## 7 JUMPSUITS 3
## 18 T SHIRTS 3
## 3 BODYCON DRESS 2
## 8 LEGGINGS 2
## 12 SHORTS 2
## 16 SUMMER JACKETS 2
## 9 OFF SHOULDER/TUBE/HALTER 1
Thus tops have been the highest sold items for Harpa with 4958 orders Now lets again categorized this data for finding out which which is the highest MRP Tops so as to be considered for further analysis
harpatops <- subset(harpa.df, harpa.df$SubCategory == "TOPS")
subsa <- as.data.frame(aggregate(harpatops$MRP, by = list(harpatops$MRP), length))
subsa[order(-subsa[,2]),]
## Group.1 x
## 13 999 2931
## 11 899 540
## 14 1099 524
## 10 799 256
## 19 1599 222
## 9 749 171
## 12 949 81
## 18 1499 54
## 6 599 45
## 1 299 29
## 8 699 27
## 3 499 26
## 5 559 18
## 2 349 9
## 16 1299 9
## 7 649 8
## 4 550 5
## 15 1199 2
## 17 1399 1
Thus here we can clearly see that Harpa tops with MRP has the greatest number 2931 of orders by the customers. Now let us take a subset of these tops for further analysis.
harpatops999 <-subset(harpatops, harpatops$MRP == "999")
nrow(harpatops999)
## [1] 2931
##Number of orders of each combination
table(harpatops999$HasVendorDiscount,harpatops999$HasWebsiteDiscount)
##
## 0 1
## 0 388 1040
## 1 1114 389
##Percentage of orders of each combination
round(prop.table(table(harpatops999$HasVendorDiscount,harpatops999$HasWebsiteDiscount))*100,2)
##
## 0 1
## 0 13.24 35.48
## 1 38.01 13.27
Thus whenever there is one or the other exclusive discount, the orders are high as compared to none or both discounts.
chisq.test(table(harpatops999$HasVendorDiscount,harpatops999$HasWebsiteDiscount))
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: table(harpatops999$HasVendorDiscount, harpatops999$HasWebsiteDiscount)
## X-squared = 644.11, df = 1, p-value < 2.2e-16
Thus we reject the null hypothesis that Vendor and website are independent
shapiro.test(harpatops999$FinalTotalPrice)
##
## Shapiro-Wilk normality test
##
## data: harpatops999$FinalTotalPrice
## W = 0.8724, p-value < 2.2e-16
Thus the value of p is less than 0.05 and hence we can reject the null hypothesis that the data is normal
library(coefplot)
## Loading required package: ggplot2
input <- harpatops999[,c("FinalTotalPrice","HasVendorDiscount","HasWebsiteDiscount","COD",
"MRP","CODCharge", "VendorDiscount", "WebsiteDiscount" )]
summary(glm(FinalTotalPrice~ HasVendorDiscount + HasWebsiteDiscount + COD, data = input))
##
## Call:
## glm(formula = FinalTotalPrice ~ HasVendorDiscount + HasWebsiteDiscount +
## COD, data = input)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -466.66 -48.86 -12.66 75.29 316.29
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 973.657 4.867 200.046 < 2e-16 ***
## HasVendorDiscount -332.710 4.474 -74.366 < 2e-16 ***
## HasWebsiteDiscount -190.530 4.490 -42.435 < 2e-16 ***
## COD 16.243 3.983 4.078 4.66e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 11419.5)
##
## Null deviance: 97590916 on 2930 degrees of freedom
## Residual deviance: 33424881 on 2927 degrees of freedom
## AIC: 35708
##
## Number of Fisher Scoring iterations: 2
coefplot(glm(FinalTotalPrice~ HasVendorDiscount + HasWebsiteDiscount + COD, data = input))
## Warning: Ignoring unknown aesthetics: xmin, xmax
library(visreg)
## Warning: package 'visreg' was built under R version 3.4.3
visreg(glm(FinalTotalPrice~ HasVendorDiscount + HasWebsiteDiscount + COD, data = input))
interact <- glm(FinalTotalPrice~ HasVendorDiscount + HasWebsiteDiscount + COD
+ input$HasVendorDiscount:input$HasWebsiteDiscount + input$HasVendorDiscount:COD
+ input$HasWebsiteDiscount:COD + input$HasVendorDiscount:input$HasWebsiteDiscount:COD,data = input)
summary(interact)
##
## Call:
## glm(formula = FinalTotalPrice ~ HasVendorDiscount + HasWebsiteDiscount +
## COD + input$HasVendorDiscount:input$HasWebsiteDiscount +
## input$HasVendorDiscount:COD + input$HasWebsiteDiscount:COD +
## input$HasVendorDiscount:input$HasWebsiteDiscount:COD, data = input)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -502.93 -26.57 0.00 22.43 297.22
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1048.950 7.883
## HasVendorDiscount -440.983 8.950
## HasWebsiteDiscount -297.219 8.772
## COD 26.566 9.841
## input$HasVendorDiscount:input$HasWebsiteDiscount 281.778 12.109
## COD:input$HasVendorDiscount -2.496 11.334
## COD:input$HasWebsiteDiscount -8.422 11.427
## COD:input$HasVendorDiscount:input$HasWebsiteDiscount -105.245 15.901
## t value Pr(>|t|)
## (Intercept) 133.057 < 2e-16 ***
## HasVendorDiscount -49.270 < 2e-16 ***
## HasWebsiteDiscount -33.884 < 2e-16 ***
## COD 2.700 0.00698 **
## input$HasVendorDiscount:input$HasWebsiteDiscount 23.270 < 2e-16 ***
## COD:input$HasVendorDiscount -0.220 0.82569
## COD:input$HasWebsiteDiscount -0.737 0.46120
## COD:input$HasVendorDiscount:input$HasWebsiteDiscount -6.619 4.29e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 8638.77)
##
## Null deviance: 97590916 on 2930 degrees of freedom
## Residual deviance: 25251124 on 2923 degrees of freedom
## AIC: 34894
##
## Number of Fisher Scoring iterations: 2
coefplot(interact,xmin = -400, xmax = 1200)
## Warning: Ignoring unknown aesthetics: xmin, xmax
Thus we can clearly see from this logistic regression model that having vendor or website discount only has a greater impact on bringing down prices for HARPA tops having MRP 999 rather than having both the discounts. Also COD does not affect the choice of the customers when choosig HARPA tops 999 as it has a low coefficient.
More number of people are opting for COD. However in top 10 cities, we can easily see that the ratio is more or less for both COD and electronic payments. Thus clearly indicationg that people from small cities and towns opt for COD much more than their urban counterparts. This can lead to a focus on ease of payment options while issuing a COD parcel. Perhaps some mobile device that can swipe cards or accept wallet or online payments while COD would be more helpful as there is often a problem with the exact amount of change. Also the same would be easy to track.
More number of people purchase from the site on the last three days Friday, Saturday and Sunday of the week. Tuesdays and Wednesdays have the least number of orders placed. Perhaps site could run certain additional promotional campaigns like mid-week sales to attract more customers during this period
Also we can clearly figure out that rathere than COD or Electronic payments, it is the Vendor and Website discounts that make the actual change in the purchasing behavior of the people. This can be stated from the visreg diagram.
The Brand GRITSTONES does not have any Vendor discount given by the vendor. However the sales of the brand are less when there is Website discount probably because customers consider it as Permium
Also customers tend to wear black and blue more often than the other colors probably because of the genuinity of these colors in pictures and reality as compared to the variability in other shades.
Website discounts and Vendor discounts have a better impact when combined with COD facility.
The average revenue per brand of MR BUTTON is considerably high as compared to other brands