Group 2
Darshan | Nishant | Ramya | Rohan | Surbhi | Vivek
Combine data based on order ID to analyze average price, demand and revenue per customer
round(mean(ecomm$Price),2)
[1] 992.28
round(mean(ecomm$Demand),1)
[1] 1.3
round(mean(ecomm$Revenue),2)
[1] 1626.13
var(ecomm$Price)
[1] 389637.2
var(ecomm$Demand)
[1] 0.5241948
var(ecomm$Revenue)
[1] 8760974
Analysis: Testing if statistcally significant difference exists between price paid per OrderID with/without COD payment option
aggregate(ecomm$Price, by=list(ecomm$COD), FUN=mean)
Group.1 x
1 0 957.0355
2 1 1014.2809
aggregate(ecomm$Price, by=list(ecomm$COD), FUN=var)
Group.1 x
1 0 346743.0
2 1 415168.1
shapiro.test(ecomm$Price[1:5000])
Shapiro-Wilk normality test
data: ecomm$Price[1:5000]
W = 0.77411, p-value < 2.2e-16
The data is not normal
wilcox.test(ecomm$Price~ecomm$COD)
Wilcoxon rank sum test with continuity correction
data: ecomm$Price by ecomm$COD
W = 134560000, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
Hence we can conclude that there is significant difference in the average product price in payments with/without COD. COD payments are of higher value
Analysis: Testing if statistcally significant difference exists between no.of products sold with/without COD payment option
aggregate(ecomm$Demand, by=list(ecomm$COD), FUN=mean)
Group.1 x
1 0 1.323247
2 1 1.309267
aggregate(ecomm$Demand, by=list(ecomm$COD), FUN=var)
Group.1 x
1 0 0.5333243
2 1 0.5184459
shapiro.test(ecomm$Demand[1:5000])
Shapiro-Wilk normality test
data: ecomm$Demand[1:5000]
W = 0.50564, p-value < 2.2e-16
The data is not normal
wilcox.test(ecomm$Demand~ecomm$COD)
Wilcoxon rank sum test with continuity correction
data: ecomm$Demand by ecomm$COD
W = 145840000, p-value = 0.01286
alternative hypothesis: true location shift is not equal to 0
At 1% significance level, we cannot reject that ordersize with/without COD are same.
Analysis: Testing if statistcally significant difference exists between average revenue earnt per orderID with/without COD payment option
aggregate(ecomm$Revenue, by=list(ecomm$COD), FUN=mean)
Group.1 x
1 0 1586.547
2 1 1650.838
aggregate(ecomm$Revenue, by=list(ecomm$COD), FUN=var)
Group.1 x
1 0 9784098
2 1 8121215
shapiro.test(ecomm$Revenue[1:5000])
Shapiro-Wilk normality test
data: ecomm$Revenue[1:5000]
W = 0.24022, p-value < 2.2e-16
The data is not normal
wilcox.test(ecomm$Revenue~ecomm$COD)
Wilcoxon rank sum test with continuity correction
data: ecomm$Revenue by ecomm$COD
W = 136670000, p-value < 2.2e-16
alternative hypothesis: true location shift is not equal to 0
We can conclude that the average revenue pe order for the firm under with/without COD payment option is different.
Analysis Areas:
Brand_aov<-aov(Brands$FinalTotalPrice~Brands$Brand)
summary(Brand_aov)
Df Sum Sq Mean Sq F value Pr(>F)
Brands$Brand 9 1.669e+09 185411790 2268 <2e-16 ***
Residuals 45888 3.751e+09 81737
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Brands$Discount<-Brands$VendorDiscount+Brands$WebsiteDiscount
Brand_aov1<-aov(Brands$Discount~Brands$Brand)
summary(Brand_aov1)
Df Sum Sq Mean Sq F value Pr(>F)
Brands$Brand 9 1.148e+09 127513567 2565 <2e-16 ***
Residuals 45888 2.281e+09 49709
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Brands$COD<-factor(Brands$COD)
Brand_aov2<-aov(Brands$FinalTotalPrice~Brands$Brand+Brands$COD+Brands$Brand:Brands$COD)
summary(Brand_aov2)
Df Sum Sq Mean Sq F value Pr(>F)
Brands$Brand 9 1.669e+09 185411790 2291.29 <2e-16 ***
Brands$COD 1 2.686e+07 26862010 331.96 <2e-16 ***
Brands$Brand:Brands$COD 9 1.144e+07 1271426 15.71 <2e-16 ***
Residuals 45878 3.712e+09 80920
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Top Expensive Brands sold by the brand
subset.AvgBrandPrice.Group.1..AvgBrandPrice.FinalTotalPrice...
1 ATHENA
2 FABALLEY
3 MISS CHASE
4 MR BUTTON
Products<-ecommrough[c(5,10,14,16,20)]
Products$Discount<-Products$VendorDiscount+Products$WebsiteDiscount
Product_aov<-aov(Products$FinalTotalPrice~Products$SubCategory)
summary(Product_aov)
Df Sum Sq Mean Sq F value Pr(>F)
Products$SubCategory 57 2.093e+09 36714202 505.9 <2e-16 ***
Residuals 45840 3.327e+09 72573
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AvgProdPrice<-aggregate(Products,by=list(Products$SubCategory),FUN=mean)
AvgProdPrice_Top<-data.frame(subset(AvgProdPrice,AvgProdPrice$FinalTotalPrice>2000))
View(AvgProdPrice_Top)
Products$COD<-factor(Products$COD)
Product_aov1<-aov(Products$Discount~Products$SubCategory)
summary(Product_aov1)
Df Sum Sq Mean Sq F value Pr(>F)
Products$SubCategory 57 1.397e+09 24500557 552.7 <2e-16 ***
Residuals 45840 2.032e+09 44331
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Product_aov2<-aov(Products$FinalTotalPrice~Products$SubCategory+Products$COD+Products$SubCategory:Products$COD)
summary(Product_aov2)
Df Sum Sq Mean Sq F value
Products$SubCategory 57 2.093e+09 36714202 509.801
Products$COD 1 1.764e+07 17636017 244.888
Products$SubCategory:Products$COD 49 1.149e+07 234429 3.255
Residuals 45790 3.298e+09 72017
Pr(>F)
Products$SubCategory < 2e-16 ***
Products$COD < 2e-16 ***
Products$SubCategory:Products$COD 1.36e-13 ***
Residuals
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
subset.AvgProdPrice.Group.1..AvgProdPrice.Discount...1000.
1 CASUAL JACKETS
2 JACKETS & BLAZERS
3 SUITS
subset.AvgProdPrice.Group.1..AvgProdPrice.WebsiteDiscount...500.
1 ETHNIC JACKETS
2 FORMAL SHIRTS
discount<-ecommrough[,c(17,18,20)]
model <- glm(COD ~.,family=binomial(link='logit'),data=discount)
anova(model, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: COD
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 45897 61256
HasVendorDiscount 1 0.03 45896 61256 0.8672
HasWebsiteDiscount 1 350.40 45895 60905 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
bs<-ecommrough[,c(20,31)]
model <- glm(COD ~.,family=binomial(link='logit'),data=bs)
anova(model, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: COD
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 45897 61256
address 1 16.191 45896 61239 5.728e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
timing<-ecommrough[,c(20,32)]
model <- glm(COD ~.,family=binomial(link='logit'),data=timing)
anova(model, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: COD
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 45897 61256
time 1 10.336 45896 61245 0.001305 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
metrocity<-ecommrough[,c(20,30)]
model <- glm(COD ~.,family=binomial(link='logit'),data=metrocity)
anova(model, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: COD
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 45897 61256
metro 1 788.61 45896 60467 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Logistic Regression based on the following factors
regressfcn<-ecommrough[,c(10,13,14,16,20,30,31)]
train <- regressfcn[c(1:15000),]
test <- regressfcn[c(15001:45898),]
model <- glm(COD ~.,family=binomial(link='logit'),data=train)
anova(model, test="Chisq")
Analysis of Deviance Table
Model: binomial, link: logit
Response: COD
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 14999 20265
FinalTotalPrice 1 93.5 14998 20171 < 2.2e-16 ***
CODCharge 1 4280.0 14997 15891 < 2.2e-16 ***
VendorDiscount 1 49.2 14996 15842 2.294e-12 ***
WebsiteDiscount 1 16.8 14995 15825 4.133e-05 ***
metro 1 184.9 14994 15640 < 2.2e-16 ***
address 1 25.3 14993 15615 4.907e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fitted.results <- predict(model,test,type='response')
fitted.results <- ifelse(fitted.results > 0.5,1,0)
misClasificError <- mean(fitted.results != test$COD)
print(paste('Accuracy',1-misClasificError))
[1] "Accuracy 0.691468703475953"