E-Commerce is a bubble of the 21st Century. Tradional shopping will sharply decline in the upcoming years owing to introduction of the online shopping facility
Because of the increase in the usage of online shopping facility, huge amout of data is getting generated every now and then. So, it is upto the e-commerce company to utilize it effectively. There are several factors that depend on each other and the core underlying issue has to be solved.
This project addresses the following issue" Order Conversion" with respect to the cost of the product displayed on the website. Is it the minimum cost that drives more conversion of an order or is it the maximum cost or is it the shipping cost?
Our field of study concerns about the huge amount of data that is being unused for prediction purposes.
Ecommerce Websites like the following are more likely to analyse such type of data:
Product Code - Code of the Product(Unique Identifier)
Product Name - Name of the Product
LeafCat - Sub Category of the product
Category - Category of the product
Minimum Price - Minimum Price of the product
Maximum Price - Maximum Price of the product
Shipping Fee Charged To Customer - Transportation Charges
Orders- No of orders of the product
Product_Visits -No of online vists of the product without ordering them
Average Price - Average Price of the product
Product Margin - Revenue - Cost
Order Conversion - No. of orders per Product_Visits
Hypothesis H1: The order conversion is high when the average price of orders along with the shipping fee charged to the customer is low.
In order to test the hypothesis the following model was designed.
Model <- Order.Conversion ~ Average.Price + Shipping.Fee.Charged.To.Customer +Product_Visits+ Product.Margin lm(Model, data=ecommercedata.df)
It turned out that the model y <- x0+x1+x2+x3+x4 seems to be true. However Product Margin doesnot directly yield the Order Conversion.
The order conversion is high when the average price of orders along with the shipping fee charged to the customer is reasonable.
Hypothesis H2: * No. of Product Visits are higher when the category of the product is one of the Electronic Accessories*
Reading Dataset into R:
setwd("C:/Office/Capestone Project")
ecommercedata.df <- read.csv(paste("Effect of Price on Order Conversion.csv"),sep = ",")
Viewing the dataset in R:
View(ecommercedata.df)
Descriptive Statistics:
library(psych)
describe(ecommercedata.df[,c(5:12)])
## vars n mean sd median
## Minimum.Price 1 1314 612.83 1351.87 348.00
## Maximum.Price 2 1314 748.56 1646.64 429.00
## Shipping.Fee.Charged.To.Customer 3 1314 65.93 82.77 49.00
## Orders 4 1314 1741.10 4724.07 454.00
## Product_Visits 5 1314 69303.72 152102.85 21625.50
## Average.Price 6 1314 680.69 1499.25 389.00
## Product.Margin 7 1314 0.22 0.13 0.21
## Order.Conversion 8 1314 0.03 0.04 0.02
## trimmed mad min max
## Minimum.Price 383.31 208.31 17 29276.00
## Maximum.Price 470.01 257.23 21 35692.00
## Shipping.Fee.Charged.To.Customer 52.14 29.65 0 1209.00
## Orders 830.90 547.82 13 84770.00
## Product_Visits 37142.63 26522.97 13 1932477.00
## Average.Price 426.66 232.77 19 32484.00
## Product.Margin 0.21 0.10 0 0.69
## Order.Conversion 0.02 0.01 0 1.00
## range skew kurtosis se
## Minimum.Price 29259.00 12.76 229.49 37.29
## Maximum.Price 35671.00 12.79 230.28 45.43
## Shipping.Fee.Charged.To.Customer 1209.00 5.52 46.37 2.28
## Orders 84757.00 8.44 102.75 130.32
## Product_Visits 1932464.00 6.23 53.96 4196.04
## Average.Price 32465.00 12.78 229.92 41.36
## Product.Margin 0.69 0.76 0.88 0.00
## Order.Conversion 1.00 13.92 301.45 0.00
One Way Contingency Table:
table(ecommercedata.df$Category)
##
## a. Electronic Accessories b. Home Furnishings
## 380 200
## c. Footwear (Men & Women) d. Ethnic Wear (Men & Women)
## 280 234
## e. Home & Kitchen Appliances
## 220
Two Way Contingency Table:
table(ecommercedata.df$Category, ecommercedata.df$Shipping.Fee.Charged.To.Customer)
##
## 0 9 19 29 39 49 59 69 79 89 99
## a. Electronic Accessories 46 21 16 28 35 51 125 40 5 0 0
## b. Home Furnishings 47 18 33 26 26 21 11 3 0 2 0
## c. Footwear (Men & Women) 34 8 17 19 77 91 6 15 2 2 2
## d. Ethnic Wear (Men & Women) 0 0 1 1 8 27 96 37 17 3 3
## e. Home & Kitchen Appliances 4 0 1 1 2 12 38 48 12 7 3
##
## 109 119 129 139 149 159 169 179 189 199 209
## a. Electronic Accessories 0 0 4 1 0 0 0 0 0 0 0
## b. Home Furnishings 0 1 5 1 1 0 0 0 3 0 0
## c. Footwear (Men & Women) 0 0 0 3 0 0 0 0 2 1 1
## d. Ethnic Wear (Men & Women) 3 0 22 6 3 0 1 2 3 0 0
## e. Home & Kitchen Appliances 3 13 10 5 7 4 7 1 5 2 3
##
## 229 239 249 259 269 289 299 319 339 379 399
## a. Electronic Accessories 0 1 0 0 0 0 0 0 0 4 1
## b. Home Furnishings 0 0 0 0 0 1 0 0 0 0 0
## c. Footwear (Men & Women) 0 0 0 0 0 0 0 0 0 0 0
## d. Ethnic Wear (Men & Women) 0 0 0 0 0 0 0 1 0 0 0
## e. Home & Kitchen Appliances 1 1 3 2 4 0 1 1 2 0 0
##
## 409 419 429 489 519 559 599 649 719 739
## a. Electronic Accessories 2 0 0 0 0 0 0 0 0 0
## b. Home Furnishings 1 0 0 0 0 0 0 0 0 0
## c. Footwear (Men & Women) 0 0 0 0 0 0 0 0 0 0
## d. Ethnic Wear (Men & Women) 0 0 0 0 0 0 0 0 0 0
## e. Home & Kitchen Appliances 1 2 1 1 4 1 2 1 1 2
##
## 1209
## a. Electronic Accessories 0
## b. Home Furnishings 0
## c. Footwear (Men & Women) 0
## d. Ethnic Wear (Men & Women) 0
## e. Home & Kitchen Appliances 1
BoxPlot:
boxplot(ecommercedata.df$Minimum.Price~ecommercedata.df$Category,main= "Minimum Price per Category",ylab="Category",xlab="Minimum Price ", horizontal = TRUE)
boxplot(ecommercedata.df$Maximum.Price~ecommercedata.df$Category,main= "Maximum Price per Category",ylab="Category",xlab="Maximum Price ", horizontal = TRUE)
Histograms:
hist(ecommercedata.df$Product_Visits, breaks = 10000, xlim=c(0,20000), col="red")
hist(ecommercedata.df$Orders, breaks = 1000, xlim=c(0,5000), col="green")
Correlation Matrix:
ecommercenumerical <-ecommercedata.df[,c(5:12)]
cor(ecommercenumerical)
## Minimum.Price Maximum.Price
## Minimum.Price 1.00000000 0.99998775
## Maximum.Price 0.99998775 1.00000000
## Shipping.Fee.Charged.To.Customer 0.24575070 0.24398256
## Orders -0.08599098 -0.08609678
## Product_Visits -0.09005734 -0.08996123
## Average.Price 0.99999631 0.99999751
## Product.Margin -0.30495104 -0.30439349
## Order.Conversion -0.01717065 -0.01688505
## Shipping.Fee.Charged.To.Customer
## Minimum.Price 0.24575070
## Maximum.Price 0.24398256
## Shipping.Fee.Charged.To.Customer 1.00000000
## Orders -0.11091960
## Product_Visits -0.10045665
## Average.Price 0.24478046
## Product.Margin -0.36902157
## Order.Conversion -0.04050421
## Orders Product_Visits Average.Price
## Minimum.Price -0.08599098 -0.09005734 0.99999631
## Maximum.Price -0.08609678 -0.08996123 0.99999751
## Shipping.Fee.Charged.To.Customer -0.11091960 -0.10045665 0.24478046
## Orders 1.00000000 0.86298159 -0.08604934
## Product_Visits 0.86298159 1.00000000 -0.09000483
## Average.Price -0.08604934 -0.09000483 1.00000000
## Product.Margin 0.12870554 0.10223398 -0.30464578
## Order.Conversion 0.04013951 -0.05048461 -0.01701386
## Product.Margin Order.Conversion
## Minimum.Price -0.304951039 -0.017170647
## Maximum.Price -0.304393487 -0.016885054
## Shipping.Fee.Charged.To.Customer -0.369021574 -0.040504212
## Orders 0.128705543 0.040139513
## Product_Visits 0.102233984 -0.050484613
## Average.Price -0.304645781 -0.017013864
## Product.Margin 1.000000000 -0.005047775
## Order.Conversion -0.005047775 1.000000000
Visualizing using Corrgram:
library(corrgram)
corrgram(ecommercenumerical)
Scatterplot:
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(ecommercedata.df$Minimum.Price~ecommercedata.df$Orders, ylim= c(0,2000), xlim=c(0,50000))
scatterplot(ecommercedata.df$Maximum.Price~ecommercedata.df$Orders, ylim= c(0,2000), xlim=c(0,50000))
High Minimum and Maximum Price results in less conversion of the item orders from visits.
T-tests:
t.test(ecommercedata.df$Minimum.Price, ecommercedata.df$Average.Price)
##
## Welch Two Sample t-test
##
## data: ecommercedata.df$Minimum.Price and ecommercedata.df$Average.Price
## t = -1.2186, df = 2598.4, p-value = 0.2231
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -177.06527 41.33925
## sample estimates:
## mean of x mean of y
## 612.8295 680.6925
p- value > 0.05. Hence, there is no significant difference between Minimum Price and the average Price.Hence we will consider the Average Price for calculating the order conversion
t.test(ecommercedata.df$Maximum.Price, ecommercedata.df$Average.Price)
##
## Welch Two Sample t-test
##
## data: ecommercedata.df$Maximum.Price and ecommercedata.df$Average.Price
## t = 1.1047, df = 2603.2, p-value = 0.2694
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -52.60063 188.32666
## sample estimates:
## mean of x mean of y
## 748.5556 680.6925
p- value > 0.05. Hence, there is no significant difference between Maximum Price and the average Price.Hence we will consider the Average Price for calculating the order conversion
cor(ecommercenumerical)
## Minimum.Price Maximum.Price
## Minimum.Price 1.00000000 0.99998775
## Maximum.Price 0.99998775 1.00000000
## Shipping.Fee.Charged.To.Customer 0.24575070 0.24398256
## Orders -0.08599098 -0.08609678
## Product_Visits -0.09005734 -0.08996123
## Average.Price 0.99999631 0.99999751
## Product.Margin -0.30495104 -0.30439349
## Order.Conversion -0.01717065 -0.01688505
## Shipping.Fee.Charged.To.Customer
## Minimum.Price 0.24575070
## Maximum.Price 0.24398256
## Shipping.Fee.Charged.To.Customer 1.00000000
## Orders -0.11091960
## Product_Visits -0.10045665
## Average.Price 0.24478046
## Product.Margin -0.36902157
## Order.Conversion -0.04050421
## Orders Product_Visits Average.Price
## Minimum.Price -0.08599098 -0.09005734 0.99999631
## Maximum.Price -0.08609678 -0.08996123 0.99999751
## Shipping.Fee.Charged.To.Customer -0.11091960 -0.10045665 0.24478046
## Orders 1.00000000 0.86298159 -0.08604934
## Product_Visits 0.86298159 1.00000000 -0.09000483
## Average.Price -0.08604934 -0.09000483 1.00000000
## Product.Margin 0.12870554 0.10223398 -0.30464578
## Order.Conversion 0.04013951 -0.05048461 -0.01701386
## Product.Margin Order.Conversion
## Minimum.Price -0.304951039 -0.017170647
## Maximum.Price -0.304393487 -0.016885054
## Shipping.Fee.Charged.To.Customer -0.369021574 -0.040504212
## Orders 0.128705543 0.040139513
## Product_Visits 0.102233984 -0.050484613
## Average.Price -0.304645781 -0.017013864
## Product.Margin 1.000000000 -0.005047775
## Order.Conversion -0.005047775 1.000000000
Even after the co-relation tests, we can conclude that the the average price can be used for calculating the order conversion of a product visit.
Regression Model:
Model <- Order.Conversion ~ Average.Price + Shipping.Fee.Charged.To.Customer +Product_Visits+ Product.Margin
lm(Model, data=ecommercedata.df)
##
## Call:
## lm(formula = Model, data = ecommercedata.df)
##
## Coefficients:
## (Intercept) Average.Price
## 3.374e-02 -4.348e-07
## Shipping.Fee.Charged.To.Customer Product_Visits
## -2.389e-05 -1.409e-08
## Product.Margin
## -7.242e-03