Japan e-commerce average order value determined factors

Project Context

Aimed to identify the average order value (“purchase_amount” in the data set) determined factors and propose the sales boosting recommendatinos

Data Pre-processing

require("tidyverse")

## Loading required package: tidyverse

## ─ Attaching packages ──────────────────── tidyverse 1.2.1 ─

## ✔ ggplot2 3.1.0     ✔ purrr   0.2.4
## ✔ tibble  1.4.2     ✔ dplyr   0.7.5
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## ─ Conflicts ────────────────────── tidyverse_conflicts() ─
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

test.table<-read_csv("test_table.csv")

## Parsed with column specification:
## cols(
##   user_id = col_integer(),
##   date = col_character(),
##   device = col_character(),
##   test = col_integer(),
##   purchase_amount = col_double()
## )

user.table<-read_csv("user_table.csv")

## Parsed with column specification:
## cols(
##   user_id = col_integer(),
##   gender = col_character(),
##   service = col_character(),
##   age = col_integer(),
##   country = col_character()
## )

head(test.table)

## # A tibble: 6 x 5
##   user_id date     device       test purchase_amount
##     <int> <chr>    <chr>       <int>           <dbl>
## 1    1983 2017/1/1 pc_web          0            33.7
## 2    4704 2017/1/1 android_app     1            30.2
## 3   14995 2017/1/1 ios_app         0            37.2
## 4   15089 2017/1/1 pc_web          0            49.8
## 5   17522 2017/1/1 pc_web          0            19.7
## 6     186 2017/1/1 pc_web          1            33.6

head(user.table)

## # A tibble: 6 x 5
##   user_id gender service   age country
##     <int> <chr>  <chr>   <int> <chr>  
## 1       1 male   golden     20 US     
## 2       2 male   premium    53 US     
## 3       3 female premium    44 TW     
## 4       4 male   golden     44 UK     
## 5       5 male   normal     20 US     
## 6       6 female golden     26 UK

print(paste(" The non-repetitive users which both appear in the test.table and the user.table",
            nrow(unique(inner_join(select(test.table,user_id),
                                   select(user.table, user_id), by="user_id")))))

## [1] " The non-repetitive users which both appear in the test.table and the user.table 19871"

test.data<-left_join(test.table,
                     user.table,
                     "user_id")
test.data$date<-as.Date(test.table$date, format = "%Y/%m/%d")
for (i in c(3,4,6,7,9)) {
  test.data[, i] <-as.factor(test.data[[i]])
}
test.data.jp <- test.data[test.data$country=="JP",]
head(test.data.jp)

## # A tibble: 6 x 9
##   user_id date       device     test  purchase_amount gender service   age
##     <int> <date>     <fct>      <fct>           <dbl> <fct>  <fct>   <int>
## 1    4858 2017-01-01 android_a… 1                49.9 male   golden     14
## 2   12858 2017-01-01 pc_web     1                40.2 male   premium    40
## 3   16588 2017-01-01 pc_web     0                42.2 male   premium    18
## 4    5997 2017-01-01 android_a… 0                41.3 male   premium    32
## 5    4196 2017-01-01 pc_web     1                44.8 female premium    50
## 6     680 2017-01-01 ios_app    1                28.5 male   premium    39
## # ... with 1 more variable: country <fct>

Data Analysis

Key Message 1

Recommend to adopt the old button design because the purchase amount is higher

Conduct the 2 sample t test to test the hypothesis : the purchase amount of test group > control group, and we accept the hypothesis

t.test(test.data.jp[test.data.jp$test==0,]$purchase_amount,
       test.data.jp[test.data.jp$test==1,]$purchase_amount,
       alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  test.data.jp[test.data.jp$test == 0, ]$purchase_amount and test.data.jp[test.data.jp$test == 1, ]$purchase_amount
## t = 63.113, df = 9953.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  8.825337      Inf
## sample estimates:
## mean of x mean of y 
##  43.56543  34.50391

Key Message 2

Discover 3 factors which are statistically significant to influence the purchase amount : button design, customer device and membership class

Conduct ANOVA to understand the if there are statistically significant differences between the following 4 factors : button design (test), customer device, customer gender and membership class (service)

aov.model<-aov(
  purchase_amount~test+device+gender+service, 
  test.data.jp
)
summary(aov.model)

##               Df Sum Sq Mean Sq  F value Pr(>F)    
## test           1 204387  204387 5434.724 <2e-16 ***
## device         2  43561   21780  579.146 <2e-16 ***
## gender         1      3       3    0.082  0.774    
## service        2  93222   46611 1239.405 <2e-16 ***
## Residuals   9950 374196      38                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Key Message 2-1 - Membership program

Recommend to reconsider the premium membership program design because the average order value is quite close to the normal group

Decide whether to optimize or cancel the premium membership program to reach the expected profits by designing the membership program

ggplot(test.data.jp,
       aes(x=service,y=purchase_amount, color=service))+
         geom_boxplot()+
         xlab("Membership Level") + ylab("Purchase Amount")+
         ggtitle("The dofference in avg. order value isn't prominent between the premium and the normal membership group")+
         theme_bw()

TukeyHSD(aov.model, "service")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = purchase_amount ~ test + device + gender + service, data = test.data.jp)
## 
## $service
##                     diff       lwr       upr p adj
## normal-golden  -7.272220 -7.671011 -6.873428     0
## premium-golden -5.568421 -5.889350 -5.247493     0
## premium-normal  1.703798  1.306328  2.101268     0

plot(TukeyHSD(aov.model,"service"))

Key Message 2-2 - Button design

Recommend to adopt the old button design because the purchase amount is higher with the old button design

The purchase amount is ~9.06 higher with the old button design(control) than the new button(test) design

ggplot(test.data.jp,
       aes(x=test,y=purchase_amount, color=test))+
  geom_boxplot()+
  xlab("Old v.s New button design")+ylab("Purchase Amount")+
  ggtitle("Average order value is higher when adopting old button design (0:control group)")+
  theme_bw()

Also confirm the difference isn’t stem from the random error because the statiscally significance exists

Conduct the TukeyHSD post-hoc test to confirm there is statistical siginficance between old (control) and new button (test) design

TukeyHSD(aov.model, "test")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = purchase_amount ~ test + device + gender + service, data = test.data.jp)
## 
## $test
##          diff       lwr       upr p adj
## 1-0 -9.061521 -9.302464 -8.820579     0

plot(TukeyHSD(aov.model, "test"))

Key Message 2-3 - Device

Recommend to more focus on the ios users aquisition and retention becuase is has the highest average order value than other devices

ggplot(test.data.jp,
       aes(x=device, y=purchase_amount, color=device))+
  geom_boxplot()+
  xlab("Device")+ylab("Purchase Amount")+
  ggtitle("Average order value is higher with ios users than pc-web and android")+
  theme_bw()

There is 95% confidence level to state the purchase amount of ios device users are higher than the android and pc web device users

Purchase amount of ios user is..
~5.68 higher than android users
~2.00 higher than pc_web users
Purchase amount of pc & web user is ~3.68 higher than android

TukeyHSD(aov.model, "device")

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = purchase_amount ~ test + device + gender + service, data = test.data.jp)
## 
## $device
##                          diff       lwr       upr p adj
## ios_app-android_app  5.681998  5.290501  6.073495     0
## pc_web-android_app   3.681216  3.286400  4.076033     0
## pc_web-ios_app      -2.000782 -2.323508 -1.678056     0

plot(TukeyHSD(aov.model, "device"))

Japan e-commerce average order value determined factors

Jiawen

Project Context

Data Pre-processing

Data Analysis

Key Message 1

Recommend to adopt the old button design because the purchase amount is higher

Key Message 2

Discover 3 factors which are statistically significant to influence the purchase amount : button design, customer device and membership class

Key Message 2-1 - Membership program

Recommend to reconsider the premium membership program design because the average order value is quite close to the normal group

Key Message 2-2 - Button design

Recommend to adopt the old button design because the purchase amount is higher with the old button design

Also confirm the difference isn’t stem from the random error because the statiscally significance exists

Key Message 2-3 - Device

Recommend to more focus on the ios users aquisition and retention becuase is has the highest average order value than other devices

There is 95% confidence level to state the purchase amount of ios device users are higher than the android and pc web device users