Project Context
Aimed to identify the average order value (“purchase_amount” in the data set) determined factors and propose the sales boosting recommendatinos
Data Pre-processing
require("tidyverse")## Loading required package: tidyverse
## ─ Attaching packages ──────────────────── tidyverse 1.2.1 ─
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.5
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ─ Conflicts ────────────────────── tidyverse_conflicts() ─
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
test.table<-read_csv("test_table.csv")## Parsed with column specification:
## cols(
## user_id = col_integer(),
## date = col_character(),
## device = col_character(),
## test = col_integer(),
## purchase_amount = col_double()
## )
user.table<-read_csv("user_table.csv")## Parsed with column specification:
## cols(
## user_id = col_integer(),
## gender = col_character(),
## service = col_character(),
## age = col_integer(),
## country = col_character()
## )
head(test.table)## # A tibble: 6 x 5
## user_id date device test purchase_amount
## <int> <chr> <chr> <int> <dbl>
## 1 1983 2017/1/1 pc_web 0 33.7
## 2 4704 2017/1/1 android_app 1 30.2
## 3 14995 2017/1/1 ios_app 0 37.2
## 4 15089 2017/1/1 pc_web 0 49.8
## 5 17522 2017/1/1 pc_web 0 19.7
## 6 186 2017/1/1 pc_web 1 33.6
head(user.table)## # A tibble: 6 x 5
## user_id gender service age country
## <int> <chr> <chr> <int> <chr>
## 1 1 male golden 20 US
## 2 2 male premium 53 US
## 3 3 female premium 44 TW
## 4 4 male golden 44 UK
## 5 5 male normal 20 US
## 6 6 female golden 26 UK
print(paste(" The non-repetitive users which both appear in the test.table and the user.table",
nrow(unique(inner_join(select(test.table,user_id),
select(user.table, user_id), by="user_id")))))## [1] " The non-repetitive users which both appear in the test.table and the user.table 19871"
test.data<-left_join(test.table,
user.table,
"user_id")
test.data$date<-as.Date(test.table$date, format = "%Y/%m/%d")
for (i in c(3,4,6,7,9)) {
test.data[, i] <-as.factor(test.data[[i]])
}
test.data.jp <- test.data[test.data$country=="JP",]
head(test.data.jp)## # A tibble: 6 x 9
## user_id date device test purchase_amount gender service age
## <int> <date> <fct> <fct> <dbl> <fct> <fct> <int>
## 1 4858 2017-01-01 android_a… 1 49.9 male golden 14
## 2 12858 2017-01-01 pc_web 1 40.2 male premium 40
## 3 16588 2017-01-01 pc_web 0 42.2 male premium 18
## 4 5997 2017-01-01 android_a… 0 41.3 male premium 32
## 5 4196 2017-01-01 pc_web 1 44.8 female premium 50
## 6 680 2017-01-01 ios_app 1 28.5 male premium 39
## # ... with 1 more variable: country <fct>
Data Analysis
Key Message 1
Key Message 2-1 - Membership program
Also confirm the difference isn’t stem from the random error because the statiscally significance exists
- Conduct the TukeyHSD post-hoc test to confirm there is statistical siginficance between old (control) and new button (test) design
TukeyHSD(aov.model, "test")## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = purchase_amount ~ test + device + gender + service, data = test.data.jp)
##
## $test
## diff lwr upr p adj
## 1-0 -9.061521 -9.302464 -8.820579 0
plot(TukeyHSD(aov.model, "test"))Key Message 2-3 - Device
Recommend to more focus on the ios users aquisition and retention becuase is has the highest average order value than other devices
ggplot(test.data.jp,
aes(x=device, y=purchase_amount, color=device))+
geom_boxplot()+
xlab("Device")+ylab("Purchase Amount")+
ggtitle("Average order value is higher with ios users than pc-web and android")+
theme_bw()There is 95% confidence level to state the purchase amount of ios device users are higher than the android and pc web device users
- Purchase amount of ios user is..
- ~5.68 higher than android users
- ~2.00 higher than pc_web users
- Purchase amount of pc & web user is ~3.68 higher than android
TukeyHSD(aov.model, "device")## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = purchase_amount ~ test + device + gender + service, data = test.data.jp)
##
## $device
## diff lwr upr p adj
## ios_app-android_app 5.681998 5.290501 6.073495 0
## pc_web-android_app 3.681216 3.286400 4.076033 0
## pc_web-ios_app -2.000782 -2.323508 -1.678056 0
plot(TukeyHSD(aov.model, "device"))