Rationale
The aim of the work was to conduct A/B testing of a clickstream datacontaining a dataset of like and share actions of a website where a change a word on the webpage is changed from tools to tips for one month along with average time spent on the website for each observation
# download.file("https://assets.datacamp.com/production/repositories/2292/datasets/b502094e5de478105cccea959d4f915a7c0afe35/data_viz_website_2018_04.csv",
# 'A_B_test.csv',
# quiet = TRUE)
data_file_path<-paste0(getwd(),'/A_B_test.csv')
data_file<-read_csv(data_file_path)## Parsed with column specification:
## cols(
## visit_date = col_date(format = ""),
## condition = col_character(),
## time_spent_homepage_sec = col_double(),
## clicked_article = col_double(),
## clicked_like = col_double(),
## clicked_share = col_double()
## )
mean_clicks_viz<-data_file%>%
gather(click_types,
click_values,
clicked_like:clicked_share)%>%
group_by(week(visit_date),
condition,
click_types)%>%
summarise(conversion_rate=mean(click_values))%>%
ggplot(aes(x=`week(visit_date)`,
y=conversion_rate,
col=condition,
group=condition))+
geom_point(size=3)+
geom_line(lwd=0.9)+
scale_y_continuous(limits = c(0, 1),
labels = percent)+
facet_grid(~click_types,
scales = 'free')+
theme_bw(base_size = 18)+
scale_color_manual(values = c("steelblue","forestgreen"))+
ylab("conversion rates")+
xlab("week")
mean_clicks_viz The above plot and data summary shows that the average conversion rates vary in the like action depending on the condition. The word tips seems to have a higher conversion rate than the word tools.
We can check whether the difference in likes and shares between the two variants is significant respectively. Binary logistic regression is performed with clicks and shares being the dependent variable and the condition (tips and tools) being the independent descriptor.
## Loading required package: broom
logistic_reg_model1 <- glm(clicked_like ~ condition,
family = "binomial",
data = data_file) %>%
tidy()
logistic_reg_model1| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -1.6123207 | 0.0219300 | -73.52131 | 0 |
| conditiontools | -0.9887948 | 0.0389587 | -25.38057 | 0 |
Results from the model clearly indicate a significance between the like action conversion rates of the two variants
Hypothetically, if the base conversion rate of clicked share needs to be improved by say 5%, the new sample size for a followup experiment needs to be determined.
## Loading required package: powerMediation
total_sample_size <- SSizeLogisticBin(p1 = 0.032,
p2 = 0.082,
B = 0.5,
a = 0.05,
p = 0.8)
total_sample_size## [1] 673
Therefore, information using 673 more samples need to be collected to improve the base conversion rate by 5%
data_file%>%
gather(click_types,
click_values,
clicked_article:clicked_share)%>%
group_by(condition)%>%
summarise(time_spent=mean(time_spent_homepage_sec,na.rm = T))%>%
arrange(desc(time_spent))| condition | time_spent |
|---|---|
| tips | 49.99909 |
| tools | 49.99489 |
time_data_week_viz<-data_file%>%
gather(click_types,
click_values,
clicked_like:clicked_share)%>%
group_by(week(visit_date),
condition)%>%
summarise(time_spent=mean(time_spent_homepage_sec))%>%
ggplot(aes(x=condition,
y=time_spent,
fill=condition))+
geom_boxplot(col='black',
lwd=0.7)+
theme_bw(base_size = 18)+
ylab("avg time spent (seconds)")+
xlab("condition")+
scale_fill_manual(values = c("steelblue",'grey50'))+
theme(legend.position = "none")
ggplotly(time_data_week_viz)We use a t-test to check whether the difference in the time spent on homepage between the two variants is statsitically significant or not
ab_experiment_results <- t.test(time_spent_homepage_sec ~ condition,
data = data_file)
#Results
ab_experiment_results##
## Welch Two Sample t-test
##
## data: time_spent_homepage_sec by condition
## t = 0.36288, df = 29997, p-value = 0.7167
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.01850573 0.02691480
## sample estimates:
## mean in group tips mean in group tools
## 49.99909 49.99489
From the analsysis it is clear that there is no significant difference in the time spent on the webpages using each of the two variants