A/B testing is also known as split testing, and is often used to test new products or features. Typically, two variants of the same product/feature are shown to real-time users and then compared to see which variant performs better.

This project will use the ‘experiment’ dataset taken from Datacamps’s A/B Testing in R course. The (fictional) data is from a cat adoption website, and the owners of the site want to see if changing the site’s homepage image will affect the amount of people who decide to adopt (i.e. the conversion rate). The data separates users into two groups (control and test) and whether they clicked on the ‘adopt today’ button.

#loading needed packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.4     v purrr   0.3.4
## v tibble  3.1.2     v stringr 1.4.0
## v tidyr   1.1.3     v forcats 0.5.1
## v readr   1.4.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
#loading dataset
library(readr)
cat <- read_csv("experiment.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   visit_date = col_date(format = ""),
##   condition = col_character(),
##   clicked_adopt_today = col_double()
## )
cat
## # A tibble: 588 x 3
##    visit_date condition clicked_adopt_today
##    <date>     <chr>                   <dbl>
##  1 2018-01-01 control                     0
##  2 2018-01-01 control                     1
##  3 2018-01-01 control                     0
##  4 2018-01-01 control                     0
##  5 2018-01-01 test                        0
##  6 2018-01-01 test                        0
##  7 2018-01-01 test                        1
##  8 2018-01-01 test                        0
##  9 2018-01-01 test                        0
## 10 2018-01-01 test                        1
## # ... with 578 more rows

I only want to use the data from the second and third column in this A/B test.

cat<-cat[ , 2:3]
cat
## # A tibble: 588 x 2
##    condition clicked_adopt_today
##    <chr>                   <dbl>
##  1 control                     0
##  2 control                     1
##  3 control                     0
##  4 control                     0
##  5 test                        0
##  6 test                        0
##  7 test                        1
##  8 test                        0
##  9 test                        0
## 10 test                        1
## # ... with 578 more rows

The ‘0’ indicates no clicks, whereas the ‘1’ indicates that the user had clicked on the button to adopt a cat.

Now that I have the data, I want to sort the data according to the proportion of users in each group who clicked and who did not click.

I will use the table() function to get the number of people in each group who clicked, and use the addmargins() function to see the absolute proportions for each group.

prop<-table(cat)
prop
##          clicked_adopt_today
## condition   0   1
##   control 245  49
##   test    181 113
prop_abs<-addmargins(prop)
prop_abs
##          clicked_adopt_today
## condition   0   1 Sum
##   control 245  49 294
##   test    181 113 294
##   Sum     426 162 588

To get a relative proportion, I will use the prop.table() function, which can divide the value of each cell by the sum of the row cells.

prop_rel<-prop.table(prop, 1) # divides cell value by sum of row
prop_rel
##          clicked_adopt_today
## condition         0         1
##   control 0.8333333 0.1666667
##   test    0.6156463 0.3843537
#rounding off values, addin margins to rows
prop_rel<-round(addmargins(prop_rel, 2), 2)
prop_rel
##          clicked_adopt_today
## condition    0    1  Sum
##   control 0.83 0.17 1.00
##   test    0.62 0.38 1.00

At first look, it seems that the users who saw the new image clicked a lot more than the users who saw the old image. To test if this difference is significant and not due to chance, I will use the prop.test() function which uses a Pearson’s chi-squared test to compare proportions between groups.

#setting scientific values
options(scipen = 999)

#inserting original proportion table into prop.test function
prop.test(prop)
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  prop
## X-squared = 33.817, df = 1, p-value = 0.000000006055
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.1442390 0.2911352
## sample estimates:
##    prop 1    prop 2 
## 0.8333333 0.6156463

From the p-value of the test (p<0.05) and the 95% CI, it can safely be concluded that the difference in clicks on the adoption button was not due to chance. This indicates that the new homepage image was able to generate a higher conversion rate.

To show the results of the A/B test visually, I will create a bar plot showing the number of clicks the adoption button received for both the new and old image.

#loading libraries
library(ggplot2)
library(labelled)
#cleaning & wranglig data

cat$clicked_adopt_today<-factor(cat$clicked_adopt_today, labels=c("No","Yes"))
cat$condition<-as.factor(cat$condition)
#creating bar plot
ggplot(cat, aes(x=condition, fill=clicked_adopt_today))+
  geom_bar(position="dodge", width=0.5)+
  geom_text(aes(label = ..count..), stat = "count", vjust=1.6, color="white", position = position_dodge(0.5))+
  scale_x_discrete(labels=c("Old image", "New image"))+
  theme_minimal() +
  scale_fill_brewer(palette="Dark2")+
  theme(axis.title.y=element_blank())+
  labs(x="User group", fill="Clicked Adoption button")+
  ggtitle("Adoption button clicks", subtitle="By group")+
  theme(plot.title = element_text(family = "sans", size = 14, margin=margin(0,0,10,0)),
        plot.subtitle=element_text(family = "sans", size = 12, margin=margin(0,0,10,0)),
        axis.title.x = element_text(family = "sans", size = 11, margin=margin(10,0,0,0)))

Now the results of the A/B test can be understood visually.