AB Testing

AB testing is a powerful way to try out a new design or program changes before making final decisions
AB testing is a framework for you to test different ideas for how to improve upon an existing design, often a website

You want to be contantly updating your website or app to maximize thing like conversion rate or usage time

AB testing case

Question: Will changing the homepage photo result in more butoon clicks?
Hypothesis: Using a photo of a cat wearing a hat will result in more clicks
Dependent variable: Clicked button or not
Independent variable: homepage photo
Control photo (no hay), test or treatment photo (hat)

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.0.3

## -- Attaching packages --------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.1
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.0

## Warning: package 'tidyr' was built under R version 4.0.3

## Warning: package 'readr' was built under R version 4.0.3

## -- Conflicts ------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

library(powerMediation)

## Warning: package 'powerMediation' was built under R version 4.0.3

library(broom)

# Read in data
click_data <- read_csv("click_data.csv")
click_data

## # A tibble: 3,650 x 2
##    visit_date clicked_adopt_today
##    <date>                   <dbl>
##  1 2017-01-01                   1
##  2 2017-01-02                   1
##  3 2017-01-03                   0
##  4 2017-01-04                   1
##  5 2017-01-05                   1
##  6 2017-01-06                   0
##  7 2017-01-07                   0
##  8 2017-01-08                   0
##  9 2017-01-09                   0
## 10 2017-01-10                   0
## # ... with 3,640 more rows

# Find oldest and most recent date
min(click_data$visit_date)

## [1] "2017-01-01"

max(click_data$visit_date)

## [1] "2017-12-31"

Baseline conversion rates

What means more clicks?

Conversion rate last year?
conversion rate today?
an specific percentage?

# Calculate the mean conversion rate by day of the week
click_data %>%
  group_by(wday(visit_date)) %>%
  summarize(conversion_rate = mean(clicked_adopt_today))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 7 x 2
##   `wday(visit_date)` conversion_rate
##                <dbl>           <dbl>
## 1                  1           0.3  
## 2                  2           0.277
## 3                  3           0.271
## 4                  4           0.298
## 5                  5           0.271
## 6                  6           0.267
## 7                  7           0.256

# Calculate the mean conversion rate by week of the year
click_data %>%
  group_by(week(visit_date)) %>%
  summarize(conversion_rate = mean(clicked_adopt_today))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 53 x 2
##    `week(visit_date)` conversion_rate
##                 <dbl>           <dbl>
##  1                  1           0.229
##  2                  2           0.243
##  3                  3           0.171
##  4                  4           0.129
##  5                  5           0.157
##  6                  6           0.186
##  7                  7           0.257
##  8                  8           0.171
##  9                  9           0.186
## 10                 10           0.2  
## # ... with 43 more rows

# Compute conversion rate by week of the year
click_data_sum <- click_data %>%
  group_by(week(visit_date)) %>%
  summarize(conversion_rate = mean(clicked_adopt_today))

## `summarise()` ungrouping output (override with `.groups` argument)

# Build plot
ggplot(click_data_sum, aes(x = `week(visit_date)`,
                           y = conversion_rate)) +
  geom_point() +
  geom_line() +
  scale_y_continuous(limits = c(0, 1),
                     labels = percent)

Experimental design

Seasonality is importnat for the experiment. The conversion rates might be different along the year

Power analysis

Experiment length is one of the big questions in A/B testing. If you stop too soon you may not get enough data to see an effect. Too long and you may waste valuable resources on a failed experiment. One way to safeguard this is with power analysis.

A power analysis will tell you how many data points (or sample size) that you need to be sure an effect is real

Statistical test: Statistical test you plan to run
Baseline value: Value for the current control condition
Desired value: expected value for the test condition
Proportion of the data from the test or treatment condition (ideally 0.5)
Significance threshold - probability correctly rejecting null hypothesis (generally 0.8)

# Compute and look at sample size for experiment in August
total_sample_size <- SSizeLogisticBin(p1 = 0.54, # Baseline
                                      p2 = 0.64, # Expected result
                                      B = 0.5, # proportion
                                      alpha = 0.05, # Significance
                                      power = 0.8) # statistical power
total_sample_size

## [1] 758

Analyzing result

experiment_data_clean <- read_csv("experiment_data.csv")

## 
## -- Column specification --------------------------------------------
## cols(
##   visit_date = col_date(format = ""),
##   condition = col_character(),
##   clicked_adopt_today = col_double()
## )

# Group and summarize data
experiment_data_clean_sum <- experiment_data_clean %>%
  group_by(visit_date, condition) %>%
  summarize(conversion_rate = mean(clicked_adopt_today))

## `summarise()` regrouping output by 'visit_date' (override with `.groups` argument)

# Make plot of conversion rates over time
ggplot(experiment_data_clean_sum,
       aes(x = visit_date,
           y = conversion_rate,
           color = condition,
           group = condition)) +
  geom_point() +
  geom_line()

# View summary of results
experiment_data_clean %>%
  group_by(condition) %>%
  summarize(conversion_rate = mean(clicked_adopt_today))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 2 x 2
##   condition conversion_rate
##   <chr>               <dbl>
## 1 control             0.167
## 2 test                0.384

# Run logistic regression
experiment_results <- glm(clicked_adopt_today ~ condition,
                          family = "binomial",
                          data = experiment_data_clean) %>%
  tidy()
experiment_results

## # A tibble: 2 x 5
##   term          estimate std.error statistic  p.value
##   <chr>            <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)      -1.61     0.156    -10.3  8.28e-25
## 2 conditiontest     1.14     0.197      5.77 7.73e- 9

Tips follow designing a new experiment

Build several small follow-up experiments
Avoid confounding variables
Test small changes (Big changes would be difficult to evaluate with aB testing)

A/B testing research questions

AB testing is the use of experimental design and statistics to compare two or more variants of a design

Uses of AB testing

Conversion rates (clicks or purchases)
Engagement with the website (Sharing, “like”)
Dropoff rate: whether a visitor continues to the next page of the website
Time spent on a website

viz_website_2017 <- read_csv("data_viz_website_2018_04.csv")

## 
## -- Column specification --------------------------------------------
## cols(
##   visit_date = col_date(format = ""),
##   condition = col_character(),
##   time_spent_homepage_sec = col_double(),
##   clicked_article = col_double(),
##   clicked_like = col_double(),
##   clicked_share = col_double()
## )

Assumptions and types of A/B testing

Within vs. between group
- A Within experiment is one in which each participant sees both conditions of an experiment, so you can see if that particular person behaved differently between the two conditions
- A between experiment puts a participant in one of the two conditions, and then you compre how the groups of participants behaved
  - Assumption: There should be nothing qualitatively between the two groups of participants

Types of AB testing

A/B - compare a control and a test condition
A/A - Compare two groups of control conditions. This type of testing can be useful to cofirm that your control condition really is stable, or that the way you’re building a between experiment is actually testing two similar groups of people. If you get a significant effect for an A/A experiment, something is wrong, because in theory you are running two of the exact same condition
- If this happens, it could be mean there is an error in how you are randomly assigning your participants
A/B/N - Compare a control condition to any nomber of different test conditions. Fast way to test more things quickly, but the statistics are more complicated, and you’ll need more data points to be sure of an effect, so in general it’s safer to just use A/B when starting out