Test & Roll. Phase I: Test - Parameter Estimation

This dataset comprises 100 previous A/B tests that represent the range of results that you might see in the A/B test you are now planning to conduct. The variable conv_rate is the number of users who click anywhere on a website.

     A_conv_rate B_conv_rate
[1,]   0.3071216   0.3239707
[2,]   0.4095521   0.4645631
[3,]   0.5314887   0.5666940
[4,]   0.4139385   0.4467867
[5,]   0.8332743   0.8229312
[6,]   0.8431361   0.8289914

In order to come up with plausible parameters for the new test, you can fit a Bayesian hierarchical model to this data. The model will then estimate the mean, which is the average profit per customer across all the tests, and the standard deviation, which is the variation between the A and B groups within a test.

            mean          sd        25%      97.5%
sigma 0.02866266 0.002073476 0.02727377 0.03297271
mu    0.68083102 0.016913947 0.66954898 0.71326542

Test & Roll. Phase I: Test - Modeling Results

Once you fit the model, you can compute the optimal sample size, and obtain profit per customer, overall profit, profit profit of the test phase, profit of the deployment (roll) phase, and some other details. The model output tells you that the sample you should use is 1145 subjects in each group, and that the profit you would get per customer is $0.69, and the total profit is approximately 69600, from which 1557 come from the test phase and 68004 from the roll phase or deployment phase.

        n1       n2 profit_per_cust   profit profit_test profit_deploy
1 1144.924 1144.924       0.6956077 69560.77    1557.097      68003.67
  profit_rand profit_perfect profit_gain      regret error_rate deploy_1_rate
1       68000       69636.15   0.9539282 0.001082488 0.06946261           0.5
  tie_rate
1        0

Test & Roll: Phase II. Roll: Plots

How do you decide what to roll? To decide which treatment to deploy, you let the model compute the posterior probability distributions for profit for each group with the input data from phase I, the test phase. You run the test somewhere out in the world using the sample size proposed by the Bayesian model, 1272 subjects per group, and as result you will get some more data to continue with phase II, the rollout phase. From the density plots you can clearly see that version A has a higher probability of yielding higher average profits than version B.

Test & Roll: Phase II. Roll: Profit & Probabilities

Below you can find the average profit you obtain in each group:

# A tibble: 2 × 4
  group  mean    sd     n
  <chr> <dbl> <dbl> <int>
1 A     0.520 0.244  1272
2 B     0.505 0.245  1272
As with all hypothesis testing frameworks, you estimate the average profit along with credible intervals: There is a 95% probability that the average profit for A ranges from:
[1] 0.5067589 0.5336009
There is a 95% probability that the average profit for B ranges from:
[1] 0.4915808 0.5184227

Once you compute the posterior probabilities for each group, you can estimate the probability of the difference for the average profit to get an idea of how sure you are before deciding which way to go. In this case, you compute the probability that the average profit of version A is greater than the mean of version B. In this case, if you decide to use version A, there is a 94% chance that A is better than B.

[1] 0.9414846