bayesAB exercise

Bayesian analysis in R Exercise.

This exercise will use the bayesAB package.

The package can be downloaded using the following command: install.packages(“bayesAB”).

URL: https://www.r-bloggers.com/bayesian-ab-testing-made-easy/

Exercise created by: BC Mullins.

Date: 21 August 2017.

library(bayesAB)
set.seed(1)

The setup

Beta probability density function will have:

alpha = 10

beta = 10

The control will have prior distribution of 50%

The treatment will have a prior distribution of 30%

Essentially, we are testing the hypothesis that the treatment group is has a lower probability of having an event occur compared to the control (30% versus 50%, respectively). Therefore, the control group can have a prior that follows a Bernoulli distribution.

First collection – Start with 20 observations each

In the first collection, we assume that the control follows a prior Bernoulli distribution of 50%. Imagine that this is the probablity of having an event. Therefore, a 50% probability of having an event is like a coin flip. However, with treatment, the probability of having an event is lower, 30%. We start with a sample size of 20 for each group.

control_1 <- rbinom(20, 1, 0.5)
treatment_1 <- rbinom(20, 1, 0.3)

First Analysis

In the first analysis, the distance and size of the treatment and control are slightly different. However, the probabilty that the treatment is better than control is 91.3%.

test1 <- bayesTest(treatment_1, control_1, distribution = "bernoulli", priors = c("alpha" = 10, "beta" = 10))
print(test1)

## --------------------------------------------
## Distribution used: bernoulli 
## --------------------------------------------
## Using data with the following properties: 
##            A    B
## Min.    0.00 0.00
## 1st Qu. 0.00 0.00
## Median  0.00 1.00
## Mean    0.25 0.55
## 3rd Qu. 0.25 1.00
## Max.    1.00 1.00
## --------------------------------------------
## Priors used for the calculation: 
## $alpha
## [1] 10
## 
## $beta
## [1] 10
## 
## --------------------------------------------
## Calculated posteriors for the following parameters: 
## Probability 
## --------------------------------------------
## Monte Carlo samples generated per posterior: 
## [1] 1e+05

summary(test1)

## Quantiles of posteriors for A and B:
## 
## $Probability
## $Probability$A
##         0%        25%        50%        75%       100% 
## 0.09783095 0.32214060 0.37321051 0.42561620 0.69819657 
## 
## $Probability$B
##        0%       25%       50%       75%      100% 
## 0.2028917 0.4717612 0.5250980 0.5787883 0.8255518 
## 
## 
## --------------------------------------------
## 
## P(A > B) by (0)%: 
## 
## $Probability
## [1] 0.08677
## 
## --------------------------------------------
## 
## Credible Interval on (A - B) / B for interval length(s) (0.9) : 
## 
## $Probability
##          5%         95% 
## -0.54229787  0.07151629 
## 
## --------------------------------------------
## 
## Posterior Expected Loss for choosing B over A:
## 
## $Probability
## [1] 0.4880871

plot(test1)

Second Collection – Now, we add 20 more observations

In the second collection, 20 additional observations are added to increase the sample size to 40. The expectation is that the more data that are available, the better the precision of the Bayesian posterior distribution. Again, maintaining that they control follows a prior Bernoulli distribution of 50% and the treatment follows a prior distribution of 30%, we obtain the following:

control_2 <- rbind(control_1, rbinom(20, 1, 0.5))
treatment_2 <- rbind(treatment_1, rbinom(20, 1, 0.3))

Second Analysis

In the second analysis, the distance and size between the two distrubitions have increased. You can now see that the probability that treatment is better than the control is 96.9%.

test2 <- bayesTest(treatment_2, control_2, distribution = "bernoulli", priors = c("alpha" = 10, "beta" = 10))
print(test2)

## --------------------------------------------
## Distribution used: bernoulli 
## --------------------------------------------
## Using data with the following properties: 
##        A                B               
##   [1,] "Min.   :0.00  " "Min.   :0.00  "
##   [2,] "1st Qu.:0.25  " "1st Qu.:0.25  "
##   [3,] "Median :0.50  " "Median :0.50  "
##   [4,] "Mean   :0.50  " "Mean   :0.50  "
##   [5,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
##   [6,] "Max.   :1.00  " "Max.   :1.00  "
##   [7,] "Min.   :0  "    "Min.   :0.00  "
##   [8,] "1st Qu.:0  "    "1st Qu.:0.25  "
##   [9,] "Median :0  "    "Median :0.50  "
##  [10,] "Mean   :0  "    "Mean   :0.50  "
##  [11,] "3rd Qu.:0  "    "3rd Qu.:0.75  "
##  [12,] "Max.   :0  "    "Max.   :1.00  "
##  [13,] "Min.   :0.00  " "Min.   :0.00  "
##  [14,] "1st Qu.:0.25  " "1st Qu.:0.25  "
##  [15,] "Median :0.50  " "Median :0.50  "
##  [16,] "Mean   :0.50  " "Mean   :0.50  "
##  [17,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
##  [18,] "Max.   :1.00  " "Max.   :1.00  "
##  [19,] "Min.   :0.00  " "Min.   :0.00  "
##  [20,] "1st Qu.:0.25  " "1st Qu.:0.25  "
##  [21,] "Median :0.50  " "Median :0.50  "
##  [22,] "Mean   :0.50  " "Mean   :0.50  "
##  [23,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
##  [24,] "Max.   :1.00  " "Max.   :1.00  "
##  [25,] "Min.   :0  "    "Min.   :0.00  "
##  [26,] "1st Qu.:0  "    "1st Qu.:0.25  "
##  [27,] "Median :0  "    "Median :0.50  "
##  [28,] "Mean   :0  "    "Mean   :0.50  "
##  [29,] "3rd Qu.:0  "    "3rd Qu.:0.75  "
##  [30,] "Max.   :0  "    "Max.   :1.00  "
##  [31,] "Min.   :0  "    "Min.   :1  "   
##  [32,] "1st Qu.:0  "    "1st Qu.:1  "   
##  [33,] "Median :0  "    "Median :1  "   
##  [34,] "Mean   :0  "    "Mean   :1  "   
##  [35,] "3rd Qu.:0  "    "3rd Qu.:1  "   
##  [36,] "Max.   :0  "    "Max.   :1  "   
##  [37,] "Min.   :0.00  " "Min.   :0.00  "
##  [38,] "1st Qu.:0.25  " "1st Qu.:0.25  "
##  [39,] "Median :0.50  " "Median :0.50  "
##  [40,] "Mean   :0.50  " "Mean   :0.50  "
##  [41,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
##  [42,] "Max.   :1.00  " "Max.   :1.00  "
##  [43,] "Min.   :0  "    "Min.   :1  "   
##  [44,] "1st Qu.:0  "    "1st Qu.:1  "   
##  [45,] "Median :0  "    "Median :1  "   
##  [46,] "Mean   :0  "    "Mean   :1  "   
##  [47,] "3rd Qu.:0  "    "3rd Qu.:1  "   
##  [48,] "Max.   :0  "    "Max.   :1  "   
##  [49,] "Min.   :0.00  " "Min.   :0.00  "
##  [50,] "1st Qu.:0.25  " "1st Qu.:0.25  "
##  [51,] "Median :0.50  " "Median :0.50  "
##  [52,] "Mean   :0.50  " "Mean   :0.50  "
##  [53,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
##  [54,] "Max.   :1.00  " "Max.   :1.00  "
##  [55,] "Min.   :0  "    "Min.   :0.00  "
##  [56,] "1st Qu.:0  "    "1st Qu.:0.25  "
##  [57,] "Median :0  "    "Median :0.50  "
##  [58,] "Mean   :0  "    "Mean   :0.50  "
##  [59,] "3rd Qu.:0  "    "3rd Qu.:0.75  "
##  [60,] "Max.   :0  "    "Max.   :1.00  "
##  [61,] "Min.   :0.00  " "Min.   :0  "   
##  [62,] "1st Qu.:0.25  " "1st Qu.:0  "   
##  [63,] "Median :0.50  " "Median :0  "   
##  [64,] "Mean   :0.50  " "Mean   :0  "   
##  [65,] "3rd Qu.:0.75  " "3rd Qu.:0  "   
##  [66,] "Max.   :1.00  " "Max.   :0  "   
##  [67,] "Min.   :0  "    "Min.   :0.00  "
##  [68,] "1st Qu.:0  "    "1st Qu.:0.25  "
##  [69,] "Median :0  "    "Median :0.50  "
##  [70,] "Mean   :0  "    "Mean   :0.50  "
##  [71,] "3rd Qu.:0  "    "3rd Qu.:0.75  "
##  [72,] "Max.   :0  "    "Max.   :1.00  "
##  [73,] "Min.   :0.00  " "Min.   :1  "   
##  [74,] "1st Qu.:0.25  " "1st Qu.:1  "   
##  [75,] "Median :0.50  " "Median :1  "   
##  [76,] "Mean   :0.50  " "Mean   :1  "   
##  [77,] "3rd Qu.:0.75  " "3rd Qu.:1  "   
##  [78,] "Max.   :1.00  " "Max.   :1  "   
##  [79,] "Min.   :0  "    "Min.   :0  "   
##  [80,] "1st Qu.:0  "    "1st Qu.:0  "   
##  [81,] "Median :0  "    "Median :0  "   
##  [82,] "Mean   :0  "    "Mean   :0  "   
##  [83,] "3rd Qu.:0  "    "3rd Qu.:0  "   
##  [84,] "Max.   :0  "    "Max.   :0  "   
##  [85,] "Min.   :0.00  " "Min.   :0.00  "
##  [86,] "1st Qu.:0.25  " "1st Qu.:0.25  "
##  [87,] "Median :0.50  " "Median :0.50  "
##  [88,] "Mean   :0.50  " "Mean   :0.50  "
##  [89,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
##  [90,] "Max.   :1.00  " "Max.   :1.00  "
##  [91,] "Min.   :0  "    "Min.   :0  "   
##  [92,] "1st Qu.:0  "    "1st Qu.:0  "   
##  [93,] "Median :0  "    "Median :0  "   
##  [94,] "Mean   :0  "    "Mean   :0  "   
##  [95,] "3rd Qu.:0  "    "3rd Qu.:0  "   
##  [96,] "Max.   :0  "    "Max.   :0  "   
##  [97,] "Min.   :0.00  " "Min.   :0.00  "
##  [98,] "1st Qu.:0.25  " "1st Qu.:0.25  "
##  [99,] "Median :0.50  " "Median :0.50  "
## [100,] "Mean   :0.50  " "Mean   :0.50  "
## [101,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
## [102,] "Max.   :1.00  " "Max.   :1.00  "
## [103,] "Min.   :0  "    "Min.   :1  "   
## [104,] "1st Qu.:0  "    "1st Qu.:1  "   
## [105,] "Median :0  "    "Median :1  "   
## [106,] "Mean   :0  "    "Mean   :1  "   
## [107,] "3rd Qu.:0  "    "3rd Qu.:1  "   
## [108,] "Max.   :0  "    "Max.   :1  "   
## [109,] "Min.   :0.00  " "Min.   :0.00  "
## [110,] "1st Qu.:0.25  " "1st Qu.:0.25  "
## [111,] "Median :0.50  " "Median :0.50  "
## [112,] "Mean   :0.50  " "Mean   :0.50  "
## [113,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
## [114,] "Max.   :1.00  " "Max.   :1.00  "
## [115,] "Min.   :0.00  " "Min.   :0.00  "
## [116,] "1st Qu.:0.25  " "1st Qu.:0.25  "
## [117,] "Median :0.50  " "Median :0.50  "
## [118,] "Mean   :0.50  " "Mean   :0.50  "
## [119,] "3rd Qu.:0.75  " "3rd Qu.:0.75  "
## [120,] "Max.   :1.00  " "Max.   :1.00  "
## --------------------------------------------
## Priors used for the calculation: 
## $alpha
## [1] 10
## 
## $beta
## [1] 10
## 
## --------------------------------------------
## Calculated posteriors for the following parameters: 
## Probability 
## --------------------------------------------
## Monte Carlo samples generated per posterior: 
## [1] 1e+05

summary(test2)

## Quantiles of posteriors for A and B:
## 
## $Probability
## $Probability$A
##        0%       25%       50%       75%      100% 
## 0.1239998 0.3071300 0.3483043 0.3905735 0.6352761 
## 
## $Probability$B
##        0%       25%       50%       75%      100% 
## 0.2458075 0.4731871 0.5166164 0.5602642 0.7863936 
## 
## 
## --------------------------------------------
## 
## P(A > B) by (0)%: 
## 
## $Probability
## [1] 0.03116
## 
## --------------------------------------------
## 
## Credible Interval on (A - B) / B for interval length(s) (0.9) : 
## 
## $Probability
##          5%         95% 
## -0.53509752 -0.04492493 
## 
## --------------------------------------------
## 
## Posterior Expected Loss for choosing B over A:
## 
## $Probability
## [1] 0.5036452

plot(test2)

Conclusion

As you increase the number of sample (trials), the better the Bayesian posterior probability becomes with each subsequent sample. This is akin to increasing the precision. But from a Bayesian perspective, this is simply updating the Bayesian prior each time the sample or trial increases.

Thus, with Bayesian analysis, we can do more than just determine that a treatment is better. We can also inform stakeholder how many times (or the probability) that the treatment will be better than the control. This is one reason why the FDA requires at least two randomized control trial in order to make sure that the second trial updates our Bayesian prior and generates a new posterior distribution.

Credit

BC Mullins created this exercise in R-Exercises which can be found here: http://www.r-exercises.com/2017/08/21/bayesian_ab_testing_made_easy/