This is a riff on how Bayesian analysis solves the multiple testing problem, as shown here.
There is nothing wrong with that post. But I’m no Bayesian, and I have no idea what exactly MCMC does. So, in my day job, I have to get by with simpler kit.
The question is whether five versions of a website – labeled A, B, C, D, E – which get so many clicks and so many signups each, are actually different from each other, or the observed differences are due to chance alone. On the answer to this depends which design you choose, and your front-end guys are eagerly awaiting your verdict.
Trying to answer it with a Chi-squared test (for equality of proportions) will trip you up: I mean, not the test itself, but the task of choosing the correct significance level, which should account for the fact that you’re making multiple pairwise comparisons. You knew that.
However, the alternative to applying a Bonferroni correction, or to rigging up some kind of hierarchical apparatus as proposed in the post above, is simple. You simulate a bunch of Bernoulli draws, plot the density functions, and take a good look: if the difference is visible to the naked eye, you’re done. If it’s not quite so obvious, an extra line of R code is all you need to settle the matter.
Bernoulli draws are an appropriate way to model the probability of a sign-up, because signing up is a yes/no choice that somebody who clicks on your site makes. Use empirical probabilities given from taking the ratio of observed successes to trials, tabulated below:
website <- LETTERS[1:5]
trials <- c(1055, 1057, 1065, 1039, 1046)
successes <- c(28, 45, 69, 58, 60)
probs <- successes / trials
knitr::kable(data.frame(website,trials,successes), digits=2)
| website | trials | successes |
|---|---|---|
| A | 1055 | 28 |
| B | 1057 | 45 |
| C | 1065 | 69 |
| D | 1039 | 58 |
| E | 1046 | 60 |
And here’s how you do that in R:
library(dplyr)
library(ggplot2)
# simulate:
names(probs) <- website
foo <- data.frame(sapply(probs, function(x) rbinom(10000, 1000, x)) / 1000)
# rearrange:
foolong <- reshape(foo, direction = 'long', varying = list(names(foo))) %>%
rename(website = time) %>%
mutate(website = LETTERS[website])
# eyeball:
ggplot(foolong, aes(x = A)) + geom_density(aes(group = website, colour = website))
I’d say that websites C through E are hard to tell apart, but they all beat website A, surely enough. So I’d pick C over A in good conscience, tell our front-end guys, and not worry about the multiple testing problem.
Now, if I had to compare B to A, maybe I would take a closer look, like so (OK, three extra lines of R code):
diff_BA <- foo$B - foo$A
round(sum(diff_BA <= 0) / nrow(foo), digits = 2)
## [1] 0.03
plot(density(diff_BA))
I expect B to beat A 97% of the time, which is pretty good. I’d pick B if if for some reason options C through E were not available.
This is all we care about, and we move on. Sometimes, in the private sector, one is in a hurry.