Swedish Fish Incorporated is the largest Swedish company delivering fish by mail order. They are now trying to get into the lucrative Danish market by selling one year Salmon subscriptions. The marketing department have done a pilot study and tried two different marketing methods:

A: Sending a mail with a colorful brochure that invites people to sign up for a one year salmon subscription.

B: Sending a colorful brochure that invites people to sign up for a one year salmon subscription and that includes a free salmon.

The marketing department sent out 16 mails of type A and 16 mails of type B. Four Danes that received a mail of type A signed up for one year of salmon, and eight Danes that received a mail of type B signed up!

The marketing department now wants to know, which method should we use, A or B?

At the bottom of this document you’ll find a solution. But try yourself first!

Question I: Build a Bayesian model that answers the question: What is the probability that method B is better than method A?

Hint 1: As part of you generative model you’ll want to use the binomial distribution, which you can sample from in R using the rbinom(n, size, prob). The binomial distribution simulates the following process n times: The number of “successes” when performing size trials, where the probability of “success” is prob.

Hint 2: A commonly used prior for the unknown probability of success in a binomial distribution is a uniform distribution from 0 to 1. You can draw from this distribution by running runif(1, min = 0, max = 1)

Hint 3: Here is a code scaffold that you can build upon.

# Number of random draws from the prior
n_draws <- 10000

prior <- ... # Here you sample n_draws draws from the prior  
hist(prior) # It's always good to eyeball the prior to make sure it looks ok.

# Here you define the generative model
generative_model <- function(parameters) {
  ...
}

# Here you simulate data using the parameters from the prior and the 
# generative model
sim_data <- rep(NA, n_draws)
for(i in 1:n_draws) {
  sim_data[i] <- generative_model(prior[i])
}

# Here you filter off all draws that do not match the data.
posterior <- prior[sim_data == observed_data] 

hist(posterior) # Eyeball the posterior
length(posterior) # See that we got enought draws left after the filtering.
                  # There are no rules here, but you probably want to aim
                  # for >1000 draws.

# Now you can summarize the posterior, where a common summary is to take the mean
# or the median posterior, and perhaps a 95% quantile interval.
median(posterior)
quantile(posterior, c(0.025, 0.975))

Hint 4: As you will probably have two parameters, it will be convenient to put both the prior and posterior samples into data frames. Similarly, your generative model will probably have to return a vector of two data points.

Hint 5: (last hint, I promise) Depending on how you define this model, it might be very slow to fit, and you might have to settle for quite few posterior samples.

Here are two bonus question, in case you have time left after answering Question I.

(Bonus) Question II: Change the model so that it uses a more informative prior. What is now the probability that method B is better than method A?

The marketing department are starting to believe that it was a fluke that such a large proportion of the Danes signed up. In all other European markets the proportion that signs up for a year of salmon is around 5% to 15%, even when given a free salmon. Use this information and make the priors in your model more informative.

Hint 1: This can be done in a million ways and there isn’t any “right” answer to this question. Just do something quick ’n dirty.

Hint 2: It would however be cool if you used a prior that wasn’t uniform. A good distribution, when crafting priors with support over [0, 1], is the beta distribution. You can sample from a beta distribution using the rbeta(1, shape1, shape2) function in R, where shape1 and shape2 surprisingly defines the shape of the distribution.

Hint 3: An easy way to plot a beta distribution (and to explore what the shape parameters really do) is to run the following:

hist(rbeta(9999, shape1 = 4, shape2 = 5), xlim=c(0, 1))

(Bonus) Question III: So what should we do? Make a simple decision analysis.

The economy department gives you the following information:

A mail of type A costs 30 kr to send out.
A mail of type B costs 300 kr to send out (due to the cost of the free salmon).
A salmon subscription brings in 1000 kr in revenue.

Which method, A or B, is most likely to make Swedish Fish Incorporated the most money?

Hint 1: This should require no changes to your model. It should suffice to post process the samples.

Hint 2: If prob_a is the probability that someone will sign up when receiving a type A mail then the expected profit is 1000 * prob_a - 30

Solutions (but this can be done in many ways)

Note that the code below has been optimized for clarity, not speed, so note that it can be very slow.

Question I

# Number of random draws from the prior
# As this model is really slow, we will have to settle for fewer draws than we would like to have
n_draws <- 200000

prior <- data.frame(pA = runif(n_draws, 0, 1),
                    pB = runif(n_draws, 0, 1))

# Here you define the generative model
generative_model <- function(pA, pB) {
  subcribersA <- rbinom(1, 16, pA)
  subcribersB <- rbinom(1, 16, pB)
  c(subcribersA = subcribersA, subcribersB = subcribersB)
}

# Here you simulate data using the parameters from the prior and the 
# generative model. While we could use a for loop here, I'm using sapply
# (the t function is just to transpose the reulting matrix so that
# it is tall rather than wide)
sim_data <- as.data.frame(t(sapply(1:n_draws, function(i) {
  generative_model(prior$pA[i], prior$pB[i])
})))

# Here you filter off all draws that do not match the data.
posterior <- prior[sim_data$subcribersA == 4 & sim_data$subcribersB == 8, ] 

hist(posterior$pA) # Eyeball the posterior

hist(posterior$pB) # Eyeball the posterior

nrow(posterior) # See that we got enought draws left after the filtering.

## [1] 709

# Now you can summarize the posterior, where a common summary is to take the mean
# or the median posterior
median(posterior$pA)

## [1] 0.2717

median(posterior$pB)

## [1] 0.4999

# The probability that underlying proportion of success is larger for A than for B
hist(posterior$pB -posterior$pA)

mean(posterior$pB -posterior$pA > 0)

## [1] 0.9309

Question II

There are an unlimited ways of doing this, and here I’m just going to go with something that I think is decent but surely not perfect.

# We will represent the background knowledge using the following beta distribution which is mostly focused on the region 0.05-0.15.

hist(rbeta(9999, shape1 = 2, shape2 =15), xlim=c(0, 1), 30)
lines(c(0.05, 0.15), c(0,0), col="red", lwd = 3)

#Except for the prior the model below is the same as in question I.

# Number of random draws from the prior
# This might take more than a minute to run.
n_draws <- 2000000

prior <- data.frame(pA = rbeta(n_draws, shape1 = 3.5, shape2 =30),
                    pB = rbeta(n_draws, shape1 = 3.5, shape2 =30))

# Here you define the generative model
generative_model <- function(pA, pB) {
  subcribersA <- rbinom(1, 16, pA)
  subcribersB <- rbinom(1, 16, pB)
  c(subcribersA = subcribersA, subcribersB = subcribersB)
}

# Here you simulate data using the parameters from the prior and the 
# generative model. While we could use a for loop here, I'm using sapply
# (the t function is just to transpose the reulting matrix so that
# it is tall rather than wide)
sim_data <- as.data.frame(t(sapply(1:n_draws, function(i) {
  generative_model(prior$pA[i], prior$pB[i])
})))

# Here you filter off all draws that do not match the data.
posterior <- prior[sim_data$subcribersA == 4 & sim_data$subcribersB == 8, ] 

hist(posterior$pA) # Eyeball the posterior

hist(posterior$pB) # Eyeball the posterior

nrow(posterior) # See that we got enought draws left after the filtering.

## [1] 166

# Now you can summarize the posterior, where a common summary is to take the mean
# or the median posterior
median(posterior$pA)

## [1] 0.1444

median(posterior$pB)

## [1] 0.229

# The probability that underlying proportion of success is larger for A than for B
hist(posterior$pB - posterior$pA)

mean(posterior$pB -posterior$pA > 0)

## [1] 0.8735

Question III

Here we don’t have to make any changes to the model, it is enough to “post-process” the posterior distribution in posterior.

# calculating the estimated posterior profit using method A (or B)
# a cost of 30 kr + the average profit per sent out add
profitA <- -30 + posterior$pA * 1000 
profitB <- -300 + posterior$pB * 1000 
hist(profitA)

hist(profitB)

hist(profitA - profitB)
expected_profit_diff <- mean(profitA - profitB)
abline(v = expected_profit_diff, col = "red", lwd =2)

The expected profit when using method A is around 190 kr higher than for method B (which actually has a negative expected profit). So I guess sending free salmon to people isn’t the best idea…

Exercise 1: Bayesian A/B testing for Swedish Fish Incorporated