Swedish Fish Incorporated is the largest Swedish company delivering fish by mail order. They are now trying to get into the lucrative Danish market by selling one year Salmon subscriptions. The marketing department have done a pilot study and tried two different marketing methods:
A: Sending a mail with a colorful brochure that invites people to sign up for a one year salmon subscription.
B: Sending a colorful brochure that invites people to sign up for a one year salmon subscription and that includes a free salmon.
The marketing department sent out 16 mails of type A and 16 mails of type B. Four Danes that received a mail of type A signed up for one year of salmon, and eight Danes that received a mail of type B signed up!
The marketing department now wants to know, which method should we use, A or B?
Hint 1: As part of you generative model you’ll want to use the binomial distribution, which you can sample from in R using the rbinom(n, size, prob)
. The binomial distribution simulates the following process n
times: The number of “successes” when performing size
trials, where the probability of “success” is prob
.
Hint 2: A commonly used prior for the unknown probability of success in a binomial distribution is a uniform distribution from 0 to 1. You can draw from this distribution by running runif(1, min = 0, max = 1)
Hint 3: Here is a code scaffold that you can build upon.
# Number of random draws from the prior
n_draws <- 10000
prior <- ... # Here you sample n_draws draws from the prior
hist(prior) # It's always good to eyeball the prior to make sure it looks ok.
# Here you define the generative model
generative_model <- function(parameters) {
...
}
# Here you simulate data using the parameters from the prior and the
# generative model
sim_data <- rep(NA, n_draws)
for(i in 1:n_draws) {
sim_data[i] <- generative_model(prior[i])
}
# Here you filter off all draws that do not match the data.
posterior <- prior[sim_data == observed_data]
hist(posterior) # Eyeball the posterior
length(posterior) # See that we got enought draws left after the filtering.
# There are no rules here, but you probably want to aim
# for >1000 draws.
# Now you can summarize the posterior, where a common summary is to take the mean
# or the median posterior, and perhaps a 95% quantile interval.
median(posterior)
quantile(posterior, c(0.025, 0.975))
Hint 4: As you will probably have two parameters, it will be convenient to put both the prior and posterior samples into data frames. Similarly, your generative model will probably have to return a vector of two data points.
Hint 5: (last hint, I promise) Depending on how you define this model, it might be very slow to fit, and you might have to settle for quite few posterior samples.
The marketing department are starting to believe that it was a fluke that such a large proportion of the Danes signed up. In all other European markets the proportion that signs up for a year of salmon is around 5% to 15%, even when given a free salmon. Use this information and make the priors in your model more informative.
Hint 1: This can be done in a million ways and there isn’t any “right” answer to this question. Just do something quick ’n dirty.
Hint 2: It would however be cool if you used a prior that wasn’t uniform. A good distribution, when crafting priors with support over [0, 1], is the beta distribution. You can sample from a beta distribution using the rbeta(1, shape1, shape2)
function in R, where shape1
and shape2
surprisingly defines the shape of the distribution.
Hint 3: An easy way to plot a beta distribution (and to explore what the shape
parameters really do) is to run the following:
hist(rbeta(9999, shape1 = 4, shape2 = 5), xlim=c(0, 1))
The economy department gives you the following information:
Which method, A or B, is most likely to make Swedish Fish Incorporated the most money?
Hint 1: This should require no changes to your model. It should suffice to post process the samples.
Hint 2: If prob_a
is the probability that someone will sign up when receiving a type A mail then the expected profit is 1000 * prob_a - 30
# Number of random draws from the prior
n_draws <- 50000
prior <- data.frame(pA = runif(n_draws, 0, 1),
pB = runif(n_draws, 0, 1))
# Here you define the generative model
generative_model <- function(pA, pB) {
subcribersA <- rbinom(1, 16, pA)
subcribersB <- rbinom(1, 16, pB)
c(subcribersA, subcribersB)
}
# Here you simulate data using the parameters from the prior and the
# generative model
sim_data <- data.frame(subcribersA = rep(NA, n_draws), subcribersB = rep(NA, n_draws))
for(i in 1:n_draws) {
sim_data[i,] <- generative_model(prior$pA[i], prior$pB[i])
}
# Here you filter off all draws that do not match the data.
posterior <- prior[sim_data$subcribersA == 4 & sim_data$subcribersB == 8, ]
hist(posterior$pA) # Eyeball the posterior
hist(posterior$pB) # Eyeball the posterior
nrow(posterior) # See that we got enought draws left after the filtering.
# Now you can summarize the posterior, where a common summary is to take the mean
# or the median posterior
median(posterior$pA)
median(posterior$pB)
# The probability that underlying proportion of success is larger for A than for B
hist(posterior$pB -posterior$pA)
mean(posterior$pB -posterior$pA > 0)