This is some practice to learning Bayesian Ideas in R.
Swedish Fish Incorporated is the largest Swedish company delivering fish by mail order. They are now trying to get into the lucrative Danish market by selling one year Salmon subscriptions. The marketing department have done a pilot study and tried the following marketing method:
A: Sending a mail with a colorful brochure that invites people to sign up for a one year salmon subscription.
The marketing department sent out 16 mails of type A. Six Danes that received a mail signed up for one year of salmon and marketing now wants to know, how good is method A? This questions comes from http://www.sumsar.net/files/posts/2017-bayesian-tutorial-exercises/modeling_exercise1.html
What do we know.
Data: 6/16 people signed up
Process: Binomial process because we are counting successes from mail sent.
We want to know the rate \(p_{success}\) at which people respond. We need to draw from a prior distribution and then condition on the data.
We are going to say that the prior at which people respond is between 0 and 1 with equal likely chance across the board. We do this with a uniform distribution.
\[ p_{success} \sim Uniform(min = 0, max = 1) \]
set.seed(123)
n_draws <- 10000 #Create 10,000 draws
prior <- runif(n_draws, min = 0, max = 1) #r means random and unif is uniform
hist(prior)
Now that we have a prior distribution for our parameter we can generate our data based on the prior. By using the rbinom to simulate a binomial process for 16 events
sim_data <- rbinom(n_draws, 16, prior)
hist(sim_data)
This is the prior data.
Now we condition our data which was 6/16 people responded so 6 successes to get our posterior
posterior <- prior[sim_data == 6]
hist(posterior)
print(quantile(posterior,c(.025,.5,.975)))
## 2.5% 50% 97.5%
## 0.1836081 0.3914988 0.6257492
We can determine that the rate at which people respond is most likely .39 between .18 and .611. From our data the rate was 0.375 which is close to the median in the posterior.
The method used to solve this problem is called Approximate Bayesian Computation and is computationally slow.
I just bought batteries on amazon. I can see the reviews and most are 5 stars but a couple are 1 star or 2 stars. I want to determine the probability of a 5 star review.
Data shows 78% are 5 stars reviews out of 30,207 reviews.
Process: Binomial
I am looking for \(p\) in a Binomial process. I am going to use a prior of the beta distribution that is left skewed because I Victor Feagins like batteries have not had too much trouble with them in the past.
n_draws.a <- 1000000
prior.a <- rbeta(n_draws.a, 8, 2)
#prior.a <- runif(n_draws.a, 0, 1)
hist(prior.a)
Now we have our prior let’s condition on it the data we know.
sim_data.a <- rbinom(n_draws.a, 30207, prior.a)
posterior.a <- prior.a[sim_data.a == (round(30207*.78))]
hist(posterior.a)
print(quantile(posterior.a,c(.025,.5,.975)))
## 2.5% 50% 97.5%
## 0.7742535 0.7800127 0.7847800
I believe this means that there is a .78 chance to give a 5 star review between .77 and .78.