Typical Bayes Theorem Example
Blood test that can correctly detect vampirism 95% of the time
Pr(positive test result|vampire) = 0.95
It also has a false positive rate of 1%
Pr(positive test result|mortal) = 0.01
It’s accurate, but not many people are vampires, only 0.1% of the population.
Pr(vampire) = 0.001
Using Bayes Theorem, we can calculate Pr(vampire|positive), or the true positive rate.
\[ \text{Pr(vampire|positive)} = \frac{\text{Pr(positive|vampire)Pr(vampire)}}{\text{Pr(positive)}} \]
\[ \text{Pr(positive)} = \text{Pr(positive|mortal)Pr(vampire)} + \text{Pr(positive|mortal) (1 − Pr(vampire))} \]
Pr_Positive_Vampire <- 0.95
Pr_Positive_Mortal <- 0.01
Pr_Vampire <- 0.001
Pr_Positive <- Pr_Positive_Vampire * Pr_Vampire +
Pr_Positive_Mortal * (1 - Pr_Vampire)
(Pr_Vampire_Positive <- Pr_Positive_Vampire * Pr_Vampire / Pr_Positive)
## [1] 0.08683729
That means that there is only a 8.7% chance that the suspect is actually a vampire.
Tests that have a low prior probability, even if they are very accurate, will have a lot of false positives.
Sampling from the Posterior
We will work with drawing samples from the posterior instead of the equations that make them up since samples are easier to work with and MCMC only produces them.
p_grid <- seq(from = 0, to = 1, length.out = 1000)
prob_p <- rep(1, 1000) # uniform prior
prob_data <- dbinom(6, size = 9, prob = p_grid)
posterior <- prob_data * prob_p
posterior <- posterior / sum(posterior)
Now we can draw samples from the posterior using
sample().
samples <- sample(p_grid, prob = posterior, size = 1e4, replace = TRUE)
We can plot the density of the samples. The more samples we draw, the more accurate the samples get to the actual posterior.
dens(samples)
Now that we have the samples, we need to learn how to work with them.
Working With Posterior Samples
It’s super easy to find how much of the posterior is below 0.5.
sum(samples < 0.5) / length(samples)
## [1] 0.1775
Confidence Intervals
We will rename confidence interval to compatibility interval to avoid the unwarranted implications of “confidence.” The interval indicates the range of parameter values compatible with the model and data.
# finding the 80th percentile of the data
quantile(samples, 0.8)
## 80%
## 0.7607608
# finding the 10th and 90th percentiles
quantile(samples, c(0.1, 0.9))
## 10% 90%
## 0.4474474 0.8128128
Highest Posterior Density Interval (HPDI) = narrowest interval containing the specified probability mass. “This interval captures the parameters with highest posterior probability”
quantile(samples)
## 0% 25% 50% 75% 100%
## 0.1551552 0.5415415 0.6436436 0.7397397 0.9739740
HPDI(samples, prob = 0.5)
## |0.5 0.5|
## 0.5645646 0.7597598
They are similar since this is a bell curve, but when the curve is not bell shaped, they will be more different.
Loss Function
In order to report ONE value, instead of the entire posterior, we can use a loss function that will tell us the best value to report that minimizes the distance between that value and every other possible value.
The MEDIAN is the absolute loss function for a posterior distribution.
The MEAN is the quadratic loss function for a posterior distribution.
It’s always best to communicate the entire posterior, not just a single value.
Simulation
Dummy Data Generation
Going back to our globe tossing model, we can simulate a globe toss using our model. 0.7 is about the true proportion of water on Earth.
dbinom(0:2, size = 2, prob = 0.7)
## [1] 0.09 0.42 0.49
This means that there’s a 9% chance of observing \(w = 0\), a 42% chance of \(w = 1\), and a 49% chance of \(w = 2\).
We can use rbinom() to generate random tosses given the
number of tosses and the prob of each toss being heads (or water in this
case).
rbinom(1, size = 2, prob = 0.7) # 1 simulation
## [1] 2
Let’s make a table of the frequency of each number of water.
dummy_w <- rbinom(1e5, size = 2, prob = 0.7)
table(dummy_w) / length(dummy_w)
## dummy_w
## 0 1 2
## 0.09117 0.42086 0.48797
A graph of a simulation using 9 tosses with the same probability for each toss.
dummy_w <- rbinom(1e5, size = 9, prob = 0.7)
simplehist(dummy_w, xlab = "dummy water count")
Posterior Predictive Distribution
We want to make a weighted average of the distributions created by each (and every) value in the sampled posterior.
w <- rbinom(1e4, size = 9, prob = samples) # using prob = samples makes it weighted
simplehist(w, main = "Posterior Predictive Distribution of our Sampled Posterior")
Practice
Easy
This code will allow my results to be reproducible and match the book’s results.
p_grid <- seq(from = 0, to = 1, length.out = 1000)
prior <- rep(1, 1000)
likelihood <- dbinom(6, size = 9, prob = p_grid)
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
set.seed(100)
samples <- sample(p_grid, prob = posterior, size = 1e4, replace = TRUE)
3E1. How much posterior probability lies below p = 0.2?
sum(samples < 0.2) / length(samples)
## [1] 4e-04
A very small amount.
3E2. How much posterior probability lies above p = 0.8?
sum(samples > 0.8) / length(samples)
## [1] 0.1116
3E3. How much posterior probability lies between p = 0.2 and p = 0.8?
sum(samples > 0.2 & samples < 0.8) / length(samples)
## [1] 0.888
3E4. 20% of the posterior probability lies below which value of p?
quantile(samples, 0.2)
## 20%
## 0.5185185
3E5. 20% of the posterior probability lies above which value of p?
quantile(samples, 1 - 0.2)
## 80%
## 0.7557558
3E6. Which values of p contain the narrowest interval equal to 66% of the posterior probability?
HPDI(samples, prob = 0.66)
## |0.66 0.66|
## 0.5085085 0.7737738
3E7. Which values of p contain 66% of the posterior probability, assuming equal posterior probability both below and above the interval?
# these are the same thing
quantile(samples, c(0.17, 0.83))
## 17% 83%
## 0.5025025 0.7697698
PI(samples, prob = 0.66)
## 17% 83%
## 0.5025025 0.7697698
Medium
3M1. Suppose the globe tossing data had turned out to be 8 water in 15 tosses. Construct the posterior distribution, using grid approximation. Use the same flat prior as before.
p_grid <- seq(from = 0, to = 1, length.out = 1000)
prior <- rep(1, 1000)
likelihood <- dbinom(8, size = 15, prob = p_grid)
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
plot(posterior ~ p_grid, type = "l")
3M2. Draw 10,000 samples from the grid approximation from above. Then use the samples to calculate the 90% HPDI for p.
samples <- sample(p_grid, prob = posterior, size = 1e4, replace = T)
HPDI(samples, prob = 0.9)
## |0.9 0.9|
## 0.3293293 0.7167167
3M3. Construct a posterior predictive check for this model and data. This means simulate the distribution of samples, averaging over the posterior uncertainty in p. What is the probability of observing 8 water in 15 tosses?
w <- rbinom(1e4, 15, prob = samples)
Seeing how many equal the data (8 tosses were water). This is the Pr of observing 8 water in 15 tosses.
sum(w == 8) / length(w)
## [1] 0.1444
Making a histogram.
simplehist(w)
3M4. Using the posterior distribution constructed from the new (8/15) data, now calculate the probability of observing 6 water in 9 tosses.
w <- rbinom(1e4, 9, prob = samples)
sum(w == 6) / length(w)
## [1] 0.1751
3M5. Start over at 3M1, but now use a prior that is zero below p = 0.5 and a constant above p = 0.5. This corresponds to prior information that a majority of the Earth’s surface is water. Repeat each problem above and compare the inferences. What difference does the better prior make? If it helps, compare inferences (using both priors) to the true value p = 0.7.
p_grid <- seq(from = 0, to = 1, length.out = 1000)
prior <- ifelse(p_grid < 0.5, 0, 1)
likelihood <- dbinom(8, size = 15, prob = p_grid)
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
plot(posterior, type = "l")
samples <- sample(p_grid, prob = posterior, replace = T)
HPDI(samples, prob = 0.9)
## |0.9 0.9|
## 0.5005005 0.7127127
The HPDI is much narrower now…which makes sense looking at the posterior.
w <- rbinom(1e4, 15, prob = samples)
sum(w == 8) / length(w)
## [1] 0.1579
It’s a little lower than with the uniform prior. And not centered at 8.
simplehist(w)
3M6. Suppose you want to estimate the Earth’s proportion of water very precisely. Specifically, you want the 99% percentile interval of the posterior distribution of p to be only 0.05 wide. This means the distance between the upper and lower bound of the interval should be 0.05. How many times will you have to toss the globe to do this?
We can cheat a little and make a while loop that stops when the 99% interval is less than 0.05
find_pi_interval <- function(n, p) {
p_grid <- seq(0, 1, length.out = 1000)
likelihood <- dbinom(round(n * p), size = n, prob = p_grid)
posterior <- likelihood * prior
posterior <- posterior / sum(posterior)
samples <- sample(p_grid, prob = posterior, size = 1e4, replace = T)
interval <- PI(samples, prob = 0.99)
names(interval) <- NULL
diff(interval)
}
interval <- 1
n <- 1
while (interval >= 0.05) {
interval <- find_pi_interval(n, 0.7)
n <- n + 1
}
n
## [1] 2138
Hard
The data we will be using—male = 1, female = 0
birth1 <- c(
1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0,
0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0,
0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0,
1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1
)
birth2 <- c(
0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0,
1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0
)
3H1. Using grid approximation, compute the posterior distribution for the probability of a birth being a boy. Assume a uniform prior probability. Which parameter value maximizes the posterior probability?
p_grid <- seq(0, 1, length.out = 1000)
prior <- rep(1, 1000)
boys <- sum(birth1, birth2)
likelihood <- dbinom(boys, length(c(birth1, birth2)), prob = p_grid)
posterior <- likelihood * prior
posterior <- posterior / length(posterior)
# finding the max from the posterior
p_grid[which.max(posterior)]
## [1] 0.5545546
Plotting the posterior.
plot(posterior ~ p_grid, type = "l")
3H2. Using the sample function, draw 10,000 random parameter values from the posterior distribution you calculated above. Use these samples to estimate the 50%, 89%, and 97% highest posterior density intervals.
# drawing samples
samples <- sample(p_grid, size = 1e4, prob = posterior, replace = T)
# HPDI estimations
HPDI(samples, prob = c(0.5, 0.89, 0.97))
## |0.97 |0.89 |0.5 0.5| 0.89| 0.97|
## 0.4774775 0.4964965 0.5305305 0.5775776 0.6076076 0.6276276
3H3. Use rbinom to simulate 10,000 replicates of
200 births. You should end up with 10,000 numbers, each one a count of
boys out of 200 births. Compare the distribution of predicted numbers of
boys to the actual count in the data (111 boys out of 200 births). There
are many good ways to visualize the simulations, but the
dens command (part of the rethinking package) is probably
the easiest way in this case. Does it look like the model fits the data
well? That is, does the distribution of predictions include the actual
observation as a central, likely outcome?
Here we use simulation.
birth_sim <- rbinom(1e4, 200, prob = samples)
# plotting
dens(birth_sim)
abline(v = sum(birth1 + birth2), col = "red")
Yes! It does fit the data well. The mean of the simulation is the number of boys birthed.
3H4. Now compare 10,000 counts of boys from 100 simulated first borns only to the number of boys in the first births, birth1. How does the model look in this light?
birth_sim <- rbinom(1e4, 100, prob = samples)
dens(birth_sim)
abline(v = sum(birth1), col = "red")
Now, the central tendency is not the same as the simulation.
3H5. The model assumes that sex of first and second births are independent. To check this assumption, focus now on second births that followed female first borns. Compare 10,000 simulated counts of boys to only those second births that followed girls. To do this correctly, you need to count the number of first borns who were girls and simulate that many births, 10,000 times. Compare the counts of boys in your simulations to the actual observed count of boys following girls. How does the model look in this light? Any guesses what is going on in these data?
aftergirl <- birth2[birth1 == 0]
aftergirl_sim <- rbinom(1e4, size = length(aftergirl), prob = samples)
# plotting
dens(aftergirl_sim)
abline(v = sum(aftergirl), col = "red")
That’s kinda garbage. Not really predictive of anything. We assumed that first and second births are independent, but maybe they aren’t (hint: they are NOT).