Notes 22
Introduction to Bayesian Statistics | Part 4: Prediction
Setup
Loading packages
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'cowplot' was built under R version 4.3.2
Same problem context
We continue with the example context from Notes 19-21 from Chapter 2 of Bayesian Computation with R.
Parameter of interest: proportion \(p\) of this population who sleep at least 8 hours (on a typical night during the week).
Data: Random sample of 27 students is taken. Of this group, 11 record that they had at least 8 hours of sleep the previous night.
In Notes 19-21, we learned to calculate the posterior distribution of \(p\) based on different types of prior distributions:
Notes 19: Discrete prior
Notes 20: Beta prior
Notes 21: Histogram prior (This notes also gives a method that works for any type of prior)
Prediction
(Section 2.6) Our focus in this notes is on prediction - specifically, based on the information on the prior distribution and the data about \(p\), we would like to predict the number of students that sleep at least 8 hours out of a future sample of 20 students.
Notation: We denote the number of students that sleep at least 8 hours a night in this future sample by \(\tilde{y}\) (read: “y-tilde”) to distinguish it from the number of students \(y\) sleeping \(\geq\) 8 hours in the actual sample (of 27 students).
As before, we model \(\tilde{y}\) as a binomial distribution with “probability of success” \(p\) and number of trials = 20.
Of course, we don’t have a single value for \(p\), we have a distribution of values for \(p\), specifically, the posterior distribution.
Prediction method
The goal of our prediction analysis is to get a distribution for \(\tilde{y}\) called the posterior predictive distribution. This is denoted by \(f(\tilde{y})\). In this case, it will consist of probabilities for each of the possible values of \(\tilde{y}\): 0, 1, 2, 3 …, 19, 20.
The strategy is to get \(f(\tilde{y})\) take a “weighted average” of
the binomial distributions for \(\tilde{y}\) at each \(p_i\)
weighted by the probabilities of \(p_i\) given by the posterior distribution of \(p\).
Examples
In our first example, we’ll use the posterior from the discrete prior (from Notes 19), which literally uses a weighted average.
Then (in the next notes), we’ll illustrate how to do this with simulation for the beta prior (notes 20) and the histogram prior (notes 21), which works for any distribution.
Discrete prior example
The code below to calculate the posterior distribution of \(p\) was taken directly from Notes 19:
sleep_disc <- tibble(
p = c(.05, .15, .25, .35, .45, .55, .65, .75),
wt = c(1, 5.2, 8, 7.2, 4.6, 2.1, .7, .1)
) |>
mutate(
prior = wt / sum(wt),
L = p^11 * (1-p)^16,
post = prior * L,
posterior = post / sum(post)
) |> select(p, prior, L, post, posterior)
sleep_disc
## # A tibble: 8 × 5
## p prior L post posterior
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.05 0.0346 2.15e-15 7.44e-17 0.0000000145
## 2 0.15 0.180 6.42e-11 1.16e-11 0.00226
## 3 0.25 0.277 2.39e- 9 6.61e-10 0.129
## 4 0.35 0.249 9.80e- 9 2.44e- 9 0.477
## 5 0.45 0.159 1.07e- 8 1.71e- 9 0.334
## 6 0.55 0.0727 3.94e- 9 2.86e-10 0.0559
## 7 0.65 0.0242 4.44e-10 1.07e-11 0.00210
## 8 0.75 0.00346 9.83e-12 3.40e-14 0.00000664
To apply our weighted average method of obtaining the posterior predictive distribution of the number of students who sleep >= 8 hours per night in a future sample of 20 students, we calculate the probability, using the binomial distribution, (with 20 trials)
of having each number of students, from 0, 1, 2, …, 20
for probability of success values from p = .05, .15, .25, …, .75
binoms <- tibble() |>
bind_rows(tibble(p = .05, x = 0:20, prob= dbinom(x, size=20, prob=p))) |>
bind_rows(tibble(p = .15, x = 0:20, prob= dbinom(x, size=20, prob=p))) |>
bind_rows(tibble(p = .25, x = 0:20, prob= dbinom(x, size=20, prob=p))) |>
bind_rows(tibble(p = .35, x = 0:20, prob= dbinom(x, size=20, prob=p))) |>
bind_rows(tibble(p = .45, x = 0:20, prob= dbinom(x, size=20, prob=p))) |>
bind_rows(tibble(p = .55, x = 0:20, prob= dbinom(x, size=20, prob=p))) |>
bind_rows(tibble(p = .65, x = 0:20, prob= dbinom(x, size=20, prob=p))) |>
bind_rows(tibble(p = .75, x = 0:20, prob= dbinom(x, size=20, prob=p)))
To visualize the weighted average, we show the posterior distribution for \(p\) on top of the binomial distributions for each value of \(p\). The posterior predictive distribution of the number of students out of 20 in the future sample who sleep at least 8 hrs per night is the weighted average of the binomial distributions, weighted by the posterior probabilities.
We then merge the datasets containing the posterior distribution and the binomial distributions together.
merged <- full_join(sleep_disc, binoms, by = join_by(p)) |>
rename(binom_prob = prob, )
merged |>
select(p, posterior, binom_prob)
## # A tibble: 168 × 3
## p posterior binom_prob
## <dbl> <dbl> <dbl>
## 1 0.05 0.0000000145 0.358
## 2 0.05 0.0000000145 0.377
## 3 0.05 0.0000000145 0.189
## 4 0.05 0.0000000145 0.0596
## 5 0.05 0.0000000145 0.0133
## 6 0.05 0.0000000145 0.00224
## 7 0.05 0.0000000145 0.000295
## 8 0.05 0.0000000145 0.0000311
## 9 0.05 0.0000000145 0.00000266
## 10 0.05 0.0000000145 0.000000187
## # ℹ 158 more rows
To make the weighted average calculation more concretely, let’s
filter the dataset to observations where x = 7
of the
students in the future dataset sleep >= 8 hours.
## # A tibble: 8 × 7
## p prior L post posterior x binom_prob
## <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
## 1 0.05 0.0346 2.15e-15 7.44e-17 0.0000000145 7 0.0000311
## 2 0.15 0.180 6.42e-11 1.16e-11 0.00226 7 0.0160
## 3 0.25 0.277 2.39e- 9 6.61e-10 0.129 7 0.112
## 4 0.35 0.249 9.80e- 9 2.44e- 9 0.477 7 0.184
## 5 0.45 0.159 1.07e- 8 1.71e- 9 0.334 7 0.122
## 6 0.55 0.0727 3.94e- 9 2.86e-10 0.0559 7 0.0366
## 7 0.65 0.0242 4.44e-10 1.07e-11 0.00210 7 0.00449
## 8 0.75 0.00346 9.83e-12 3.40e-14 0.00000664 7 0.000154
So, the posterior predictive probability
\[P(\tilde{y}=7) = \sum_{i} f_B(\tilde{y}=7|p_i) g(p_i),\]
where \(f_B(\cdot)\) is the binomial probability and \(g(\cdot)\) is posterior probability.
So, numerically, P(y-tilde = 7) = 0.0000000145 * 0.0000311 + 0.00226 * 0.0160 + 0.129* 0.112 + … + 0.00000664 * 0.000154
## # A tibble: 1 × 1
## `sum(posterior * binom_prob)`
## <dbl>
## 1 0.145
The group_by()
function here allows us to do separated
weighted averages for each x
from 0 to 20.
post_pred_dist <- merged |>
group_by(x) |>
summarize(
pred_prob = sum(posterior * binom_prob)
)
post_pred_dist
## # A tibble: 21 × 2
## x pred_prob
## <int> <dbl>
## 1 0 0.000586
## 2 1 0.00400
## 3 2 0.0142
## 4 3 0.0346
## 5 4 0.0648
## 6 5 0.0995
## 7 6 0.129
## 8 7 0.145
## 9 8 0.143
## 10 9 0.125
## # ℹ 11 more rows
Plot of the posterior predictive distribution:
ggplot(
data = post_pred_dist,
) +
geom_segment(
mapping = aes(x = x, xend = x, y = pred_prob, yend = 0),
linewidth = 2
) +
labs(
x = "Number of students sleeping >= 8 hrs",
y = "Probability"
)
Summarizing the posterior predictive distribution
We can use the posterior predictive distribution to calculate summaries.
For example, what is the expected number (mean) of students sleeping >= 8 out of the class of 20?
## # A tibble: 1 × 1
## mean
## <dbl>
## 1 7.64
What is the most likely number of students (the mode)? 7 (one with the highest probability)
You can also answer probability questions about ranges of numbers of students. You can calculate the cumulative distribution function, which gives the probability that numbers of students are >= a certain value, as follows:
## # A tibble: 21 × 3
## x pred_prob cdf
## <int> <dbl> <dbl>
## 1 0 0.000586 0.000586
## 2 1 0.00400 0.00459
## 3 2 0.0142 0.0188
## 4 3 0.0346 0.0534
## 5 4 0.0648 0.118
## 6 5 0.0995 0.218
## 7 6 0.129 0.347
## 8 7 0.145 0.492
## 9 8 0.143 0.635
## 10 9 0.125 0.760
## # ℹ 11 more rows
Examples:
What is the probability that 10 or less students sleep >= 8 hours? 0.760
What is the probability that greater than 5 students sleep >= 8 hours? 1 - p(y-tilde <= 4)
## [1] 0.9466
- What is the probability that between 6 and 10 (inclusive) students sleep >= 8 hours?
## [1] 0.642