Notes 22

Introduction to Bayesian Statistics | Part 4: Prediction

Setup

Loading packages

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.3

library(LearnBayes)
library(cowplot)

## Warning: package 'cowplot' was built under R version 4.3.2

Same problem context

We continue with the example context from Notes 19-21 from Chapter 2 of Bayesian Computation with R.

Parameter of interest: proportion \(p\) of this population who sleep at least 8 hours (on a typical night during the week).
Data: Random sample of 27 students is taken. Of this group, 11 record that they had at least 8 hours of sleep the previous night.

In Notes 19-21, we learned to calculate the posterior distribution of \(p\) based on different types of prior distributions:

Notes 19: Discrete prior
Notes 20: Beta prior
Notes 21: Histogram prior (This notes also gives a method that works for any type of prior)

Prediction

(Section 2.6) Our focus in this notes is on prediction - specifically, based on the information on the prior distribution and the data about \(p\), we would like to predict the number of students that sleep at least 8 hours out of a future sample of 20 students.

Notation: We denote the number of students that sleep at least 8 hours a night in this future sample by \(\tilde{y}\) (read: “y-tilde”) to distinguish it from the number of students \(y\) sleeping \(\geq\) 8 hours in the actual sample (of 27 students).

As before, we model \(\tilde{y}\) as a binomial distribution with “probability of success” \(p\) and number of trials = 20.

Of course, we don’t have a single value for \(p\), we have a distribution of values for \(p\), specifically, the posterior distribution.

Prediction method

The goal of our prediction analysis is to get a distribution for \(\tilde{y}\) called the posterior predictive distribution. This is denoted by \(f(\tilde{y})\). In this case, it will consist of probabilities for each of the possible values of \(\tilde{y}\): 0, 1, 2, 3 …, 19, 20.

The strategy is to get \(f(\tilde{y})\) take a “weighted average” of

the binomial distributions for \(\tilde{y}\) at each \(p_i\)
weighted by the probabilities of \(p_i\) given by the posterior distribution of \(p\).

Examples

In our first example, we’ll use the posterior from the discrete prior (from Notes 19), which literally uses a weighted average.

Then (in the next notes), we’ll illustrate how to do this with simulation for the beta prior (notes 20) and the histogram prior (notes 21), which works for any distribution.

Discrete prior example

The code below to calculate the posterior distribution of \(p\) was taken directly from Notes 19:

sleep_disc <- tibble(
  p = c(.05, .15, .25, .35, .45, .55, .65, .75),
  wt = c(1, 5.2, 8, 7.2, 4.6, 2.1, .7, .1)
) |> 
  mutate(
    prior = wt / sum(wt), 
    L = p^11 * (1-p)^16, 
    post = prior * L,
    posterior = post / sum(post)
  ) |> select(p, prior, L, post, posterior)
  
sleep_disc

## # A tibble: 8 × 5
##       p   prior        L     post    posterior
##   <dbl>   <dbl>    <dbl>    <dbl>        <dbl>
## 1  0.05 0.0346  2.15e-15 7.44e-17 0.0000000145
## 2  0.15 0.180   6.42e-11 1.16e-11 0.00226     
## 3  0.25 0.277   2.39e- 9 6.61e-10 0.129       
## 4  0.35 0.249   9.80e- 9 2.44e- 9 0.477       
## 5  0.45 0.159   1.07e- 8 1.71e- 9 0.334       
## 6  0.55 0.0727  3.94e- 9 2.86e-10 0.0559      
## 7  0.65 0.0242  4.44e-10 1.07e-11 0.00210     
## 8  0.75 0.00346 9.83e-12 3.40e-14 0.00000664

To apply our weighted average method of obtaining the posterior predictive distribution of the number of students who sleep >= 8 hours per night in a future sample of 20 students, we calculate the probability, using the binomial distribution, (with 20 trials)

of having each number of students, from 0, 1, 2, …, 20
for probability of success values from p = .05, .15, .25, …, .75

binoms <- tibble() |> 
  bind_rows(tibble(p = .05, x = 0:20, prob= dbinom(x, size=20, prob=p))) |> 
  bind_rows(tibble(p = .15, x = 0:20, prob= dbinom(x, size=20, prob=p))) |> 
  bind_rows(tibble(p = .25, x = 0:20, prob= dbinom(x, size=20, prob=p))) |> 
  bind_rows(tibble(p = .35, x = 0:20, prob= dbinom(x, size=20, prob=p))) |> 
  bind_rows(tibble(p = .45, x = 0:20, prob= dbinom(x, size=20, prob=p))) |> 
  bind_rows(tibble(p = .55, x = 0:20, prob= dbinom(x, size=20, prob=p))) |> 
  bind_rows(tibble(p = .65, x = 0:20, prob= dbinom(x, size=20, prob=p))) |> 
  bind_rows(tibble(p = .75, x = 0:20, prob= dbinom(x, size=20, prob=p)))

To visualize the weighted average, we show the posterior distribution for \(p\) on top of the binomial distributions for each value of \(p\). The posterior predictive distribution of the number of students out of 20 in the future sample who sleep at least 8 hrs per night is the weighted average of the binomial distributions, weighted by the posterior probabilities.

plot_grid(
  posterior, p_binoms, nrow = 2, ncol = 1
)

We then merge the datasets containing the posterior distribution and the binomial distributions together.

merged <- full_join(sleep_disc, binoms, by = join_by(p)) |> 
  rename(binom_prob = prob, )
merged |>
  select(p, posterior, binom_prob)

## # A tibble: 168 × 3
##        p    posterior  binom_prob
##    <dbl>        <dbl>       <dbl>
##  1  0.05 0.0000000145 0.358      
##  2  0.05 0.0000000145 0.377      
##  3  0.05 0.0000000145 0.189      
##  4  0.05 0.0000000145 0.0596     
##  5  0.05 0.0000000145 0.0133     
##  6  0.05 0.0000000145 0.00224    
##  7  0.05 0.0000000145 0.000295   
##  8  0.05 0.0000000145 0.0000311  
##  9  0.05 0.0000000145 0.00000266 
## 10  0.05 0.0000000145 0.000000187
## # ℹ 158 more rows

To make the weighted average calculation more concretely, let’s filter the dataset to observations where x = 7 of the students in the future dataset sleep >= 8 hours.

merged |> filter(x == 7)

## # A tibble: 8 × 7
##       p   prior        L     post    posterior     x binom_prob
##   <dbl>   <dbl>    <dbl>    <dbl>        <dbl> <int>      <dbl>
## 1  0.05 0.0346  2.15e-15 7.44e-17 0.0000000145     7  0.0000311
## 2  0.15 0.180   6.42e-11 1.16e-11 0.00226          7  0.0160   
## 3  0.25 0.277   2.39e- 9 6.61e-10 0.129            7  0.112    
## 4  0.35 0.249   9.80e- 9 2.44e- 9 0.477            7  0.184    
## 5  0.45 0.159   1.07e- 8 1.71e- 9 0.334            7  0.122    
## 6  0.55 0.0727  3.94e- 9 2.86e-10 0.0559           7  0.0366   
## 7  0.65 0.0242  4.44e-10 1.07e-11 0.00210          7  0.00449  
## 8  0.75 0.00346 9.83e-12 3.40e-14 0.00000664       7  0.000154

So, the posterior predictive probability

\[P(\tilde{y}=7) = \sum_{i} f_B(\tilde{y}=7|p_i) g(p_i),\]

where \(f_B(\cdot)\) is the binomial probability and \(g(\cdot)\) is posterior probability.

So, numerically, P(y-tilde = 7) = 0.0000000145 * 0.0000311 + 0.00226 * 0.0160 + 0.129* 0.112 + … + 0.00000664 * 0.000154

merged |> filter(x == 7)|>
  summarize(
    sum(posterior * binom_prob)
  )

## # A tibble: 1 × 1
##   `sum(posterior * binom_prob)`
##                           <dbl>
## 1                         0.145

The group_by() function here allows us to do separated weighted averages for each x from 0 to 20.

post_pred_dist <- merged |> 
  group_by(x) |> 
  summarize(
    pred_prob = sum(posterior * binom_prob)
  )
post_pred_dist

## # A tibble: 21 × 2
##        x pred_prob
##    <int>     <dbl>
##  1     0  0.000586
##  2     1  0.00400 
##  3     2  0.0142  
##  4     3  0.0346  
##  5     4  0.0648  
##  6     5  0.0995  
##  7     6  0.129   
##  8     7  0.145   
##  9     8  0.143   
## 10     9  0.125   
## # ℹ 11 more rows

Plot of the posterior predictive distribution:

ggplot(
  data = post_pred_dist, 
) + 
  geom_segment(
    mapping = aes(x = x, xend = x, y = pred_prob, yend = 0), 
    linewidth = 2
  ) + 
  labs(
    x = "Number of students sleeping >= 8 hrs", 
    y = "Probability"
  )

Summarizing the posterior predictive distribution

We can use the posterior predictive distribution to calculate summaries.

For example, what is the expected number (mean) of students sleeping >= 8 out of the class of 20?

post_pred_dist |> 
  summarize(
    mean = sum(x * pred_prob)
  )

## # A tibble: 1 × 1
##    mean
##   <dbl>
## 1  7.64

What is the most likely number of students (the mode)? 7 (one with the highest probability)

You can also answer probability questions about ranges of numbers of students. You can calculate the cumulative distribution function, which gives the probability that numbers of students are >= a certain value, as follows:

post_pred_dist |> 
  mutate(cdf = cumsum(pred_prob))

## # A tibble: 21 × 3
##        x pred_prob      cdf
##    <int>     <dbl>    <dbl>
##  1     0  0.000586 0.000586
##  2     1  0.00400  0.00459 
##  3     2  0.0142   0.0188  
##  4     3  0.0346   0.0534  
##  5     4  0.0648   0.118   
##  6     5  0.0995   0.218   
##  7     6  0.129    0.347   
##  8     7  0.145    0.492   
##  9     8  0.143    0.635   
## 10     9  0.125    0.760   
## # ℹ 11 more rows

Examples:

What is the probability that 10 or less students sleep >= 8 hours? 0.760
What is the probability that greater than 5 students sleep >= 8 hours? 1 - p(y-tilde <= 4)

1 -.0534

## [1] 0.9466

What is the probability that between 6 and 10 (inclusive) students sleep >= 8 hours?

.760 - .118

## [1] 0.642