Notes 20

Introduction to Bayesian Statistics | Part 2: Using a Beta Prior

Setup

Loading packages

library(tidyverse)

## Warning: package 'ggplot2' was built under R version 4.3.3

library(LearnBayes)

Proportion of heavy sleepers

We continue with the example context from Notes 19 and Chapter 2 of Bayesian Computation with R.

Parameter of interest: proportion \(p\) of this population who sleep at least 8 hours (on a typical night during the week).
Data: Random sample of 27 students is taken. Of this group, 11 record that they had at least 8 hours of sleep the previous night.

Recall:

The posterior distribution is proportional to the prior distribution times the likelihood function

Using a beta prior (Section 2.4)

Since the proportion can take any value on the interval 0 to 1, an alternative approach (that makes a lot more sense) is to use a continuous distribution as a prior.

We have seen that the beta distribution is a such a distribution. The “beta group” presented that the beta distribution density function as a function of \(p\)

\[\pi(p)\propto p^{\alpha-1} (1-p)^{\beta-1}\]

The question is which parameters \(\alpha\) and \(\beta\) to use for the prior. (The parameters for the prior distribution are called hyperparameters, btw.)

We could use the Shiny app from the “beta group” to find a density to our liking, our we can use the beta.select() function from the LearnBayes package. The inputs to beta.select() are two lists, quantile1 and quantile2, that define two prior percentiles, and the function returns the values of the matching beta parameters.

In the book, the researcher believes that the median and the 90th percentiles of the proportion are given, respectively, by .3 and .5. Then using the beta.select() function:

quantile1 <- list(p=.5, x=.3) #for median
quantile2 <- list(p=.9, x=.5) #for 90th percentile
beta.select(quantile1, quantile2)

## [1] 3.26 7.19

So this selects the beta parameters \(\alpha=3.26\) and \(\beta=7.19\)

Posterior proportional to prior times likelihood

Using the binomial distribution to obtain the likelihood function of \(p\) for the 11 students that reported sleeping at least 8 hours and the 16 who didn’t, we have \(L(p)\propto p^{11} (1-p)^{16}\)

In this case, to get the posterior, it is easy to multiply the

the prior \(\pi(p)\propto p^{3.26-1} (1-p)^{7.19-1}\)
the likelihood \(L(p)\propto p^{11} (1-p)^{16}\)

because they have the same form: we can just add the corresponding exponents.

Terminology: Because it works out nicely like this, we call the beta prior the conjugate prior of the binomial likelihood.

So the posterior distribution is also a beta distribution with parameters (3.26 + 11) and (7.19 + 16).

We can use the dbeta() function to plot the beta distributions for the prior, likelihood, and the posterior.

beta_dists <- tibble(
  p = seq(from = 0, to=1, by=.005), 
  prior = dbeta(p, shape1=3.26, shape2=7.19), 
  likelihood = dbeta(p, shape1 = 11+1, shape2 = 16+1),
  posterior = dbeta(p, shape1 = 3.26+11, shape2 = 7.19 + 16)
)
# beta_dists

Plotting these on the same axes:

beta_dists |> 
  pivot_longer(
    cols = c(prior, likelihood, posterior), 
    names_to = "Distribution", values_to = "Density"
  ) |> 
  ggplot(
    mapping = aes(x = p, y = Density, color = Distribution)
  ) + geom_line()

Questions:

Where does the posterior distribution fall in relation to the prior and the likelihood? between
Is the posterior closer to the prior or the likelihood? likelihood Why? because the likelihood has higher exponent than the prior, it is contributing more to the posterior beta parameter
Which distribution has the smallest spread? posterior Why? Because it based on most information, both the prior knowledge and the data

Posterior inference

We can use the beta posterior distribution to answer probability questions about the parameter:

For instance, is it likely that the proportion of heavy sleepers is greater than 0.5? This is a question about the probability of a range, so we use the pbeta function.

#probability that proportion of heavy sleepers is greater than 0.5
1 - pbeta(.5, shape1 = 3.26 +11, shape2 = 7.19 +16)

## [1] 0.0690226

We can also obtain an interval estimate for \(p\). For instance, a 90% interval estimate for \(p\) is found by computing the 5th and 95th percentiles of the beta density. We are finding quantiles here so we use the qbeta function.

#90% interval estimate for p
qbeta(.05, shape1 = 3.26 +11, shape2 = 7.19 +16)

## [1] 0.2555267

qbeta(.95, shape1 = 3.26 +11, shape2 = 7.19 +16)

## [1] 0.5133608

Posterior credible interval

This is called a 90% posterior credible interval (note: credible interval, not confidence interval).

A strong appeal of Bayesian analysis is that we can interpret this interval in the intuitive way: there is a 90% probability that the proportion \(p\) is between .256 and .513.

This interpretation is straightforward in contrast to the somewhat convoluted interpretation of the confidence level in frequentist confidence intervals (regarding taking hypothetical repeated samples).

Posterior inference through simulation

We can obtain the same answers through simulation (generating random numbers) from the posterior distribution. Here, we use the rbeta function to generate 1000 draws from the Beta(3.26+11, 7.19+16) distribution.

set.seed(24601)
post_sim <- tibble(
  p = rbeta(n = 100000, shape1 = 3.26+11, shape2 = 7.19+16)
)

This plot compares a a density estimate of the distribution of these random draws (black) to the exact beta posterior density function (shown in red).

ggplot(
  data = post_sim, 
  mapping = aes(x = p)
) +
  geom_density() + # simulated posterior
  geom_line( # exact beta posterior
    data = tibble(
      p = seq(from=.15, to=.65, by=.005), 
      Density = dbeta(p, shape1 = 3.26+11, shape2 = 7.19+16)
    ), 
    mapping = aes(x = p, y = Density), 
    color = "red"
  )

This should give you the idea that simulation only gives answers about posterior inference that are only approximately correct. But, in many models (like we don’t use a conjugate prior), it is our best option.

What is the probability that the \(p\) is greater than 0.5?

post_sim |> 
  summarize(
    probability = mean(p > 0.5)
  )

## # A tibble: 1 × 1
##   probability
##         <dbl>
## 1      0.0689

90% credible interval for \(p\):

post_sim |> 
  summarize(
    lower = quantile(p, .05), upper = quantile(p, .95)
  )

## # A tibble: 1 × 2
##   lower upper
##   <dbl> <dbl>
## 1 0.256 0.514