Notes 20
Introduction to Bayesian Statistics | Part 2: Using a Beta Prior
Setup
Loading packages
## Warning: package 'ggplot2' was built under R version 4.3.3
Proportion of heavy sleepers
We continue with the example context from Notes 19 and Chapter 2 of Bayesian Computation with R.
Parameter of interest: proportion \(p\) of this population who sleep at least 8 hours (on a typical night during the week).
Data: Random sample of 27 students is taken. Of this group, 11 record that they had at least 8 hours of sleep the previous night.
Recall:
The posterior distribution is proportional to the prior distribution times the likelihood function
Using a beta prior (Section 2.4)
Since the proportion can take any value on the interval 0 to 1, an alternative approach (that makes a lot more sense) is to use a continuous distribution as a prior.
We have seen that the beta distribution is a such a distribution. The “beta group” presented that the beta distribution density function as a function of \(p\)
\[\pi(p)\propto p^{\alpha-1} (1-p)^{\beta-1}\]
The question is which parameters \(\alpha\) and \(\beta\) to use for the prior. (The parameters for the prior distribution are called hyperparameters, btw.)
We could use the Shiny app from the “beta group” to find a density to
our liking, our we can use the beta.select()
function from
the LearnBayes
package. The inputs to
beta.select()
are two lists, quantile1
and
quantile2
, that define two prior percentiles, and the
function returns the values of the matching beta parameters.
In the book, the researcher believes that the median and the 90th
percentiles of the proportion are given, respectively, by .3 and .5.
Then using the beta.select()
function:
quantile1 <- list(p=.5, x=.3) #for median
quantile2 <- list(p=.9, x=.5) #for 90th percentile
beta.select(quantile1, quantile2)
## [1] 3.26 7.19
So this selects the beta parameters \(\alpha=3.26\) and \(\beta=7.19\)
Posterior proportional to prior times likelihood
Using the binomial distribution to obtain the likelihood function of \(p\) for the 11 students that reported sleeping at least 8 hours and the 16 who didn’t, we have \(L(p)\propto p^{11} (1-p)^{16}\)
In this case, to get the posterior, it is easy to multiply the
the prior \(\pi(p)\propto p^{3.26-1} (1-p)^{7.19-1}\)
the likelihood \(L(p)\propto p^{11} (1-p)^{16}\)
because they have the same form: we can just add the corresponding exponents.
Terminology: Because it works out nicely like this, we call the beta prior the conjugate prior of the binomial likelihood.
So the posterior distribution is also a beta distribution with parameters (3.26 + 11) and (7.19 + 16).
We can use the dbeta()
function to plot the beta
distributions for the prior, likelihood, and the posterior.
beta_dists <- tibble(
p = seq(from = 0, to=1, by=.005),
prior = dbeta(p, shape1=3.26, shape2=7.19),
likelihood = dbeta(p, shape1 = 11+1, shape2 = 16+1),
posterior = dbeta(p, shape1 = 3.26+11, shape2 = 7.19 + 16)
)
# beta_dists
Plotting these on the same axes:
beta_dists |>
pivot_longer(
cols = c(prior, likelihood, posterior),
names_to = "Distribution", values_to = "Density"
) |>
ggplot(
mapping = aes(x = p, y = Density, color = Distribution)
) + geom_line()
Questions:
Where does the posterior distribution fall in relation to the prior and the likelihood? between
Is the posterior closer to the prior or the likelihood? likelihood Why? because the likelihood has higher exponent than the prior, it is contributing more to the posterior beta parameter
Which distribution has the smallest spread? posterior Why? Because it based on most information, both the prior knowledge and the data
Posterior inference
We can use the beta posterior distribution to answer probability questions about the parameter:
- For instance, is it likely that the proportion of heavy sleepers is greater than 0.5? This is a question about the probability of a range, so we use the pbeta function.
#probability that proportion of heavy sleepers is greater than 0.5
1 - pbeta(.5, shape1 = 3.26 +11, shape2 = 7.19 +16)
## [1] 0.0690226
We can also obtain an interval estimate for \(p\). For instance, a 90% interval estimate
for \(p\) is found by computing the 5th
and 95th percentiles of the beta density. We are finding quantiles here
so we use the qbeta
function.
## [1] 0.2555267
## [1] 0.5133608
Posterior credible interval
This is called a 90% posterior credible interval (note: credible interval, not confidence interval).
A strong appeal of Bayesian analysis is that we can interpret this interval in the intuitive way: there is a 90% probability that the proportion \(p\) is between .256 and .513.
This interpretation is straightforward in contrast to the somewhat convoluted interpretation of the confidence level in frequentist confidence intervals (regarding taking hypothetical repeated samples).
Posterior inference through simulation
We can obtain the same answers through simulation (generating random
numbers) from the posterior distribution. Here, we use the
rbeta
function to generate 1000 draws from the
Beta(3.26+11, 7.19+16) distribution.
This plot compares a a density estimate of the distribution of these random draws (black) to the exact beta posterior density function (shown in red).
ggplot(
data = post_sim,
mapping = aes(x = p)
) +
geom_density() + # simulated posterior
geom_line( # exact beta posterior
data = tibble(
p = seq(from=.15, to=.65, by=.005),
Density = dbeta(p, shape1 = 3.26+11, shape2 = 7.19+16)
),
mapping = aes(x = p, y = Density),
color = "red"
)
This should give you the idea that simulation only gives answers about posterior inference that are only approximately correct. But, in many models (like we don’t use a conjugate prior), it is our best option.
What is the probability that the \(p\) is greater than 0.5?
## # A tibble: 1 × 1
## probability
## <dbl>
## 1 0.0689
90% credible interval for \(p\):
## # A tibble: 1 × 2
## lower upper
## <dbl> <dbl>
## 1 0.256 0.514