Bayesian analysis

Bayesian statistics posits a prior on the parameter of interest
All inferences are then performed on the distribution of the parameter given the data, called the posterior
In general, \[ \mbox{Posterior} \propto \mbox{Likelihood} \times \mbox{Prior} \]
Therefore (as we saw in diagnostic testing) the likelihood is the factor by which our prior beliefs are updated to produce conclusions in the light of the data

Prior specification

The beta distribution is the default prior for parameters between \( 0 \) and \( 1 \).
The beta density depends on two parameters \( \alpha \) and \( \beta \) \[ \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} p ^ {\alpha - 1} (1 - p) ^ {\beta - 1} ~~~~\mbox{for} ~~ 0 \leq p \leq 1 \]
The mean of the beta density is \( \alpha / (\alpha + \beta) \)
The variance of the beta density is \[ \frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)} \]
The uniform density is the special case where \( \alpha = \beta = 1 \)

## Exploring the beta density
library(manipulate)
pvals <- seq(0.01, 0.99, length = 1000)
manipulate(
    plot(pvals, dbeta(pvals, alpha, beta), type = "l", lwd = 3, frame = FALSE),
    alpha = slider(0.01, 10, initial = 1, step = .5),
    beta = slider(0.01, 10, initial = 1, step = .5)
    )

Posterior

Suppose that we chose values of \( \alpha \) and \( \beta \) so that the beta prior is indicative of our degree of belief regarding \( p \) in the absence of data
Then using the rule that \[ \mbox{Posterior} \propto \mbox{Likelihood} \times \mbox{Prior} \] and throwing out anything that doesn't depend on \( p \), we have that \[ \begin{align} \mbox{Posterior} &\propto p^x(1 - p)^{n-x} \times p^{\alpha -1} (1 - p)^{\beta - 1} \\ & = p^{x + \alpha - 1} (1 - p)^{n - x + \beta - 1} \end{align} \]
This density is just another beta density with parameters \( \tilde \alpha = x + \alpha \) and \( \tilde \beta = n - x + \beta \)

Posterior mean

\[ \begin{align} E[p ~|~ X] & = \frac{\tilde \alpha}{\tilde \alpha + \tilde \beta}\\ \\ & = \frac{x + \alpha}{x + \alpha + n - x + \beta}\\ \\ & = \frac{x + \alpha}{n + \alpha + \beta} \\ \\ & = \frac{x}{n} \times \frac{n}{n + \alpha + \beta} + \frac{\alpha}{\alpha + \beta} \times \frac{\alpha + \beta}{n + \alpha + \beta} \\ \\ & = \mbox{MLE} \times \pi + \mbox{Prior Mean} \times (1 - \pi) \end{align} \]

Thoughts

The posterior mean is a mixture of the MLE (\( \hat p \)) and the prior mean
\( \pi \) goes to \( 1 \) as \( n \) gets large; for large \( n \) the data swamps the prior
For small \( n \), the prior mean dominates
Generalizes how science should ideally work; as data becomes increasingly available, prior beliefs should matter less and less
With a prior that is degenerate at a value, no amount of data can overcome the prior

Example

Suppose that in a random sample of an at-risk population \( 13 \) of \( 20 \) subjects had hypertension. Estimate the prevalence of hypertension in this population.
\( x = 13 \) and \( n=20 \)
Consider a uniform prior, \( \alpha = \beta = 1 \)
The posterior is proportional to (see formula above) \[ p^{x + \alpha - 1} (1 - p)^{n - x + \beta - 1} = p^x (1 - p)^{n-x} \] That is, for the uniform prior, the posterior is the likelihood
Consider the instance where \( \alpha = \beta = 2 \) (recall this prior is humped around the point \( .5 \)) the posterior is \[ p^{x + \alpha - 1} (1 - p)^{n - x + \beta - 1} = p^{x + 1} (1 - p)^{n-x + 1} \]
The “Jeffrey's prior” which has some theoretical benefits puts \( \alpha = \beta = .5 \)

pvals <- seq(0.01, 0.99, length = 1000)
x <- 13; n <- 20
myPlot <- function(alpha, beta){
    plot(0 : 1, 0 : 1, type = "n", xlab = "p", ylab = "", frame = FALSE)
    lines(pvals, dbeta(pvals, alpha, beta) / max(dbeta(pvals, alpha, beta)), 
            lwd = 3, col = "darkred")
    lines(pvals, dbinom(x,n,pvals) / dbinom(x,n,x/n), lwd = 3, col = "darkblue")
    lines(pvals, dbeta(pvals, alpha+x, beta+(n-x)) / max(dbeta(pvals, alpha+x, beta+(n-x))),
        lwd = 3, col = "darkgreen")
    title("red=prior,green=posterior,blue=likelihood")
}
manipulate(
    myPlot(alpha, beta),
    alpha = slider(0.01, 100, initial = 1, step = .5),
    beta = slider(0.01, 100, initial = 1, step = .5)
    )

Credible intervals

A Bayesian credible interval is the Bayesian analog of a confidence interval
A \( 95\% \) credible interval, \( [a, b] \) would satisfy \[ P(p \in [a, b] ~|~ x) = .95 \]
The best credible intervals chop off the posterior with a horizontal line in the same way we did for likelihoods
These are called highest posterior density (HPD) intervals

Getting HPD intervals for this example

Install the \texttt{binom} package, then the command

library(binom)
binom.bayes(13, 20, type = "highest")

  method  x  n shape1 shape2   mean  lower  upper  sig
1  bayes 13 20   13.5    7.5 0.6429 0.4423 0.8361 0.05

gives the HPD interval.

The default credible level is \( 95\% \) and the default prior is the Jeffrey's prior.

pvals <- seq(0.01, 0.99, length = 1000)
x <- 13; n <- 20
myPlot2 <- function(alpha, beta, cl){
    plot(pvals, dbeta(pvals, alpha+x, beta+(n-x)), type = "l", lwd = 3,
    xlab = "p", ylab = "", frame = FALSE)
    out <- binom.bayes(x, n, type = "highest", 
        prior.shape1 = alpha, 
        prior.shape2 = beta, 
        conf.level = cl)
    p1 <- out$lower; p2 <- out$upper
    lines(c(p1, p1, p2, p2), c(0, dbeta(c(p1, p2), alpha+x, beta+(n-x)), 0), 
        type = "l", lwd = 3, col = "darkred")
}
manipulate(
    myPlot2(alpha, beta, cl),
    alpha = slider(0.01, 10, initial = 1, step = .5),
    beta = slider(0.01, 10, initial = 1, step = .5),
    cl = slider(0.01, 0.99, initial = 0.95, step = .01)
    )