Bayes’ formula extends naturally to statistical models. A Bayesian model is a parametric model in the classical (or frequentist) sense, but with the addition of a prior probability distribution for the model parameter, which is treated as a random variable rather than an unknown constant. The basic elements of a Bayesian model may be listed as:
The parameter of interest, say \(\theta\). Note that this is completely general, since \(\theta\) may be vector valued. So \(\theta\) might be a binomial parameter, or the mean and variance of a Normal distribution, or an odds ratio, or a set of regression coefficients, etc. The parameter of interest is sometimes usefully thought of as the “true state of nature”.
The prior distribution of \(\theta\), \(f(\theta)\). This prior distribution summarizes what is known about \(\theta\) before the experiment is carried out. It is “subjective”, so may vary from investigator to investigator.
The likelihood function, \(f(y|\theta)\). The likelihood function provides the distribution of the data, \(y\), given the parameter value \(\theta)\). So it may be the binomial likelihood, a normal likelihood, a likelihood from a regression equation with associated normal residual variance, logistic regression model, etc.
The posterior distribution, \(f(\theta|y)\). The posterior distribution summarizes the information in the data, \(y\), together with the information in the prior distribution, \(f(\theta)\). Thus, it summarizes what is known about the parameter of interest \(\theta\) after the data are collected.
Bayes Theorem. This theorem relates the above quantities:
\[ \text{posterior distribution} = \frac{\text{likelihood of the data} \times \text{prior distribution}}{\text{normalizing constant}} \]
or
\[ f(\theta|y) = \frac{f(y|\theta) \times f(\theta)}{f(y)} \]
where:
\[ f(y) = \begin{cases} \sum_{\theta} f(\theta) \times f(y|\theta), \; \text{if}\; \theta\; \text{is discrete} \\ \int f(\theta) \times f(y|\theta) d\theta, \; \text{if}\; \theta\; \text{is continuous} \end{cases} \] Ignoring the normalizing constant, we get
\[ f(\theta|y) \propto f(y|\theta) \times f(\theta) \]
Thus we “update” the prior distribution to a posterior distribution after seeing the data via Bayes Theorem.
The action, \(a\). The action is the decision or action that is taken after the analysis is completed. For example, one may decide to treat a patient with Drug 1 or Drug 2, depending on the data collected in a clinical trial. Thus our action will either be to use Drug 1 (so that \(a = 1\)) or Drug 2 (so that \(a = 2\)).
The loss function, \(L(\theta, a)\). Each time we choose an action, there is some loss we incur, which depends on what the true state of nature is, and what action we decide to take. For example, if the true state of nature is that Drug 1 is in fact superior to Drug 2, then choosing action \(a = 1\) will incur a smaller loss than choosing \(a = 2\). Now, the usual problem is that we do not know the true state of nature, we only have data that lets us make probabilistic statements about it (ie, we have a posterior distribution for \(\theta\), but do not usually know the exact value of \(\theta\)). Also, we rarely make decisions before seeing the data, so that in general, \(a = a(y)\) is a function of the data. Note that while we will refer to these as “losses”, we could equally well use “gains”.
Expected Bayes Loss (Bayes Risk): We do not know the true value of \(\theta\), but we do have a posterior distribution once the data are known, \(f(\theta|y)\). Hence, to make a “coherent” Bayesian decision, we minimize the Expected Bayesian Loss, defined by:
\[ EBL = \int L(\theta, a(y)) f(\theta|y)d\theta \] In other words, we choose the action \(a(y)\) such that the EBL is minimized.
The first five elements in the above list comprise a non-decision theoretic Bayesian approach to statistical inference. This type of analysis (ie, nondecision theoretic) is what most of us are used to seeing in the medical literature. However, many Bayesians argue that the main reason we carry out any statistical analyses is to help in making decisions, so that elements 6, 7, and 8 are crucial. There is little doubt that we will see more such analyses in the near future, but it remains to be seen how popular the decision theoretic framework will become in medicine. The main problem is to specify the loss functions, since there are so many possible consequences (main outcomes, side effects, costs, etc.) to medical decisions, and it is difficult to combine these into a single loss function. My guess is that much work will have to be done on developing loss functions before the decision theoretic approach becomes mainstream. This course, therefore, will focus on elements 1 through 5