I was inspired to write this blog post after re-reading Gelman and Shalizi (2012). In it, the authors lay out a compelling argument that the common view of Bayesian statistics is flawed, which is that is is a purely inductive reasoning approach that allows for continuous learning about general theories from specific observations. They argue that, in practice, it more closely resembles Popperian falsification in the form of a sophisticated kind of hypothetico-deductivism.

In the paper, the authors explain the Bayesian data analysis cycle through a succinct description of how statistical models are constructed and used in a data analysis. I found their brief summary of the data analysis cycle extremely helpful for constructing the logic behind Bayesian reasoning, and I hope to paraphrase their description here. I hope this will be helpful for others like me; intrigued by Bayesian methods for data analysis, but coming from an academic background with no real formal training in mathematics or statistics (Ecology for me), and left clueless about how to start actually thinking like the Reverend Thomas Bayes. My goal with this blog is not about how to write a Bayesian model with R (there are plenty of other great resources for that, see Gelman et al. (2004), McElreath (2018)). Instead, I want to do my best to paraphrase Gelman and Shalizi (2012), in the hopes that I might help you to better understand the logical arguments that serves as the foundation of the Bayesian framework.

What is a Bayesian model?

A Bayesian model can be most simply described as the joint probability distribution of the observed data \(y\), the unobserved or latent observations \(\widetilde{y}\), and some function of a set of parameters \(\theta\).

\[(y, \widetilde{y}, \theta)\]

The joint distribution of \((y, \widetilde{y}, \theta)\) can mathematically factor out into the prior distribution for the parameters \(p(\theta)\) and the complete data likelihood \(p(y, \widetilde{y}|\theta)\). The likelihood \(p(y, \widetilde{y}|\theta)\) can also be thought of as the total probability of the observed data \((y)\) and the unobserved data \((\widetilde{y})\) given some vector of parameters \((\theta)\). The likelihood can be written as:

\[p(y|\theta) = \int{p(y,\widetilde{y}|\theta)d\widetilde{y}}\]

Put plainly, the likelihood \(p(y|\theta)\) can be thought of as the probability of the data, given some vector of parameters and is equal to the indefinite integral of the complete data likelihood \(p(y, \widetilde{y}|\theta)\), with respect to small changes in \(\widetilde{y}\).

The central goal of a Bayesian analysis is then to compute the posterior probability of the parameters, given the observed data \(p(\theta|y)\). The posterior distribution can be obtained mathematically through the joint distribution of the prior and the likelihood marginalized over all potential values \((k)\) contained in the the parameter space for \(\theta_k\).

\[p(\theta_k|y) = \frac{p(\theta_{k})p(y|\theta_{k})}{\sum{p(\theta_{k'})p(y|\theta_{k'})}}\] What you might have noticed is that we have just arrived at Bayes’ classic theorem:

\[P(A|B) = \frac{P(A)~P(B|A)}{P(B)}\] Gelman and Shalizi (2012) extend this logic to show that we can define sets of competing models (or hypotheses) with \(\Theta_k\), where each model \(\Theta_k\) has its own associated prior probability \(p(\Theta_k)\) and parameters \(\theta_k\). The posterior distribution of the the parameters for each model, given the observed data, is then obtained by:

\[p(\Theta_k|y) = \frac{p(\Theta_{k})p(y|\Theta_{k})}{\sum{p(\Theta_{k'})p(y|\Theta_{k'})}}\] Or equivalently:

\[p(\Theta_k|y) = \frac{p(\Theta_{k})\int{p(y,\theta_k|\Theta_k)d\theta_k}}{\sum{p(\Theta_{k'})\int{p(y,\theta_{k'}|\Theta_{k'})d\theta_{k'}}}}\]

In other words we can say thar the probability of each competing model (or hypothesis) is equal to the joint distribution of the prior probability of the model (or hypothesis) \(p(\Theta_k)\) and the likelihood of the data given the model (or hypothesis) and its parameters \(\int{p(y,\theta_k|\Theta_k)d\theta_k}\), marginalized over the entire possible parameter space \(\sum{p(\Theta_{k'})\int{p(y,\theta_{k'}|\Theta_{k'})d\theta_{k'}}}\).

Model checking

A crucial part of a Bayesian data analysis is to check the fitted model against the observed data. As Gelman and Shalizi (2012) state:

“The data-analysis process - Bayesian or otherwise - does not end with calculating parameter estimates or posterior distributions. Rather, the model can then be checked, by comparing the implications of the fitted model to the empirical evidence.”

Thus, we use the Bayesian data analysis process to assign degrees of belief to various models (or hypotheses) based on both the strength of our prior assumptions and the weight of new evidence. As the weight of new evidence shifts our understanding, we can continuously expand or improve upon on our models (or hypotheses) and/or reject old models (or hypotheses) as newer ones gain more support in the empirical evidence. In this way, the practical application of Bayesian data analyses can be thought of as a type of hypothetico-deductive reasoning. In most of the “soft sciences” it is known from the outset that all models are wrong or, at best, incomplete and we can use the Bayesian framework to update our understanding of the world by improving our models and rejecting outdated hypotheses as new information (or a better model) shifts the weight of evidence closer towards truth.

I hope this paraphrasing of Gelman and Shalizi’s (2012) explanation of the data analysis cycle was helpful to you. I highly recommend reading the entire paper if you are committed to using and understanding Bayesian methods in your own work. If you are completely new to Bayesian statistics, I also recommend picking up these excellent references: McElreath (2018), Gelman et al. (2004), and Hobbs and Hooten (2015).

References

Gelman, Andrew, John B. Carlin, Hal S. Stern, and Donald B. Rubin. 2004. Bayesian Data Analysis. 2nd ed. New York, NY: Chapman & Hall.

Gelman, Andrew, and Cosma Rohilla Shalizi. 2012. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology 66 (1): 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x.

Hobbs, N. Thompson, and Mevin B. Hooten. 2015. Bayesian Models: A Statistical Primer for Ecologists. STU - Student edition. Princeton University Press. http://www.jstor.org/stable/j.ctt1dr36kz.

McElreath, R. 2018. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. 2nd ed. Boca Raton, FL: CRC Press.