Visualizing MCMC

Matt Kumar

2021-11-14

Overview

This article serves as an overview of a collection of interrelated statistical concepts I’ve been toying with how to present in my spare time. All code and output can be found on my github repository here.

Briefly, I cover the following concepts in a few (well commented) R scripts:

How to generate data from a known statistical model
- Specify a binary logistic regression model and generate data from it
- Verify the results of the simulation
Construct a Markov Chain Monte Carlo (MCMC) sampler
- Implement the Random Walk Metropolis Hastings algorithm from scratch to perform a Bayesian Logistic Regression
Visualize results
- Build an intuition for what the MCMC algorithm does through the use of animations via the {gganimate} package

Some prior knowledge of Bayesian statistics is useful. No pun intended :)

Visualization 1: Step-by-Step

This is an example visualization that shows the first 100 iterations of a single chain. I’ve overlaid the current parameter values, previous parameter values and the decision taken by the algorithm at each iteration. I have slowed down the animation so that it is easy to follow.

Visualization 2: Full View

This is an example visualization that shows the chain running for the full 10,000 iterations. Contextual information is withheld and the animation speed is increased. You can see the chain eventually settles on what the known parameter values are.

Visualization 3: Multiple Chains

This is an example visualization that shows 4 chains attempting to estimate parameters of the Logistic Regression model. Each chain is initialized with a unique set of starting values. This code provides the basis of further study into consequences of different starting values strategies (e.g. does one set take longer to eventually reach the target space than another?).

Future Visualization

An idea for the a future visualization, similar to visualization 3, is to alter the variance(s) of the candidate generating distribution(s) from very small to very large. Can you see the issue with picking a variance that’s too small or too large? What practical consequence does this have in the analysis? Consider looking at the accept or reject % of each chain :)