The Core Definition

A full conditional distribution is the probability distribution of a single parameter (or a group of parameters) given fixed values of all other parameters in the model and the observed data.

In mathematical notation, for parameters \(\theta_1, \theta_2, \ldots, \theta_k\):

\[p(\theta_j \mid \theta_1, \theta_2, \ldots, \theta_{j-1}, \theta_{j+1}, \ldots, \theta_k, \text{data})\]

The key phrase is “given everything else” - you condition on:

  • All other parameters
  • The observed data

Why “Full” Conditional?

It’s called “full” because we condition on all other parameters, not just a subset. This distinguishes it from:

Type Expression Description
Full conditional \(p(\theta_j \mid \text{all other } \theta, \text{data})\) Conditions on all other parameters
Marginal distribution \(p(\theta_j \mid \text{data})\) Integrates out other parameters
Partial conditional \(p(\theta_j \mid \text{some parameters}, \text{data})\) Conditions on only a subset

Intuitive Example: Three Parameters

Imagine we have three parameters: \(\alpha\), \(\beta\), \(\gamma\)

Parameter Full Conditional
\(\alpha\) \(p(\alpha \mid \beta, \gamma, \text{data})\)
\(\beta\) \(p(\beta \mid \alpha, \gamma, \text{data})\)
\(\gamma\) \(p(\gamma \mid \alpha, \beta, \text{data})\)

Each one treats the other two as known constants.

Why Full Conditional Distributions Are Often Simple

This is one of the most elegant and powerful features of Bayesian analysis. Full conditionals tend to be simple even when the joint posterior is incredibly complex.

The Core Reason: Conditioning Breaks Dependencies

The Joint Posterior as a Product

The joint posterior typically looks like:

\[p(\theta_1, \theta_2, \ldots, \theta_k \mid \text{data}) \propto \text{Likelihood} \times \text{Prior}_1 \times \text{Prior}_2 \times \ldots \times \text{Prior}_k\]

This product can be very complicated because parameters interact through the likelihood.

The Full Conditional Factors

When you condition on all other parameters, most of the product becomes constant:

\[p(\theta_j \mid \theta_{-j}, \text{data}) \propto [\text{terms containing } \theta_j \text{ from likelihood}] \times [\text{prior for } \theta_j] \times [\text{constants}]\]

Key insight: Terms that don’t contain \(\theta_j\) cancel out in the proportionality constant!

Concrete Example 1: Normal Distribution

The Model

\[y_1, \ldots, y_n \sim \text{iid Normal}(\mu, \sigma^2)\]

The Joint Posterior (Complex!)

\[p(\mu, \sigma^2 \mid y) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{\sum(y_i-\mu)^2}{2\sigma^2}\right) \times \exp\left(-\frac{(\mu-\mu_0)^2}{2\tau_0^2}\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]

This looks messy! \(\mu\) and \(\sigma^2\) are tangled together.

Full Conditional for \(\mu\) (Simple!)

When we condition on \(\sigma^2\), anything without \(\mu\) becomes constant:

\[p(\mu \mid \sigma^2, y) \propto \exp\left(-\frac{\sum(y_i-\mu)^2}{2\sigma^2}\right) \times \exp\left(-\frac{(\mu-\mu_0)^2}{2\tau_0^2}\right)\]

This is just a Normal distribution! The complicated \((\sigma^2)^{-n/2}\) and \((\sigma^2)^{-a_0-1} \exp(-b_0/\sigma^2)\) terms don’t involve \(\mu\), so they cancel out.

Result: \[\mu \mid \sigma^2, y \sim \text{Normal}\left( \text{mean} = \frac{n\bar{y}/\sigma^2 + \mu_0/\tau_0^2}{n/\sigma^2 + 1/\tau_0^2}, \text{ variance} = \frac{1}{n/\sigma^2 + 1/\tau_0^2} \right)\]

Full Conditional for \(\sigma^2\) (Also Simple!)

When we condition on \(\mu\):

\[p(\sigma^2 \mid \mu, y) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{\sum(y_i-\mu)^2}{2\sigma^2}\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]

This is an Inverse-Gamma distribution! The Normal prior for \(\mu\) disappears because it doesn’t contain \(\sigma^2\).

Result: \[\sigma^2 \mid \mu, y \sim \text{Inverse-Gamma}\left( a_0 + \frac{n}{2}, b_0 + \frac{\sum(y_i-\mu)^2}{2} \right)\]

Concrete Example 2: Linear Regression

Joint Posterior (Complex!)

\[p(\beta, \sigma^2 \mid y, X) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{(y-X\beta)'(y-X\beta)}{2\sigma^2}\right) \times \exp\left(-\frac{1}{2} \beta'\Sigma_0^{-1}\beta\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]

Parameters are tangled: \(\beta\) and \(\sigma^2\) appear together in the likelihood.

Full Conditional for \(\beta\) (Given \(\sigma^2\)) \(\rightarrow\) Multivariate Normal

\[p(\beta \mid \sigma^2, y, X) \propto \exp\left(-\frac{(y-X\beta)'(y-X\beta)}{2\sigma^2}\right) \times \exp\left(-\frac{1}{2} \beta'\Sigma_0^{-1}\beta\right)\]

The Inverse-Gamma prior for \(\sigma^2\) disappears. This is Multivariate Normal!

Full Conditional for \(\sigma^2\) (Given \(\beta\)) \(\rightarrow\) Inverse-Gamma

\[p(\sigma^2 \mid \beta, y, X) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{(y-X\beta)'(y-X\beta)}{2\sigma^2}\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]

The Normal prior for \(\beta\) disappears. This is Inverse-Gamma!

The Mathematical Reason: Exponential Family + Conjugate Priors

This simplicity isn’t accidental. It happens when:

  • The likelihood is from the exponential family (Normal, Poisson, Binomial, Gamma, etc.)
  • We use conjugate priors (prior in the same family as the likelihood)

Exponential Family Form

\[p(y \mid \theta) = h(y) \exp\left(\eta(\theta)' T(y) - A(\theta)\right)\]

When we multiply independent conjugate priors, the full conditionals remain in the same family.

Example 3: Poisson Regression (More Complex Joint, Simple Full Conditionals)

Model

\[y_i \sim \text{Poisson}(\lambda_i), \quad \lambda_i = \exp(x_i'\beta)\] \[\beta_j \sim \text{Normal}(0, 1000) \text{ (independent priors)}\]

Joint Posterior (Very Complex!)

\[p(\beta \mid y, X) \propto \prod_i \left[ \frac{\exp(x_i'\beta)^{y_i} \exp(-\exp(x_i'\beta))}{y_i!} \right] \times \prod_j \left[ \frac{1}{\sqrt{2000\pi}} \exp\left(-\frac{\beta_j^2}{2000}\right) \right]\]

No closed form! Parameters are highly correlated.

Full Conditional for \(\beta_j\) (Still Relatively Simple!)

\[p(\beta_j \mid \beta_{-j}, y, X) \propto \exp\left( \sum_i \left[ y_i x_{ij} \beta_j - \exp(x_i'\beta) \right] \right) \times \exp\left( -\frac{\beta_j^2}{2000} \right)\]

Why is this simpler?

  • All terms without \(\beta_j\) are constant
  • The sum over \(i\) only involves \(\beta_j\) through \(x_{ij}\beta_j\) and \(\exp(x_i'\beta)\)
  • It’s log-concave, making it easy to sample via Metropolis-Hastings

Why “Simple” Doesn’t Always Mean “Standard”

Sometimes “simple” means:

Type Meaning Sampling Method
Standard distribution Normal, Gamma, Beta Direct sampling
Log-concave Log of density is concave Adaptive rejection sampling
Low-dimensional 1 or 2 parameters Metropolis-Hastings within Gibbs
Conditionally independent Breaks into smaller pieces Block Gibbs sampling

Why Full Conditionals Matter in Gibbs Sampling

The Gibbs Sampling Step

At each iteration, you sample:

\[\theta_1^{(t)} \sim p(\theta_1 \mid \theta_2^{(t-1)}, \theta_3^{(t-1)}, \ldots, \theta_k^{(t-1)}, \text{data})\]

\[\theta_2^{(t)} \sim p(\theta_2 \mid \theta_1^{(t)}, \theta_3^{(t-1)}, \ldots, \theta_k^{(t-1)}, \text{data})\]

… and so on

Each of these is a full conditional distribution.

Key Property

When you sample from a full conditional, you’re effectively performing one step of a Gibbs update - you’re moving through the parameter space in a way that preserves the target joint posterior distribution.

The Ultimate Reason: Hammersley-Clifford Theorem

This theorem (in the context of Gibbs sampling) essentially says:

If you have all the full conditionals, you can reconstruct the joint distribution. But the reverse is also true: the joint distribution determines the full conditionals, and they tend to be simpler because each one ignores interactions with other parameters.

Summary Table: Joint vs. Full Conditional

Aspect Joint Posterior Full Conditional
Complexity High (all parameters interact) Low (other parameters are fixed constants)
Parameter interactions Fully present Conditioned away
Form Often no closed form Often standard distribution
Dimensionality Full parameter space Single parameter (or small block)
Sampling Difficult (needs MCMC) Easy (direct or simple MCMC step)