A full conditional distribution is the probability distribution of a single parameter (or a group of parameters) given fixed values of all other parameters in the model and the observed data.
In mathematical notation, for parameters \(\theta_1, \theta_2, \ldots, \theta_k\):
\[p(\theta_j \mid \theta_1, \theta_2, \ldots, \theta_{j-1}, \theta_{j+1}, \ldots, \theta_k, \text{data})\]
The key phrase is “given everything else” - you condition on:
It’s called “full” because we condition on all other parameters, not just a subset. This distinguishes it from:
| Type | Expression | Description |
|---|---|---|
| Full conditional | \(p(\theta_j \mid \text{all other } \theta, \text{data})\) | Conditions on all other parameters |
| Marginal distribution | \(p(\theta_j \mid \text{data})\) | Integrates out other parameters |
| Partial conditional | \(p(\theta_j \mid \text{some parameters}, \text{data})\) | Conditions on only a subset |
Imagine we have three parameters: \(\alpha\), \(\beta\), \(\gamma\)
| Parameter | Full Conditional |
|---|---|
| \(\alpha\) | \(p(\alpha \mid \beta, \gamma, \text{data})\) |
| \(\beta\) | \(p(\beta \mid \alpha, \gamma, \text{data})\) |
| \(\gamma\) | \(p(\gamma \mid \alpha, \beta, \text{data})\) |
Each one treats the other two as known constants.
This is one of the most elegant and powerful features of Bayesian analysis. Full conditionals tend to be simple even when the joint posterior is incredibly complex.
The joint posterior typically looks like:
\[p(\theta_1, \theta_2, \ldots, \theta_k \mid \text{data}) \propto \text{Likelihood} \times \text{Prior}_1 \times \text{Prior}_2 \times \ldots \times \text{Prior}_k\]
This product can be very complicated because parameters interact through the likelihood.
When you condition on all other parameters, most of the product becomes constant:
\[p(\theta_j \mid \theta_{-j}, \text{data}) \propto [\text{terms containing } \theta_j \text{ from likelihood}] \times [\text{prior for } \theta_j] \times [\text{constants}]\]
Key insight: Terms that don’t contain \(\theta_j\) cancel out in the proportionality constant!
\[y_1, \ldots, y_n \sim \text{iid Normal}(\mu, \sigma^2)\]
\[p(\mu, \sigma^2 \mid y) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{\sum(y_i-\mu)^2}{2\sigma^2}\right) \times \exp\left(-\frac{(\mu-\mu_0)^2}{2\tau_0^2}\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]
This looks messy! \(\mu\) and \(\sigma^2\) are tangled together.
When we condition on \(\sigma^2\), anything without \(\mu\) becomes constant:
\[p(\mu \mid \sigma^2, y) \propto \exp\left(-\frac{\sum(y_i-\mu)^2}{2\sigma^2}\right) \times \exp\left(-\frac{(\mu-\mu_0)^2}{2\tau_0^2}\right)\]
This is just a Normal distribution! The complicated \((\sigma^2)^{-n/2}\) and \((\sigma^2)^{-a_0-1} \exp(-b_0/\sigma^2)\) terms don’t involve \(\mu\), so they cancel out.
Result: \[\mu \mid \sigma^2, y \sim \text{Normal}\left( \text{mean} = \frac{n\bar{y}/\sigma^2 + \mu_0/\tau_0^2}{n/\sigma^2 + 1/\tau_0^2}, \text{ variance} = \frac{1}{n/\sigma^2 + 1/\tau_0^2} \right)\]
When we condition on \(\mu\):
\[p(\sigma^2 \mid \mu, y) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{\sum(y_i-\mu)^2}{2\sigma^2}\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]
This is an Inverse-Gamma distribution! The Normal prior for \(\mu\) disappears because it doesn’t contain \(\sigma^2\).
Result: \[\sigma^2 \mid \mu, y \sim \text{Inverse-Gamma}\left( a_0 + \frac{n}{2}, b_0 + \frac{\sum(y_i-\mu)^2}{2} \right)\]
\[p(\beta, \sigma^2 \mid y, X) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{(y-X\beta)'(y-X\beta)}{2\sigma^2}\right) \times \exp\left(-\frac{1}{2} \beta'\Sigma_0^{-1}\beta\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]
Parameters are tangled: \(\beta\) and \(\sigma^2\) appear together in the likelihood.
\[p(\beta \mid \sigma^2, y, X) \propto \exp\left(-\frac{(y-X\beta)'(y-X\beta)}{2\sigma^2}\right) \times \exp\left(-\frac{1}{2} \beta'\Sigma_0^{-1}\beta\right)\]
The Inverse-Gamma prior for \(\sigma^2\) disappears. This is Multivariate Normal!
\[p(\sigma^2 \mid \beta, y, X) \propto (\sigma^2)^{-n/2} \exp\left(-\frac{(y-X\beta)'(y-X\beta)}{2\sigma^2}\right) \times (\sigma^2)^{-a_0-1} \exp\left(-\frac{b_0}{\sigma^2}\right)\]
The Normal prior for \(\beta\) disappears. This is Inverse-Gamma!
This simplicity isn’t accidental. It happens when:
\[p(y \mid \theta) = h(y) \exp\left(\eta(\theta)' T(y) - A(\theta)\right)\]
When we multiply independent conjugate priors, the full conditionals remain in the same family.
\[y_i \sim \text{Poisson}(\lambda_i), \quad \lambda_i = \exp(x_i'\beta)\] \[\beta_j \sim \text{Normal}(0, 1000) \text{ (independent priors)}\]
\[p(\beta \mid y, X) \propto \prod_i \left[ \frac{\exp(x_i'\beta)^{y_i} \exp(-\exp(x_i'\beta))}{y_i!} \right] \times \prod_j \left[ \frac{1}{\sqrt{2000\pi}} \exp\left(-\frac{\beta_j^2}{2000}\right) \right]\]
No closed form! Parameters are highly correlated.
\[p(\beta_j \mid \beta_{-j}, y, X) \propto \exp\left( \sum_i \left[ y_i x_{ij} \beta_j - \exp(x_i'\beta) \right] \right) \times \exp\left( -\frac{\beta_j^2}{2000} \right)\]
Why is this simpler?
Sometimes “simple” means:
| Type | Meaning | Sampling Method |
|---|---|---|
| Standard distribution | Normal, Gamma, Beta | Direct sampling |
| Log-concave | Log of density is concave | Adaptive rejection sampling |
| Low-dimensional | 1 or 2 parameters | Metropolis-Hastings within Gibbs |
| Conditionally independent | Breaks into smaller pieces | Block Gibbs sampling |
At each iteration, you sample:
\[\theta_1^{(t)} \sim p(\theta_1 \mid \theta_2^{(t-1)}, \theta_3^{(t-1)}, \ldots, \theta_k^{(t-1)}, \text{data})\]
\[\theta_2^{(t)} \sim p(\theta_2 \mid \theta_1^{(t)}, \theta_3^{(t-1)}, \ldots, \theta_k^{(t-1)}, \text{data})\]
… and so on
Each of these is a full conditional distribution.
When you sample from a full conditional, you’re effectively performing one step of a Gibbs update - you’re moving through the parameter space in a way that preserves the target joint posterior distribution.
This theorem (in the context of Gibbs sampling) essentially says:
If you have all the full conditionals, you can reconstruct the joint distribution. But the reverse is also true: the joint distribution determines the full conditionals, and they tend to be simpler because each one ignores interactions with other parameters.
| Aspect | Joint Posterior | Full Conditional |
|---|---|---|
| Complexity | High (all parameters interact) | Low (other parameters are fixed constants) |
| Parameter interactions | Fully present | Conditioned away |
| Form | Often no closed form | Often standard distribution |
| Dimensionality | Full parameter space | Single parameter (or small block) |
| Sampling | Difficult (needs MCMC) | Easy (direct or simple MCMC step) |