Properties of the Dirichlet distribution.
Bayesian parametric inference, including hierarchical modeling.
Simulation-based inference using Markov chain Monte Carlo.
Basic concepts of measure theory and stochastic processes.
Let \(\mathcal{X}\) be the sample space of a random variable \(x\) such that \(x\mid\boldsymbol{\theta} \sim G(\boldsymbol{\theta})\), with \(G\in\mathcal{P} = \{G(\cdot;\boldsymbol{\theta}):\boldsymbol{\theta}\in\Theta\subseteq\mathbb{R}^k\}\) is a collection of specific parametric distributions on \(\mathcal{X}\).
Parametric Bayes requires specifying a prior on \(\Theta\) (finite-dimensional space).
For example: \(\mathcal{X} = \mathbb{R}\), \(G(\boldsymbol{\theta}) = \textsf{N}(\mu,\sigma^2)\), \(\boldsymbol{\theta} = (\mu,\sigma^2)\), \(\Theta = \mathbb{R}\times\mathbb{R}^+\).
However, \(\mathcal{P}\) is small compared to \(\{G:G\in\mathcal{G}\}\), with \(\mathcal{G}\) es a subset of the collection of all distributions on \(\mathcal{X}\).
Nonparametric Bayes requires specifying a prior on \(\{G:G\in\mathcal{G}\}\) (infinite-dimensional space).
How to choose \(\mathcal{G}\)? How to specify the prior on \(G\)?
Intuitively, the main idea consists in defining such a prior as a stochastic processes whose sample paths correspond to distribution functions on \(\mathcal{X}\) (e.g., \(\mathcal{X} = \mathbb{R}\)), which is equipped with a \(\sigma\)-algebra \(\mathcal{B}\) (e.g., the Borel \(\sigma\)-algebra).
“Bayesian nonparametric models” is an oxymoron: We should really say “Bayesian models with an infinite number of parameters”.
Even though the focus lies on priors for distributions, these methods are general (e.g., link functions).
The Dirichlet process (DP) is the first prior defined for spaces of distributions.
The DP generates random probability measures on \((\mathcal{X},\mathcal{B})\), and therefore, random distributions on \(\mathcal{X}\).
[Definition] The DP is a random probability measure on the space of probability measures on \((\mathcal{X},\mathcal{B})\) that generates random probability measures \(Q\) on \((\mathcal{X},\mathcal{B})\) (random distributions \(G\) on \(\mathcal{X}\)) such that for any finite measurable partition \(B_1,\ldots,B_k\) of \(\mathcal{X}\), it follows that \[ (Q(B_1),\ldots,Q(B_k))\sim\textsf{Dirichlet}(\alpha Q_0(B_1),\ldots,\alpha Q_0(B_k))\,, \] where \(\alpha\) is a positive scalar and \(Q_0\) is a probability measure on \((\mathcal{X},\mathcal{B})\) that defines a distribution \(G_0\) on \(\mathcal{X}\).
For any measurable subset \(B\) of \(\mathcal{X}\), it follows from the definition that \(Q(B)\sim \textsf{Beta}(\alpha Q_0(B), \alpha Q_0(B^c))\), and therefore: \[ \textsf{E}(Q(B)) = Q_0(B) \qquad\text{and}\qquad \textsf{Var}(Q(B)) = \frac{Q_0(B)(1-Q_0(B))}{\alpha+1}\,. \]
Consider \(\mathcal{X} = \mathbb{R}\), \(B = (-\infty,x]\), with \(x\in\mathbb{R}\). It follows that \(Q(B) = G(x)\sim \textsf{Beta}(\alpha G_0(B), \alpha(1-G_0(x)))\), and therefore: \[ \textsf{E}(G(x)) = G_0(x) \qquad\text{and}\qquad \textsf{Var}(G(x)) = \frac{G_0(x)(1-G_0(x))}{\alpha+1}\,. \]
Consider \(G \sim \textsf{DP}(\alpha, G_0)\) and any grid \(x_1 < x_2 < \ldots < x_k\) in \(\mathcal{X}\subseteq\mathbb{R}\). Then, the random vector (increment process): \[ (G(x_1), G(x_2) - G(x_1),\ldots,G(x_{k}) - G(x_{k-1}), 1 - G(x_k))\in \text{Simplex}(\mathbb{R}^{k+1}) \] follows a a Dirichlet distribution with parameter vector \[ (\alpha G_0(x_1), \alpha(G_0(x_2) - G_0(x_1)),\ldots,\alpha(G_0(x_{k}) - G_0(x_{k-1})), \alpha(1 - G_0(x_k)))\,. \] If \((u_1,u_2,\ldots,u_{k+1})\) is a draw from this Dirichlet distribution, then \((u_1,\ldots,\sum_{j=1}^iu_j,\ldots,\sum_{j=1}^k u_j)\) is a draw from the distribution of \((G(x_1),\ldots,G(x_i),\ldots,G(x_k))\).
The following correspond to realizations from a \(\textsf{DP}(\alpha, G_0\)), for \(\alpha\in\{0.1,1,10,100\}\) and \(G_0 = \textsf{N}(0,1)\). The solid black line corresponds to \(G_0\), while the color lines represent the corresponding realizations.
# simulation of several G such that G ~ DP(alpha, G_0 = N(0,1)) for several values of alpha
k <- 1000
alpha <- c(0.1,1,10,100)
G0 <- function(x) pnorm(x)
par(mfrow = c(2,2), mar = c(3,3,1.4,1.4), mgp = c(1.75,0.75,0))
set.seed(1)
for (i in 1:length(alpha)) {
plot(NA, NA, xlim = c(-3,3), ylim = c(0,1), xlab = "x", ylab = "G(x)", main = bquote(alpha == .(alpha[i])))
for (l in 1:10) {
x <- sort(runif(n = k, min = -3, max = 3))
a <- NULL
a[1] <- alpha[i]*G0(x[1])
a[k+1] <- alpha[i]*(1 - G0(x[k]))
for (j in 2:k)
a[j] <- alpha[i]*(G0(x[j]) - G0(x[j-1]))
u <- c(gtools::rdirichlet(n = 1, alpha = a))
lines(x = x, y = cumsum(u)[-(k+1)], type = "l", col = i)
}
curve(expr = G0(x), from = -3, to = 3, n = 1000, lwd = 2, add = TRUE)
}