1 Introduction

Prerequisites

Properties of the Dirichlet distribution.

Bayesian parametric inference, including hierarchical modeling.

Simulation-based inference using Markov chain Monte Carlo.

Basic concepts of measure theory and stochastic processes.

Parametric Bayes

Let \(\mathcal{X}\) be the sample space of a random variable \(x\) such that \(x\mid\boldsymbol{\theta} \sim G(\boldsymbol{\theta})\), with \(G\in\mathcal{P} = \{G(\cdot;\boldsymbol{\theta}):\boldsymbol{\theta}\in\Theta\subseteq\mathbb{R}^k\}\) is a collection of specific parametric distributions on \(\mathcal{X}\).

Parametric Bayes requires specifying a prior on \(\Theta\) (finite-dimensional space).

For example: \(\mathcal{X} = \mathbb{R}\), \(G(\boldsymbol{\theta}) = \textsf{N}(\mu,\sigma^2)\), \(\boldsymbol{\theta} = (\mu,\sigma^2)\), \(\Theta = \mathbb{R}\times\mathbb{R}^+\).

Nonparametric Bayes

However, \(\mathcal{P}\) is small compared to \(\{G:G\in\mathcal{G}\}\), with \(\mathcal{G}\) es a subset of the collection of all distributions on \(\mathcal{X}\).

Nonparametric Bayes requires specifying a prior on \(\{G:G\in\mathcal{G}\}\) (infinite-dimensional space).

How to choose \(\mathcal{G}\)? How to specify the prior on \(G\)?

Intuitively, the main idea consists in defining such a prior as a stochastic processes whose sample paths correspond to distribution functions on \(\mathcal{X}\) (e.g., \(\mathcal{X} = \mathbb{R}\)), which is equipped with a \(\sigma\)-algebra \(\mathcal{B}\) (e.g., the Borel \(\sigma\)-algebra).

“Bayesian nonparametric models” is an oxymoron: We should really say “Bayesian models with an infinite number of parameters”.

Even though the focus lies on priors for distributions, these methods are general (e.g., link functions).

2 The Dirichlet Process

The Dirichlet process (DP) is the first prior defined for spaces of distributions.

The DP generates random probability measures on \((\mathcal{X},\mathcal{B})\), and therefore, random distributions on \(\mathcal{X}\).

Definition of the DP

[Definition] The DP is a random probability measure on the space of probability measures on \((\mathcal{X},\mathcal{B})\) that generates random probability measures \(Q\) on \((\mathcal{X},\mathcal{B})\) (random distributions \(G\) on \(\mathcal{X}\)) such that for any finite measurable partition \(B_1,\ldots,B_k\) of \(\mathcal{X}\), it follows that \[ (Q(B_1),\ldots,Q(B_k))\sim\textsf{Dirichlet}(\alpha Q_0(B_1),\ldots,\alpha Q_0(B_k))\,, \] where \(\alpha\) is a positive scalar and \(Q_0\) is a probability measure on \((\mathcal{X},\mathcal{B})\) that defines a distribution \(G_0\) on \(\mathcal{X}\).

  • The DP is characterized by two parameters: \(\alpha\) and \(Q_0\).
  • Note that \(Q(B_i)\) (a random variable) and \(Q_0(B_i)\) (a constant) denote the measure (probability) of \(B_i\) under \(Q\) and \(Q_0\), respectively.
  • Analogous definition for the random distribution \(G\) on \(\mathcal{X}\) generated from a DP with parameters \(\alpha\) and \(G_0\).
  • \(G \sim \textsf{DP}(\alpha, G_0)\) indicates that \(G\) is generated according to a DP prior.

Interpreting the parameters of the DP

For any measurable subset \(B\) of \(\mathcal{X}\), it follows from the definition that \(Q(B)\sim \textsf{Beta}(\alpha Q_0(B), \alpha Q_0(B^c))\), and therefore: \[ \textsf{E}(Q(B)) = Q_0(B) \qquad\text{and}\qquad \textsf{Var}(Q(B)) = \frac{Q_0(B)(1-Q_0(B))}{\alpha+1}\,. \]

  • \(Q_0\) is the center of the DP, known as baseline probability measure or baseline distribution.
  • \(\alpha\) is the precision parameter of the DP, meaning that the smaller(larger) \(\alpha\) is, the closer(further) a realization from the process \(Q\) is to \(Q_0\).
  • The support of the DP contains all probability measures on that are absolutely continuous with respect to \(Q_0\).

Example

Consider \(\mathcal{X} = \mathbb{R}\), \(B = (-\infty,x]\), with \(x\in\mathbb{R}\). It follows that \(Q(B) = G(x)\sim \textsf{Beta}(\alpha G_0(B), \alpha(1-G_0(x)))\), and therefore: \[ \textsf{E}(G(x)) = G_0(x) \qquad\text{and}\qquad \textsf{Var}(G(x)) = \frac{G_0(x)(1-G_0(x))}{\alpha+1}\,. \]

Example

Consider \(G \sim \textsf{DP}(\alpha, G_0)\) and any grid \(x_1 < x_2 < \ldots < x_k\) in \(\mathcal{X}\subseteq\mathbb{R}\). Then, the random vector (increment process): \[ (G(x_1), G(x_2) - G(x_1),\ldots,G(x_{k}) - G(x_{k-1}), 1 - G(x_k))\in \text{Simplex}(\mathbb{R}^{k+1}) \] follows a a Dirichlet distribution with parameter vector \[ (\alpha G_0(x_1), \alpha(G_0(x_2) - G_0(x_1)),\ldots,\alpha(G_0(x_{k}) - G_0(x_{k-1})), \alpha(1 - G_0(x_k)))\,. \] If \((u_1,u_2,\ldots,u_{k+1})\) is a draw from this Dirichlet distribution, then \((u_1,\ldots,\sum_{j=1}^iu_j,\ldots,\sum_{j=1}^k u_j)\) is a draw from the distribution of \((G(x_1),\ldots,G(x_i),\ldots,G(x_k))\).

3 Simulation

The following correspond to realizations from a \(\textsf{DP}(\alpha, G_0\)), for \(\alpha\in\{0.1,1,10,100\}\) and \(G_0 = \textsf{N}(0,1)\). The solid black line corresponds to \(G_0\), while the color lines represent the corresponding realizations.

# simulation of several G such that G ~ DP(alpha, G_0 = N(0,1)) for several values of alpha
k <- 1000
alpha <- c(0.1,1,10,100)
G0 <- function(x) pnorm(x)
par(mfrow = c(2,2), mar = c(3,3,1.4,1.4), mgp = c(1.75,0.75,0))
set.seed(1)
for (i in 1:length(alpha)) {
  plot(NA, NA, xlim = c(-3,3), ylim = c(0,1), xlab = "x", ylab = "G(x)", main = bquote(alpha == .(alpha[i])))
    for (l in 1:10) {
      x <- sort(runif(n = k, min = -3, max = 3))
      a <- NULL
      a[1] <- alpha[i]*G0(x[1]) 
      a[k+1] <- alpha[i]*(1 - G0(x[k]))
      for (j in 2:k)
        a[j] <- alpha[i]*(G0(x[j]) - G0(x[j-1]))
      u <- c(gtools::rdirichlet(n = 1, alpha = a))
      lines(x = x, y = cumsum(u)[-(k+1)], type = "l", col = i)
    }
  curve(expr = G0(x), from = -3, to = 3, n = 1000, lwd = 2, add = TRUE)
}

4 References