1 Constructive definition

Consider a Dirichlet Process (DP) with concentration parameter \(\alpha\) and baseline distribution \(G_0\).

Let \(z_1,z_2,\ldots\) and \(\vartheta_1,\vartheta_2.\ldots\) are independent sequences of independent and identically distributed (i.i.d.) random variables such that \[ z_r\sim\textsf{Beta}(1,\alpha)\,,\quad r=1,2,\ldots \qquad\text{and}\qquad \vartheta_\ell\sim G_0\,,\quad \ell=1,2,\ldots \]

According to the constructive definition of the DP, a realization \(G\) from \(\textsf{DP}(\alpha,G_0)\) is given by \[ G(\cdot) = \sum_{\ell=1}^\infty \omega_\ell\,\delta_{\vartheta_\ell}(\cdot) \] where:

2 Stick-breaking process

It stick-breaking process provides a way to understand the infinite-dimensional nature of the DP, by generating a sequence of weights \(\omega_1,\omega_2,\ldots\) of a discrete probability distribution.

Under this setup, we say that \(\omega_1,\omega_2,\ldots\textsf{SB}(\alpha)\).

It can be shown that \(\sum_{\ell=1}^\infty \omega_\ell = 1\).

3 Countable mixtures of point masses

The DP generates distributions that can be represented as countable mixtures of point masses.

As a matter of fact, any distribution on \(\mathbb{R}^d\) can be approximated arbitrarily well using a countable mixture of point masses.

The constructive definition points out to the fact that the DP generates discrete distributions.

4 Simulation

The DP constructive definition yields another method to simulate from the DP up to a truncation approximation: \[ G(\cdot) \approx \sum_{\ell=1}^J p_\ell\,\delta_{\vartheta_\ell}(\cdot)\,, \] where \(p_\ell = \omega_\ell\), for \(\ell=1,\ldots,J\), and \(p_J = 1 - \sum_{\ell=1}^{J-1} p_\ell\).

It can be shown that \[ \textsf{E}\left( \sum_{\ell=1}^{J} \omega_\ell\right) = 1 - \left( \frac{\alpha}{\alpha+1} \right)^J\,. \] Thus, \(J\) can be chosen such that \((\alpha/(\alpha+1))^J=\epsilon\), for small \(\epsilon\), i.e., \(J=\log(\epsilon)/\log(\alpha/(\alpha+1))\).

The following correspond to a realization from a \(\textsf{DP}(\alpha, G_0\)), for \(\alpha= 25\) and \(G_0 = \textsf{N}(0,1)\). The solid black line corresponds to \(G_0\), while the blue line represents the corresponding realization.

In the left panel, the spiked lines are located at 1000 draws from \(G_0\) with heights given by the (truncated) stick-breaking weights. These spikes are then summed up to generate a sample path.

# simulation of several G such that G ~ DP(alpha = 25, G_0 = N(0,1))
# using the constructive definition of the DP
alpha <- 25
eps <- 1e-6
J <- round(log(eps)/log(alpha/(alpha+1)))
G0 <- function(x) pnorm(x)
set.seed(1)
vartheta <- rnorm(n = J)
z <- rbeta(n = J, shape1 = 1, shape2 = alpha)
omega <- NULL
omega[1] <- z[1]
for (j in 2:(J-1))
  omega[j] <- z[j]*prod(1 - z[1:(j-1)])
omega[J] <- 1 - sum(omega[1:(J-1)]) 
# plot
par(mfrow = c(1,2), mar = c(3,3,1.4,1.4), mgp = c(1.75,0.75,0))
# panel 1: atoms
plot(x = vartheta, y = omega, type = "h", xlim = c(-3,3), xlab = "x", ylab = expression(omega), col = 4)
# panel 2: G ~ DP(alpha = 25, G_0 = N(0,1))
x <- seq(from = -3, to = 3, len = 1000)
G <- NULL
for (j in 1:length(x))
  G[j] <- sum(omega[vartheta <= x[j]])
plot(x = x, y = G, type = "l", xlim = c(-3,3), col = 4)
curve(expr = G0(x), from = -3, to = 3, n = 1000, lwd = 2, add = TRUE)

5 References