The Pólya urn scheme can be seen as a discrete-time analog to the Dirichlet process, providing an intuitive way to understand how the Dirichlet process (DP) generates samples.
Consider a precision parameter \(\alpha\) and a baseline distribution \(G_0\):
Initialization: Start with an empty urn.
Process:
Formaly, consider the measurable space \((\mathbb{R},\mathcal{B})\), where \(\mathcal{B}\) is the Borel \(\sigma\)-algebra.
[Definition] The sequence of random variables \(X_1,X_2,\ldots\) is a Pólya sequence with parameters \(\alpha\) (a positive scalar) and \(G_0\) (a probability distribution) if for any \(B\in\mathcal{B}\) it follows that \(\textsf{P}(X_1\in B) = G_0(B)\) and \[ \textsf{P}(X_{n+1}\in B\mid X_1,\ldots,X_n) = \frac{\alpha}{\alpha+n}G_0(B) + \frac{1}{\alpha + n}\sum_{i=1}^n\delta_{X_i}(B)\,, \] where \(\delta_{X_i}(B) = 1\) if \(X_i\in B\) and \(\delta_{X_i}(B) = 0\) if \(X_i\notin B\), for \(i=1,\ldots,n\).
If \(X_1,X_2,\ldots\) is a Pólya sequence with parameters \(\alpha\) and \(G_0\), then:
If \(x_i\mid G \stackrel{\text{iid}}{\sim}G\), for \(i=1,\ldots,n\), and \(G\sim\textsf{DP}(\alpha,G_0)\), then the conditional distribution of \(x_i\mid x_1,\ldots,x_{i-1}\) after integrating \(G\) out is given by: \[ p(x_i\mid x_1,\ldots,x_{i-1}) = \frac{\alpha}{\alpha+i-1}G_0(x_i) + \frac{1}{\alpha+i-1}\sum_{j=1}^{i-1} \delta_{x_j}(x_i)\,,\qquad i=2,\ldots,n\,, \] and therefore, the marginal distribution of \(x_1,\ldots,x_n\) is given by: \[ p(x_1,\ldots,x_n) = G_0(x_1)\prod_{i=2}^n\left(\frac{\alpha}{\alpha+i-1}G_0(x_i) + \frac{1}{\alpha+i-1}\sum_{j=1}^{i-1} \delta_{x_j}(x_i) \right)\,, \] where \(\delta_{x_j}(x_i) = 1\) if \(x_i=x_j\) and \(\delta_{x_j}(x_i) = 0\) if \(x_i\neq x_j\).
This means that \(x_1,\ldots,x_n\) follows a **Pólya urn scheme* such that:
Under this setup, we say that \(x_1,\ldots,x_n\sim\textsf{PU}(\alpha,G_0)\).
The \(x_1,\ldots,x_n\) are exchangeable (and hence have the same marginals and full conditional distributions), but they are not independent.
If \(x_i\in\mathbb{R}\), for \(i=1,\ldots,n\), then \[ \textsf{Cor}(x_i,x_j) = \textsf{P}(x_i = x_j) = \frac{1}{\alpha+1}\,,\qquad i\neq j\,. \]
The Chinese restaurant process (CRP) is an analogy that helps to understand the DP. It is a metaphorical process used to describe the behavior of customers entering a Chinese restaurant and choosing tables.
Imagine a Chinese restaurant with an infinite number of tables, where customers enter one by one and choose tables according to the following rules:
# simulation of a CRP with alpha = 25 and n = 1000 customers
# tab[i] corresponds to the table number of customer i
alpha <- 25
n <- 1000
tab <- NULL
tab[1] <- 1
set.seed(1)
for (i in 2:n) {
p <- c(table(tab), alpha)
tab[i] <- sample(x = 1:length(p), size = 1, prob = p)
}
# number of customers per table (first 10 tables)
mk <- cbind(1:length(table(tab)), table(tab))
colnames(mk) <- c("Table","# of customers")
print(head(mk, n = 10))
## Table # of customers
## 1 1 83
## 2 2 142
## 3 3 112
## 4 4 89
## 5 5 61
## 6 6 30
## 7 7 18
## 8 8 9
## 9 9 23
## 10 10 44
# number of tables per number of customers
mt <- table(table(tab))
mt <- cbind(as.numeric(names(mt)), mt)
colnames(mt) <- c("# of customers", "# of tables")
print(mt)
## # of customers # of tables
## 1 1 17
## 2 2 10
## 3 3 7
## 4 4 5
## 5 5 5
## 6 6 6
## 7 7 6
## 9 9 3
## 10 10 2
## 11 11 1
## 12 12 2
## 15 15 1
## 17 17 1
## 18 18 3
## 20 20 1
## 23 23 2
## 24 24 1
## 30 30 1
## 44 44 1
## 61 61 1
## 83 83 1
## 89 89 1
## 112 112 1
## 142 142 1