Consider the location-scale Normal **Dirichlet Process Mixture Model* (DPMM)*: \[ f(y \mid G, \phi) = \int \textsf{N}(y \mid \theta, \sigma^2) \, \textsf{d}G(\theta,\sigma^2), \] where the mixing distribution \(G\) follows a DP process prior: \[ G \mid \alpha, \mu, \kappa, \beta \sim \textsf{DP}(\alpha, G_0 = \textsf{N}(\mu,\sigma^2/\kappa)\cdot\textsf{IG}(c,\beta)), \] with fixed \(c\).
We assume the following prior distributions: \[ \alpha \sim \textsf{G}(a_{\alpha}, b_{\alpha}),\quad \mu \sim \textsf{N}(a_{\mu}, b_{\mu}),\quad \kappa \sim \textsf{G}(a_{\kappa}, b_{\kappa}),\quad \beta \sim \textsf{G}(a_{\beta}, b_{\beta}). \]
Thus, the hierarchical representation of this semiparametric DPMM is: \[ \begin{align*} y_i \mid \theta_i, \sigma^2 &\overset{\text{ind}}{\sim} \textsf{N}(y_i \mid \theta_i, \sigma^2_i), \qquad i = 1, \dots, n, \\ (\theta_i,\sigma^2_i) \mid G &\overset{\text{i.i.d.}}{\sim} G, \qquad i = 1, \dots, n, \\ G \mid \alpha, \mu, \kappa,\beta &\sim \textsf{DP}(\alpha, G_0 = \textsf{N}(\mu,\sigma^2/\kappa)\cdot\textsf{IG}(c,\beta)), \\ \alpha, \mu, \kappa, \beta &\sim p(\alpha) \, p(\mu) \, p(\kappa) \, p(\beta), \end{align*} \] where the independent prior distributions \(p(\alpha)\), \(p(\mu)\), \(p(\kappa)\), and \(p(\beta)\) are specified as above.
To study inference under this model, we consider the galaxy dataset: velocities (km/second) for 82 galaxies, drawn from six well-separated conic sections of the Corona Borealis region.
# data
# https://rdrr.io/cran/rebmix/man/galaxy.html
data(galaxy, package = "rebmix")
y <- galaxy$Velocity
# histograma
hist(x = y, freq = F, nclass = 30, col = "lightgray", border = "lightgray",
xlab = "y", ylab = "Densidad", main = "Galaxy dataset")
By integrating \(G\) out, the CRP formulation leads to a finite mixture model where the cluster assignments \(\xi_i\) determine which cluster each observation \(y_i\) belongs to. The model can be rewritten in a hierarchical form as follows:
Observation model: Each observation \(y_i\) follows a Normal distribution with a mean \(\vartheta_{\xi_i}\) and variance \(\varsigma^2_{\xi_i}\) corresponding to its assigned cluster: \[ y_i \mid \xi_i, \vartheta_{\xi_i}, \varsigma^2_{\xi_i} \overset{\text{ind}}{\sim} \textsf{N}(\vartheta_{\xi_i}, \varsigma^2_{\xi_i}), \quad i = 1, \dots, n. \]
Cluster assignments prior: We use the CRP prior to define the distribution of cluster assignments \(\xi_i\): \[ p(\xi_i \mid \boldsymbol{\xi}_{-i}, \alpha) = \begin{cases} \frac{n_{-i,k}}{n - 1 + \alpha}, & \text{if } \xi_i = k \qquad\quad \,\, \text{ (existing cluster)}, \\ \frac{\alpha}{n - 1 + \alpha}, & \text{if } \xi_i = K_{-i} + 1 \,\,\, \text{ (new cluster)}. \end{cases} \]
Cluster mean prior: Each cluster-specific mean and variance \((\vartheta_k,\varsigma^2_k)\) is drawn from the base measure \(G_0\), which follows a Normal-Inverse-Gamma distribution: \[ \vartheta_k \mid \varsigma^2_k, \mu, \kappa, \overset{\text{ind.}}{\sim} \textsf{N}(\mu, \varsigma^2_k/\kappa), \quad \varsigma^2_k \mid \beta \overset{\text{i.i.d.}}{\sim} \textsf{IG}(c,\beta), \quad k = 1, \dots, K. \]
Priors for model parameters: To complete the hierarchical model, we place priors on the rest of model parameters: \[ \alpha \sim \textsf{G}(a_{\alpha}, b_{\alpha}),\quad \mu \sim \textsf{N}(a_{\mu}, b_{\mu}),\quad \kappa \sim \textsf{G}(a_{\kappa}, b_{\kappa}),\quad \beta \sim \textsf{G}(a_{\beta}, b_{\beta}). \]
By integrating out \(G\), the joint posterior distribution of the model parameters is:
\[ \begin{aligned} p(\boldsymbol{\xi}, \boldsymbol{\vartheta}, \boldsymbol{\varsigma}^2, \alpha, \mu, \kappa, \beta \mid \boldsymbol{y}) &\propto \prod_{i=1}^n \textsf{N}(y_i \mid \vartheta_{\xi_i}, \varsigma^2_{\xi_i}) \cdot \prod_{k=1}^K \textsf{N}(\vartheta_k \mid \mu, \varsigma^2_k / \kappa) \cdot \textsf{IG}(\varsigma^2_k \mid c, \beta) \\ &\quad \cdot p(\boldsymbol{\xi} \mid \alpha) \cdot \textsf{G}(\alpha \mid a_\alpha, b_\alpha) \cdot \textsf{N}(\mu \mid a_\mu, b_\mu) \cdot \textsf{G}(\kappa \mid a_\kappa, b_\kappa) \cdot \textsf{G}(\beta \mid a_\beta, b_\beta). \end{aligned} \]
We now derive the full conditional distributions for posterior inference via Gibbs sampling.
We want to derive: \[ p(\xi_i = k \mid \boldsymbol{\xi}_{-i}, \boldsymbol{\vartheta}, \boldsymbol{\varsigma}^2, \alpha, \mu, \kappa, \beta, \mathbf{y}) \propto p(\xi_i = k \mid \boldsymbol{\xi}_{-i}, \alpha) \cdot p(y_i \mid \vartheta_k, \varsigma^2_k) \]
Assigning \(\xi_i\) to an existing cluster \(k \in \{1, \dots, K_{-i} \}\): \[ p(\xi_i = k \mid \cdots) \propto \frac{n_{-i,k}}{n - 1 + \alpha} \cdot \textsf{N}(y_i \mid \vartheta_k, \varsigma^2_k) \]
Assigning \(\xi_i\) to a new cluster \(k^\ast = K_{-i} + 1\): In this case, the new cluster has no associated \((\vartheta_{k^\ast}, \varsigma^2_{k^\ast})\), so we must integrate them out under the base measure \(G_0\). That is: \[ p(y_i \mid \mu, \kappa, \beta) = \int \textsf{N}(y_i \mid \vartheta, \varsigma^2) \cdot \textsf{N}(\vartheta \mid \mu, \varsigma^2/\kappa) \cdot \textsf{IG}(\varsigma^2 \mid c, \beta) \, \textsf{d}\vartheta \, \textsf{d}\varsigma^2 = \textsf{t}_{2c} \left(y_i \mid \mu, \beta (1 + 1/\kappa)/c \right) \] So: \[ p(\xi_i = \ K_{-i} + 1 \mid \cdots) \propto \frac{\alpha}{n - 1 + \alpha} \cdot \textsf{t}_{2c} \left(y_i \mid \mu, \beta (1 + 1/\kappa)/c \right) \] If \(\xi_i\) is assigned to a new cluster, we must sample a new value for \((\vartheta_{K+1},\varsigma^2_{K+1})\) from its posterior distribution: \[ p(\vartheta,\varsigma^2 \mid y_i, \mu, \kappa, \beta) = p(\vartheta \mid y_i, \varsigma^2 \mu, \kappa) \, p(\varsigma^2 \mid y_i, \mu, \kappa, \beta) = \textsf{N}(\vartheta \mid \mu^*, \sigma^{*2}) \, \textsf{IG}(\varsigma^2 \mid \alpha^*,\beta^*), \] where \[ \mu^* = \frac{\kappa \mu + y_i}{\kappa + 1}, \quad \sigma^{*2} = \frac{\varsigma^2}{\kappa + 1}, \quad \alpha^* = cc + \frac{1}{2}, \quad \beta^* = \beta + \frac{1}{2}\,\frac{(y_i - \mu)^2}{1 + 1/\kappa}. \]
We want to derive: \[ p(\vartheta_k \mid \{y_i : \xi_i = k\}, \varsigma^2_k, \mu, \kappa) \propto \prod_{i : \xi_i = k} \textsf{N}(y_i \mid \vartheta_k, \varsigma^2_k) \cdot \textsf{N}(\vartheta_k \mid \mu, \varsigma^2_k / \kappa). \] This corresponds to: \[ \vartheta_k \mid \cdots \sim \textsf{N} \left( \frac{\kappa \mu + n_k \bar{y}_k}{\kappa + n_k}, \, \frac{\varsigma^2_k}{\kappa + n_k} \right) \] Here, the term \(\bar{y}_k\) represents the sample mean of all observations assigned to cluster \(k\): \[ \bar{y}_k = \frac{1}{n_k} \sum_{i: \xi_i = k} y_i. \] where \(n_k\) is the number of observations assigned to cluster \(k\), i.e., \(n_k = \sum_{i} \mathbb{1}(\xi_i = k)\).
We want to derive: \[ p(\varsigma^2_k \mid \{y_i : \xi_i = k\}, \vartheta_k, \mu, \kappa, \beta) \propto \prod_{i : \xi_i = k} \textsf{N}(y_i \mid \vartheta_k, \varsigma^2_k) \cdot \textsf{N}(\vartheta_k \mid \mu, \varsigma^2_k / \kappa \cdot \textsf{IG}(\varsigma_k^2 \mid c, \beta) \] This corresponds to: \[ \varsigma_k^2 \mid \cdots \sim \textsf{IG} \left( c + \frac{n_k + 1}{2}, \, \beta + \frac{\kappa (\vartheta_k - \mu)^2 + \sum_{i : \xi_i = k} (y_i - \vartheta_k)^2}{2} \right). \]
We want to derive: \[ p(\mu \mid \boldsymbol{\vartheta}, \boldsymbol{\varsigma}^2, \kappa) \propto \prod_{k=1}^K \textsf{N}(\vartheta_k \mid \mu, \varsigma_k^2 / \kappa) \cdot \textsf{N}(\mu \mid a_\mu, b_\mu). \] This corresponds to \(\mu \mid \cdots \sim \textsf{N}(M,V)\), where: \[ M = \frac{\frac{1}{b_\mu} \, a_\mu + \sum_{k=1}^K \frac{\kappa}{\varsigma_k^2} \, \vartheta_k}{\frac{1}{b_\mu} + \sum_{k=1}^K \frac{\kappa}{\varsigma_k^2}}, \qquad V = \frac{1}{\frac{1}{b_\mu} + \sum_{k=1}^K \frac{\kappa}{\varsigma_k^2}}. \]
\[ \kappa \mid \cdots \sim \textsf{G}\left( a_\kappa + \frac{K}{2}, \, b_\kappa + \frac{1}{2} \sum_{k=1}^K \frac{(\vartheta_k - \mu)^2}{\varsigma_k^2} \right). \]
\[ \beta \mid \cdots \sim \textsf{G}\left( a_\beta + cK, \, b_\beta + \sum_{k=1}^K \frac{1}{\varsigma_k^2} \right) \]
sample_xi <- function(y, xi, theta, sigma2, alpha, mu, kappa, beta, cc) {
n <- length(y)
for (i in 1:n) {
# Temporarily remove current assignment
xi_new <- xi
xi_new[i] <- 0
# Relabel clusters
unique_xi <- unique(xi_new[-i])
unique_xi <- unique_xi[order(unique_xi)]
xi_new[-i] <- as.numeric(factor(xi_new[-i], levels = unique_xi, labels = seq_along(unique_xi)))
K_new <- max(c(0, xi_new)) # in case all elements are zero
# Keep only active cluster parameters
theta <- theta[unique_xi]
sigma2 <- sigma2[unique_xi]
log_probs <- numeric(K_new + 1)
# Probabilities for existing clusters
for (k in 1:K_new) {
n_k <- sum(xi_new == k)
log_probs[k] <- log(n_k) + dnorm(y[i], mean = theta[k], sd = sqrt(sigma2[k]), log = TRUE)
}
# Predictive for new cluster: Student-t(2cc, location = mu, scale^2 = beta(1 + 1/kappa)/cc)
df <- 2 * cc
scale2 <- beta * (1 + 1/kappa) / cc
log_probs[K_new + 1] <- log(alpha) + dt((y[i] - mu)/sqrt(scale2), df = df, log = TRUE) - 0.5 * log(scale2)
# Sample new assignment
new_cluster <- sample(1:(K_new + 1), size = 1, prob = exp(log_probs - max(log_probs)))
xi_new[i] <- new_cluster
# If new cluster
if (new_cluster == (K_new + 1)) {
# Sample sigma2
shape <- cc + 0.5
rate <- beta + 0.5 * (y[i] - mu)^2 / (1 + 1/kappa)
new_sigma2 <- 1 / rgamma(1, shape = shape, rate = rate)
# Sample theta
mu_star <- (kappa * mu + y[i]) / (kappa + 1)
tau2_star <- new_sigma2 / (kappa + 1)
new_theta <- rnorm(1, mean = mu_star, sd = sqrt(tau2_star))
theta <- c(theta, new_theta)
sigma2 <- c(sigma2, new_sigma2)
}
xi <- xi_new
}
return(list(xi = xi, theta = theta, sigma2 = sigma2))
}
sample_theta <- function(y, xi, sigma2, mu, kappa)
{
K <- max(xi)
theta <- numeric(K)
for (k in 1:K) {
y_k <- y[xi == k]
n_k <- length(y_k)
sum_k <- sum(y_k)
tau2_star <- sigma2[k] / (kappa + n_k)
mu_star <- (kappa * mu + sum_k) / (kappa + n_k)
theta[k] <- rnorm(1, mean = mu_star, sd = sqrt(tau2_star))
}
return(theta)
}
sample_sigma2 <- function(y, xi, theta, mu, kappa, beta, cc)
{
K <- max(xi)
sigma2 <- numeric(K)
for (k in 1:K) {
y_k <- y[xi == k]
n_k <- length(y_k)
shape <- cc + 0.5 * (n_k + 1)
rate <- beta + 0.5 * (sum((y_k - theta[k])^2) + kappa * (theta[k] - mu)^2)
sigma2[k] <- 1 / rgamma(1, shape = shape, rate = rate)
}
return(sigma2)
}
sample_mu <- function(theta, sigma2, kappa, a_mu, b_mu)
{
K <- length(theta)
precision_terms <- kappa / sigma2
V_mu <- 1 / (1 / b_mu + sum(precision_terms))
M_mu <- V_mu * (a_mu / b_mu + sum(precision_terms * theta))
return(rnorm(1, mean = M_mu, sd = sqrt(V_mu)))
}
sample_kappa <- function(theta, sigma2, mu, a_kappa, b_kappa)
{
K <- length(theta)
shape <- a_kappa + 0.5 * K
rate <- b_kappa + 0.5 * sum((theta - mu)^2 / sigma2)
return(rgamma(1, shape = shape, rate = rate))
}
sample_beta <- function(sigma2, cc, a_beta, b_beta)
{
K <- length(sigma2)
shape <- a_beta + cc * K
rate <- b_beta + sum(1 / sigma2)
return(rgamma(1, shape = shape, rate = rate))
}
sample_alpha <- function(alpha, xi, n, a_alpha, b_alpha)
{
K <- length(unique(xi))
eta <- rbeta(1, shape1 = alpha + 1, shape2 = n)
delta <- (a_alpha + K - 1)/(a_alpha + K - 1 + n*(b_alpha - log(eta)))
if (runif(1) < delta) {
return(rgamma(1, shape = a_alpha + K, rate = b_alpha - log(eta)))
} else {
return(rgamma(1, shape = a_alpha + K - 1, rate = b_alpha - log(eta)))
}
}
gibbs_sampler_bnp <- function(y, iter, burn,
a_alpha, b_alpha,
a_mu, b_mu,
a_kappa, b_kappa,
a_beta, b_beta,
cc) {
n <- length(y)
max_K <- floor(n / 2)
# Storage
B <- iter - burn
xi_store <- matrix(NA, nrow = B, ncol = n)
mu_store <- numeric(B)
kappa_store <- numeric(B)
beta_store <- numeric(B)
alpha_store <- numeric(B)
theta_store <- vector("list", B)
sigma2_store <- vector("list", B)
ll_store <- numeric(iter)
# Initialization
alpha <- rgamma(1, a_alpha, b_alpha)
mu <- rnorm(1, a_mu, sqrt(b_mu))
kappa <- rgamma(1, a_kappa, b_kappa)
beta <- rgamma(1, a_beta, b_beta)
xi <- sample(1:max_K, n, replace = TRUE)
xi <- as.numeric(factor(xi, levels = unique(xi), labels = seq_along(unique(xi))))
K <- length(unique(xi))
sigma2 <- 1 / rgamma(K, shape = cc, rate = beta)
theta <- rnorm(K, mean = mu, sd = sqrt(sigma2 / kappa))
for (t in 1:iter) {
# Sample xi (updates theta and sigma2 if a new cluster is formed)
xi_update <- sample_xi(y, xi, theta, sigma2, alpha, mu, kappa, beta, cc)
xi <- xi_update$xi
theta <- xi_update$theta
sigma2 <- xi_update$sigma2
# Sample cluster parameters
theta <- sample_theta(y, xi, sigma2, mu, kappa)
sigma2 <- sample_sigma2(y, xi, theta, mu, kappa, beta, cc)
# Sample hyperparameters
mu <- sample_mu(theta, sigma2, kappa, a_mu, b_mu)
kappa <- sample_kappa(theta, sigma2, mu, a_kappa, b_kappa)
beta <- sample_beta(sigma2, cc, a_beta, b_beta)
alpha <- sample_alpha(alpha, xi, n, a_alpha, b_alpha)
ll_store[t] <- sum(dnorm(y, mean = theta[xi], sd = sqrt(sigma2[xi]), log = T))
# Store post-burn-in samples
if (t > burn) {
i <- t - burn
xi_store[i, ] <- xi
mu_store[i] <- mu
kappa_store[i] <- kappa
beta_store[i] <- beta
alpha_store[i] <- alpha
theta_store[[i]] <- theta
sigma2_store[[i]] <- sigma2
}
# Progress update
if (t %% (iter / 10) == 0) {
cat("Iteration", t, "/", iter, "| Clusters:", length(unique(xi)), "\n")
}
}
return(list(
xi = xi_store,
theta = theta_store,
sigma2 = sigma2_store,
mu = mu_store,
kappa = kappa_store,
beta = beta_store,
alpha = alpha_store,
ll = ll_store
))
}
# Hyperparameters for location-scale BNP model
a_alpha <- 1
b_alpha <- 1
a_mu <- mean(y)
b_mu <- 2 * var(y)
a_kappa <- 2
b_kappa <- 1
a_beta <- 2
b_beta <- 1
cc <- 2
# Gibbs sampling settings
iter <- 25000
burn <- 5000
# Run Gibbs sampler
samples <- gibbs_sampler_bnp(y, iter, burn,
a_alpha, b_alpha,
a_mu, b_mu,
a_kappa, b_kappa,
a_beta, b_beta,
cc)
# Save output
save(samples, file = "samples_bnp_location_scale.RData")
Taking the logarithm of the likelihood at iteration \(b\), we obtain:
\[
\log p(y \mid \theta^{(b)}, \phi^{(b)}) = \sum_{i=1}^{n} \log
\textsf{N}(y_i \mid \theta_{\xi_i}^{(b)}, \varsigma_{\xi_i}^{2\,(b)}),
\quad b = 1,\ldots,B,
\] where \(B\) is the number of
posterior samples.
# Plot log-likelihood trace
par(mfrow = c(1,1), mar = c(3,3,1.4,1.4), mgp = c(1.75,0.75,0))
plot(samples$ll, type = "p", pch = ".", col = "blue",
main = "Log-Likelihood Trace Plot",
xlab = "Iteration", ylab = "Log-Likelihood")
We estimate the posterior distribution of clusters by computing \(K^{(b)} = \max \boldsymbol{\xi}^{(b)}\) at each iteration. The normalized frequencies approximate \(p(K \mid \mathbf{y})\), which we visualize with a histogram.
# Compute the number of clusters K at each iteration
K_values <- apply(samples$xi, 1, function(x) length(unique(x)))
# Compute the posterior distribution of K
K_table <- table(K_values) / length(K_values)
par(mfrow = c(1,1), mar = c(3,3,1.4,1.4), mgp = c(1.75,0.75,0))
plot(as.numeric(names(K_table)), K_table, type = "h", lwd = 3, col = "blue",
main = "Posterior distribution of number of clusters",
xlab = "Number of clusters", ylab = "Density",
yaxt = "n") # Suppress default y-axis
axis(2, at = pretty(K_table), labels = pretty(K_table))
In a DPMM, the number of clusters \(K\) varies across posterior samples. So, the density estimate can be computed as: \[ \hat{g}(x) = \frac{1}{B} \sum_{b=1}^B \sum_{k=1}^{K^{(b)}} \frac{n_k^{(b)}}{n} \, \textsf{N}(x \mid \theta_k^{(b)}, \phi^{(b)}). \] This estimator combines posterior samples to estimate the population density function** while incorporating uncertainty in both the number of clusters and parameter values.
posterior_density_estimate <- function(samples, x_seq) {
B <- length(samples$theta)
n <- ncol(samples$xi)
M <- length(x_seq)
# Store density estimates
FE <- matrix(NA, nrow = B, ncol = M)
for (b in 1:B) {
xi_b <- samples$xi[b, ]
theta_b <- samples$theta[[b]]
sigma2_b <- samples$sigma2[[b]]
# Unique cluster IDs
cluster_labels <- sort(unique(xi_b))
cluster_counts <- as.numeric(table(xi_b)[as.character(cluster_labels)])
# Cluster probabilities
weight_b <- cluster_counts / sum(cluster_counts)
for (i in 1:M) {
FE[b, i] <- sum(weight_b * dnorm(x_seq[i], mean = theta_b, sd = sqrt(sigma2_b)))
}
}
f_hat <- colMeans(FE) # Posterior mean density
f_inf <- apply(FE, 2, quantile, probs = 0.025) # 2.5% credible interval
f_sup <- apply(FE, 2, quantile, probs = 0.975) # 97.5% credible interval
return(list(f_hat = f_hat, f_inf = f_inf, f_sup = f_sup))
}
# Define a sequence of x values for density estimation
x_seq <- seq(min(y) - 1, max(y) + 1, length.out = 250)
# Compute posterior density estimate and credible intervals
density_estimate <- posterior_density_estimate(samples, x_seq)
# Extract results
f_hat <- density_estimate$f_hat
f_inf <- density_estimate$f_inf
f_sup <- density_estimate$f_sup
# Plot the histogram
par(mfrow = c(1,1), mar = c(3,3,1.4,1.4), mgp = c(1.75,0.75,0))
hist(y, breaks = 30, freq = FALSE, col = "lightgray", border = "lightgray",
main = "Posterior predictive density estimate",
xlab = "x", ylab = "Density", ylim = c(0, max(f_sup)))
# Add the 95% credible interval as a shaded region
polygon(c(x_seq, rev(x_seq)), c(f_inf, rev(f_sup)), col = rgb(0, 0, 1, 0.2), border = NA)
# Overlay the posterior density estimate as a blue line
lines(x_seq, f_hat, col = "blue", lwd = 2)
The co-clustering matrix \(\mathbf{A} = [a_{i,j}]\) is an \(n \times n\) symmetric matrix that represents the posterior probability that observations \(i\) and \(j\) belong to the same cluster. It is defined as: \[ a_{i,j} = \textsf{Pr}(\xi_i = \xi_j \mid \boldsymbol{y}) \approx \frac{1}{B} \sum_{b=1}^{B} I\left\{ \xi_i^{(b)} = \xi_j^{(b)} \right\}, \] where \(I(A)\) is the indicator function that equals 1 if \(A\) is true and 0 otherwise.
Since cluster assignments are identical in both directions, \(\mathbf{A}\) is symmetric: \[ \textsf{Pr}(\xi_i = \xi_j \mid \boldsymbol{y}) = \textsf{Pr}(\xi_j = \xi_i \mid \boldsymbol{y}). \] Furthermore, the diagonal elements satisfy \(a_{i,i} = 1\) because each observation is always in the same cluster as itself.
# Compute co-clustering matrix A
compute_co_clustering_matrix <- function(samples) {
n <- ncol(samples$xi)
B <- nrow(samples$xi)
A <- matrix(0, nrow = n, ncol = n)
for (b in 1:B) {
xi_b <- samples$xi[b, ]
for (k in unique(xi_b)) {
cluster_members <- which(xi_b == k)
A[cluster_members, cluster_members] <- A[cluster_members, cluster_members] + 1
}
}
A <- A / B
diag(A) <- 1
return(A)
}
# Compute and reorder A based on true partition
A <- compute_co_clustering_matrix(samples)
# Convert A to dissimilarity
D <- 1 - A
suppressMessages(suppressWarnings(library(mclust)))
# Apply classical multidimensional scaling
mds_coords <- cmdscale(as.dist(D), k = 4)
# Run Mclust on the embedded coordinates
clustering_result <- Mclust(mds_coords, G = 8, verbose = FALSE)
# Optimal partition
components <- clustering_result$classification
# Sort observations by true partition
A <- A[order(components), order(components)]
# Visualize co-clustering matrix
par(mar = c(2.75, 2.75, 0.5, 0.5), mgp = c(1.7, 0.7, 0))
colorscale <- c("white", rev(heat.colors(100)))
corrplot::corrplot(A, is.corr = FALSE, method = "color", col = colorscale,
tl.pos = "n", addgrid.col = NA)