Dynamic modeling (project 1)

The objective of this project is twofold. First, to adapt the hierarchical model from the original article by incorporating Lasso regression, Ridge regression, and Elastic Net regression into a static framework. Second, to develop dynamic adaptations of each of these models. The motivation for this work arises from the ICFES dataset, which includes grouping factors (e.g., departments), features a large number of covariates (hence the need for regularization), and is also indexed over time.

It is essential to theoretically compare the models, demonstrating why they are effective for handling a large number of covariates. This involves developing and implementing the Gibbs sampler (which may include Metropolis-Hastings steps) and empirically comparing the models through application, using test statistics, information criteria, and cross-validation. Additionally, in the application, we can analyze the evaluation of covariates over time and test substantive hypotheses about how these covariates influence the test outcome.

Hierarchical Normal Linear Regression Model (HLRM)

This model is fully developed in Sosa, J., & Aristizabal, J. P. (2022). Some Developments in Bayesian Hierarchical Linear Regression Modeling. Revista Colombiana de Estadística, 45(2), 231–255.

Model Specification

\[ y_{i,j} \mid \boldsymbol{x}_{i,j}, \boldsymbol{\beta}_j, \sigma_j^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j}^\top \boldsymbol{\beta}_j, \sigma_j^2), \quad i = 1, \ldots, n_j, \quad j = 1, \ldots, m, \] where:

  • \(\boldsymbol{x}_{i,j} = (x_{i,j,1}, \ldots, x_{i,j,p})^\top\) is the vector of covariates for observation \(i\) in group \(j\),
  • \(\boldsymbol{\beta}_j = (\beta_{1,j}, \ldots, \beta_{p,j})^\top\) are group-specific regression coefficients,
  • \(\sigma_j^2\) is the group-specific variance.

Prior Specification

\[ \begin{align*} \boldsymbol{\beta}_j \mid \boldsymbol{\beta}, \boldsymbol{\Sigma} &\overset{\text{i.i.d.}}{\sim} \textsf{N}_p(\boldsymbol{\beta}, \boldsymbol{\Sigma}), \\ \sigma_j^2 \mid \nu, \sigma^2 &\overset{\text{i.i.d.}}{\sim} \textsf{IG}(\nu / 2, \nu \sigma^2 / 2), \\ \boldsymbol{\beta} &\sim \textsf{N}_p(\boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0), \\ \boldsymbol{\Sigma} &\sim \textsf{IW}(n_0, \mathbf{S}_0^{-1}), \\ p(\nu) &\propto e^{-\lambda_0 \nu}, \\ \sigma^2 &\sim \textsf{G}(a_0, b_0). \end{align*} \]

Model Parameters

The unknown parameters in the model are: \[ \Theta = (\boldsymbol{\beta}_1, \ldots, \boldsymbol{\beta}_m, \sigma_1^2, \ldots, \sigma_m^2, \boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma^2, \nu). \]

Note: \(\nu\) is restricted to the set \(\{1, 2, \dots\}\).

Sparse Hierarchical Linear Regression Model (SHLRM)

The Sparse Hierarchical Linear Regression Model (SHLRM) extends the HLRM by incorporating Lasso regularization, which promotes sparsity in the group-specific regression coefficients. Sparsity allows the model to identify and retain only the most relevant predictors, effectively reducing noise and enhancing interpretability.

Each coefficient \(\beta_{k,j}\) follows a Laplace prior: \[ \beta_{k,j} \mid \lambda \overset{\text{i.i.d.}}{\sim} \textsf{Laplace}(0, \lambda^{-1}), \quad k = 1, \ldots, p, \, j = 1, \ldots, m. \] To adaptively control the degree of sparsity, the Lasso shrinkage parameter \(\lambda\) is assigned a Gamma prior: \[ \lambda \sim \textsf{Gamma}(a_\lambda, b_\lambda). \]

Ridge Hierarchical Linear Regression Model (RHLRM)

The Ridge Hierarchical Linear Regression Model (RHLRM) extends the HLRM by incorporating Ridge regularization, which shrinks group-specific regression coefficients towards zero to improve stability and prevent overfitting.

Each coefficient \(\beta_{k,j}\) follows a Normal prior with mean zero and variance controlled by a regularization parameter \(\tau^2\): \[ \beta_{k,j} \mid \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2), \quad k = 1, \ldots, p, \, j = 1, \ldots, m. \] The regularization parameter \(\tau^2\) is assigned an Inverse-Gamma prior: \[ \tau^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]

Unlike the Lasso extension in SHLRM, which promotes sparsity by forcing some coefficients to zero, RHLRM applies continuous shrinkage, ensuring all coefficients remain nonzero. This makes it ideal when all predictors are expected to contribute to the response.

Elastic Net Hierarchical Linear Regression Model (ENHLRM)

The Elastic Net Hierarchical Linear Regression Model (ENHLRM) extends the HLRM by combining Lasso and Ridge regularization. It promotes sparsity by forcing some coefficients to zero (like Lasso) while applying continuous shrinkage (like Ridge) to improve stability.

Each group-specific regression coefficient \(\beta_{k,j}\) follows a Normal prior: \[ \beta_{k,j} \mid \lambda, \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2 / \lambda), \quad k = 1, \ldots, p, \, j = 1, \ldots, m, \] where \(\lambda\) and \(\tau^2\) balance the Lasso and Ridge penalties.

The regularization parameters \(\lambda\) and \(\tau^2\) have the following priors: \[ \lambda \sim \textsf{Gamma}(a_{\lambda}, b_{\lambda}), \quad \tau^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]

Dynamic Hierarchical Normal Linear Regression Model (Dynamic HLRM)

Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:

  • \(y_{i,j,t}\) is the response variable for observation \(i\) in group \(j\) at time \(t\),
  • \(\boldsymbol{x}_{i,j,t} = (x_{i,j,t,1}, \ldots, x_{i,j,t,p})^\top\) is the vector of covariates,
  • \(\boldsymbol{\beta}_{j,t} = (\beta_{j,t,1}, \ldots, \beta_{j,t,p})^\top\) are the time-varying group-specific regression coefficients,
  • \(\sigma_{j,t}^2\) is the time-varying group-specific variance.

Dynamic Priors

  • Time evolution of group-specific coefficients: \[ \boldsymbol{\beta}_{j,t} \mid \boldsymbol{\beta}_{j,t-1}, \boldsymbol{\Sigma} \sim \textsf{N}_p(\boldsymbol{\beta}_{j,t-1}, \boldsymbol{\Sigma}), \] where \(\boldsymbol{\Sigma}\) is the shared covariance matrix.

  • Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]

Global Priors

  • Global regression coefficients: \[ \boldsymbol{\beta}_{t} \sim \textsf{N}_p(\boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0). \]

  • Covariance matrix for group-specific coefficients: \[ \boldsymbol{\Sigma} \sim \textsf{IW}(n_0, \mathbf{S}_0^{-1}). \]

  • Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]

  • Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]

Model Parameters

The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \boldsymbol{\beta}_t, \boldsymbol{\Sigma}, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]

Dynamic Sparse Hierarchical Linear Regression Model (Dynamic SHLRM)

Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:

  • \(y_{i,j,t}\) is the response variable for observation \(i\) in group \(j\) at time \(t\),
  • \(\boldsymbol{x}_{i,j,t} = (x_{i,j,t,1}, \ldots, x_{i,j,t,p})^\top\) is the vector of covariates,
  • \(\boldsymbol{\beta}_{j,t} = (\beta_{j,t,1}, \ldots, \beta_{j,t,p})^\top\) are the time-varying group-specific regression coefficients,
  • \(\sigma_{j,t}^2\) is the time-varying group-specific variance.

Dynamic Priors

  • Time evolution of group-specific coefficients: \[ \beta_{k,j,t} \mid \beta_{k,j,t-1}, \lambda_{t} \overset{\text{i.i.d.}}{\sim} \textsf{Laplace}(0, \lambda_{t}^{-1}), \] where \(\lambda_t\) is the time-varying Lasso shrinkage parameter.

  • Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]

Global Priors

  • Time evolution of the Lasso shrinkage parameter: \[ \lambda_t \sim \textsf{Gamma}(a_\lambda, b_\lambda). \]

  • Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]

  • Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]

Model Parameters

The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \lambda_t, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]

Dynamic Ridge Hierarchical Linear Regression Model (Dynamic RHLRM)

Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:

  • \(y_{i,j,t}\) is the response variable for observation \(i\) in group \(j\) at time \(t\),
  • \(\boldsymbol{x}_{i,j,t} = (x_{i,j,t,1}, \ldots, x_{i,j,t,p})^\top\) is the vector of covariates,
  • \(\boldsymbol{\beta}_{j,t} = (\beta_{j,t,1}, \ldots, \beta_{j,t,p})^\top\) are the time-varying group-specific regression coefficients,
  • \(\sigma_{j,t}^2\) is the time-varying group-specific variance.

Dynamic Priors

  • Time evolution of group-specific coefficients: \[ \boldsymbol{\beta}_{j,t} \mid \boldsymbol{\beta}_{j,t-1}, \tau_t^2 \sim \textsf{N}_p(\boldsymbol{\beta}_{j,t-1}, \tau_t^2 \mathbf{I}), \] where \(\tau_t^2\) is the time-varying Ridge regularization parameter.

  • Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]

Global Priors

  • Time evolution of the Ridge regularization parameter: \[ \tau_t^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]

  • Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]

  • Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]

Model Parameters

The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \tau_t^2, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]

Dynamic Elastic Net Hierarchical Linear Regression Model (Dynamic ENHLRM)

Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:

  • \(y_{i,j,t}\) is the response variable for observation \(i\) in group \(j\) at time \(t\),
  • \(\boldsymbol{x}_{i,j,t} = (x_{i,j,t,1}, \ldots, x_{i,j,t,p})^\top\) is the vector of covariates,
  • \(\boldsymbol{\beta}_{j,t} = (\beta_{j,t,1}, \ldots, \beta_{j,t,p})^\top\) are the time-varying group-specific regression coefficients,
  • \(\sigma_{j,t}^2\) is the time-varying group-specific variance.

Dynamic Priors

  • Time evolution of group-specific coefficients: \[ \boldsymbol{\beta}_{j,t} \mid \boldsymbol{\beta}_{j,t-1}, \lambda_t, \tau_t^2 \sim \textsf{N}_p(\boldsymbol{\beta}_{j,t-1}, \tau_t^2 / \lambda_t \cdot \mathbf{I}), \] where \(\lambda_t\) controls the Lasso penalty (promoting sparsity) and \(\tau_t^2\) controls the Ridge penalty (promoting shrinkage).

  • Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]

Global Priors

  • Time evolution of Lasso and Ridge regularization parameters: \[ \lambda_t \sim \textsf{Gamma}(a_\lambda, b_\lambda), \quad \tau_t^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]

  • Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]

  • Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]

Model Parameters

The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \lambda_t, \tau_t^2, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]

Clustering modeling (project 2)

Once again, the goal of this project is twofold. First, to extend the hierarchical model with clustering factors from the original article by incorporating Lasso regression, Ridge regression, and Elastic Net regression into a static framework. Second, to explore different priors for performing clustering processes.

The application to the ICFES dataset remains highly interesting, as it allows us to test substantive hypotheses about a large number of covariates while segmenting grouping factors (e.g., departments). Unlike the previous project, here we would focus on a single cohort (e.g., the most recent one). As in the other case, it is essential to theoretically compare the models, demonstrating why they are effective for handling a large number of covariates. This includes developing and implementing the Gibbs sampler (which may include Metropolis-Hastings steps) and empirically comparing the models through application, using test statistics, information criteria, and cross-validation. Additionally, in the application, we can analyze the assessment of covariates and test substantive hypotheses about how these covariates influence the test outcomes.

Clustering Hierarchical Normal Linear Regression Model (CHLRM)

Model Specification:

The Clustering Hierarchical Normal Linear Regression Model (CHLRM) extends the HLRM by introducing group-level clustering. Groups are assigned to clusters, and within each cluster, regression coefficients and variances share common values. Let \(\gamma_j\) be the cluster assignment for group \(j\), where \(\gamma_j \in \{1, \ldots, K\}\) and \(K\) is the number of clusters. The model is specified as: \[ y_{i,j} \mid \boldsymbol{x}_{i,j}, \boldsymbol{\beta}_{\gamma_j}, \sigma_{\gamma_j}^2 \overset{\text{ind}}{\sim} \textsf{Normal}(\boldsymbol{x}_{i,j}^\top \boldsymbol{\beta}_{\gamma_j}, \sigma_{\gamma_j}^2), \quad i = 1, \ldots, n_j, \, j = 1, \ldots, m, \] where: - \(\boldsymbol{x}_{i,j} = (x_{i,j,1}, \ldots, x_{i,j,p})^\top\) is the vector of covariates for observation \(i\) in group \(j\), - \(\boldsymbol{\beta}_{\gamma_j} = (\beta_{1,\gamma_j}, \ldots, \beta_{p,\gamma_j})^\top\) are the regression coefficients for cluster \(\gamma_j\), - \(\sigma_{\gamma_j}^2\) is the variance for cluster \(\gamma_j\), - \(\gamma_j\) determines the cluster membership of group \(j\).

Prior Specification:

\[ \begin{align*} \boldsymbol{\beta}_k \mid \boldsymbol{\beta}, \boldsymbol{\Sigma} &\overset{\text{i.i.d.}}{\sim} \textsf{Normal}_p(\boldsymbol{\beta}, \boldsymbol{\Sigma}), \\ \sigma_k^2 \mid \nu, \sigma^2 &\overset{\text{i.i.d.}}{\sim} \textsf{Inverse-Gamma}(\nu / 2, \nu \sigma^2 / 2), \\ \gamma_j \mid \boldsymbol{\omega} &\overset{\text{i.i.d.}}{\sim} \textsf{Categorical}(\boldsymbol{\omega}), \\ \boldsymbol{\beta} &\sim \textsf{Normal}_p(\boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0), \\ \boldsymbol{\Sigma} &\sim \textsf{Inverse-Wishart}(n_0, \mathbf{S}_0^{-1}), \\ \boldsymbol{\omega} &\sim \textsf{Dirichlet}(\alpha/K, \ldots, \alpha/K), \\ \nu &\propto e^{-\lambda_0 \nu}, \quad \nu \in \{1, 2, \dots\}, \\ \sigma^2 &\sim \textsf{Gamma}(a_0, b_0), \\ \alpha &\sim \textsf{Gamma}(c_0, d_0). \end{align*} \]

Model Parameters:

The unknown parameters in the model are: \[ \Theta = (\boldsymbol{\beta}_1, \ldots, \boldsymbol{\beta}_K, \sigma_1^2, \ldots, \sigma_K^2, \gamma_1, \ldots, \gamma_m, \boldsymbol{\beta}, \boldsymbol{\Sigma}, \boldsymbol{\omega}, \nu, \sigma^2, \alpha). \]

Sparse Clustering Hierarchical Normal Linear Regression Model (SCHLRM)

The Sparse Clustering Hierarchical Normal Linear Regression Model (SCHLRM) extends the CHLRM by incorporating Lasso regularization to promote sparsity in the cluster-specific regression coefficients \(\boldsymbol{\beta}_k\). This enables the identification of the most relevant predictors across clusters while maintaining group-level clustering.

To induce sparsity, a Laplace (double-exponential) prior is placed on each component of the cluster-specific regression coefficients \(\beta_{k,j}\), where \(k\) indexes the clusters, and \(j\) represents the predictors. The updated prior is:

\[ \beta_{k,j} \mid \lambda \overset{\text{i.i.d.}}{\sim} \textsf{Laplace}(0, \lambda^{-1}), \quad k = 1, \ldots, K, \, j = 1, \ldots, p. \]

To adaptively control the degree of sparsity, the regularization parameter \(\lambda\) follows a Gamma prior:

\[ \lambda \sim \textsf{Gamma}(a_{\lambda}, b_{\lambda}). \]

Ridge Clustering Hierarchical Normal Linear Regression Model (RCHLRM)

The Ridge Clustering Hierarchical Normal Linear Regression Model (RCHLRM) extends the CHLRM by incorporating Ridge regularization to control the magnitude of the cluster-specific regression coefficients \(\boldsymbol{\beta}_k\). This ensures stability and reduces overfitting while maintaining group-level clustering.

Each component of the cluster-specific regression coefficients \(\beta_{k,j}\), where \(k\) indexes the clusters and \(j\) represents the predictors, follows a Normal prior centered at zero with variance \(\tau^2\):

\[ \beta_{k,j} \mid \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2), \quad k = 1, \ldots, K, \, j = 1, \ldots, p. \]

To adaptively control the degree of shrinkage, the regularization parameter \(\tau^2\) follows an Inverse-Gamma prior:

\[ \tau^2 \sim \textsf{Inverse-Gamma}(a_{\tau}, b_{\tau}). \]

Elastic Net Clustering Hierarchical Normal Linear Regression Model (ENCHLRM)

The Elastic Net Clustering Hierarchical Normal Linear Regression Model (ENCHLRM) extends the CHLRM by incorporating Elastic Net regularization, which combines the strengths of both Lasso and Ridge regression. This approach promotes sparsity by forcing some cluster-specific regression coefficients \(\boldsymbol{\beta}_k\) to zero (like Lasso), while applying continuous shrinkage (like Ridge) to stabilize the remaining coefficients.

Each component of the cluster-specific regression coefficients \(\beta_{k,j}\), where \(k\) indexes the clusters and \(j\) represents the predictors, follows a Normal prior with variance inversely proportional to \(\lambda\), balancing sparsity and shrinkage:

\[ \beta_{k,j} \mid \lambda, \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2 / \lambda), \quad k = 1, \ldots, K, \, j = 1, \ldots, p. \]

To adaptively control the Elastic Net penalties, \(\lambda\) and \(\tau^2\) are assigned the following priors:

\[ \lambda \sim \textsf{Gamma}(a_{\lambda}, b_{\lambda}), \quad \tau^2 \sim \textsf{Inverse-Gamma}(a_{\tau}, b_{\tau}). \]

Clustering Priors for Cluster Assignments

Adapt the models considering the following clustering priors:

Multinomial-Dirichlet Prior

  • Description: Cluster assignments \(\gamma_j\) are drawn from a categorical distribution with probabilities \(\boldsymbol{\omega}\), where \(\boldsymbol{\omega}\) has a Dirichlet prior.
  • Model: \[ \gamma_j \mid \boldsymbol{\omega} \sim \textsf{Categorical}(\boldsymbol{\omega}), \quad \boldsymbol{\omega} \sim \textsf{Dirichlet}(\alpha/K, \ldots, \alpha/K). \]
  • Key features: Simple, symmetric clustering where the number of clusters \(K\) is fixed.

Dirichlet Process (DP)

  • Description: Nonparametric prior allowing an infinite number of clusters, where cluster probabilities \(\boldsymbol{\omega}\) follow a stick-breaking process.
  • Model: \[ \gamma_j \sim \sum_{k=1}^\infty \omega_k \delta_k, \quad \omega_k = v_k \prod_{h<k}(1 - v_h), \quad v_k \sim \textsf{Beta}(1, \alpha). \]
  • Key features: Flexibility in determining the number of clusters based on the data.

Pitman-Yor Process (PY)

  • Description: Generalization of the DP allowing power-law behavior in cluster sizes.
  • Model: \[ \gamma_j \sim \sum_{k=1}^\infty \omega_k \delta_k, \quad \omega_k = v_k \prod_{h<k}(1 - v_h), \quad v_k \sim \textsf{Beta}(1 - a, b + ka), \] where \(0 \leq a < 1\), \(b > -a\).
  • Key features: Promotes large clusters with a few small ones (heavy-tailed cluster size distribution).

Gnedin Prior

  • Description: Prior for partitions with a fixed number of groups but allows flexibility in group size distributions.
  • Model: \[ \Pr(\gamma_j = k) \propto \begin{cases} \alpha_k, & \text{if } k \text{ is a new cluster}, \\ |C_k| - \theta, & \text{if } k \text{ is an existing cluster}, \end{cases} \] where \(\theta\) controls the discount factor, and \(\alpha_k\) governs new cluster probabilities.
  • Key features: Balances between new and existing clusters.

Markov Random Field (MRF) Prior

  • Description: Incorporates spatial or network dependence into cluster assignments using a neighborhood structure.
  • Model: \[ p(\gamma_j \mid \boldsymbol{\gamma}_{-j}) \propto \exp\left(\sum_{k \in \text{neighbors}(j)} \eta \mathbb{I}(\gamma_j = \gamma_k)\right), \] where \(\eta > 0\) promotes similarity among neighboring groups.
  • Key features: Spatially or graph-structured clustering.

Nested Chinese Restaurant Process (nCRP)

  • Description: Hierarchical clustering where groups are nested in a tree structure with infinite branching.
  • Model: \[ \gamma_j \sim \text{nCRP}, \quad \Pr(\gamma_j) \propto \prod_{\text{nodes } n} \frac{\alpha + m_n}{\alpha + m_{\text{parent}(n)}}, \] where \(m_n\) is the number of groups assigned to node \(n\).
  • Key features: Clustering within a tree hierarchy.

Hierarchical Dirichlet Process (HDP)

  • Description: Extends the DP to model clusters within clusters (e.g., communities and super-communities).
  • Model: \[ \gamma_j \sim \textsf{Categorical}(\boldsymbol{\pi}_{\text{community}}), \quad \boldsymbol{\pi}_{\text{community}} \sim \textsf{DP}(\alpha, G_0), \quad G_0 \sim \textsf{DP}(\beta, H). \]
  • Key features: Flexible multi-level clustering.