The objective of this project is twofold. First, to adapt the hierarchical model from the original article by incorporating Lasso regression, Ridge regression, and Elastic Net regression into a static framework. Second, to develop dynamic adaptations of each of these models. The motivation for this work arises from the ICFES dataset, which includes grouping factors (e.g., departments), features a large number of covariates (hence the need for regularization), and is also indexed over time.
It is essential to theoretically compare the models, demonstrating why they are effective for handling a large number of covariates. This involves developing and implementing the Gibbs sampler (which may include Metropolis-Hastings steps) and empirically comparing the models through application, using test statistics, information criteria, and cross-validation. Additionally, in the application, we can analyze the evaluation of covariates over time and test substantive hypotheses about how these covariates influence the test outcome.
This model is fully developed in Sosa, J., & Aristizabal, J. P. (2022). Some Developments in Bayesian Hierarchical Linear Regression Modeling. Revista Colombiana de EstadÃstica, 45(2), 231–255.
Model Specification
\[ y_{i,j} \mid \boldsymbol{x}_{i,j}, \boldsymbol{\beta}_j, \sigma_j^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j}^\top \boldsymbol{\beta}_j, \sigma_j^2), \quad i = 1, \ldots, n_j, \quad j = 1, \ldots, m, \] where:
Prior Specification
\[ \begin{align*} \boldsymbol{\beta}_j \mid \boldsymbol{\beta}, \boldsymbol{\Sigma} &\overset{\text{i.i.d.}}{\sim} \textsf{N}_p(\boldsymbol{\beta}, \boldsymbol{\Sigma}), \\ \sigma_j^2 \mid \nu, \sigma^2 &\overset{\text{i.i.d.}}{\sim} \textsf{IG}(\nu / 2, \nu \sigma^2 / 2), \\ \boldsymbol{\beta} &\sim \textsf{N}_p(\boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0), \\ \boldsymbol{\Sigma} &\sim \textsf{IW}(n_0, \mathbf{S}_0^{-1}), \\ p(\nu) &\propto e^{-\lambda_0 \nu}, \\ \sigma^2 &\sim \textsf{G}(a_0, b_0). \end{align*} \]
Model Parameters
The unknown parameters in the model are: \[ \Theta = (\boldsymbol{\beta}_1, \ldots, \boldsymbol{\beta}_m, \sigma_1^2, \ldots, \sigma_m^2, \boldsymbol{\beta}, \boldsymbol{\Sigma}, \sigma^2, \nu). \]
Note: \(\nu\) is restricted to the set \(\{1, 2, \dots\}\).
The Sparse Hierarchical Linear Regression Model (SHLRM) extends the HLRM by incorporating Lasso regularization, which promotes sparsity in the group-specific regression coefficients. Sparsity allows the model to identify and retain only the most relevant predictors, effectively reducing noise and enhancing interpretability.
Each coefficient \(\beta_{k,j}\) follows a Laplace prior: \[ \beta_{k,j} \mid \lambda \overset{\text{i.i.d.}}{\sim} \textsf{Laplace}(0, \lambda^{-1}), \quad k = 1, \ldots, p, \, j = 1, \ldots, m. \] To adaptively control the degree of sparsity, the Lasso shrinkage parameter \(\lambda\) is assigned a Gamma prior: \[ \lambda \sim \textsf{Gamma}(a_\lambda, b_\lambda). \]
The Ridge Hierarchical Linear Regression Model (RHLRM) extends the HLRM by incorporating Ridge regularization, which shrinks group-specific regression coefficients towards zero to improve stability and prevent overfitting.
Each coefficient \(\beta_{k,j}\) follows a Normal prior with mean zero and variance controlled by a regularization parameter \(\tau^2\): \[ \beta_{k,j} \mid \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2), \quad k = 1, \ldots, p, \, j = 1, \ldots, m. \] The regularization parameter \(\tau^2\) is assigned an Inverse-Gamma prior: \[ \tau^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]
Unlike the Lasso extension in SHLRM, which promotes sparsity by forcing some coefficients to zero, RHLRM applies continuous shrinkage, ensuring all coefficients remain nonzero. This makes it ideal when all predictors are expected to contribute to the response.
The Elastic Net Hierarchical Linear Regression Model (ENHLRM) extends the HLRM by combining Lasso and Ridge regularization. It promotes sparsity by forcing some coefficients to zero (like Lasso) while applying continuous shrinkage (like Ridge) to improve stability.
Each group-specific regression coefficient \(\beta_{k,j}\) follows a Normal prior: \[ \beta_{k,j} \mid \lambda, \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2 / \lambda), \quad k = 1, \ldots, p, \, j = 1, \ldots, m, \] where \(\lambda\) and \(\tau^2\) balance the Lasso and Ridge penalties.
The regularization parameters \(\lambda\) and \(\tau^2\) have the following priors: \[ \lambda \sim \textsf{Gamma}(a_{\lambda}, b_{\lambda}), \quad \tau^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]
Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:
Dynamic Priors
Time evolution of group-specific coefficients: \[ \boldsymbol{\beta}_{j,t} \mid \boldsymbol{\beta}_{j,t-1}, \boldsymbol{\Sigma} \sim \textsf{N}_p(\boldsymbol{\beta}_{j,t-1}, \boldsymbol{\Sigma}), \] where \(\boldsymbol{\Sigma}\) is the shared covariance matrix.
Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]
Global Priors
Global regression coefficients: \[ \boldsymbol{\beta}_{t} \sim \textsf{N}_p(\boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0). \]
Covariance matrix for group-specific coefficients: \[ \boldsymbol{\Sigma} \sim \textsf{IW}(n_0, \mathbf{S}_0^{-1}). \]
Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]
Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]
Model Parameters
The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \boldsymbol{\beta}_t, \boldsymbol{\Sigma}, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]
Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:
Dynamic Priors
Time evolution of group-specific coefficients: \[ \beta_{k,j,t} \mid \beta_{k,j,t-1}, \lambda_{t} \overset{\text{i.i.d.}}{\sim} \textsf{Laplace}(0, \lambda_{t}^{-1}), \] where \(\lambda_t\) is the time-varying Lasso shrinkage parameter.
Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]
Global Priors
Time evolution of the Lasso shrinkage parameter: \[ \lambda_t \sim \textsf{Gamma}(a_\lambda, b_\lambda). \]
Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]
Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]
Model Parameters
The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \lambda_t, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]
Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:
Dynamic Priors
Time evolution of group-specific coefficients: \[ \boldsymbol{\beta}_{j,t} \mid \boldsymbol{\beta}_{j,t-1}, \tau_t^2 \sim \textsf{N}_p(\boldsymbol{\beta}_{j,t-1}, \tau_t^2 \mathbf{I}), \] where \(\tau_t^2\) is the time-varying Ridge regularization parameter.
Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]
Global Priors
Time evolution of the Ridge regularization parameter: \[ \tau_t^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]
Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]
Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]
Model Parameters
The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \tau_t^2, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]
Model Specification \[ y_{i,j,t} \mid \boldsymbol{x}_{i,j,t}, \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2 \overset{\text{ind}}{\sim} \textsf{N}(\boldsymbol{x}_{i,j,t}^\top \boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2), \] where:
Dynamic Priors
Time evolution of group-specific coefficients: \[ \boldsymbol{\beta}_{j,t} \mid \boldsymbol{\beta}_{j,t-1}, \lambda_t, \tau_t^2 \sim \textsf{N}_p(\boldsymbol{\beta}_{j,t-1}, \tau_t^2 / \lambda_t \cdot \mathbf{I}), \] where \(\lambda_t\) controls the Lasso penalty (promoting sparsity) and \(\tau_t^2\) controls the Ridge penalty (promoting shrinkage).
Time evolution of group-specific variances: \[ \sigma_{j,t}^2 \mid \sigma_{j,t-1}^2, \nu, \sigma^2 \sim \textsf{IG}(\nu / 2, \nu \sigma_{j,t-1}^2 / 2). \]
Global Priors
Time evolution of Lasso and Ridge regularization parameters: \[ \lambda_t \sim \textsf{Gamma}(a_\lambda, b_\lambda), \quad \tau_t^2 \sim \textsf{Inverse-Gamma}(a_\tau, b_\tau). \]
Global variance parameter: \[ \sigma^2 \sim \textsf{G}(a_0, b_0). \]
Degrees of freedom for variance dynamics: \[ p(\nu) \propto e^{-\lambda_0 \nu}. \]
The unknown parameters in the model are: \[ \Theta = \{\boldsymbol{\beta}_{j,t}, \sigma_{j,t}^2, \lambda_t, \tau_t^2, \sigma^2, \nu \mid j = 1, \ldots, m, t = 1, \ldots, T\}. \]
Once again, the goal of this project is twofold. First, to extend the hierarchical model with clustering factors from the original article by incorporating Lasso regression, Ridge regression, and Elastic Net regression into a static framework. Second, to explore different priors for performing clustering processes.
The application to the ICFES dataset remains highly interesting, as it allows us to test substantive hypotheses about a large number of covariates while segmenting grouping factors (e.g., departments). Unlike the previous project, here we would focus on a single cohort (e.g., the most recent one). As in the other case, it is essential to theoretically compare the models, demonstrating why they are effective for handling a large number of covariates. This includes developing and implementing the Gibbs sampler (which may include Metropolis-Hastings steps) and empirically comparing the models through application, using test statistics, information criteria, and cross-validation. Additionally, in the application, we can analyze the assessment of covariates and test substantive hypotheses about how these covariates influence the test outcomes.
Model Specification:
The Clustering Hierarchical Normal Linear Regression Model (CHLRM) extends the HLRM by introducing group-level clustering. Groups are assigned to clusters, and within each cluster, regression coefficients and variances share common values. Let \(\gamma_j\) be the cluster assignment for group \(j\), where \(\gamma_j \in \{1, \ldots, K\}\) and \(K\) is the number of clusters. The model is specified as: \[ y_{i,j} \mid \boldsymbol{x}_{i,j}, \boldsymbol{\beta}_{\gamma_j}, \sigma_{\gamma_j}^2 \overset{\text{ind}}{\sim} \textsf{Normal}(\boldsymbol{x}_{i,j}^\top \boldsymbol{\beta}_{\gamma_j}, \sigma_{\gamma_j}^2), \quad i = 1, \ldots, n_j, \, j = 1, \ldots, m, \] where: - \(\boldsymbol{x}_{i,j} = (x_{i,j,1}, \ldots, x_{i,j,p})^\top\) is the vector of covariates for observation \(i\) in group \(j\), - \(\boldsymbol{\beta}_{\gamma_j} = (\beta_{1,\gamma_j}, \ldots, \beta_{p,\gamma_j})^\top\) are the regression coefficients for cluster \(\gamma_j\), - \(\sigma_{\gamma_j}^2\) is the variance for cluster \(\gamma_j\), - \(\gamma_j\) determines the cluster membership of group \(j\).
Prior Specification:
\[ \begin{align*} \boldsymbol{\beta}_k \mid \boldsymbol{\beta}, \boldsymbol{\Sigma} &\overset{\text{i.i.d.}}{\sim} \textsf{Normal}_p(\boldsymbol{\beta}, \boldsymbol{\Sigma}), \\ \sigma_k^2 \mid \nu, \sigma^2 &\overset{\text{i.i.d.}}{\sim} \textsf{Inverse-Gamma}(\nu / 2, \nu \sigma^2 / 2), \\ \gamma_j \mid \boldsymbol{\omega} &\overset{\text{i.i.d.}}{\sim} \textsf{Categorical}(\boldsymbol{\omega}), \\ \boldsymbol{\beta} &\sim \textsf{Normal}_p(\boldsymbol{\mu}_0, \boldsymbol{\Lambda}_0), \\ \boldsymbol{\Sigma} &\sim \textsf{Inverse-Wishart}(n_0, \mathbf{S}_0^{-1}), \\ \boldsymbol{\omega} &\sim \textsf{Dirichlet}(\alpha/K, \ldots, \alpha/K), \\ \nu &\propto e^{-\lambda_0 \nu}, \quad \nu \in \{1, 2, \dots\}, \\ \sigma^2 &\sim \textsf{Gamma}(a_0, b_0), \\ \alpha &\sim \textsf{Gamma}(c_0, d_0). \end{align*} \]
Model Parameters:
The unknown parameters in the model are: \[ \Theta = (\boldsymbol{\beta}_1, \ldots, \boldsymbol{\beta}_K, \sigma_1^2, \ldots, \sigma_K^2, \gamma_1, \ldots, \gamma_m, \boldsymbol{\beta}, \boldsymbol{\Sigma}, \boldsymbol{\omega}, \nu, \sigma^2, \alpha). \]
The Sparse Clustering Hierarchical Normal Linear Regression Model (SCHLRM) extends the CHLRM by incorporating Lasso regularization to promote sparsity in the cluster-specific regression coefficients \(\boldsymbol{\beta}_k\). This enables the identification of the most relevant predictors across clusters while maintaining group-level clustering.
To induce sparsity, a Laplace (double-exponential) prior is placed on each component of the cluster-specific regression coefficients \(\beta_{k,j}\), where \(k\) indexes the clusters, and \(j\) represents the predictors. The updated prior is:
\[ \beta_{k,j} \mid \lambda \overset{\text{i.i.d.}}{\sim} \textsf{Laplace}(0, \lambda^{-1}), \quad k = 1, \ldots, K, \, j = 1, \ldots, p. \]
To adaptively control the degree of sparsity, the regularization parameter \(\lambda\) follows a Gamma prior:
\[ \lambda \sim \textsf{Gamma}(a_{\lambda}, b_{\lambda}). \]
The Ridge Clustering Hierarchical Normal Linear Regression Model (RCHLRM) extends the CHLRM by incorporating Ridge regularization to control the magnitude of the cluster-specific regression coefficients \(\boldsymbol{\beta}_k\). This ensures stability and reduces overfitting while maintaining group-level clustering.
Each component of the cluster-specific regression coefficients \(\beta_{k,j}\), where \(k\) indexes the clusters and \(j\) represents the predictors, follows a Normal prior centered at zero with variance \(\tau^2\):
\[ \beta_{k,j} \mid \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2), \quad k = 1, \ldots, K, \, j = 1, \ldots, p. \]
To adaptively control the degree of shrinkage, the regularization parameter \(\tau^2\) follows an Inverse-Gamma prior:
\[ \tau^2 \sim \textsf{Inverse-Gamma}(a_{\tau}, b_{\tau}). \]
The Elastic Net Clustering Hierarchical Normal Linear Regression Model (ENCHLRM) extends the CHLRM by incorporating Elastic Net regularization, which combines the strengths of both Lasso and Ridge regression. This approach promotes sparsity by forcing some cluster-specific regression coefficients \(\boldsymbol{\beta}_k\) to zero (like Lasso), while applying continuous shrinkage (like Ridge) to stabilize the remaining coefficients.
Each component of the cluster-specific regression coefficients \(\beta_{k,j}\), where \(k\) indexes the clusters and \(j\) represents the predictors, follows a Normal prior with variance inversely proportional to \(\lambda\), balancing sparsity and shrinkage:
\[ \beta_{k,j} \mid \lambda, \tau^2 \overset{\text{i.i.d.}}{\sim} \textsf{Normal}(0, \tau^2 / \lambda), \quad k = 1, \ldots, K, \, j = 1, \ldots, p. \]
To adaptively control the Elastic Net penalties, \(\lambda\) and \(\tau^2\) are assigned the following priors:
\[ \lambda \sim \textsf{Gamma}(a_{\lambda}, b_{\lambda}), \quad \tau^2 \sim \textsf{Inverse-Gamma}(a_{\tau}, b_{\tau}). \]
Adapt the models considering the following clustering priors: