Overcomplete factor analysis in MCMSEM

Author

Michel Nivard

Published

September 25, 2024

This demo explores the integration of overcomplete Independent Component Analysis (ICA) and Structural Equation Modeling (SEM) using the MCMSEM package in R. Traditional methods like PCA, ICA, and SEM typically have to assume there are fewer (or equal) latent variables than observed variables, but overcomplete factor models allow for more latent factors then observed variables, offering richer insights into complex data. By bridging ICA’s focus on statistical independence with SEM’s flexible latent modeling, we introduce an approach to fit overcomplete SEM models, and demonstrate its capabilities through simulated data.

Principal Component Analysis (PCA)

PCA is a method used to reduce the dimensionality of data by transforming the original variables into a new set of uncorrelated variables, called principal components. These components are linear combinations of the original variables and are ordered by the amount of variance they explain. PCA is often used for data exploration and to simplify large data sets, but it assumes that the components are orthogonal and doesn’t provide insight into the latent structure of the data. The components are also normally distributed and not sparse, the compents absolutely maximize variance explained and in doing so mix dependent signals.

Independent Component Analysis (ICA)

ICA, unlike PCA, seeks to find statistically independent rather than uncorrelated components. This makes ICA particularly suited to situations where the observed data are mixtures of non-Gaussian signals, such as in neuroscientific data (e.g., fMRI or EEG). A classic example used to illustrate the unqiue capabilities of ICA is the cocktail party problem, where multiple people are speaking simultaneously in a room, and a set of microphones records the mixed sounds. Each microphone captures a combination of all voices, but ICA can separate these mixed signals into the original, independent sources—each person’s voice. This ability to separate independent signals from a mixture makes ICA ideal for applications involving complex, overlapping signals, like neural data or behavioral processes. ICA decomposes data into independent sources, allowing for a more nuanced understanding of underlying factors in highly complex systems.

Factor Analysis (FA) and Structural Equation Modeling (SEM)

Factor Analysis (FA) is a statistical method used to model the underlying structure in a set of observed variables by assuming that they are influenced by a smaller number of unobservable factors. FA focuses on identifying latent variables that explain the covariances between observed variables. Structural Equation Modeling (SEM) extends FA by incorporating both measurement models (like FA) and structural models, which specify the causal relationships between latent and observed variables. SEM allows for more complex models and hypothesis testing but often relies on linear relationships and assumes normality in the data.

The key differences between PCA/ICA on one hand and SEM on the other is that in SEM the researcher can shape and constrain which parameters are, and are not, estimated. In this MCMSEM tutorial we’ll bring some of the advantages of ICA and SEM into one tool, and we’ll introduce a very cool extensions of ICA, which allow for more latent than observed variables into SEM, using MCMSEM.

Over-complete Factor Models

One common assumption in traditional methods like PCA, FA, SEM, and ICA is that the number of latent variables (or factors) is smaller than the number of observed variables. This constraint allows these models to simplify and reduce dimensional effectively. However, in many real-world data sets—especially in fields like neuroscience and psychology—the underlying latent structure may be more complex, involving more latent variables than observed measurements. For example, brain activity captured by fMRI or EEG may involve numerous independent neural processes, yet we only have access to a limited number of sensors or voxel measurements. Similarly in genome wide association studies (GWAS) the genetic effects on a small number of correlated traits may reflects hundreds of unobserved biological pathways and systems.

To address this, researchers have been exploring overcomplete factor models, particularly in the context of ICA, where the number of latent variables exceeds the number of observed variables. This overcomplete approach allows for richer and more flexible models that can capture more complex underlying dynamics, even when the available observed data is limited. In overcomplete solutions, the challenge shifts from merely reducing dimensional to discovering a larger set of latent variables that can explain intricate and sparse patterns in the data.

I turns out MCMSEM is capable of fitting overcomplete factor models, like the one below, much like modern extensions of ICA do, modeling higher order moments.

**Figure 1:** Over complete factor solutions where 8 observed variables load on 10 latent variables.

We wont go into the details of other overcomplete models, but I don’t want to pretend we have developed the only one, below is more reading on Over complete ICA which is widely applied in machine learning and causal learning circles.

Note

Over-complete ICA is dauntingly technical for a psychologists, I really struggled trough these papers, nevertheless they reflect a field actively working on the overcomplete problem.

Pati, R., Pujari, A. K., Gahan, P., & Kumar, V. (2021). Independent component analysis: a review with emphasis on commonly used algorithms and contrast function. Computación y Sistemas, 25(1), 97-115.

General technical review of ICA techniques.

Ding, C., Gong, M., Zhang, K., & Tao, D. (2019). Likelihood-free overcomplete ICA and applications in causal discovery. Advances in neural information processing systems, 32.

This paper discusses the development of a Likelihood-Free Overcomplete ICA (LFOICA) algorithm that improves causal discovery by avoiding strong parametric assumptions and computationally expensive methods, like Expectation Maximization (EM). The new approach enhances the practicality of causal discovery tasks by estimating the mixing matrix directly via back-propagation, demonstrating its effectiveness on both synthetic and real data.

Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d’Aspremont, David Sontag. Overcomplete Independent Component Analysis via SDP. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.

This paper introduces a new overcomplete ICA algorithm that handles cases where the number of latent sources exceeds the number of observed variables, avoiding assumptions about sparsity and offering computational efficiency. The algorithm uses Hessians of the cumulant generating function and a novel semi-definite programming (SDP) relaxation, demonstrating strong theoretical properties and practical performance on synthetic data and the CIFAR-10 dataset.

Emperical illustration

In the following we’ll simulate 10 latent variables, which effect 8 observed variables, then well estimate that model without pre-specifying that model. Now this is not an academic article so we take some liberties for expedience. First we don’t let the optimizer run for hours but sort of stop slightly early. MCMSEM is build on torch, which is developed for machine learning. Unlike in SEM models when using ML models the weights (parameters) aren’t the quantity of interest and so MCMSEM, like torch, doesn’t optimize until the model converges but for as long as you’ll let it train, you the use have to make sure you trust the minimum/result.

Installation of Required Packages

Before starting the analysis, we need to install and load the necessary packages. The MCMSEM package is not available on CRAN, so we will install it directly from GitHub using the devtools package.

# Install devtools if not already installed
if (!require(devtools)) {
  install.packages("devtools")
  library(devtools)
}

Loading required package: devtools

Loading required package: usethis

# Install MCMSEM from GitHub
if (!require(MCMSEM)) {
install_github("https://github.com/MichelNivard/MCMSEM")
library(MCMSEM)
}

Loading required package: MCMSEM

# Load additional required packages
library(corrplot)

corrplot 0.92 loaded

Data Generation

To ensure reproducibility, we set a seed. We then generate a source matrix with skewed variables and introduce sparseness into these variables increasing their Kurtosis.

# Set seed for reproducibility
set.seed(123)

# Define the number of observations and variables
n <- 30000
source <- matrix(NA, n, 10)

# Generate skewed variables
for(i in 1:10){
  source[,i] <- ((3 %% 2) - 0.5) * -2 * rgamma(n, shape = i, rate = 2 + i/5)
}

# Introduce sparseness into the variables
for(i in 1:10){
  source[sample(1:n, round(runif(1, 1000, 18000))), i] <- 0
}

Specification of the Loading Matrix

We create a tractable loading matrix that maps 10 latent variables to 8 observed variables. This matrix defines the relationships between latent and observed variables.

# Initialize loading matrix with zeros
A <- matrix(0, 8, 10)

# Define specific loadings
A[1, c(1, 2)] <- 0.6
A[2, c(1, 3)] <- 0.6
A[3, c(2, 3)] <- 0.6
A[4, c(2, 5)] <- 0.6
A[5, c(3, 4)] <- 0.6
A[6, c(3, 7)] <- 0.6
A[7, c(4, 5)] <- 0.6
A[8, c(4, 9)] <- 0.6
A[1, 10] <- 0.6
A[5, 10] <- 0.6
A[3, 8] <- 0.6
A[6, 8] <- 0.6
A[5, 6] <- 0.6
A[1, 6] <- 0.6
A[7, 2] <- 0.6

# Display the loading matrix
A

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]  0.6  0.6  0.0  0.0  0.0  0.6  0.0  0.0  0.0   0.6
[2,]  0.6  0.0  0.6  0.0  0.0  0.0  0.0  0.0  0.0   0.0
[3,]  0.0  0.6  0.6  0.0  0.0  0.0  0.0  0.6  0.0   0.0
[4,]  0.0  0.6  0.0  0.0  0.6  0.0  0.0  0.0  0.0   0.0
[5,]  0.0  0.0  0.6  0.6  0.0  0.6  0.0  0.0  0.0   0.6
[6,]  0.0  0.0  0.6  0.0  0.0  0.0  0.6  0.6  0.0   0.0
[7,]  0.0  0.6  0.0  0.6  0.6  0.0  0.0  0.0  0.0   0.0
[8,]  0.0  0.0  0.0  0.6  0.0  0.0  0.0  0.0  0.6   0.0

Scaling and Generating Observed Variables

We scale the source (latent) variables and generate the observed variables by multiplying the loading matrix with the source matrix.

# Scale the source/latent variables
source <- scale(source)

# Generate the observed variables
observed <- t(A %*% t(source))

Correlation Plot of Observed Variables

To visualize the relationships between observed variables, we create a correlation plot ordered by hierarchical clustering.

# Plot the correlation matrix
corrplot(cor(observed), order = "hclust")

Preparing Moments for MCMSEM

We prepare the summary statistics required for the MCMSEM analysis using the observed data.

# Prepare the moments for MCMSEM
simmdatasumm <- MCMdatasummary(as.data.frame(observed), scale_data = FALSE)

Specifying the MCMSEM 10-Factor Model

We specify an overcomplete model with 10 latent factors. This model allows for skewness and kurtosis in the latent variables but not in the observed variables. For details on setting up MCMSEM models see the GitHub Wiki.

We assume (among other things!) the following in this model:

The latent variables are skewed and/or have kurtosis
the latent variables are independent
there are no residual influences

The assumptions mirror ICA.

# Define the overcomplete 10-factor model
overcomplete.model <- MCMmodel(
  simmdatasumm, 
  n_latent = 10,
  causal_observed = FALSE,
  constrained_a = FALSE,
  var_latent = TRUE,
  skew_latent = TRUE,
  kurt_latent = TRUE,
  var_observed = FALSE,
  skew_observed = FALSE,
  kurt_observed = FALSE
)

Warning in MCMmodel(simmdatasumm, n_latent = 10, causal_observed = FALSE, :
It's unlikely you want to use use more latent factors than phenotypes present
in the data, unless you know what you are doing consider revising....

Setting Model Constraints

We impose constraints on the model parameters, such as setting residual variances to zero for observed variables and fixing latent variances to one. We have to fix the latent variances for scaling perpouses.

# No residual variances for the observed variables
coords_S <- list(rep(11:18), rep(11:18))
overcomplete.model <- MCMedit(overcomplete.model, pointer = "S", name = coords_S, value = 0)

# No kurtosis for the observed variables
for(i in 1:10){
  overcomplete.model <- MCMedit(overcomplete.model, pointer = "K", name = paste0("k", i), value = 0)
}

# Fix the variances of the latent variables to 1
coords_var <- list(c(1:10), c(1:10))
overcomplete.model <- MCMedit(overcomplete.model, pointer = "S", name = coords_var, value = 1)

Specifying Loadings and Parameter Boundaries

We specify a full set of loadings from the 10 latent variables to the 8 observed variables and adjust parameter boundaries to ensure model stability.

# Specify loadings
coords_A <- list(rep(11:18, 10), rep(1:10, each = 8))
new_names <- paste0("a", 1:80)
overcomplete.model <- MCMedit(overcomplete.model, pointer = "A", name = coords_A, value = new_names)

# Change boundaries on some parameters
overcomplete.model <- MCMedit(overcomplete.model, pointer = "bound", name = "a", value = c(-0.05, 1.2))
overcomplete.model <- MCMedit(overcomplete.model, pointer = "bound", name = "k", value = c(0, 25))

Setting Starting Values for Parameters

Proper starting values can aid in the convergence of the optimization algorithm. We set starting values for the loadings and skewness parameters.

# Change the starting values on some parameters
overcomplete.model <- MCMedit(overcomplete.model, pointer = "start", name = "a", value = 0)

for(i in 1:10){
  overcomplete.model <- MCMedit(overcomplete.model, pointer = "start", name = paste0("kl", i), value = runif(1, 3, 5))
  overcomplete.model <- MCMedit(overcomplete.model, pointer = "start", name = paste0("skl", i), value = runif(1, -1, 1))
}

Fitting the MCMSEM Model

We fit the specified overcomplete model to the prepared data summary using various optimization algorithms. We use 3 optimizer, or well 2 actually “rprops” and “lbfgs” but for “rprops” we change the learning rate.

# Fit the overcomplete model
fitted.model <- MCMfit(
  overcomplete.model,
  simmdatasumm,
  optimizers = c("rprop", "rprop", "lbfgs"),
  optim_iters = c(300, 300, 50),
  learning_rate = c(0.005, 0.01, 1),
  monitor_grads = TRUE,
  compute_se = TRUE,
  loss_type = 'smooth_l1',
  debug = FALSE
)

# Summarize the fitted model
sum.model <- summary(fitted.model)
sum.model

|--------------------------------------|
| MCM Result Summary (MCMSEM v0.26.1)  |
|--------------------------------------|
device         : cpu
N phenotypes   : 8
N latents      : 10
N observations : 30000
N parameters   : 100
SE type        : asymptotic

Fit statistics
loss  : 0.0176038835197687
chisq : 528.116505593061
BIC   : 1559.01177165749

Parameters summary
   label lhs edge rhs           est          se             p
1     a1  f1   =~  V1  0.1678715944 0.048147809  4.892236e-04
2     a2  f1   =~  V2 -0.0609780699 0.025913790  1.861699e-02
3     a3  f1   =~  V3  0.2696545124 0.041094221  5.314082e-11
4     a4  f1   =~  V4  0.7329896092 0.024108918 4.974757e-203
5     a5  f1   =~  V5 -0.0604888424 0.033842031  7.387479e-02
6     a6  f1   =~  V6 -0.0007124003 0.036803890  9.845566e-01
7     a7  f1   =~  V7  0.7083864808 0.032205720 3.167973e-107
8     a8  f1   =~  V8 -0.0306280944 0.024219824  2.060191e-01
9     a9  f2   =~  V1 -0.0189371482 0.023982558  4.297487e-01
10   a10  f2   =~  V2  0.6021347642 0.009226346  0.000000e+00
11   a11  f2   =~  V3  0.6057526469 0.021693107 1.374251e-171
12   a12  f2   =~  V4  0.0069870590 0.013993848  6.175715e-01
13   a13  f2   =~  V5  0.5530548692 0.019606708 4.740561e-175
14   a14  f2   =~  V6  0.6095092297 0.020448277 3.136002e-195
15   a15  f2   =~  V7 -0.0117967073 0.017373675  4.971385e-01
           last_gradient
1     -0.003036123001948
2   -0.00117389671504498
3   -0.00572217209264636
4   0.000284313224256039
5    0.00296841375529766
6   -0.00346106383949518
7   0.000492800027132034
8  -0.000649686902761459
9   -0.00551279308274388
10   -0.0020515862852335
11   -0.0136406663805246
12  -0.00300788786262274
13  0.000595672056078911
14  -0.00886264815926552
15   0.00198831246234477
... Print capped at 15 rows, use: as.data.frame([summary_object], estimates='parameters')

Variances summary
  label lhs edge rhs est se  p last_gradient
1    NA  NA   NA  NA  NA NA NA            NA
2    NA  NA   NA  NA  NA NA NA            NA

Skewness summary
   label edge  v1  v2  v3        est         se            p
1   skl1  ~~~  f1  f1  f1 -0.2462558 0.04887607 4.695090e-07
2   skl2  ~~~  f2  f2  f2 -0.9796571 0.04661056 4.497081e-98
3   skl3  ~~~  f3  f3  f3 -4.8307133 1.34476101 3.278393e-04
4   skl4  ~~~  f4  f4  f4 -0.2076529 0.03594272 7.589632e-09
5   skl5  ~~~  f5  f5  f5 -5.0106988 1.97902632 1.134460e-02
6   skl6  ~~~  f6  f6  f6 -0.3221173 0.03509216 4.343741e-20
7   skl7  ~~~  f7  f7  f7 -0.7627797 0.05984439 3.279362e-37
8   skl8  ~~~  f8  f8  f8 -1.4404614 0.61679763 1.952319e-02
9   skl9  ~~~  f9  f9  f9 -0.4237339 0.06323311 2.067995e-11
10 skl10  ~~~ f10 f10 f10 -0.5848176 0.03961539 2.560554e-49
           last_gradient
1   5.94980701862369e-05
2   0.000510916637722403
3  -5.67671986573259e-06
4   0.000450620835181326
5  -0.000949684996157885
6   0.000926083652302623
7  -8.42159261082998e-06
8   0.000790745951235294
9    0.00033377727959305
10 -8.83228203747422e-05

Kurtosis summary
   label edge  v1  v2  v3  v4       est          se             p
1    kl1 ~~~~  f1  f1  f1  f1  2.739680  0.16583231  2.600019e-61
2    kl2 ~~~~  f2  f2  f2  f2  4.632741  0.16553795 2.407235e-172
3    kl3 ~~~~  f3  f3  f3  f3 17.702730  5.79674292  2.258800e-03
4    kl4 ~~~~  f4  f4  f4  f4  1.945173  0.17690448  4.011779e-28
5    kl5 ~~~~  f5  f5  f5  f5 32.226040 17.48164177  6.526745e-02
6    kl6 ~~~~  f6  f6  f6  f6  2.754783  0.05823371  0.000000e+00
7    kl7 ~~~~  f7  f7  f7  f7  2.879166  0.14085579  7.294934e-93
8    kl8 ~~~~  f8  f8  f8  f8  9.958513  3.97755241  1.229096e-02
9    kl9 ~~~~  f9  f9  f9  f9  3.463953  0.29559177  1.022226e-31
10  kl10 ~~~~ f10 f10 f10 f10  3.867421  0.12334407 8.384988e-216
           last_gradient
1  -0.000820127606857568
2  -0.000409843632951379
3  -7.44726057746448e-05
4  -0.000462166499346495
5   0.000632951210718602
6  -9.94364672806114e-05
7     8.649976371089e-05
8  -1.10996561488719e-05
9   -0.00149082741700113
10 -0.000166902114870027

We can also plot the model

# Plot the fitted model with a threshold
plot(fitted.model, threshold = 0.1)

And finally we can compare the significant loadings fitted model to the simulated model. Its not perfect because random fluctuation in the simulated data, but its close!

# Extract and display significant parameter estimates
parameters_df <- as.data.frame(sum.model, estimates = 'parameters')
A_est <- matrix(parameters_df$est * (parameters_df$p < (0.05/80)), 8, 10)
round(A_est,3)

      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7] [,8]  [,9] [,10]
[1,] 0.168 0.000 0.430 0.000 0.503 0.886 0.000 0.45 0.000 0.000
[2,] 0.000 0.602 0.000 0.000 0.479 0.000 0.000 0.00 0.000 0.000
[3,] 0.270 0.606 0.411 0.000 0.000 0.102 0.657 0.00 0.000 0.000
[4,] 0.733 0.000 0.423 0.000 0.000 0.000 0.000 0.00 0.000 0.000
[5,] 0.000 0.553 0.000 0.000 0.000 0.831 0.000 0.00 0.000 0.633
[6,] 0.000 0.610 0.000 0.000 0.000 0.000 0.566 0.00 0.618 0.000
[7,] 0.708 0.000 0.420 0.000 0.000 0.000 0.000 0.00 0.000 0.606
[8,] 0.000 0.000 0.000 0.577 0.000 0.000 0.000 0.00 0.000 0.609

round(A,3)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]  0.6  0.6  0.0  0.0  0.0  0.6  0.0  0.0  0.0   0.6
[2,]  0.6  0.0  0.6  0.0  0.0  0.0  0.0  0.0  0.0   0.0
[3,]  0.0  0.6  0.6  0.0  0.0  0.0  0.0  0.6  0.0   0.0
[4,]  0.0  0.6  0.0  0.0  0.6  0.0  0.0  0.0  0.0   0.0
[5,]  0.0  0.0  0.6  0.6  0.0  0.6  0.0  0.0  0.0   0.6
[6,]  0.0  0.0  0.6  0.0  0.0  0.0  0.6  0.6  0.0   0.0
[7,]  0.0  0.6  0.0  0.6  0.6  0.0  0.0  0.0  0.0   0.0
[8,]  0.0  0.0  0.0  0.6  0.0  0.0  0.0  0.0  0.6   0.0

Competing Model Specification and Fitting

MCMSEM can go further though, we can specify other overcomplete models and compare! To evaluate the robustness of our overcomplete model, we specify and fit a competing model with fewer latent factors.This model has 4 latent variables but ALSO let the observed variabbles influence eachother causally. ITs obviosuly not the true model and so we expec it to do worse then the true model.

# Specify a competing model with 4 latent factors
competing_model <- MCMmodel(
  simmdatasumm,
  n_latent = 4,
  constrained_a = FALSE,
  skew_latent = TRUE,
  kurt_latent = TRUE
)

# Fit the competing model
fitted.competing_model <- MCMfit(
  competing_model,
  simmdatasumm,
  optimizers = c("rprop", "rprop", "lbfgs"),
  optim_iters = c(300, 300, 50),
  learning_rate = c(0.005, 0.01, 1),
  monitor_grads = TRUE,
  compute_se = TRUE,
  loss_type = 'smooth_l1',
  debug = FALSE
)

Warning in MCMfit(competing_model, simmdatasumm, optimizers = c("rprop", : Use
of a dataframe with more than 2 columns is still experimental.

# Summarize the competing model
sum.competing_model <- summary(fitted.competing_model)
sum.competing_model

|--------------------------------------|
| MCM Result Summary (MCMSEM v0.26.1)  |
|--------------------------------------|
device         : cpu
N phenotypes   : 8
N latents      : 4
N observations : 30000
N parameters   : 96
SE type        : asymptotic

Fit statistics
loss  : 0.113943725824356
chisq : 3418.31177473068
BIC   : 4407.97123015253

Parameters summary
   label lhs edge rhs         est          se             p
1   a1_1  f1   =~  V1  0.58671135 0.021233041 4.593127e-168
2   a1_2  f1   =~  V2  0.54721290 0.015460397 2.075386e-274
3   a1_3  f1   =~  V3  0.01840877 0.005937756  1.933302e-03
4   a1_4  f1   =~  V4  0.03225262 0.003398096  2.279936e-21
5   a1_5  f1   =~  V5 -0.53644031 0.029860681  3.677886e-72
6   a1_6  f1   =~  V6 -0.00452376 0.125743359  9.713014e-01
7   a1_7  f1   =~  V7  0.04530273 0.102940403  6.598741e-01
8   a1_8  f1   =~  V8  0.55785263 0.088908024  3.507792e-10
9   a2_2  f2   =~  V2 -0.65590131 0.037008498  2.786873e-70
10  a2_3  f2   =~  V3 -0.22879045 0.025518371  3.082734e-19
11  a2_4  f2   =~  V4  0.08779491 0.010048243  2.386639e-18
12  a2_5  f2   =~  V5 -0.62122595 0.049961634  1.707375e-35
13  a2_6  f2   =~  V6  0.89841199 0.030471753 4.694945e-191
14  a2_7  f2   =~  V7  0.22849125 0.128754422  7.595927e-02
15  a2_8  f2   =~  V8  0.18408673 0.123019338  1.345482e-01
           last_gradient
1  -0.000778920482844114
2    -0.0107413278892636
3  -0.000946930609643459
4    0.00281061977148056
5  -0.000626851804554462
6   -0.00974255986511707
7    -0.0175781585276127
8    -0.0105550866574049
9    0.00682596024125814
10   0.00353261269629002
11  -0.00109346862882376
12   0.00463091768324375
13   0.00157801248133183
14  -0.00396790076047182
15 -0.000447223428636789
... Print capped at 15 rows, use: as.data.frame([summary_object], estimates='parameters')

Variances summary
  label lhs edge rhs        est         se             p         last_gradient
1    s1  V1   ~~  V1 2.01134586 0.13794161  3.702860e-48   -0.0035299826413393
2    s2  V2   ~~  V2 0.14229307 0.04162611  6.299946e-04  -0.00449292780831456
3    s3  V3   ~~  V3 0.37259153 0.01419675 8.184404e-152  0.000126397237181664
4    s4  V4   ~~  V4 0.05731162 0.00324623  9.342257e-70   0.00140627846121788
5    s5  V5   ~~  V5 0.80195838 0.05463563  8.878188e-49  -0.00389841198921204
6    s6  V6   ~~  V6 0.71400458 0.32741088  2.920117e-02 -0.000505443662405014
7    s7  V7   ~~  V7 1.31597900 0.07649209  2.473136e-66  -0.00158527120947838
8    s8  V8   ~~  V8 0.46943089 0.07014626  2.198803e-11  -0.00484337937086821

Skewness summary
   label edge v1 v2 v3          est          se            p
1   skl1  ~~~ f1 f1 f1 -4.042101383 0.294701844 8.154681e-43
2   skl2  ~~~ f2 f2 f2 -0.412107468 0.044018678 7.816402e-21
3   skl3  ~~~ f3 f3 f3 -5.000007153 0.573909998 2.980861e-18
4   skl4  ~~~ f4 f4 f4 -3.980021000 0.616690099 1.090489e-10
5    sk1  ~~~ V1 V1 V1 -0.941351533 0.097216636 3.560079e-22
6    sk2  ~~~ V2 V2 V2  0.007474131 0.014978430 6.177843e-01
7    sk3  ~~~ V3 V3 V3 -0.191082194 0.019076658 1.289323e-23
8    sk4  ~~~ V4 V4 V4 -0.014484304 0.008152434 7.562013e-02
9    sk5  ~~~ V5 V5 V5 -0.395851284 0.053353231 1.176135e-13
10   sk6  ~~~ V6 V6 V6 -0.748721063 0.446280628 9.340741e-02
11   sk7  ~~~ V7 V7 V7 -0.974187136 0.126479357 1.335958e-14
12   sk8  ~~~ V8 V8 V8 -0.145152092 0.025439348 1.157910e-08
          last_gradient
1    0.0024511655792594
2  -0.00204537552781403
3  -0.00795921683311462
4  0.000969979970250279
5  2.79787545878207e-05
6  -0.00266749621368945
7   0.00441847182810307
8   0.00118438305798918
9  0.000129872612887993
10 -0.00463660899549723
11 -0.00467800255864859
12  0.00732388999313116

Kurtosis summary
   label edge v1 v2 v3 v4          est         se             p
1    kl1 ~~~~ f1 f1 f1 f1 21.770267487 2.25192308  4.147344e-22
2    kl2 ~~~~ f2 f2 f2 f2  2.992108107 0.11196399 2.487433e-157
3    kl3 ~~~~ f3 f3 f3 f3 22.878871918 3.41802740  2.177619e-11
4    kl4 ~~~~ f4 f4 f4 f4 72.177597046 5.37040043  3.529556e-41
5     k1 ~~~~ V1 V1 V1 V1 12.113233566 1.67465365  4.714767e-13
6     k2 ~~~~ V2 V2 V2 V2  0.001022829 0.05528670  9.852396e-01
7     k3 ~~~~ V3 V3 V3 V3  0.337345690 0.04966100  1.098485e-11
8     k4 ~~~~ V4 V4 V4 V4  0.006811842 0.02087649  7.442031e-01
9     k5 ~~~~ V5 V5 V5 V5  1.429990530 0.28547978  5.469138e-07
10    k6 ~~~~ V6 V6 V6 V6  2.557013512 2.22073555  2.495569e-01
11    k7 ~~~~ V7 V7 V7 V7  6.558364391 0.83994675  5.806957e-15
12    k8 ~~~~ V8 V8 V8 V8  0.616894364 0.18498760  8.536458e-04
           last_gradient
1   0.000479075242765248
2   -0.00222521205432713
3    0.00295029231347144
4   8.64517132868059e-05
5   0.000985380262136459
6     0.0016332168597728
7  -0.000807455449830741
8    0.00142482621595263
9   0.000115286406071391
10 -0.000218936242163181
11 -4.57439819001593e-06
12   0.00158688239753246

We can slo plot the competing model.

# Plot the competing model
plot(fitted.competing_model)

Model Comparison

Finally, we compare the loss functions of the overcomplete and competing models to assess their performance.

# Compare the loss of both models
MCMcompareloss(
  results = list(fitted.model, fitted.competing_model),
  test_data = simmdatasumm
)

       smooth_l1_train_loss train_chisq train_bic smooth_l1_test_loss
model1           0.01760388    528.1165  1559.012          0.01760388
model2           0.11394373   3418.3118  4407.971          0.11394373
       smooth_l1_diff smooth_l1_test_chisq smooth_l1_test_bic N_parameters
model1             NA             528.1165           1559.012          100
model2    -0.09633984            3418.3118           4407.971           96

Reproducibility with a Different Seed

To ensure the robustness of our analysis, we repeat the data generation and modeling process with a different seed. So we have new data, and our two competing models trained on the original data.

# Set a different seed for reproducibility
set.seed(456)

n<- 30000

# Generate new source data
source_new <- matrix(NA, n, 10)

# Generate skewed variables
for(i in 1:10){
  source_new[,i] <- ((3 %% 2) - 0.5) * -2 * rgamma(n, shape = i, rate = 2 + i/5)
}

# Introduce sparseness into the new variables
for(i in 1:10){
  source_new[sample(1:n, round(runif(1, 1000, 18000))), i] <- 0
}

# Scale the new source/latent variables
source_new <- scale(source_new)

# Generate new observed variables
observed_new <- t(A %*% t(source_new))

# Plot the correlation matrix for new observed variables
corrplot(cor(observed_new), order = "hclust")

# Prepare the moments for MCMSEM with new data
simmdatasumm_new <- MCMdatasummary(as.data.frame(observed_new), scale_data = FALSE)

# Compare the loss of the previously fitted models with the new data
MCMcompareloss(
  results = list(fitted.model, fitted.competing_model),
  test_data = simmdatasumm_new
)

       smooth_l1_train_loss train_chisq train_bic smooth_l1_test_loss
model1           0.01760388    528.1165  1559.012            5.917913
model2           0.11394373   3418.3118  4407.971            5.940602
       smooth_l1_diff smooth_l1_test_chisq smooth_l1_test_bic N_parameters
model1             NA             177537.4           178568.3          100
model2    -0.02268887             178218.1           179207.7           96

If you look at the comparison you’ll see the true overcomplete model did better, and so using MCMSEM researchers can test and compare competing overcomplete models. In future simulations I’ll try to map how far we can push overcomplete model exactly but previous work I listed above suggests you can have p^2/4 latent variables where p is the number of observed variables. so with 10,12,126 observed variables you could potentially (and theoretically) estimate 25,36 or 64 latent variables. Currently MCMSEM cannot do so without running into memory issues and modifying estimation considerably, unless you have access to the kind of GPUs OpenAI uses….

Conclusion

In this analysis, we successfully generated skewed and sparse data, specified and fitted an overcomplete MCMSEM model, and compared it with a competing model. The reproducibility checks with a different seed confirmed the robustness of our findings. The MCMSEM package provides a powerful framework for complex structural equation modeling, accommodating skewness and kurtosis in latent variables.

References

MCMSEM Package: GitHub Repository
Pati, R., Pujari, A. K., Gahan, P., & Kumar, V. (2021). Independent component analysis: a review with emphasis on commonly used algorithms and contrast function. Computación y Sistemas, 25(1), 97-115.
Ding, C., Gong, M., Zhang, K., & Tao, D. (2019). Likelihood-free overcomplete ICA and applications in causal discovery. Advances in neural information processing systems, 32.
Podosinnikova, A., Perry, A, Wein A., Bach, B., Alexandre d’Aspremont, A. & Sontag, D. (2019) Overcomplete Independent Component Analysis via SDP. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS).