This demo explores the integration of overcomplete Independent Component Analysis (ICA) and Structural Equation Modeling (SEM) using the MCMSEM package in R. Traditional methods like PCA, ICA, and SEM typically have to assume there are fewer (or equal) latent variables than observed variables, but overcomplete factor models allow for more latent factors then observed variables, offering richer insights into complex data. By bridging ICA’s focus on statistical independence with SEM’s flexible latent modeling, we introduce an approach to fit overcomplete SEM models, and demonstrate its capabilities through simulated data.
Principal Component Analysis (PCA)
PCA is a method used to reduce the dimensionality of data by transforming the original variables into a new set of uncorrelated variables, called principal components. These components are linear combinations of the original variables and are ordered by the amount of variance they explain. PCA is often used for data exploration and to simplify large data sets, but it assumes that the components are orthogonal and doesn’t provide insight into the latent structure of the data. The components are also normally distributed and not sparse, the compents absolutely maximize variance explained and in doing so mix dependent signals.
Independent Component Analysis (ICA)
ICA, unlike PCA, seeks to find statistically independent rather than uncorrelated components. This makes ICA particularly suited to situations where the observed data are mixtures of non-Gaussian signals, such as in neuroscientific data (e.g., fMRI or EEG). A classic example used to illustrate the unqiue capabilities of ICA is the cocktail party problem, where multiple people are speaking simultaneously in a room, and a set of microphones records the mixed sounds. Each microphone captures a combination of all voices, but ICA can separate these mixed signals into the original, independent sources—each person’s voice. This ability to separate independent signals from a mixture makes ICA ideal for applications involving complex, overlapping signals, like neural data or behavioral processes. ICA decomposes data into independent sources, allowing for a more nuanced understanding of underlying factors in highly complex systems.
Factor Analysis (FA) and Structural Equation Modeling (SEM)
Factor Analysis (FA) is a statistical method used to model the underlying structure in a set of observed variables by assuming that they are influenced by a smaller number of unobservable factors. FA focuses on identifying latent variables that explain the covariances between observed variables. Structural Equation Modeling (SEM) extends FA by incorporating both measurement models (like FA) and structural models, which specify the causal relationships between latent and observed variables. SEM allows for more complex models and hypothesis testing but often relies on linear relationships and assumes normality in the data.
The key differences between PCA/ICA on one hand and SEM on the other is that in SEM the researcher can shape and constrain which parameters are, and are not, estimated. In this MCMSEM tutorial we’ll bring some of the advantages of ICA and SEM into one tool, and we’ll introduce a very cool extensions of ICA, which allow for more latent than observed variables into SEM, using MCMSEM.
Over-complete Factor Models
One common assumption in traditional methods like PCA, FA, SEM, and ICA is that the number of latent variables (or factors) is smaller than the number of observed variables. This constraint allows these models to simplify and reduce dimensional effectively. However, in many real-world data sets—especially in fields like neuroscience and psychology—the underlying latent structure may be more complex, involving more latent variables than observed measurements. For example, brain activity captured by fMRI or EEG may involve numerous independent neural processes, yet we only have access to a limited number of sensors or voxel measurements. Similarly in genome wide association studies (GWAS) the genetic effects on a small number of correlated traits may reflects hundreds of unobserved biological pathways and systems.
To address this, researchers have been exploring overcomplete factor models, particularly in the context of ICA, where the number of latent variables exceeds the number of observed variables. This overcomplete approach allows for richer and more flexible models that can capture more complex underlying dynamics, even when the available observed data is limited. In overcomplete solutions, the challenge shifts from merely reducing dimensional to discovering a larger set of latent variables that can explain intricate and sparse patterns in the data.
I turns out MCMSEM is capable of fitting overcomplete factor models, like the one below, much like modern extensions of ICA do, modeling higher order moments.
Figure 1: Over complete factor solutions where 8 observed variables load on 10 latent variables.
We wont go into the details of other overcomplete models, but I don’t want to pretend we have developed the only one, below is more reading on Over complete ICA which is widely applied in machine learning and causal learning circles.
Note
Over-complete ICA is dauntingly technical for a psychologists, I really struggled trough these papers, nevertheless they reflect a field actively working on the overcomplete problem.
Pati, R., Pujari, A. K., Gahan, P., & Kumar, V. (2021). Independent component analysis: a review with emphasis on commonly used algorithms and contrast function.Computación y Sistemas, 25(1), 97-115.
General technical review of ICA techniques.
Ding, C., Gong, M., Zhang, K., & Tao, D. (2019). Likelihood-free overcomplete ICA and applications in causal discovery.Advances in neural information processing systems, 32.
This paper discusses the development of a Likelihood-Free Overcomplete ICA (LFOICA) algorithm that improves causal discovery by avoiding strong parametric assumptions and computationally expensive methods, like Expectation Maximization (EM). The new approach enhances the practicality of causal discovery tasks by estimating the mixing matrix directly via back-propagation, demonstrating its effectiveness on both synthetic and real data.
Anastasia Podosinnikova, Amelia Perry, Alexander Wein, Francis Bach, Alexandre d’Aspremont, David Sontag. Overcomplete Independent Component Analysis via SDP. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
This paper introduces a new overcomplete ICA algorithm that handles cases where the number of latent sources exceeds the number of observed variables, avoiding assumptions about sparsity and offering computational efficiency. The algorithm uses Hessians of the cumulant generating function and a novel semi-definite programming (SDP) relaxation, demonstrating strong theoretical properties and practical performance on synthetic data and the CIFAR-10 dataset.
Emperical illustration
In the following we’ll simulate 10 latent variables, which effect 8 observed variables, then well estimate that model without pre-specifying that model. Now this is not an academic article so we take some liberties for expedience. First we don’t let the optimizer run for hours but sort of stop slightly early. MCMSEM is build on torch, which is developed for machine learning. Unlike in SEM models when using ML models the weights (parameters) aren’t the quantity of interest and so MCMSEM, like torch, doesn’t optimize until the model converges but for as long as you’ll let it train, you the use have to make sure you trust the minimum/result.
Installation of Required Packages
Before starting the analysis, we need to install and load the necessary packages. The MCMSEM package is not available on CRAN, so we will install it directly from GitHub using the devtools package.
# Install devtools if not already installedif (!require(devtools)) {install.packages("devtools")library(devtools)}
Loading required package: devtools
Loading required package: usethis
# Install MCMSEM from GitHubif (!require(MCMSEM)) {install_github("https://github.com/MichelNivard/MCMSEM")library(MCMSEM)}
To ensure reproducibility, we set a seed. We then generate a source matrix with skewed variables and introduce sparseness into these variables increasing their Kurtosis.
# Set seed for reproducibilityset.seed(123)# Define the number of observations and variablesn <-30000source <-matrix(NA, n, 10)# Generate skewed variablesfor(i in1:10){ source[,i] <- ((3%%2) -0.5) *-2*rgamma(n, shape = i, rate =2+ i/5)}# Introduce sparseness into the variablesfor(i in1:10){ source[sample(1:n, round(runif(1, 1000, 18000))), i] <-0}
Specification of the Loading Matrix
We create a tractable loading matrix that maps 10 latent variables to 8 observed variables. This matrix defines the relationships between latent and observed variables.
We scale the source (latent) variables and generate the observed variables by multiplying the loading matrix with the source matrix.
# Scale the source/latent variablessource <-scale(source)# Generate the observed variablesobserved <-t(A %*%t(source))
Correlation Plot of Observed Variables
To visualize the relationships between observed variables, we create a correlation plot ordered by hierarchical clustering.
# Plot the correlation matrixcorrplot(cor(observed), order ="hclust")
Preparing Moments for MCMSEM
We prepare the summary statistics required for the MCMSEM analysis using the observed data.
# Prepare the moments for MCMSEMsimmdatasumm <-MCMdatasummary(as.data.frame(observed), scale_data =FALSE)
Specifying the MCMSEM 10-Factor Model
We specify an overcomplete model with 10 latent factors. This model allows for skewness and kurtosis in the latent variables but not in the observed variables. For details on setting up MCMSEM models see the GitHub Wiki.
We assume (among other things!) the following in this model:
The latent variables are skewed and/or have kurtosis
Warning in MCMmodel(simmdatasumm, n_latent = 10, causal_observed = FALSE, :
It's unlikely you want to use use more latent factors than phenotypes present
in the data, unless you know what you are doing consider revising....
Setting Model Constraints
We impose constraints on the model parameters, such as setting residual variances to zero for observed variables and fixing latent variances to one. We have to fix the latent variances for scaling perpouses.
# No residual variances for the observed variablescoords_S <-list(rep(11:18), rep(11:18))overcomplete.model <-MCMedit(overcomplete.model, pointer ="S", name = coords_S, value =0)# No kurtosis for the observed variablesfor(i in1:10){ overcomplete.model <-MCMedit(overcomplete.model, pointer ="K", name =paste0("k", i), value =0)}# Fix the variances of the latent variables to 1coords_var <-list(c(1:10), c(1:10))overcomplete.model <-MCMedit(overcomplete.model, pointer ="S", name = coords_var, value =1)
Specifying Loadings and Parameter Boundaries
We specify a full set of loadings from the 10 latent variables to the 8 observed variables and adjust parameter boundaries to ensure model stability.
# Specify loadingscoords_A <-list(rep(11:18, 10), rep(1:10, each =8))new_names <-paste0("a", 1:80)overcomplete.model <-MCMedit(overcomplete.model, pointer ="A", name = coords_A, value = new_names)# Change boundaries on some parametersovercomplete.model <-MCMedit(overcomplete.model, pointer ="bound", name ="a", value =c(-0.05, 1.2))overcomplete.model <-MCMedit(overcomplete.model, pointer ="bound", name ="k", value =c(0, 25))
Setting Starting Values for Parameters
Proper starting values can aid in the convergence of the optimization algorithm. We set starting values for the loadings and skewness parameters.
# Change the starting values on some parametersovercomplete.model <-MCMedit(overcomplete.model, pointer ="start", name ="a", value =0)for(i in1:10){ overcomplete.model <-MCMedit(overcomplete.model, pointer ="start", name =paste0("kl", i), value =runif(1, 3, 5)) overcomplete.model <-MCMedit(overcomplete.model, pointer ="start", name =paste0("skl", i), value =runif(1, -1, 1))}
Fitting the MCMSEM Model
We fit the specified overcomplete model to the prepared data summary using various optimization algorithms. We use 3 optimizer, or well 2 actually “rprops” and “lbfgs” but for “rprops” we change the learning rate.
# Fit the overcomplete modelfitted.model <-MCMfit( overcomplete.model, simmdatasumm,optimizers =c("rprop", "rprop", "lbfgs"),optim_iters =c(300, 300, 50),learning_rate =c(0.005, 0.01, 1),monitor_grads =TRUE,compute_se =TRUE,loss_type ='smooth_l1',debug =FALSE)# Summarize the fitted modelsum.model <-summary(fitted.model)sum.model
|--------------------------------------|
| MCM Result Summary (MCMSEM v0.26.1) |
|--------------------------------------|
device : cpu
N phenotypes : 8
N latents : 10
N observations : 30000
N parameters : 100
SE type : asymptotic
Fit statistics
loss : 0.0176038835197687
chisq : 528.116505593061
BIC : 1559.01177165749
Parameters summary
label lhs edge rhs est se p
1 a1 f1 =~ V1 0.1678715944 0.048147809 4.892236e-04
2 a2 f1 =~ V2 -0.0609780699 0.025913790 1.861699e-02
3 a3 f1 =~ V3 0.2696545124 0.041094221 5.314082e-11
4 a4 f1 =~ V4 0.7329896092 0.024108918 4.974757e-203
5 a5 f1 =~ V5 -0.0604888424 0.033842031 7.387479e-02
6 a6 f1 =~ V6 -0.0007124003 0.036803890 9.845566e-01
7 a7 f1 =~ V7 0.7083864808 0.032205720 3.167973e-107
8 a8 f1 =~ V8 -0.0306280944 0.024219824 2.060191e-01
9 a9 f2 =~ V1 -0.0189371482 0.023982558 4.297487e-01
10 a10 f2 =~ V2 0.6021347642 0.009226346 0.000000e+00
11 a11 f2 =~ V3 0.6057526469 0.021693107 1.374251e-171
12 a12 f2 =~ V4 0.0069870590 0.013993848 6.175715e-01
13 a13 f2 =~ V5 0.5530548692 0.019606708 4.740561e-175
14 a14 f2 =~ V6 0.6095092297 0.020448277 3.136002e-195
15 a15 f2 =~ V7 -0.0117967073 0.017373675 4.971385e-01
last_gradient
1 -0.003036123001948
2 -0.00117389671504498
3 -0.00572217209264636
4 0.000284313224256039
5 0.00296841375529766
6 -0.00346106383949518
7 0.000492800027132034
8 -0.000649686902761459
9 -0.00551279308274388
10 -0.0020515862852335
11 -0.0136406663805246
12 -0.00300788786262274
13 0.000595672056078911
14 -0.00886264815926552
15 0.00198831246234477
... Print capped at 15 rows, use: as.data.frame([summary_object], estimates='parameters')
Variances summary
label lhs edge rhs est se p last_gradient
1 NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA
Skewness summary
label edge v1 v2 v3 est se p
1 skl1 ~~~ f1 f1 f1 -0.2462558 0.04887607 4.695090e-07
2 skl2 ~~~ f2 f2 f2 -0.9796571 0.04661056 4.497081e-98
3 skl3 ~~~ f3 f3 f3 -4.8307133 1.34476101 3.278393e-04
4 skl4 ~~~ f4 f4 f4 -0.2076529 0.03594272 7.589632e-09
5 skl5 ~~~ f5 f5 f5 -5.0106988 1.97902632 1.134460e-02
6 skl6 ~~~ f6 f6 f6 -0.3221173 0.03509216 4.343741e-20
7 skl7 ~~~ f7 f7 f7 -0.7627797 0.05984439 3.279362e-37
8 skl8 ~~~ f8 f8 f8 -1.4404614 0.61679763 1.952319e-02
9 skl9 ~~~ f9 f9 f9 -0.4237339 0.06323311 2.067995e-11
10 skl10 ~~~ f10 f10 f10 -0.5848176 0.03961539 2.560554e-49
last_gradient
1 5.94980701862369e-05
2 0.000510916637722403
3 -5.67671986573259e-06
4 0.000450620835181326
5 -0.000949684996157885
6 0.000926083652302623
7 -8.42159261082998e-06
8 0.000790745951235294
9 0.00033377727959305
10 -8.83228203747422e-05
Kurtosis summary
label edge v1 v2 v3 v4 est se p
1 kl1 ~~~~ f1 f1 f1 f1 2.739680 0.16583231 2.600019e-61
2 kl2 ~~~~ f2 f2 f2 f2 4.632741 0.16553795 2.407235e-172
3 kl3 ~~~~ f3 f3 f3 f3 17.702730 5.79674292 2.258800e-03
4 kl4 ~~~~ f4 f4 f4 f4 1.945173 0.17690448 4.011779e-28
5 kl5 ~~~~ f5 f5 f5 f5 32.226040 17.48164177 6.526745e-02
6 kl6 ~~~~ f6 f6 f6 f6 2.754783 0.05823371 0.000000e+00
7 kl7 ~~~~ f7 f7 f7 f7 2.879166 0.14085579 7.294934e-93
8 kl8 ~~~~ f8 f8 f8 f8 9.958513 3.97755241 1.229096e-02
9 kl9 ~~~~ f9 f9 f9 f9 3.463953 0.29559177 1.022226e-31
10 kl10 ~~~~ f10 f10 f10 f10 3.867421 0.12334407 8.384988e-216
last_gradient
1 -0.000820127606857568
2 -0.000409843632951379
3 -7.44726057746448e-05
4 -0.000462166499346495
5 0.000632951210718602
6 -9.94364672806114e-05
7 8.649976371089e-05
8 -1.10996561488719e-05
9 -0.00149082741700113
10 -0.000166902114870027
We can also plot the model
# Plot the fitted model with a thresholdplot(fitted.model, threshold =0.1)
And finally we can compare the significant loadings fitted model to the simulated model. Its not perfect because random fluctuation in the simulated data, but its close!
MCMSEM can go further though, we can specify other overcomplete models and compare! To evaluate the robustness of our overcomplete model, we specify and fit a competing model with fewer latent factors.This model has 4 latent variables but ALSO let the observed variabbles influence eachother causally. ITs obviosuly not the true model and so we expec it to do worse then the true model.
# Specify a competing model with 4 latent factorscompeting_model <-MCMmodel( simmdatasumm,n_latent =4,constrained_a =FALSE,skew_latent =TRUE,kurt_latent =TRUE)# Fit the competing modelfitted.competing_model <-MCMfit( competing_model, simmdatasumm,optimizers =c("rprop", "rprop", "lbfgs"),optim_iters =c(300, 300, 50),learning_rate =c(0.005, 0.01, 1),monitor_grads =TRUE,compute_se =TRUE,loss_type ='smooth_l1',debug =FALSE)
Warning in MCMfit(competing_model, simmdatasumm, optimizers = c("rprop", : Use
of a dataframe with more than 2 columns is still experimental.
# Summarize the competing modelsum.competing_model <-summary(fitted.competing_model)sum.competing_model
To ensure the robustness of our analysis, we repeat the data generation and modeling process with a different seed. So we have new data, and our two competing models trained on the original data.
# Set a different seed for reproducibilityset.seed(456)n<-30000# Generate new source datasource_new <-matrix(NA, n, 10)# Generate skewed variablesfor(i in1:10){ source_new[,i] <- ((3%%2) -0.5) *-2*rgamma(n, shape = i, rate =2+ i/5)}# Introduce sparseness into the new variablesfor(i in1:10){ source_new[sample(1:n, round(runif(1, 1000, 18000))), i] <-0}# Scale the new source/latent variablessource_new <-scale(source_new)# Generate new observed variablesobserved_new <-t(A %*%t(source_new))# Plot the correlation matrix for new observed variablescorrplot(cor(observed_new), order ="hclust")
# Prepare the moments for MCMSEM with new datasimmdatasumm_new <-MCMdatasummary(as.data.frame(observed_new), scale_data =FALSE)# Compare the loss of the previously fitted models with the new dataMCMcompareloss(results =list(fitted.model, fitted.competing_model),test_data = simmdatasumm_new)
If you look at the comparison you’ll see the true overcomplete model did better, and so using MCMSEM researchers can test and compare competing overcomplete models. In future simulations I’ll try to map how far we can push overcomplete model exactly but previous work I listed above suggests you can have p^2/4 latent variables where p is the number of observed variables. so with 10,12,126 observed variables you could potentially (and theoretically) estimate 25,36 or 64 latent variables. Currently MCMSEM cannot do so without running into memory issues and modifying estimation considerably, unless you have access to the kind of GPUs OpenAI uses….
Conclusion
In this analysis, we successfully generated skewed and sparse data, specified and fitted an overcomplete MCMSEM model, and compared it with a competing model. The reproducibility checks with a different seed confirmed the robustness of our findings. The MCMSEM package provides a powerful framework for complex structural equation modeling, accommodating skewness and kurtosis in latent variables.
Pati, R., Pujari, A. K., Gahan, P., & Kumar, V. (2021). Independent component analysis: a review with emphasis on commonly used algorithms and contrast function.Computación y Sistemas, 25(1), 97-115.
Ding, C., Gong, M., Zhang, K., & Tao, D. (2019). Likelihood-free overcomplete ICA and applications in causal discovery.Advances in neural information processing systems, 32.
Podosinnikova, A., Perry, A, Wein A., Bach, B., Alexandre d’Aspremont, A. & Sontag, D. (2019) Overcomplete Independent Component Analysis via SDP. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS).