Citation to Preprint with Technical Details

de la Fuente J. et al. (2026). Ultra-Fast Implementation of Multivariate GWAS in Genomic SEM Using Flexible Analytic Estimation [Preprint].

The preprint provides the full mathematical derivation of the analytic estimator, benchmarking results, and empirical validation against the standard iterative estimator. This tutorial focuses on practical application of the function.

⚠️ CAVEAT EMPTOR — Please read before use

The analytic estimation option in userGWAS is currently in alpha and under active development. It has not yet undergone the full validation program planned for the beta version. Our intention in releasing this alpha version is to enable interested members of the community to work with the function early and provide feedback on ease of use, requested features, and any unexpected behavior before we advance to a more complete version suitable for broader beta testing.

What this means in practice:

The function’s interface, arguments, and output may change without notice in future versions. The alpha release does not guarantee stability of any aspect of the function.

Results should be cross-validated against the standard iterative userGWAS function for a random subset of SNPs (e.g., ~1,000 randomly selected SNPs and a sample of top hits) before drawing scientific conclusions.

Users should exercise particular caution in high-dimensional applications (25+ phenotypes), for which iterative cross-validation is not currently feasible. Validity studies for very high-dimensional settings are planned for future work.

We actively welcome feedback. If you encounter unexpected results, errors, or have suggestions for features, please open an issue on GitHub or contact the authors directly (j.delafuente@utexas.edu). Your input will directly shape the development of the beta version.

Overview

The userGWAS function in GenomicSEM now includes an analytic estimation option that performs multivariate GWAS using a closed-form Generalized Least Squares (GLS) estimator for the most computationally intensive portion of model estimation, replacing the iterative numerical optimization that userGWAS previously relied on exclusively.

The analytic estimator is activated by setting analytic = TRUE in the userGWAS call. The default remains analytic = FALSE (iterative estimation) to preserve backward compatibility during the alpha period.

The core idea

Standard Genomic SEM multivariate GWAS fits a separate structural equation model for each of millions of SNPs using an iterative estimator. This is accurate but computationally demanding, typically requiring access to a high-performance computing (HPC) cluster.

The analytic option takes advantage of the fact that the model can be decomposed into two independent parts:

The measurement model (factor loadings, factor covariances) — depends only on the LDSC output, is the same for every SNP, and requires optimizing a nonlinear system of equations. This is estimated once with numerical optimization using the standard usermodel() function.
The structural model (SNP → factor regressions) — must be estimated separately for each SNP, but constitutes a linear system of equations for which a direct analytic GLS solution exists.

By substituting an analytic GLS estimator for the iterative estimator in step 2, run-times are reduced by over 800 times without meaningfully changing the results. Consequently, analytic estimation does not require an HPC environment — researchers can run full genome-wide analyses on standard personal computers.

Notes on model specification

The measurement model (i.e., the portion of the model that does not involve the SNP) should include only GWAS phenotypes, first-order factors, factor loadings (=~), and factor covariances (~~). Higher-order factors and regressions between factors should not be explicitly included. However, nearly all models containing such terms can be reparameterized as first-order measurement models.

The structural model (i.e., the portion of the model that involves the SNP) should include only SNP → factor regressions (~ SNP). To specify a SNP → GWAS phenotype regression directly, the measurement model should include a factor representing that phenotype, with the residual variance of the phenotype set to zero.

A key check when reparametrizing a model is to fit the no-SNP versions of both the original and reparameterized models in usermodel() and confirm that their model fit, degrees of freedom, and parameter estimates are equivalent. In future releases, we plan to provide examples of how several common models (e.g., hierarchical factor models, mediation models, GWAS-by-subtraction models) can be parametrized to meet these requirements.

Computational gains

Runtime comparison for ~20K SNPs: iterative vs. analytic estimation. Data from the preprint.

Installation

Install GenomicSEM directly from GitHub:

if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("GenomicSEM/GenomicSEM")

Using analytic estimation in `userGWAS`

Analytic estimation is activated by adding analytic = TRUE to your existing userGWAS call. All other arguments remain the same as in standard iterative usage.

userGWAS(
  covstruc   = LDSCoutput,   # ldsc() output
  SNPs       = sumstats,     # sumstats() output
  model      = model,        # lavaan-style syntax (including SNP lines)
  usermod    = NULL,         # pre-fitted no-SNP model output. Default is NULL.
  analytic   = TRUE,         # activates the analytic estimator
  batch_size = 100000        # number of SNPs processed per batch (tune to available RAM)
)

New argument	Description
`analytic`	Logical. If `TRUE`, uses the closed-form GLS analytic estimator. Default is `FALSE` (iterative).
`batch_size`	Integer. Number of SNPs per batch. Larger values are faster but require more RAM. Default is `100,000`.
`usermod`	Optional. Pre-fitted no-SNP model output (see below). Default is `NULL`.

All other userGWAS arguments are unchanged. See ?userGWAS for the full argument list.

The `usermod` argument

By default, usermod = NULL and the function internally fits the no-SNP measurement model from the lavaan syntax provided in model. If you have already fitted the no-SNP model separately — for example, to inspect model fit or validate the factor structure before running the GWAS — you can pass the $results element returned by usermodel() directly via usermod. The function will then skip internal estimation of the no-SNP model and use the supplied parameter estimates to extract factor loadings.

When usermod is provided, there is no need to include the model syntax, as the function estimates SNP effects on all factors detected in the provided model output.

# Step 1: fit the no-SNP model separately
nosnp_fit <- usermodel(
  LDSCoutput,
  estimation = "DWLS",
  model      = nosnp_model
)

# Step 2: pass the results directly
userGWAS(
  sumstats   = sumstats,
  LDSCoutput = LDSCoutput,
  usermod    = nosnp_fit$results,
  batch_size = 100000
)

Note: The behavior of usermod — specifically, which factors receive a SNP effect when it is provided — may be refined in future versions of the function.

Output columns

When analytic = TRUE, userGWAS returns a single data frame with one row per SNP (rather than a list of data frames as in iterative mode). The following columns are returned for each factor F specified in the ~ SNP lines of your model:

Column	Description
`beta_F`	Estimated effect of the SNP on factor F
`SE_F`	Standard error of `beta_F`
`Z_beta_F`	Z-statistic for `beta_F`
`p_val_F`	Two-sided p-value for `beta_F`
`Q_omnibus`	Omnibus heterogeneity statistic across all traits
`Q_omnibus_df`	Degrees of freedom for `Q_omnibus` (= k − f, where k is the number of phenotypes and f the number of factors)
`Q_omnibus_pval`	p-value for `Q_omnibus`

Worked example: 5-factor psychiatric GWAS

This example reproduces the analysis from Grotzinger et al. (2023) using a 5-factor model of common psychiatric disorders across 13 phenotypes.

Data

The two files required to run this example can be downloaded below. Place both files in your R working directory before running the code.

LDSC_PSYCH.RData — Pre-computed LDSC output containing the genetic covariance matrix and sampling covariance matrix for the 5-factor psychiatric model.
Psych_sumstats.txt — Merged GWAS summary statistics for 13 psychiatric phenotypes, formatted for userGWAS input.

library(data.table)
library(GenomicSEM)

load("LDSC_PSYCH.RData")       # loads LDSC_P
sumstats <- fread("Psych_sumstats.txt", data.table = FALSE)

Model specification

The model follows standard lavaan syntax. Lines ending in ~ SNP define the factors that will receive a SNP effect estimate. All other lines define the factor structure and are estimated once from the no-SNP model.

Reminder: The measurement model should include only GWAS phenotypes, first-order factors, factor loadings, and factor covariances. The structural model should include only SNP → factor regressions.

model <- '
  Comp  =~  OCD + TS + AN
  Psych =~  a*SCZ + a*BIP
  Neuro =~  ADHD + MDD + TS + ASD + PTSD
  Int   =~  MDD + ANX + PTSD
  SUD   =~  ALCH + CUD + OUD
  Comp  ~~  Psych + Neuro + Int + SUD
  Psych ~~  Neuro + Int + SUD
  Neuro ~~  Int + SUD
  Int   ~~  SUD

  Comp  ~   SNP
  Psych ~   SNP
  Neuro ~   SNP
  Int   ~   SNP
  SUD   ~   SNP
'

Note that a*SCZ + a*BIP imposes an equality constraint on the loadings of SCZ and BIP onto the Psychotic Disorders factor. The analytic estimator correctly handles constrained model syntax of this form.

Running the analysis

results <- userGWAS(
  covstruc   = LDSC_P,
  SNPs       = sumstats,
  model      = model,
  analytic   = TRUE,
  batch_size = 100000
)

Inspecting the output

The output is a single data frame with one row per SNP. The table below shows the first six rows from a real run of the 5-factor psychiatric GWAS model.

**Table 1.** First six rows of `userGWAS(analytic = TRUE)` output from the 5-factor psychiatric GWAS model (13 phenotypes, real data).
						Compulsive (Comp)				Neurodevelopmental (Neuro)				Internalizing (Int)				Substance Use (SUD)				Psychotic (Psych)				Heterogeneity
SNP	CHR	BP	MAF	A1	A2	beta_Comp	SE_Comp	Z_beta_Comp	p_val_Comp	beta_Neuro	SE_Neuro	Z_beta_Neuro	p_val_Neuro	beta_Int	SE_Int	Z_beta_Int	p_val_Int	beta_SUD	SE_SUD	Z_beta_SUD	p_val_SUD	beta_Psych	SE_Psych	Z_beta_Psych	p_val_Psych	Q_omnibus	Q_omnibus_df	Q_omnibus_pval
rs12089192	1	1171022	0.1183	G	C	-0.0149	0.0179	-0.8299	0.4066	-0.0002	0.0119	-0.0128	0.9898	-0.0072	0.0035	-2.0667	0.0388	0.0162	0.0115	1.4040	0.1603	-0.0121	0.0062	-1.9387	0.0525	8.680	8	0.3700
rs78446752	1	1183397	0.0666	C	T	-0.0093	0.0235	-0.3952	0.6927	0.0131	0.0152	0.8643	0.3874	0.0001	0.0043	0.0325	0.9741	0.0144	0.0143	1.0059	0.3145	-0.0228	0.0080	-2.8315	0.0046	6.186	8	0.6264
rs2494426	1	2337537	0.1203	C	G	-0.0156	0.0176	-0.8837	0.3769	0.0082	0.0124	0.6602	0.5091	0.0100	0.0037	2.6936	0.0071	-0.0028	0.0114	-0.2414	0.8092	-0.0039	0.0067	-0.5869	0.5573	6.487	8	0.5929
rs12727713	1	3095631	0.0457	G	A	0.0061	0.0277	0.2200	0.8259	0.0327	0.0202	1.6164	0.1060	-0.0068	0.0047	-1.4479	0.1476	-0.0007	0.0177	-0.0391	0.9688	-0.0120	0.0091	-1.3238	0.1856	6.082	8	0.6380
rs2455103	1	3178416	0.1421	A	G	-0.0061	0.0150	-0.4054	0.6852	-0.0032	0.0106	-0.3026	0.7622	0.0029	0.0033	0.8846	0.3764	-0.0059	0.0104	-0.5691	0.5693	-0.0130	0.0057	-2.2629	0.0236	15.747	8	0.0461
rs12097610	1	3812661	0.1829	C	T	-0.0082	0.0138	-0.5975	0.5501	0.0022	0.0098	0.2263	0.8210	-0.0028	0.0031	-0.9051	0.3654	-0.0087	0.0092	-0.9491	0.3426	0.0061	0.0053	1.1678	0.2429	4.580	8	0.8014

Providing feedback

We actively welcome feedback from users. If you:

encounter unexpected results or errors,
have suggestions for new features, or
would like to report a use case where the function behaved unexpectedly,

please open an issue on the GenomicSEM GitHub repository or contact the authors directly (j.delafuente@utexas.edu). Your input will directly shape the development of the beta version.

Analytic Estimation for Multivariate GWAS in Genomic SEM

Javier de la Fuente & Elliot M. Tucker-Drob

June 2026

Citation to Preprint with Technical Details

Overview

The core idea

Notes on model specification

Computational gains

Installation

Using analytic estimation in `userGWAS`

The `usermod` argument

Output columns

Worked example: 5-factor psychiatric GWAS

Data

Model specification

Running the analysis

Inspecting the output

Providing feedback

Analytic Estimation for Multivariate GWAS in Genomic SEM

Javier de la Fuente & Elliot M. Tucker-Drob

June 2026

Citation to Preprint with Technical Details

Overview

The core idea

Notes on model specification

Computational gains

Installation

Using analytic estimation in userGWAS

The usermod argument

Output columns

Worked example: 5-factor psychiatric GWAS

Data

Model specification

Running the analysis

Inspecting the output

Providing feedback

Using analytic estimation in `userGWAS`

The `usermod` argument