de la Fuente J. et al. (2026). Ultra-Fast Implementation of Multivariate GWAS in Genomic SEM Using Flexible Analytic Estimation [Preprint].
The preprint provides the full mathematical derivation of the analytic estimator, benchmarking results, and empirical validation against the standard iterative estimator. This tutorial focuses on practical application of the function.
⚠️ CAVEAT EMPTOR — Please read before use
The analytic estimation option in
userGWASis currently in alpha and under active development. It has not yet undergone the full validation program planned for the beta version. Our intention in releasing this alpha version is to enable interested members of the community to work with the function early and provide feedback on ease of use, requested features, and any unexpected behavior before we advance to a more complete version suitable for broader beta testing.What this means in practice:
- The function’s interface, arguments, and output may change without notice in future versions. The alpha release does not guarantee stability of any aspect of the function.
- Results should be cross-validated against the standard iterative
userGWASfunction for a random subset of SNPs (e.g., ~1,000 randomly selected SNPs and a sample of top hits) before drawing scientific conclusions.- Users should exercise particular caution in high-dimensional applications (25+ phenotypes), for which iterative cross-validation is not currently feasible. Validity studies for very high-dimensional settings are planned for future work.
- We actively welcome feedback. If you encounter unexpected results, errors, or have suggestions for features, please open an issue on GitHub or contact the authors directly (j.delafuente@utexas.edu). Your input will directly shape the development of the beta version.
The userGWAS function in GenomicSEM now
includes an analytic estimation option that performs
multivariate GWAS using a closed-form Generalized Least Squares (GLS)
estimator for the most computationally intensive portion of model
estimation, replacing the iterative numerical optimization that
userGWAS previously relied on exclusively.
The analytic estimator is activated by setting
analytic = TRUE in the userGWAS call. The
default remains analytic = FALSE (iterative estimation) to
preserve backward compatibility during the alpha period.
Standard Genomic SEM multivariate GWAS fits a separate structural equation model for each of millions of SNPs using an iterative estimator. This is accurate but computationally demanding, typically requiring access to a high-performance computing (HPC) cluster.
The analytic option takes advantage of the fact that the model can be decomposed into two independent parts:
usermodel() function.By substituting an analytic GLS estimator for the iterative estimator in step 2, run-times are reduced by over 800 times without meaningfully changing the results. Consequently, analytic estimation does not require an HPC environment — researchers can run full genome-wide analyses on standard personal computers.
The measurement model (i.e., the portion of the
model that does not involve the SNP) should include only GWAS
phenotypes, first-order factors, factor loadings (=~), and
factor covariances (~~). Higher-order factors and
regressions between factors should not be explicitly included. However,
nearly all models containing such terms can be reparameterized as
first-order measurement models.
The structural model (i.e., the portion of the model
that involves the SNP) should include only SNP → factor regressions
(~ SNP). To specify a SNP → GWAS phenotype regression
directly, the measurement model should include a factor representing
that phenotype, with the residual variance of the phenotype set to
zero.
A key check when reparametrizing a model is to fit the no-SNP
versions of both the original and reparameterized models in
usermodel() and confirm that their model fit, degrees of
freedom, and parameter estimates are equivalent. In future releases, we
plan to provide examples of how several common models (e.g.,
hierarchical factor models, mediation models, GWAS-by-subtraction
models) can be parametrized to meet these requirements.
Runtime comparison for ~20K SNPs: iterative vs. analytic estimation. Data from the preprint.
Install GenomicSEM directly from GitHub:
if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
devtools::install_github("GenomicSEM/GenomicSEM")userGWASAnalytic estimation is activated by adding
analytic = TRUE to your existing userGWAS
call. All other arguments remain the same as in standard iterative
usage.
userGWAS(
covstruc = LDSCoutput, # ldsc() output
SNPs = sumstats, # sumstats() output
model = model, # lavaan-style syntax (including SNP lines)
usermod = NULL, # pre-fitted no-SNP model output. Default is NULL.
analytic = TRUE, # activates the analytic estimator
batch_size = 100000 # number of SNPs processed per batch (tune to available RAM)
)| New argument | Description |
|---|---|
analytic |
Logical. If TRUE, uses the closed-form GLS analytic
estimator. Default is FALSE (iterative). |
batch_size |
Integer. Number of SNPs per batch. Larger values are faster but
require more RAM. Default is 100,000. |
usermod |
Optional. Pre-fitted no-SNP model output (see below). Default is
NULL. |
All other userGWAS arguments are unchanged. See
?userGWAS for the full argument list.
usermod argumentBy default, usermod = NULL and the function internally
fits the no-SNP measurement model from the lavaan syntax provided in
model. If you have already fitted the no-SNP model
separately — for example, to inspect model fit or validate the factor
structure before running the GWAS — you can pass the
$results element returned by usermodel()
directly via usermod. The function will then skip internal
estimation of the no-SNP model and use the supplied parameter estimates
to extract factor loadings.
When usermod is provided, there is no need to include
the model syntax, as the function estimates SNP effects on all factors
detected in the provided model output.
# Step 1: fit the no-SNP model separately
nosnp_fit <- usermodel(
LDSCoutput,
estimation = "DWLS",
model = nosnp_model
)
# Step 2: pass the results directly
userGWAS(
sumstats = sumstats,
LDSCoutput = LDSCoutput,
usermod = nosnp_fit$results,
batch_size = 100000
)Note: The behavior of
usermod— specifically, which factors receive a SNP effect when it is provided — may be refined in future versions of the function.
When analytic = TRUE, userGWAS returns a
single data frame with one row per SNP (rather than a
list of data frames as in iterative mode). The following columns are
returned for each factor F specified in the
~ SNP lines of your model:
| Column | Description |
|---|---|
beta_F |
Estimated effect of the SNP on factor F |
SE_F |
Standard error of beta_F |
Z_beta_F |
Z-statistic for beta_F |
p_val_F |
Two-sided p-value for beta_F |
Q_omnibus |
Omnibus heterogeneity statistic across all traits |
Q_omnibus_df |
Degrees of freedom for Q_omnibus (= k −
f, where k is the number of phenotypes and f
the number of factors) |
Q_omnibus_pval |
p-value for Q_omnibus |
This example reproduces the analysis from Grotzinger et al. (2023) using a 5-factor model of common psychiatric disorders across 13 phenotypes.
The two files required to run this example can be downloaded below. Place both files in your R working directory before running the code.
LDSC_PSYCH.RData
— Pre-computed LDSC output containing the genetic covariance matrix and
sampling covariance matrix for the 5-factor psychiatric model.Psych_sumstats.txt
— Merged GWAS summary statistics for 13 psychiatric phenotypes,
formatted for userGWAS input.The model follows standard lavaan syntax. Lines ending in
~ SNP define the factors that will receive a SNP effect
estimate. All other lines define the factor structure and are estimated
once from the no-SNP model.
Reminder: The measurement model should include only GWAS phenotypes, first-order factors, factor loadings, and factor covariances. The structural model should include only SNP → factor regressions.
model <- '
Comp =~ OCD + TS + AN
Psych =~ a*SCZ + a*BIP
Neuro =~ ADHD + MDD + TS + ASD + PTSD
Int =~ MDD + ANX + PTSD
SUD =~ ALCH + CUD + OUD
Comp ~~ Psych + Neuro + Int + SUD
Psych ~~ Neuro + Int + SUD
Neuro ~~ Int + SUD
Int ~~ SUD
Comp ~ SNP
Psych ~ SNP
Neuro ~ SNP
Int ~ SNP
SUD ~ SNP
'Note that a*SCZ + a*BIP imposes an equality constraint
on the loadings of SCZ and BIP onto the Psychotic Disorders factor. The
analytic estimator correctly handles constrained model syntax of this
form.
The output is a single data frame with one row per SNP. The table below shows the first six rows from a real run of the 5-factor psychiatric GWAS model.
| SNP | CHR | BP | MAF | A1 | A2 | beta_Comp | SE_Comp | Z_beta_Comp | p_val_Comp | beta_Neuro | SE_Neuro | Z_beta_Neuro | p_val_Neuro | beta_Int | SE_Int | Z_beta_Int | p_val_Int | beta_SUD | SE_SUD | Z_beta_SUD | p_val_SUD | beta_Psych | SE_Psych | Z_beta_Psych | p_val_Psych | Q_omnibus | Q_omnibus_df | Q_omnibus_pval |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| rs12089192 | 1 | 1171022 | 0.1183 | G | C | -0.0149 | 0.0179 | -0.8299 | 0.4066 | -0.0002 | 0.0119 | -0.0128 | 0.9898 | -0.0072 | 0.0035 | -2.0667 | 0.0388 | 0.0162 | 0.0115 | 1.4040 | 0.1603 | -0.0121 | 0.0062 | -1.9387 | 0.0525 | 8.680 | 8 | 0.3700 |
| rs78446752 | 1 | 1183397 | 0.0666 | C | T | -0.0093 | 0.0235 | -0.3952 | 0.6927 | 0.0131 | 0.0152 | 0.8643 | 0.3874 | 0.0001 | 0.0043 | 0.0325 | 0.9741 | 0.0144 | 0.0143 | 1.0059 | 0.3145 | -0.0228 | 0.0080 | -2.8315 | 0.0046 | 6.186 | 8 | 0.6264 |
| rs2494426 | 1 | 2337537 | 0.1203 | C | G | -0.0156 | 0.0176 | -0.8837 | 0.3769 | 0.0082 | 0.0124 | 0.6602 | 0.5091 | 0.0100 | 0.0037 | 2.6936 | 0.0071 | -0.0028 | 0.0114 | -0.2414 | 0.8092 | -0.0039 | 0.0067 | -0.5869 | 0.5573 | 6.487 | 8 | 0.5929 |
| rs12727713 | 1 | 3095631 | 0.0457 | G | A | 0.0061 | 0.0277 | 0.2200 | 0.8259 | 0.0327 | 0.0202 | 1.6164 | 0.1060 | -0.0068 | 0.0047 | -1.4479 | 0.1476 | -0.0007 | 0.0177 | -0.0391 | 0.9688 | -0.0120 | 0.0091 | -1.3238 | 0.1856 | 6.082 | 8 | 0.6380 |
| rs2455103 | 1 | 3178416 | 0.1421 | A | G | -0.0061 | 0.0150 | -0.4054 | 0.6852 | -0.0032 | 0.0106 | -0.3026 | 0.7622 | 0.0029 | 0.0033 | 0.8846 | 0.3764 | -0.0059 | 0.0104 | -0.5691 | 0.5693 | -0.0130 | 0.0057 | -2.2629 | 0.0236 | 15.747 | 8 | 0.0461 |
| rs12097610 | 1 | 3812661 | 0.1829 | C | T | -0.0082 | 0.0138 | -0.5975 | 0.5501 | 0.0022 | 0.0098 | 0.2263 | 0.8210 | -0.0028 | 0.0031 | -0.9051 | 0.3654 | -0.0087 | 0.0092 | -0.9491 | 0.3426 | 0.0061 | 0.0053 | 1.1678 | 0.2429 | 4.580 | 8 | 0.8014 |
We actively welcome feedback from users. If you:
please open an issue on the GenomicSEM GitHub repository or contact the authors directly (j.delafuente@utexas.edu). Your input will directly shape the development of the beta version.