LocalSRMD (Local Standardized Root Mean-square Difference) is a flexible index of the extent to which parameter estimates differ across groups. We introduced it in this paper:

Schwaba, T., Mallard, T.T., Maihofer, A.X., Rhemtulla, M., Lee, P.H., Smoller, J.W., Davis, L.K., Nivard, M.G., Grotzinger, A.D., & Tucker-Drob, E.M. (2025) Comparison of the Multivariate Genetic Architecture of Eight Major Psychiatric Disorders Across Sex. Nature Genetics..

This short tutorial demonstrates how to calculate localSRMD in R using the genomicSEM package. (Note: you can also apply localSRMD to phenotypic SEM using the lavaan package).

If you’re struggling with this tutorial, you may want to brush up on the GenomicSEM Wiki pages 1-3 , or post a question to the Genomic SEM Users google group.

A basic example

In this tutorial, we will use localSRMD to compare the multivariate genetic architecture of internalizing disorders across cisgender males and females. Our model, depicted in Figure 1, contains a latent internalizing factor (INT) indicated by the genetic components of three disorders: Major Depressive Disorder (MDD), Anxiety (ANX), and Post-Traumatic Stress Disorder (PTSD). These variables are subscripted with an M in males and with an F in females. Our primary question pertains to the parameters contained within the lavender ovals: are the factor loadings of PTSD, MDD, and ANX on INT approximately equivalent in males and females?

Figure 1: Our path diagram

Note: Here, we follow standard practice for estimating these models: Using past theory and research, we select one canonical indicator for the factor (MDD) whose loading we scale to 1, and we allow the residuals of each indicator to correlate across sex.

Preparing the data and estimating the models

First, we use the munge() and ldsc() functions to prepare the data. Specifically, to compare data across groups, we obtain group-stratified GWAS summary statistics (here, summary statistics in males and in females) and we enter all data from all groups into ldsc() so that the resulting genetic covariance matrix includes correlations within and between groups. We provide the relevant code below. To facilitate this process for this tutorial, you can download the ldsc object that results from this code here and use it for the remainder of the tutorial.

library(GenomicSEM)
library(dplyr)

munge(filenames = c("ptsdm.txt","mddm.txt","anxm.txt","ptsdf.txt","mddf.txt","anxf.txt"), 
      "w_hm3.noMHC.snplist",
      trait.names = c("PTSDm","MDDm","ANXm","PTSDf","MDDf","ANXf"),
      info.filter = 0.9, 
      maf.filter = 0.01)

covmat <- ldsc(traits = c("PTSDm.sumstats.gz","MDDm.sumstats.gz","ANXm.sumstats.gz","PTSDf.sumstats.gz","MDDf.sumstats.gz","ANXf.sumstats.gz")
               trait.names = c("PTSDm","MDDm","ANXm","PTSDf","MDDf","ANXf"),
               sample.prev = c(NA, NA, .5, NA, NA, .5), 
               population.prev = c(NA, NA, .090, NA, NA, .135), 
               ld = "/eur_w_ld_chr/", 
               wld = "/eur_w_ld_chr/")
#because MDD and PTSD are meta-analyses of continuous and binary traits, and the associated Ns were computed as the sum of the continuous sample size and the liability-scale corrected sample size, we treat them as having been effectively measured on continuous scales and we enter NA for their population prevalence and NA for their sample prevalence. Because ANX is measured on a binary scale, we first recalculate the sum of effective N for a balanced sample design and then enter .5 for the sample prevalence and use estimated population values from an outside source for the population prevalence.

We are now ready to estimate our models and compare parameter values across groups. Specifically, to make a group comparison we will estimate two nested models:

An unconstrained model, where all parameters are estimated freely in males and females, and
A constrained model, where the parameters we wish to compare across males and females are constrained so that their values are forced to be equal.

To estimate the unconstrained model as per the path diagram in figure 1, we use the following code:

library(GenomicSEM)
library(dplyr)
#replace the following with the filepath on your computer to load in the ldsc object
load("/Users/schwabat/Library/CloudStorage/OneDrive-MichiganStateUniversity/Genomic_MI/localSRMD_tutorial_ldsc.Rdata")

free <- "INTm =~ 1*MDDm + ANXm + PTSDm #estimate an internalizing factor in males
         INTf =~ 1*MDDf + ANXf + PTSDf #estimate an internalizing factor in females
         INTm ~~ INTf #covariances between internalizing factors
         MDDm ~~ MDDf #covarying residuals
         ANXm ~~ ANXf
         PTSDm ~~ PTSDf"

freeresults <- usermodel(covstruc = covmat, model = free, std.lv = FALSE)

The output of this unconstrained model is below. You can see that the estimated value of each phenotype’s loading on INT differs somewhat across males and females. LocalSRMD will be used to quantify this difference.

Table 1: Results of the Unconstrained Model

freeresults$results

##      lhs op   rhs Unstand_Est          Unstand_SE      p_value
## 1   INTm =~  ANXm  0.90138331   0.182051325031262 7.373479e-07
## 2   INTm =~ PTSDm  0.43516300  0.0881772409730142 8.011204e-07
## 3   INTf =~  ANXf  0.84142757   0.170613154946775 8.148115e-07
## 4   INTf =~ PTSDf  0.68504981   0.138278907778516 7.265970e-07
## 5   INTm ~~  INTf  0.06804051  0.0139243168329455 1.026690e-06
## 6   MDDm ~~  MDDf -0.01278917   0.013818460024397 3.546989e-01
## 7   ANXm ~~  ANXf  0.04414874   0.034271497892684 1.976744e-01
## 8  PTSDm ~~ PTSDf  0.02916869 0.00574705549388425 3.866507e-07
## 9  PTSDm ~~ PTSDm  0.02182638 0.00730083571915459 2.793674e-03
## 10  MDDm ~~  MDDm -0.02715811   0.024723547351103 2.719986e-01
## 11  ANXm ~~  ANXm  0.11687701   0.071056271440454 1.000005e-01
## 12 PTSDf ~~ PTSDf  0.04332922 0.00864321351524023 5.356257e-07
## 13  MDDf ~~  MDDf -0.01734770  0.0146385912654966 2.359913e-01
## 14  ANXf ~~  ANXf  0.05341143  0.0411293665033463 1.940742e-01
## 15  INTm ~~  INTm  0.11114579  0.0253725154095693 1.183756e-05
## 16  INTf ~~  INTf  0.06992016  0.0146646101307858 1.861077e-06
## 17  INTm =~  MDDm  1.00000000                               NA
## 18  INTf =~  MDDf  1.00000000                               NA

Next, we estimate the constrained model. The code to do this is very similar to the unconstrained model code above:

cons <- "INTm =~ 1*MDDm + a*ANXm + b*PTSDm #note the parameter labels here
         INTf =~ 1*MDDf + a*ANXf + b*PTSDf #matching parameter labels
         INTm ~~ INTf
         MDDm ~~ MDDf
         ANXm ~~ ANXf
         PTSDm ~~ PTSDf"

consresults <- usermodel(covstruc = covmat, model = cons, std.lv = FALSE)

In this constrained model we have appended a and b parameter labels for each factor loading that force that loading’s estimate to be equal across males and females (more information on parameter labels here). You can see now that the estimates for each phenotype’s factor loading are the same across sex.

Table 2: Results of the Constrained Model

consresults$results

##      lhs op   rhs  Unstand_Est          Unstand_SE      p_value
## 1   INTm =~  ANXm  0.908934285    0.18887897922968 1.492349e-06
## 2   INTm =~ PTSDm  0.589044656   0.121717862209524 1.302143e-06
## 3   INTf =~  ANXf  0.908934285   0.177615767419206 3.097452e-07
## 4   INTf =~ PTSDf  0.589044656   0.119584872718397 8.403930e-07
## 5   INTm ~~  INTf  0.060028411  0.0125666712713201 1.781113e-06
## 6   MDDm ~~  MDDf -0.004777060  0.0125727080695166 7.039790e-01
## 7   ANXm ~~  ANXf  0.046161022  0.0342176983558987 1.773243e-01
## 8  PTSDm ~~ PTSDf  0.028623888 0.00577830644263122 7.282251e-07
## 9  PTSDm ~~ PTSDm  0.012549730 0.00866575570343377 1.475617e-01
## 10  MDDm ~~  MDDm -0.003407961  0.0204751320632984 8.678076e-01
## 11  ANXm ~~  ANXm  0.134979032  0.0705826102200735 5.583063e-02
## 12 PTSDf ~~ PTSDf  0.049704497 0.00787679602763007 2.785976e-10
## 13  MDDf ~~  MDDf -0.023622903    0.01583758181192 1.358112e-01
## 14  ANXf ~~  ANXf  0.039965099  0.0420423944103615 3.418123e-01
## 15  INTm ~~  INTm  0.087395639  0.0207769350253886 2.594963e-05
## 16  INTf ~~  INTf  0.076195375  0.0157732361778176 1.360711e-06
## 17  INTm =~  MDDm  1.000000000                               NA
## 18  INTf =~  MDDf  1.000000000                               NA

Calculating LocalSRMD

Now, we can calculate localSRMD using these model results to obtain a standardized effect size estimate of the average difference in these parameters across males and females.

The localSRMD() function in genomic SEM takes four arguments and returns a value of localSRMD. Specifically, the function requires:

unconstrained: a vector of parameter values from an unconstrained model, where focal parameters are estimated freely in each group.
constrained: a vector of parameter values from a constrained model, where focal parameters are constrained to be equal across groups.
lhsvar: a list containing the variances, in each group, of the variables in the usermodel results column lhs
rhsvar: a list containing the variances, in each group, of the variables in the usermodel results column rhs

Finding unconstrained and constrained

Locating the values for arguments 1 and 2 is straightforward, as these are found in the results of your usermodel() functions.

For argument 1, we extract the estimated values of the 6 unconstrained factor loadings (3 in males and 3 in females), which are in the Unstd_Est column in Table 1 above:

unconstrained <- freeresults$results[c(1:4,17,18),4]

#or, manually,

unconstrained <- c(0.9013, 0.4351, 0.8414, 0.6850, 1.0000, 1.0000)

For argument 2, we do the same, but with the estimates of the constrained loadings in Table 2:

constrained <- consresults$results[c(1:4,17,18),4]

#or, manually,

constrained <- c(0.9089, 0.5890, 0.9089, 0.5890, 1.0000, 1.0000)

Finding lhsvar and rhsvar

Now, we’ll locate the values we need for arguments 3 and 4, which is a bit more complicated. These two arguments are lists that supply localSRMD() with variable variances in each group, which makes the metric standardized and thus more easily interpretable.

Take a look at the first row of Table 1, which contains the first parameter we’re comparing across groups:

freeresults$results[1,]

##    lhs op  rhs Unstand_Est        Unstand_SE      p_value
## 1 INTm =~ ANXm   0.9013833 0.182051325031262 7.373479e-07

Each parameter represents an association between two variables. Here, the two variables involved are INTm and ANXm. The variances of these two variables will go into the first argument of lhsvar and rhsvar. Because INTm is in the lhs column, we will use its variance as part of the lhsvar list, and because ANXm is in the rhs column, we will use its variance as part of the rhsvar list.

The variance for each lhs and rhs variable can be found in one of two places:

If the variable is a genetic component of a GWAS phenotype that was entered into ldsc (like MDDm, ANXm, PTSDm, MDDf, ANXf, or PTSDf), its variance can be found on the diagonal of the S matrix (see below), which is part of the ldsc object you calculated when preparing the data:

row.names(covmat$S) <- colnames(covmat$S)
covmat$S

## 6 x 6 Matrix of class "dpoMatrix"
##            PTSDm       MDDm       ANXm      PTSDf       MDDf       ANXf
## PTSDm 0.04287372 0.04641143 0.03964935 0.04945216 0.03053281 0.03040123
## MDDm  0.04641143 0.08398767 0.10961016 0.04686663 0.05525134 0.05236161
## ANXm  0.03964935 0.10961016 0.20718207 0.02480336 0.06347116 0.09575412
## PTSDf 0.04945216 0.04686663 0.02480336 0.07614229 0.04779752 0.05332571
## MDDf  0.03053281 0.05525134 0.06347116 0.04779752 0.05257247 0.05417007
## ANXf  0.03040123 0.05236161 0.09575412 0.05332571 0.05417007 0.10291480

So, for the first parameter being compared, the variance of ANXm (rhsvar) is .206.

If the variable is a latent variable (here, INTm or INTf), and not a phenotype you measured, its variance isn’t in the S matrix (because we didn’t actually measure INT, we inferred it). Latent variable variances are instead found near the bottom of the genomic SEM usermodel() results for the unconstrained model, where the variable is associated with “~~” itself. So, for the first parameter being compared, the lhs variable is INTm, and its variance is .1111, found at the bottom of Table 1.

freeresults$results[15,]

##    lhs op  rhs Unstand_Est         Unstand_SE STD_Genotype
## 1 INTm ~~ INTm   0.1111458 0.0253725154095693     1.323358

(Important note if you’re working with latent variables: to calculate localSRMD as we describe in this tutorial, the latent variables you’re comparing must be exogenous, meaning that their variance is not accounted for by other variables in a model (they can predict other things, but nothing can predict them). A simple way to figure out whether your latent variable is exogenous is to draw a path diagram (like Figure 1) and to make sure that no directed arrows are pointing at the latent variable (double-headed arrows are fine). If your latent variable is not exogenous, the usermodel() results will report its remaining variance rather than its total variance, which will inflate localSRMD. If your latent variables cannot be made exogenous, you can manually calculate their total variance using path tracing rules.)

Now that we know where each variable’s variance can be found, we make a list that contains each variable’s variances in each group. The list is of length 6, as we are comparing 6 parameters across groups (it matches the length of the arguments unconstrained and constrained). And each item in the list is a vector of length 2, as there are 2 groups.

Returning once more to the first line of Table 1,

freeresults$results[1,]

##    lhs op  rhs Unstand_Est        Unstand_SE      p_value
## 1 INTm =~ ANXm   0.9013833 0.182051325031262 7.373479e-07

Because our lhs variable is INTm, our first item in the list lhsvar will contain the variances of INT in males (var(INTm) = .1111) and females (var(INTf) = .0699). Because our rhs variable is ANXm, our first item in the list rhsvar will contain the variances of ANX in males (var(ANXm) = .2072) and females (var(ANXf) = .1029).

Then, we would move to the second parameter being compared between groups (line 2 of Table 1) and put the variances of these variables in each group (on the lhs, var(INTm) = .1111 and var(INTf) = .0699 again; on the rhs, var(PTSDm) = .0429, var(PTSDf) = .0761) into the second item of the lists. The full code to do this is therefore:

lhsvar  <- list(c(.1111,.0699), # var(INTm) and var(INTf)
                  c(.1111,.0699), # var(INTm) and var(INTf)
                c(.1111,.0699), # var(INTm) and var(INTf)
                c(.1111,.0699), # var(INTm) and var(INTf)
                c(.1111,.0699), # var(INTm) and var(INTf)
                c(.1111,.0699)) # var(INTm) and var(INTf)

rhsvar <- list(c(.2072,.1029), #var(ANXm) and var(ANXF)
               c(.0429,.0761), #var(PTSDm) and var(PTSDf)
                 c(.2072,.1020), #var(ANXm) and var(ANXF)
               c(.0429,.0761), #var(PTSDm) and var(PTSDf)
                 c(.0840,.0526), #var(MDDm) and var(MDDf) 
               c(.0840,.0526)) #var(MDDm) and var(MDDf)

Now, we have our four arguments! We can now calculate localSRMD using the following code (the full equation for localSRMD is here (LINK TO PREPRINT), if you’re interested:

localSRMD(unconstrained = unconstrained, constrained = constrained, lhsvar = lhsvar, rhsvar = rhsvar)

## [1] "calculating localSRMD across h = 3 parameters in g = 2 groups (6 parameters total)"

## [1] 1.036005

Our localSRMD is 1.036. What does this mean?

Interpreting LocalSRMD

When working with localSRMD, you should prespecify a threshold value used to make a decision about the difference between groups. Just like how prespecifying a p-value threshold is necessary for a significance test, pre-specifying a localSRMD threshold allows you to decide whether the differences you find are substantial.

LocalSRMD is a standardized effect measure (interpretable like Cohen’s d), which means you can use your knowledge about the expected model values to pre-specify a threshold value. (Here’s a general primer on effect size interpretation).

For example, if the correlation between two variables in group 1 is around r = .15, and the correlation between those variables in a similarly-sized group 2 is around r = .45, the group-constrained correlation will be around r = .30, meaning that the localSRMD will be around 0.150 (the difference between .15 and .30 in group 1 and between .45 and .30 in group 2). Once you get a feel for localSRMD, you can try supplying the function with hypothetical parameter values that would constitute “large” or “small” group differences in the specific context of your research question and use this to guide your decision-making (see this pre-registration for an example). For this demonstration, we arbitrarily set our localSRMD threshold to 0.200. In setting this threshold, we are making explicit that we will consider localSRMD values less than .200 to be trivial and localSRMD values of .200 and greater to be nontrivial.

Because our localSRMD is greater than this threshold, the differences between the constrained and unconstrained parameters are nontrivial, so we infer that males and females exhibit substantial difference in factor loadings on INT

Incorporating χ2 tests

However, we typically need more information to draw a conclusion about group comparison. LocalSRMD provides an effect size for the difference in constrained and unconstrained parameters across groups, but it does not take into account that these parameters are measured with uncertainty. If your model has low statistical power, and thus quite a bit of estimation error, you may wind up with a large value of localSRMD that is entirely due to sampling variability. We therefore advise you use localSRMD in combination with a chi-square (χ2) nested model comparison test. The chi-square nested model comparison is a test of the exact equivalence of parameters across groups (if it is statistically significant, the constrained model fits significantly worse than the unconstrained model), but it provides no direct information about the magnitude of any detected differences. To conclude that the parameters differ to an appreciable extent, we generally take the perspective that localSRMD must be nontrivial AND χ2 must be significant, meaning that model parameters have a substantial average difference across groups AND we’re confident that this isn’t just noise.

To calculate a χ2 model comparison test requires two of the components from above:

First, you need the χ2 value (chisq) and degrees of freedom (df) of the unconstrained model.

freeresults$modelfit$chisq

## [1] 5.937282

freeresults$modelfit$df

## [1] 5

Second, you need the χ2 value and degrees of freedom from the constrained model.

consresults$modelfit$chisq

## [1] 25.48758

consresults$modelfit$df

## [1] 7

In R, you compare χ2 between two models using the pchisq() function, which takes as its first argument the difference in χ2 model fit, and as its second argument the difference in model degrees of freedom. It returns a p-value for this difference.

pchisq(consresults$modelfit$chisq - freeresults$modelfit$chisq, 
       consresults$modelfit$df - freeresults$modelfit$df, lower.tail = FALSE)

## [1] 5.684675e-05

#or, manually,
#pchisq(q = 25.488 - 5.937, df = 7 - 5, lower.tail = FALSE)

This χ2 test with 2 degrees of freedom has a value of 19.551, with a corresponding p-value of .000057. That’s less than p < .05 (the threshold we set for this tutorial). So, we conclude that the differences between internalizing factor loadings in males and females are substantial in terms of localSRMD and significant in terms of a χ2 model comparison test. That’s all the evidence we need to conclude the parameters differ between males and females.

Estimating Relaxed Models to Reduce LocalSRMD and χ2

If you’re interested in establishing invariance of factor loadings as part of Genomic Structural Invariance testing, and your results indicate that a group difference is substantial and significant, you may want to estimate a relaxed model where one or more of the constrained parameters is set free. Setting a loading free will reduce localSRMD and χ2 misfit, perhaps below the thresholds you set.

Which parameter should you free? A simple approach is to free the parameter that differs the most between groups in the unconstrained model. In this tutorial, that’s PTSD (see Table 1). Note here that this approach represents a data-driven decision and is thus susceptible to overfitting.

The code to estimate this relaxed model is:

cons2 <- "INTm =~ 1*MDDm + b*ANXm + PTSDm #removing the constraint on PTSDm
         INTf =~ 1*MDDf + b*ANXf + PTSDf #removing the constraint on PTSDf
         INTm ~~ INTf
         MDDm ~~ MDDf
          ANXm ~~ ANXf
         PTSDm ~~ PTSDf"

consresults2 <- usermodel(covstruc = covmat, model = cons2, std.lv = FALSE)

constrained2 <- consresults2$results[c(1:4,17,18),4] #extract factor loadings

localSRMD(unconstrained, constrained2, lhsvar, rhsvar) #rerun localSRMD()

## [1] "calculating localSRMD across h = 3 parameters in g = 2 groups (6 parameters total)"

## [1] 0.1624559

pchisq(consresults2$modelfit$chisq - freeresults$modelfit$chisq, 
       consresults2$modelfit$df - freeresults$modelfit$df, lower.tail = FALSE)

## [1] 0.4965017

Testing how this affects our model is quick and only requires replacing a single argument to localSRMD. LocalSRMD with this parameter freed is 0.162, which is now below our threshold. Additionally, the chi-square test for this difference is no longer significant (χ2(1) = 0.462, p = .496). So, this relaxed model indicates that group differences are no longer significant nor substantial, establishing that our factor loadings are partially invariant across groups. To wrap up this tutorial, we will summarize our findings as if we were writing a paper:

“Comparison of internalizing factor loadings between males and females indicated that they differed substantially (localSRMD = 1.036) and significantly (χ2(2) = 19.55, p = <.001). The unstandardized loading for PTSD differed most between males and females; freeing this parameter reduced localSRMD to 0.162, which was below our threshold of 0.200 for a meaningfully large difference. Using this relaxed model, we thus established partial invariance of this interalizing factor, affording meaningful between-sex comparisons of its associations.”

Tutorial: Calculating localSRMD and applying it to your own data

Ted Schwaba & Elliot M. Tucker-Drob

June 5, 2023