In this tutorial, we will guide you through the steps to run the
rgmodel function. This function estimates the matrix of
genetic correlations (R) and the corresponding matrix
of their sampling covariances (V_R) from the output of
the ldsc() function.
The S_Stand matrix and its associated sampling
covariance matrix (V_Stand), is output of the
ldsc() when stand = TRUE is specified (default = FALSE).
S_Stand and V_Stand , are versions of
the genetic covariance matrix (S) and its sampling covariance matrix
(V) that have both been standardized relative to the
heritabilities on the diagonal of S. The S_Stand matrix
is typically more interpretable than the S matrix, and
has the oftentimes desirable property of keeping the Z statistics and
p-values unchanged relative to those associated with the corresponding
elements of S.
We refer to S_Stand as the standardized genetic covariance matrix and treat it as conceptually distinct from the genetic correlation matrix. The key rationale for making this distinction relates to the fact that the sampling covariance matrix of S_Stand (i.e. V_stand) is simply a rescaled version of V, and does not correspond to the sampling covariance matrix of the genetic correlation matrix had it been directly estimated across independent samples from the same population (or across jackknife blocks). This is apparent in that V_Stand contains sampling variances (squared standard errors) of the standardized heritabilities. The standardized heritabilities are always, by definition, 1.0, but V_Stand allows them to have nonzero standard errors to express uncertainty in the heritability estimates, rescaled to a standardized metric. In contrast, a true genetic correlation matrix should not have standard errors associated with its diagonals, which are guaranteed to be 1.0 across independent samples and jacknknife blocks. The off-diagonals of the sampling covariance matrices of the standardized genetic covariance matrix vs. the genetic correlation matrix, are also expected to differ, albeit for less intuitive reasons.
The rgmodel( ) function provides an automated means of estimating a
genetic correlation matrix (R) and its sampling
covariance matrix (V_R) by fitting a genetic
correlation model directly to ldsc( ) output. This model
automatically specifies a standardized genetic factor (with variance =
1.0) for each phenotype, sets the residual genetic variance of each GWAS
phenotype to 0, freely estimates all factor loadings, and allows all
factors to freely covary. Because the covariances of factors with fixed
variances of 1.0 are equivalent to correlations, this model provides
direct estimates of genetic correlations, and the sampling covariances
of the model parameters includes the correct sampling covariances of the
genetic correlation matrix, which the function uses to construct
V_R.
rgmodel FunctionThe rgmodel function requires only the
LDSCout argument, which is an object containing the output
from the Genomic SEM multivariable LD Score regression
ldsc() function. For the examples showcased in this
tutorial, we will use the ldsc() output for two scenarios:
one with twenty fours traits (X1–X24) having low-to-moderate sample
overlap, and another with thirteen traits (X1–X13) having
moderate-to-high sample overlap.
You can download the ldsc() output for these examples
from the following links:
Let’s load the ldsc() output and run the
rgmodel() function:
# Example 1: Low-to-moderate sample overlap
load("rgmodel_LDSC_ex1.RData") # Load LDSC output
LDSCoutputRG1 <- rgmodel(LDSCoutput = LDSCoutput_ex1) # Run rgmodel function
## [1] "Running primary model"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed
## 20.5
## [1] "Model fit statistics are all printed as NA as you have specified a fully saturated model (i.e., df = 0)"
## [1] "The S matrix was smoothed prior to model estimation due to a non-positive definite matrix. The largest absolute difference in a cell between the smoothed and non-smoothed matrix was 0.00672444591817246 As a result of the smoothing, the largest Z-statistic change for the genetic covariances was 0.34108468844048 . We recommend setting the smooth_check argument to true if you are going to run a multivariate GWAS."
# Example 2: Moderate-to-high sample overlap
load("rgmodel_LDSC_ex2.RData") # Load LDSC output
LDSCoutputRG2 <- rgmodel(LDSCoutput = LDSCoutput_ex2) # Run rgmodel function
## [1] "Running primary model"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed
## 1.69
## [1] "Model fit statistics are all printed as NA as you have specified a fully saturated model (i.e., df = 0)"
## [1] "The S matrix was smoothed prior to model estimation due to a non-positive definite matrix. The largest absolute difference in a cell between the smoothed and non-smoothed matrix was 0.00541591900985086 As a result of the smoothing, the largest Z-statistic change for the genetic covariances was 0.213387672883552 . We recommend setting the smooth_check argument to true if you are going to run a multivariate GWAS."
rgmodel()The rgmodel() function creates a copy of the original
ldsc object using the specified name (here LDSCoutputRG1
and LDSCoutputRG2) with R and
V_R added. Thus, the new LDSC object have we have
created includes
$V: Sampling covariance matrix in
lavaan format.$S: Covariance matrix (on the
liability scale for case/control designs).$I: Matrix of LDSC intercepts and
cross-trait (bivariate) intercepts.$N: Sample sizes for heritabilities
and \(\sqrt{N1N2}\) for
co-heritabilities.$m: Number of SNPs used to construct
the LD score.$V_Stand: Sampling covariance matrix
for standardized genetic covariances (V_Stand), if
contained in the original ldsc( ) object provided. Note
that this contains terms corresponding to the diagonal of
S_Stand, which- though 1.0- are rescaled from freely
estimated parameters.$S_Stand: Standardized genetic
covariance matrix (S_Stand) , if contained in the
original ldsc( ) object provided.$R: A genetic correlation matrix.$V_R: Sampling covariance matrix of
the genetic correlation matrix. Note that this does not contain terms
corresponding to the diagonal of R which contains fixed parameters.In this section, we examine the relationship between genetic
correlations (R) estimated using the
rgmodel() function and standardized genetic covariances
(S_Stand) derived from the ldsc()
function. We use two distinct examples to illustrate this
correspondence, focusing on how sample overlap affects the
comparison.
For this example, we analyze thirty traits (X1–X24) with low-to-moderate sample overlap. In the figure below, the solid blue line represents the regression of R on S_Stand with a slope \(b = 0.978\) (SE = 0.002), while the red dashed line represents the function \(y = x\) (i.e. a regression fixed to have intercept = 0 and slope = 1). We can observe close correspondence between R and S_Stand with a correlation coefficient \(r = 0.999\) and scatter closely centered around the dashed red y=x line. Note that we would typically expect the R and S_Stand matrices to be identical. However, when the S matrix is non-positive definite (e.g. when it includes heritabilities greater than 1.0 or genetic correlations outside of -1,1, it is smoothed to the nearest positive definite matrix prior to fitting the genetic correlation model.
Further analysis of standard errors and Z statistics associated with R and S_Stand reveals:
Summary: The increased standard errors for
S_Stand in the presence of greater sample overlap
suggest that genetic correlations (R) estimated with
the rgmodel() function are generally more precise.
In the context of the second example, which exhibits moderate-to-high sample overlap, we again observe a close correspondence between R and S_Stand with \(r = 0.999\) and scatter closely centered around the dashed red y=x line.
However, the increased sample overlap across trait pairs results in noticeable discrepancies between the standard errors and Z statistics for R and S_Stand:
Summary: As with the low-to-moderate sample overlap
scenario, the presence of moderate-to-high sample overlap leads to
larger standard errors for S_Stand compared to
R for traits with higher cross-trait intercepts (more
highly correlated estimation errors). Thus, genetic correlations
estimated using the rgmodel() function generally offer
greater precision and power.