In this tutorial, we will guide you through the steps to run the
QTrait function, a genetically informed method for
assessing the external validity of common factors in the genomic
space.
The QTrait function uses the QTrait statistic introduced by Grotzinger et al. (2022) as an omnibus statistical test of heterogeneity in associations between an external correlate and indicators of a factor. We use the word heterogeneous here to refer to a circumstance in which the empirical pattern of associations with the indicators significantly differs from that implied by a model in which the external correlate is related to the indicators by way of its associations with the factor (and, when applicable, any direct associations that are added in a more relaxed model). The QTrait is based on χ² difference tests comparing two competing models: 1) the common pathway model, where the external correlate predicts only the common factor, and 2) the independent pathways model, where the external correlate predicts the individual indicator phenotypes that load on the common factor. The χ² difference between these two models is the QTrait statistic, and its degrees of freedom are equal to the difference in degrees of freedom between the models being compared (i.e., for a one factor model, the total number of indicators minus 1).
In addition to the QTrait heterogeneity test statistic, the QTrait
function computes the local standardized root mean squared residual
(lSRMR), an effect size index of heterogeneity which quantifies the
magnitude of the discrepancies between the empirical Linkage
Disequilibrium Score Regression-derived genetic correlations between the
external correlate and the indicators of the factor from those implied
by the common pathway model. We considered lSRMR values greater than .10
and equal to or exceeding 25% of the root mean square genetic
correlation between the external trait and the individual indicators of
the factor to reflect non-negligible differences between the observed
and the model-implied genetic correlations, thus reflecting meaningful
heterogeneity in the association between the factor and the external
correlate. Nonetheless this threshold can be adjusted with the arguments
lsrmrthld and lsrmr_abs (see below).
We apply a three-criteria test to assess heterogeneity in the associations between the external correlate and the individual indicator phenotypes. Specifically, we require that:
QTrait heterogeneity index is statistically
significant at the Bonferroni-corrected p-value
threshold.lSRMR effect size heterogeneity index exceeds
predefined thresholds, with the default criteria being:
For external correlates exhibiting heterogeneous associations with the common factor, as identified by the three-criteria test, the QTrait function identifies specific outlier indicator traits whose genetic associations with the external correlate deviate from the common pathway model.
An indicator phenotype is classified as an outlier if:
Its residual genetic correlation exceeds 0.10
(this threshold can be adjusted using the
mresidthreshold argument).
Its residual correlation with the external correlate is more than
25% of the root mean square genetic correlation across
the indicator phenotypes and the external correlate (this threshold
can be adjusted using the mresid argument).
When outliers are detected, the function automatically fits a series of follow-up models, sequentially introducing unconstrained direct paths from the external correlate to the most extreme outliers.
This process continues iteratively until:
The model is saturated (i.e., df = 0)
No further significant heterogeneity is detected
QTrait FunctionQTraitQTrait function requires the following
arguments:LDSCout: An object containing the
output from the Genomic SEM multivariable LD Score regression
ldsc() function.indicators: A character vector
specifying the names of the indicator traits that define the common
factor. These names must match the corresponding trait names in the
ldsc() output. Note that only a single common factor model
is currently supported. The QTrait function does not currently support
multifactor models.traits: A character vector specifying
the external correlates for which heterogeneity indices and genetic
correlations with the common factor (defined by indicators)
will be computed. These names must match the corresponding trait names
in the ldsc() output.mresid: Relative residual threshold to
identify outlier indicator traits. The proportion of the root mean
square genetic correlation between the external correlate and the
indicator traits used for outlier detection (default =
0.25).mresidthreshold: Absolute threshold
for identifying meaningful residual genetic correlations (default =
0.10).lsrmr: The threshold for determining
whether lSRMR is meaningful, expressed as a percentage of
the root mean square genetic correlation between the external correlate
and the indicator traits (default = 25%).lsrmrhreshold: Threshold to determine
whether lSRMR is meaningful in absolute value (default
= 0.10).save.plots: If TRUE,
generates and saves scatterplots of associations between the external
correlate and the factor indicators against the factor loadings of the
indicators on the common factor (default = TRUE).stdout: If TRUE, uses
standardized output (i.e., genetic correlations against standardized
factor loadings) for the scatter plots (default = TRUE). If
stdout = FALSE, the scatterplots display genetic
covariances against unstandardized factor loadings.For the example showcased in this tutorial, we will use the
ldsc() output with the seven cognitive traits
loading on a genetic factor representing general cognitive
function (i.e., g factor), and the external correlate
Educational Attainment.
You can download the ldsc() output for this example here:
Let’s load the ldsc() output and run the
rgmodel() function:
# Load LDSC output
load("LDSC_G_factor_QTRAIT_tutorial.RData")
# Define the names of the indicator traits loading on the genetic g factor
indicators <- c("Matrix","Memory","RT","Symbol_Digit","TMTB","Tower","VNR")
# Define the name of the external correlate EA (Educational Attainment)
traits <- "EA"
# Run QTrait function
qtrait_out <- QTrait(LDSCoutput = "LDSC_G_factor_QTRAIT_tutorial.RData",indicators=indicators,traits=traits,
mresid=.25,mresidthreshold = .10,
lsrmr=.25,lsrmrthreshold = .10,
save.plots=TRUE,stdout = TRUE)
QTrait()print(qtrait_out)
## rGF1Trait_CPM SErGF1Trait_CPM pvalrGF1Trait_CPM rGF1Trait_significat_CPM
## EA 0.4382612 0.02344554 5.673645e-78 *
## QTrait_CPM df_CPM p_value_CPM Qsignificant_CPM lSRMR_CPM
## EA 1930.884 6 0 * 0.1937131
## lSRMR_above_threshold_CPM heterogeneity_CPM rGF1Trait_FUM SErGF1Trait_FUM
## EA Yes Yes 0.2976921 0.02411617
## pvalrGF1Trait_FUM rGF1Trait_significat_FUM QTrait_FUM df_FUM p_value_FUM
## EA 5.242638e-35 * 37.03478 4 1.77183e-07
## Qsignificant_FUM lSRMR_FUM lSRMR_above_threshold_FUM heterogeneity_FUM
## EA * 0.05802758 No No
## Unconstrained_paths
## EA VNR,Matrix
The QTrait() function returns a data frame with the
following columns:
rGF1Trait_CPM: Genetic correlation
between the external correlate (i.e., Educational
Attainment) and the common factor (i.e., the genetic g
factor).SErGF1Trait_CPM: Standard error of the
genetic correlation.pvalrGF1Trait_CPM: p-value for the
genetic correlation estimate.rGF1Trait_significant_CPM:
Bonferroni-corrected significance of the genetic correlation (* =
statistically significant).QTrait_CPM: QTrait heterogeneity
statistic, derived from a χ² difference test comparing:
QTrait_CPM suggests that the relationship
between the external correlate and the indicator traits cannot
be fully explained by the common factor, implying the presence
of more specific genetic pathways.df_CPM: Degrees of freedom for
QTrait_CPM.p_value_CPM: p-value of
QTrait_CPM.Qsignificant_CPM: Bonferroni-corrected
significance of QTrait_CPM (* = statistically
significant).lSRMR_CPM: Local Standardized Root
Mean Squared Residual (lSRMR), an index quantifying discrepancies
between observed genetic correlations (LDSC-derived) and those implied
by the common pathway model.lSRMR_above_threshold_CPM: Indicates
whether lSRMR_CPM exceeds 25% of the root mean
square genetic correlation between the external correlate and
individual phenotypes and lSRMR_CPM is greated than
**0.10). This threshold is defined by the arguments lsrmr
(default = 0.25) andlsrmrthreshold (default = 0.10).heterogeneity_CPM: Indicates the
presence of statistically significant (QTrait_CPM)
and meaningful (lSRMR_CPM above threshold)
heterogeneity in the common pathway model.rGF1Trait_FUM: Genetic correlation
between the external correlate and the common factor in the
follow-up model, which allows unconstrained direct
paths to identified outlier indicator traits (see
Unconstrained_paths).SErGF1Trait_FUM: Standard error of
rGF1Trait_FUM.pvalrGF1Trait_FUM: p-value for
rGF1Trait_FUM.rGF1Trait_significant_FUM:
Bonferroni-corrected significance of rGF1Trait_FUM.QTrait_FUM: QTrait heterogeneity
statistic in the follow-up model.df_FUM: Degrees of freedom for
QTrait_FUM.p_value_FUM: p-value of
QTrait_FUM.Qsignificant_FUM: Bonferroni-corrected
significance of QTrait_FUM.lSRMR_FUM: Local Standardized Root
Mean Squared Residual (lSRMR) in the follow-up
model.lSRMR_above_threshold_FUM: Indicates
whether lSRMR_FUM exceeds 25% of the root mean
square genetic correlation between the external correlate and
individual phenotypes and lSRMR_FUM is greated than
**0.10). This threshold is defined by the arguments lsrmr
(default = 0.25) andlsrmrthreshold (default = 0.10).heterogeneity_FUM: Indicates the
presence of statistically significant (QTrait_FUM)
and meaningful (lSRMR_FUM above threshold)
heterogeneity in the follow-up model.Unconstrained_paths: Indicator traits
identified as outliers with unconstrained direct paths
from the external correlate in the follow-up
model.In this example, we examined the genetic relationship between the
genetic g factor and educational attainment,
which showed a significant genetic correlation in the common pathway
model (rG = 0.44, SE = 0.02, p < 0.001). The
heterogeneity indices revealed significant and meaningful
heterogeneity in the association between educational attainment
and the individual indicators of the genetic g factor, as indicated by
the QTrait statistic (QTrait(6) = 1930.88, p <
0.001) and a meaningful lSRMR (i.e., greater than the absolute threshold
of 0.10 and exceeding 25% of the root mean square genetic correlation
between the external correlate and the individual cognitive phenotypes;
root mean square genetic correlation = 0.41, 25% of root mean square
genetic correlation = 0.10, lSRMR = 0.19).
The outlier detection method identified Verbal Numerical
Reasoning (VNR) and Matrix Pattern Recognition
(Matrix) as cognitive traits that deviated from expectations
under the common pathway model, both displaying stronger-than-expected
positive genetic associations with educational
attainment. Although the follow-up model, which allowed direct
paths from educational attainment to these outlier traits, remained
statistically significant (QTrait(4) = 37.03, p
< 0.001), the lSRMR index did not exceed the predefined
threshold, suggesting that the remaining heterogeneity was not
substantial in terms of effect size (lSRMR = 0.06). While
the genetic correlation between educational attainment and the g factor
was reduced from 0.44 to 0.30 (SE = 0.02) in the follow-up model, it
remained statistically significant (p = 5.24e-35).
The plot below presents:
1. The observed, model-implied (common pathway model), and residual
genetic correlations between the seven indicators of the genetic g
factor and educational attainment (top), and
2. A scatterplot of genetic correlations between educational
attainment and the seven indicator traits of the genetic g
factor against their respective standardized factor loadings on the
genetic g factor (bottom). The size of the dots corresponds to
the inverse of the variance of the unstandardized beta coefficient. 3.
The model-implied association between the external correlate
(educational attainment) and the factor
(general cognitive function) from the common pathway
model and the unconstrained models, represented as the slope of the
regression line.
Notably, the common pathway model fails to adequately capture the associations between educational attainment and the VNR and Matrix tests, both of which exhibit meaningful residuals (0.37 and 0.29, respectively). If the common pathway model were sufficient in accounting for the genetic correlations between the external correlate and its indicator traits, we would expect these correlations to scale proportionally with the factor loadings of the indicators on the common factor. However, the scatterplot reveals that VNR and Matrix deviate significantly from these expectations, showing stronger-than-expected positive genetic correlations with educational attainment. It can be seen that the common pathway model estimates the genetic association between EA and general cognitive function at .439, but that after allow for unconstrained direct paths between EA and VNR and Matrix, the association drops to .291. This underscore the inadequacy of the common pathway model for capturing the full complexity of the relationships between educational attainment and cognitive traits. A multifactorial model may be worthwhile to consider.