In this tutorial, we will guide you through the steps to run the QTrait function, a genetically informed method for assessing the external validity of common factors in the genomic space.

Overview of the QTrait function

Comparison of R and S_Stand for low-to-moderate sample overlap

QTrait statistic: an omnibus inferential test of heterogeneity

The QTrait function uses the QTrait statistic introduced by Grotzinger et al. (2022) as an omnibus statistical test of heterogeneity in associations between an external correlate and indicators of a factor. We use the word heterogeneous here to refer to a circumstance in which the empirical pattern of associations with the indicators significantly differs from that implied by a model in which the external correlate is related to the indicators by way of its associations with the factor (and, when applicable, any direct associations that are added in a more relaxed model). The QTrait is based on χ² difference tests comparing two competing models: 1) the common pathway model, where the external correlate predicts only the common factor, and 2) the independent pathways model, where the external correlate predicts the individual indicator phenotypes that load on the common factor. The χ² difference between these two models is the QTrait statistic, and its degrees of freedom are equal to the difference in degrees of freedom between the models being compared (i.e., for a one factor model, the total number of indicators minus 1).

Local Standardized Root Mean Squared Residual as an effect size measure of heterogeneity

In addition to the QTrait heterogeneity test statistic, the QTrait function computes the local standardized root mean squared residual (lSRMR), an effect size index of heterogeneity which quantifies the magnitude of the discrepancies between the empirical Linkage Disequilibrium Score Regression-derived genetic correlations between the external correlate and the indicators of the factor from those implied by the common pathway model. We considered lSRMR values greater than .10 and equal to or exceeding 25% of the root mean square genetic correlation between the external trait and the individual indicators of the factor to reflect non-negligible differences between the observed and the model-implied genetic correlations, thus reflecting meaningful heterogeneity in the association between the factor and the external correlate. Nonetheless this threshold can be adjusted with the arguments lsrmrthld and lsrmr_abs (see below).

Three-criteria test of heterogeneity

We apply a three-criteria test to assess heterogeneity in the associations between the external correlate and the individual indicator phenotypes. Specifically, we require that:

  1. The genetic correlation between the external correlate and the common factor surpasses a Bonferroni-corrected p-value threshold. The Bonferroni correction is based on the number of external correlates for which the function independently computes the heterogeneity indices.
  2. The QTrait heterogeneity index is statistically significant at the Bonferroni-corrected p-value threshold.
  3. The lSRMR effect size heterogeneity index exceeds predefined thresholds, with the default criteria being:
    • lSRMR > 0.10
    • Greater than 25% of the root mean square genetic correlation between the external correlate and the indicator phenotypes that load on the factor.

Outlier detection and follow-up models

For external correlates exhibiting heterogeneous associations with the common factor, as identified by the three-criteria test, the QTrait function identifies specific outlier indicator traits whose genetic associations with the external correlate deviate from the common pathway model.

An indicator phenotype is classified as an outlier if:

  • Its residual genetic correlation exceeds 0.10 (this threshold can be adjusted using the mresidthreshold argument).

  • Its residual correlation with the external correlate is more than 25% of the root mean square genetic correlation across the indicator phenotypes and the external correlate (this threshold can be adjusted using the mresid argument).

When outliers are detected, the function automatically fits a series of follow-up models, sequentially introducing unconstrained direct paths from the external correlate to the most extreme outliers.

This process continues iteratively until:

  1. The model is saturated (i.e., df = 0)

  2. No further significant heterogeneity is detected


Running the QTrait Function

Required and Optional Arguments of QTrait

The QTrait function requires the following arguments:

  • LDSCout: An object containing the output from the Genomic SEM multivariable LD Score regression ldsc() function.
  • indicators: A character vector specifying the names of the indicator traits that define the common factor. These names must match the corresponding trait names in the ldsc() output. Note that only a single common factor model is currently supported. The QTrait function does not currently support multifactor models.
  • traits: A character vector specifying the external correlates for which heterogeneity indices and genetic correlations with the common factor (defined by indicators) will be computed. These names must match the corresponding trait names in the ldsc() output.

Optional Arguments

  • mresid: Relative residual threshold to identify outlier indicator traits. The proportion of the root mean square genetic correlation between the external correlate and the indicator traits used for outlier detection (default = 0.25).
  • mresidthreshold: Absolute threshold for identifying meaningful residual genetic correlations (default = 0.10).
  • lsrmr: The threshold for determining whether lSRMR is meaningful, expressed as a percentage of the root mean square genetic correlation between the external correlate and the indicator traits (default = 25%).
  • lsrmrhreshold: Threshold to determine whether lSRMR is meaningful in absolute value (default = 0.10).
  • save.plots: If TRUE, generates and saves scatterplots of associations between the external correlate and the factor indicators against the factor loadings of the indicators on the common factor (default = TRUE).
  • stdout: If TRUE, uses standardized output (i.e., genetic correlations against standardized factor loadings) for the scatter plots (default = TRUE). If stdout = FALSE, the scatterplots display genetic covariances against unstandardized factor loadings.

Example: Cognitive Traits and Educational Attainment

For the example showcased in this tutorial, we will use the ldsc() output with the seven cognitive traits loading on a genetic factor representing general cognitive function (i.e., g factor), and the external correlate Educational Attainment.

You can download the ldsc() output for this example here:

Let’s load the ldsc() output and run the rgmodel() function:

# Load LDSC output
load("LDSC_G_factor_QTRAIT_tutorial.RData")

# Define the names of the indicator traits loading on the genetic g factor
indicators <- c("Matrix","Memory","RT","Symbol_Digit","TMTB","Tower","VNR")

# Define the name of the external correlate EA (Educational Attainment)
traits <- "EA"

# Run QTrait function
qtrait_out <- QTrait(LDSCoutput = "LDSC_G_factor_QTRAIT_tutorial.RData",indicators=indicators,traits=traits,
                    mresid=.25,mresidthreshold = .10,
                    lsrmr=.25,lsrmrthreshold = .10,
                    save.plots=TRUE,stdout = TRUE) 

Output Objects of QTrait()

print(qtrait_out)
##    rGF1Trait_CPM SErGF1Trait_CPM pvalrGF1Trait_CPM rGF1Trait_significat_CPM
## EA     0.4382612      0.02344554      5.673645e-78                        *
##    QTrait_CPM df_CPM p_value_CPM Qsignificant_CPM lSRMR_CPM
## EA   1930.884      6           0                * 0.1937131
##    lSRMR_above_threshold_CPM heterogeneity_CPM rGF1Trait_FUM SErGF1Trait_FUM
## EA                       Yes               Yes     0.2976921      0.02411617
##    pvalrGF1Trait_FUM rGF1Trait_significat_FUM QTrait_FUM df_FUM p_value_FUM
## EA      5.242638e-35                        *   37.03478      4 1.77183e-07
##    Qsignificant_FUM  lSRMR_FUM lSRMR_above_threshold_FUM heterogeneity_FUM
## EA                * 0.05802758                        No                No
##    Unconstrained_paths
## EA          VNR,Matrix

The QTrait() function returns a data frame with the following columns:

Genetic Correlation Estimates (Common Pathway Model)

  • rGF1Trait_CPM: Genetic correlation between the external correlate (i.e., Educational Attainment) and the common factor (i.e., the genetic g factor).
  • SErGF1Trait_CPM: Standard error of the genetic correlation.
  • pvalrGF1Trait_CPM: p-value for the genetic correlation estimate.
  • rGF1Trait_significant_CPM: Bonferroni-corrected significance of the genetic correlation (* = statistically significant).

Heterogeneity Indices (Common Pathway Model)

  • QTrait_CPM: QTrait heterogeneity statistic, derived from a χ² difference test comparing:
    1. The common pathway model, where the external correlate predicts only the common factor.
    2. The independent pathways model, where the external correlate directly predicts individual indicator phenotypes.
      A significant QTrait_CPM suggests that the relationship between the external correlate and the indicator traits cannot be fully explained by the common factor, implying the presence of more specific genetic pathways.
  • df_CPM: Degrees of freedom for QTrait_CPM.
  • p_value_CPM: p-value of QTrait_CPM.
  • Qsignificant_CPM: Bonferroni-corrected significance of QTrait_CPM (* = statistically significant).
  • lSRMR_CPM: Local Standardized Root Mean Squared Residual (lSRMR), an index quantifying discrepancies between observed genetic correlations (LDSC-derived) and those implied by the common pathway model.
  • lSRMR_above_threshold_CPM: Indicates whether lSRMR_CPM exceeds 25% of the root mean square genetic correlation between the external correlate and individual phenotypes and lSRMR_CPM is greated than **0.10). This threshold is defined by the arguments lsrmr (default = 0.25) andlsrmrthreshold (default = 0.10).
  • heterogeneity_CPM: Indicates the presence of statistically significant (QTrait_CPM) and meaningful (lSRMR_CPM above threshold) heterogeneity in the common pathway model.

Genetic Correlation Estimates (Follow-Up Model)

  • rGF1Trait_FUM: Genetic correlation between the external correlate and the common factor in the follow-up model, which allows unconstrained direct paths to identified outlier indicator traits (see Unconstrained_paths).
  • SErGF1Trait_FUM: Standard error of rGF1Trait_FUM.
  • pvalrGF1Trait_FUM: p-value for rGF1Trait_FUM.
  • rGF1Trait_significant_FUM: Bonferroni-corrected significance of rGF1Trait_FUM.

Heterogeneity Indices (Follow-Up Model)

  • QTrait_FUM: QTrait heterogeneity statistic in the follow-up model.
  • df_FUM: Degrees of freedom for QTrait_FUM.
  • p_value_FUM: p-value of QTrait_FUM.
  • Qsignificant_FUM: Bonferroni-corrected significance of QTrait_FUM.
  • lSRMR_FUM: Local Standardized Root Mean Squared Residual (lSRMR) in the follow-up model.
  • lSRMR_above_threshold_FUM: Indicates whether lSRMR_FUM exceeds 25% of the root mean square genetic correlation between the external correlate and individual phenotypes and lSRMR_FUM is greated than **0.10). This threshold is defined by the arguments lsrmr (default = 0.25) andlsrmrthreshold (default = 0.10).
  • heterogeneity_FUM: Indicates the presence of statistically significant (QTrait_FUM) and meaningful (lSRMR_FUM above threshold) heterogeneity in the follow-up model.

Outlier Identification

  • Unconstrained_paths: Indicator traits identified as outliers with unconstrained direct paths from the external correlate in the follow-up model.

In this example, we examined the genetic relationship between the genetic g factor and educational attainment, which showed a significant genetic correlation in the common pathway model (rG = 0.44, SE = 0.02, p < 0.001). The heterogeneity indices revealed significant and meaningful heterogeneity in the association between educational attainment and the individual indicators of the genetic g factor, as indicated by the QTrait statistic (QTrait(6) = 1930.88, p < 0.001) and a meaningful lSRMR (i.e., greater than the absolute threshold of 0.10 and exceeding 25% of the root mean square genetic correlation between the external correlate and the individual cognitive phenotypes; root mean square genetic correlation = 0.41, 25% of root mean square genetic correlation = 0.10, lSRMR = 0.19).

The outlier detection method identified Verbal Numerical Reasoning (VNR) and Matrix Pattern Recognition (Matrix) as cognitive traits that deviated from expectations under the common pathway model, both displaying stronger-than-expected positive genetic associations with educational attainment. Although the follow-up model, which allowed direct paths from educational attainment to these outlier traits, remained statistically significant (QTrait(4) = 37.03, p < 0.001), the lSRMR index did not exceed the predefined threshold, suggesting that the remaining heterogeneity was not substantial in terms of effect size (lSRMR = 0.06). While the genetic correlation between educational attainment and the g factor was reduced from 0.44 to 0.30 (SE = 0.02) in the follow-up model, it remained statistically significant (p = 5.24e-35).

The plot below presents:
1. The observed, model-implied (common pathway model), and residual genetic correlations between the seven indicators of the genetic g factor and educational attainment (top), and
2. A scatterplot of genetic correlations between educational attainment and the seven indicator traits of the genetic g factor against their respective standardized factor loadings on the genetic g factor (bottom). The size of the dots corresponds to the inverse of the variance of the unstandardized beta coefficient. 3. The model-implied association between the external correlate (educational attainment) and the factor (general cognitive function) from the common pathway model and the unconstrained models, represented as the slope of the regression line.

Notably, the common pathway model fails to adequately capture the associations between educational attainment and the VNR and Matrix tests, both of which exhibit meaningful residuals (0.37 and 0.29, respectively). If the common pathway model were sufficient in accounting for the genetic correlations between the external correlate and its indicator traits, we would expect these correlations to scale proportionally with the factor loadings of the indicators on the common factor. However, the scatterplot reveals that VNR and Matrix deviate significantly from these expectations, showing stronger-than-expected positive genetic correlations with educational attainment. It can be seen that the common pathway model estimates the genetic association between EA and general cognitive function at .439, but that after allow for unconstrained direct paths between EA and VNR and Matrix, the association drops to .291. This underscore the inadequacy of the common pathway model for capturing the full complexity of the relationships between educational attainment and cognitive traits. A multifactorial model may be worthwhile to consider.


Comparison of R and S_Stand for low-to-moderate sample overlap