High-definition likelihood (HDL) inference is an exciting new method to estimate heritability, and genetic correlation with higher precision then LD score regression (LDSC). LDSC models the relation between the covariance of a SNP’s Z statistics for pairs of traits and that SNP’s LD score. The HDL extends the LDSC method by modelling the relation between covariances among Z statistics for pairs of traits across multiple SNPs and a full matrix of cross-SNP LD scores. You should read the paper for all the details:
Ning, Z., Pawitan, Y., & Shen, X. (2020). High-definition likelihood inference of genetic correlations across human complex traits (pp. 1-6). Nature Genetics.
HDL is of obvious value to people that want to analyze GWAS summary statistics. There is intuitive and accessible code to run HDL found here:https://github.com/zhenin/HDL
Practically using HDL means you can analyze GWAS sumstats from smaller GWASs and still have enough power to estimate genetic correlations. Like LDSC, HDL compute the standard errors of heritability and coheritability estimates using a block jacknife. In order to appropriately include HDL estimates in GenomicSEM, we extended the block jacknife to estimate the dependencies between estimation errors across heritability and coheritability estimates, just as we have previously done for LDSC.
We wrote a basic multivariate version of HDL as a function in GenomicSEM (hdl()). It can ingest GWAS summary statistics you have prepared for analysis in GenomicSEM using the munge() function.
If you end up using the hdl() function in a paper, we would highly appreciate if you cite Ning et al., as it is the direct product of their intellectual labor.
hdl() in GenomicSEMFirst you will need to (re)install GenomicSEM, the install instructions are HERE
Then we need to download the LD references matrices for HDL, I opted for the HapMap2 set of SNPs, which is like 15Gb, so a long download and quite a bit of diskspace! You can download these files here: https://www.dropbox.com/s/kv5zhgu274wg9z5/UKB_imputed_hapmap2_SVD_eigen99_extraction.tar.gz?dl=0
Again unzip these files and place in your working directory.
For this example I am using GWAS sumstats for BMI Yengo et al : https://portals.broadinstitute.org/collaboration/giant/images/6/63/Meta-analysis_Wood_et_al%2BUKBiobank_2018.txt.gz
GWAS summary data for educational attainment: Lee et al: https://www.dropbox.com/s/ho58e9jmytmpaf8/GWAS_EA_excl23andMe.txt?dl=0
GWAS summary data for schizophrenia: Ripke et al. (SCZ2): https://www.med.unc.edu/pgc/download-results/scz/
You can process these summary statistics with the munge() functino as per instructions on the GenomicSEM wiki
If you want to proceed quickly and without having to download and process all files yourself you can downlaod the files I minged directly HERE
IMPORTANT I found some NaN values in BMI summary statistics, which I had to remove, I did so by searching for them in a text editor and deleting the entire line.
Having collected all the files we need we go ahead and run hdl() in genomicSEM. Note hdl() will warn you that overlap between the BMI sumstats and the references file it low (~98.%). If we were running this as a real analysis for publication, we would download the full GWAS sumstats, and select the exact set of SNPs. hdl() will run for about 15 minutes maybe a bit more, if you want faster estimation (because you have many traits for example) consider running hdl() with the argument method= set to ”piecewise” which is a faster implementation of hdl() which has some additional benefits we will outline soon. Code for both versions is included, as is a run with LD score regression as implemented in genomicSEM.
# run HDL
hdl.covstruct <- hdl(c("BMI.sumstats","EA3.sumstats.gz","scz2.sumstats.gz"),sample.prev = c(NA,NA,.5),population.prev = c(NA,NA,0.01) ,trait.names=c("BMI","EA3","SCZ"),LD.path="UKB_imputed_hapmap2_SVD_eigen99_extraction/", method = "jackknife")
# a faster variant of HDL
hdl.covstruct.2 <- hdl(c("BMI.sumstats","EA3.sumstats.gz","scz2.sumstats.gz"),sample.prev = c(NA,NA,.5),population.prev = c(NA,NA,0.01) ,trait.names=c("BMI","EA3","SCZ"),LD.path="UKB_imputed_hapmap2_SVD_eigen99_extraction/", method = "piecewise")
# run LDSC for comparisson
ldsc.covstruct <- ldsc(traits =c("BMI.sumstats","EA3.sumstats.gz","scz2.sumstats.gz"),sample.prev = c(NA,NA,.5),population.prev = c(NA,NA,0.01) ,ld = "eur_w_ld_chr/",wld = "eur_w_ld_chr/",trait.names =c("BMI","EA3","SCZ"))
Having computed the genetic covariance matrix, and its variance covariance matrix, we can proceed and fit a genomicSEM model. Here we simply compute the genetic correlations, so you can compare them with output from the original HDL or ldsc implementations if you wish.
# Model to compute correlations:
model <- '
lSCZ =~ NA*SCZ
lEA3 =~ NA*EA3
lBMI =~ NA*BMI
SCZ ~~ 0*SCZ + 0*EA3 + 0*BMI
EA3 ~~ 0*EA3 + 0*BMI
BMI ~~ 0*BMI
lSCZ ~~ 1*lSCZ + lEA3 + lBMI
lEA3 ~~ 1*lEA3 + lBMI
lBMI ~~ 1*lBMI
'
## fit a model for the correlations based on HDL:
hdl.cors <- usermodel(covstruc = hdl.covstruct ,estimation = "DWLS",model = model)
## [1] "Running primary model"
## [1] "Calculating model chi-square"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed
## 0.738
hdl.cors
## $modelfit
## chisq df p_chisq AIC CFI SRMR
## df NA 0 NA NA 1 1.456728e-09
##
## $results
## lhs op rhs Unstandardized_Estimate Unstandardized_SE Standardized_Est
## 1 lSCZ =~ SCZ 0.42670246 0.007709009 1.00000000
## 2 lEA3 =~ EA3 0.31105812 0.003807877 1.00000000
## 3 lBMI =~ BMI -0.37420119 0.004785996 1.00000000
## 11 lSCZ ~~ lEA3 0.06347467 0.015422844 0.06347467
## 12 lSCZ ~~ lBMI 0.13108164 0.013900687 -0.13108164
## 14 lEA3 ~~ lBMI 0.29887999 0.015921505 -0.29887999
## Standardized_SE
## 1 0.01806647
## 2 0.01224169
## 3 0.01278990
## 11 0.01542284
## 12 0.01390069
## 14 0.01592150
and for comparison let’s look at the results using the "piecewise" implementation of HDL, and ldsc():
## fit a model for the correlations based on with HDL piecewise (faster version):
hdl.cors2 <- usermodel(covstruc = hdl.covstruct.2,estimation = "DWLS",model = model)
## [1] "Running primary model"
## [1] "Calculating model chi-square"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed
## 0.808
hdl.cors2
## $modelfit
## chisq df p_chisq AIC CFI SRMR
## df NA 0 NA NA 1 5.220787e-09
##
## $results
## lhs op rhs Unstandardized_Estimate Unstandardized_SE Standardized_Est
## 1 lSCZ =~ SCZ -0.42929800 0.009449058 1.00000000
## 2 lEA3 =~ EA3 0.32898487 0.004492512 1.00000000
## 3 lBMI =~ BMI 0.38906525 0.005118436 1.00000000
## 11 lSCZ ~~ lEA3 -0.04779385 0.013969207 0.04779385
## 12 lSCZ ~~ lBMI 0.12676083 0.015809034 -0.12676083
## 14 lEA3 ~~ lBMI -0.18557739 0.013052721 -0.18557739
## Standardized_SE
## 1 0.02201049
## 2 0.01365568
## 3 0.01315573
## 11 0.01396921
## 12 0.01580903
## 14 0.01305272
## fit a model for the correlations based on ldsc for comparison:
ldsc.cors <- usermodel(covstruc = ldsc.covstruct,estimation = "DWLS",model = model)
## [1] "Running primary model"
## [1] "Calculating model chi-square"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed
## 0.627
ldsc.cors
## $modelfit
## chisq df p_chisq AIC CFI SRMR
## df NA 0 NA NA 1 2.007542e-10
##
## $results
## lhs op rhs Unstandardized_Estimate Unstandardized_SE Standardized_Est
## 1 lSCZ =~ SCZ 0.50090672 0.010097916 1.00000000
## 2 lEA3 =~ EA3 0.33509115 0.004504385 1.00000000
## 3 lBMI =~ BMI -0.45722380 0.006857845 1.00000000
## 11 lSCZ ~~ lEA3 0.06172632 0.019153422 0.06172632
## 12 lSCZ ~~ lBMI 0.11110016 0.017700927 -0.11110016
## 14 lEA3 ~~ lBMI 0.26925026 0.013625589 -0.26925026
## Standardized_SE
## 1 0.02015927
## 2 0.01344227
## 3 0.01499888
## 11 0.01915342
## 12 0.01770093
## 14 0.01362559
Lets collect the relevant results into one dataframe
df <- round(cbind(ldsc.cors$results[4:6,6:7],hdl.cors$results[4:6,6:7],hdl.cors2$results[4:6,6:7]),3)
colnames(df) <- c("rg ldsc","se ldsc","rg hdl","se hdl","rg hdl piecewise","se hdl piecewise")
rownames(df) <- c("cor(SCZ,EA)","cor(SCZ,BMI)","cor(BMI,EA)")
knitr::kable(df)
| rg ldsc | se ldsc | rg hdl | se hdl | rg hdl piecewise | se hdl piecewise | |
|---|---|---|---|---|---|---|
| cor(SCZ,EA) | 0.062 | 0.019 | 0.063 | 0.015 | 0.048 | 0.014 |
| cor(SCZ,BMI) | -0.111 | 0.018 | -0.131 | 0.014 | -0.127 | 0.016 |
| cor(BMI,EA) | -0.269 | 0.014 | -0.299 | 0.016 | -0.186 | 0.013 |
As you can see very similar results, but in this instance the standard errors aren’t that much smaller for hdl() then for ldsc() which could be due to the somewhat low overlap between the BMI sumstats and the reference LD, or to slight mismatch between the UKB reference LD and these GWAS, which are based on UKB and many other cohorts. Nonetheless, you now have HDL at your disposal which will certainly be valuable in low power situations. Note that with out of sample LD hdl() may esitmate somewhat lower heritability, and a higher intercept, this means you should NOT use the intercept to shrink effect sizes if you run a GWAS in GenomicSEM.