HDL in GenomicSEM

Analysis example for `hdl()` in GenomicSEM

First you will need to (re)install GenomicSEM, the install instructions are HERE

Then we need to download the LD references matrices for HDL, I opted for the HapMap2 set of SNPs, which is like 15Gb, so a long download and quite a bit of diskspace! You can download these files here: https://www.dropbox.com/s/kv5zhgu274wg9z5/UKB_imputed_hapmap2_SVD_eigen99_extraction.tar.gz?dl=0

Again unzip these files and place in your working directory.

For this example I am using GWAS sumstats for BMI Yengo et al : https://portals.broadinstitute.org/collaboration/giant/images/6/63/Meta-analysis_Wood_et_al%2BUKBiobank_2018.txt.gz

GWAS summary data for educational attainment: Lee et al: https://www.dropbox.com/s/ho58e9jmytmpaf8/GWAS_EA_excl23andMe.txt?dl=0

GWAS summary data for schizophrenia: Ripke et al. (SCZ2): https://www.med.unc.edu/pgc/download-results/scz/

You can process these summary statistics with the munge() functino as per instructions on the GenomicSEM wiki

If you want to proceed quickly and without having to download and process all files yourself you can downlaod the files I minged directly HERE

IMPORTANT I found some NaN values in BMI summary statistics, which I had to remove, I did so by searching for them in a text editor and deleting the entire line.

Having collected all the files we need we go ahead and run hdl() in genomicSEM. Note hdl() will warn you that overlap between the BMI sumstats and the references file it low (~98.%). If we were running this as a real analysis for publication, we would download the full GWAS sumstats, and select the exact set of SNPs. hdl() will run for about 15 minutes maybe a bit more, if you want faster estimation (because you have many traits for example) consider running hdl() with the argument method= set to ”piecewise” which is a faster implementation of hdl() which has some additional benefits we will outline soon. Code for both versions is included, as is a run with LD score regression as implemented in genomicSEM.

# run HDL
hdl.covstruct <- hdl(c("BMI.sumstats","EA3.sumstats.gz","scz2.sumstats.gz"),sample.prev = c(NA,NA,.5),population.prev = c(NA,NA,0.01) ,trait.names=c("BMI","EA3","SCZ"),LD.path="UKB_imputed_hapmap2_SVD_eigen99_extraction/", method = "jackknife")
  
# a faster variant of HDL
hdl.covstruct.2 <- hdl(c("BMI.sumstats","EA3.sumstats.gz","scz2.sumstats.gz"),sample.prev = c(NA,NA,.5),population.prev = c(NA,NA,0.01) ,trait.names=c("BMI","EA3","SCZ"),LD.path="UKB_imputed_hapmap2_SVD_eigen99_extraction/", method = "piecewise")  
  
  
# run LDSC for comparisson
ldsc.covstruct <- ldsc(traits =c("BMI.sumstats","EA3.sumstats.gz","scz2.sumstats.gz"),sample.prev = c(NA,NA,.5),population.prev = c(NA,NA,0.01) ,ld = "eur_w_ld_chr/",wld = "eur_w_ld_chr/",trait.names =c("BMI","EA3","SCZ"))

Having computed the genetic covariance matrix, and its variance covariance matrix, we can proceed and fit a genomicSEM model. Here we simply compute the genetic correlations, so you can compare them with output from the original HDL or ldsc implementations if you wish.

  # Model to compute correlations:
  
  
  model <- '
  lSCZ =~ NA*SCZ
  lEA3 =~ NA*EA3
  lBMI =~ NA*BMI
  
  SCZ ~~ 0*SCZ + 0*EA3 + 0*BMI
  EA3 ~~ 0*EA3 + 0*BMI
  BMI ~~ 0*BMI
  
  lSCZ ~~ 1*lSCZ + lEA3 + lBMI
  lEA3 ~~ 1*lEA3 + lBMI
  lBMI ~~ 1*lBMI
  '

  
## fit a model for the correlations based on HDL:
hdl.cors <-  usermodel(covstruc = hdl.covstruct ,estimation = "DWLS",model = model)

## [1] "Running primary model"
## [1] "Calculating model chi-square"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed 
##   0.738

hdl.cors

## $modelfit
##    chisq df p_chisq AIC CFI         SRMR
## df    NA  0      NA  NA   1 1.456728e-09
## 
## $results
##     lhs op  rhs Unstandardized_Estimate Unstandardized_SE Standardized_Est
## 1  lSCZ =~  SCZ              0.42670246       0.007709009       1.00000000
## 2  lEA3 =~  EA3              0.31105812       0.003807877       1.00000000
## 3  lBMI =~  BMI             -0.37420119       0.004785996       1.00000000
## 11 lSCZ ~~ lEA3              0.06347467       0.015422844       0.06347467
## 12 lSCZ ~~ lBMI              0.13108164       0.013900687      -0.13108164
## 14 lEA3 ~~ lBMI              0.29887999       0.015921505      -0.29887999
##    Standardized_SE
## 1       0.01806647
## 2       0.01224169
## 3       0.01278990
## 11      0.01542284
## 12      0.01390069
## 14      0.01592150

and for comparison let’s look at the results using the "piecewise" implementation of HDL, and ldsc():

## fit a model for the correlations based on with HDL piecewise (faster version):
 hdl.cors2 <-  usermodel(covstruc = hdl.covstruct.2,estimation = "DWLS",model = model)

## [1] "Running primary model"
## [1] "Calculating model chi-square"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed 
##   0.808

 hdl.cors2

## $modelfit
##    chisq df p_chisq AIC CFI         SRMR
## df    NA  0      NA  NA   1 5.220787e-09
## 
## $results
##     lhs op  rhs Unstandardized_Estimate Unstandardized_SE Standardized_Est
## 1  lSCZ =~  SCZ             -0.42929800       0.009449058       1.00000000
## 2  lEA3 =~  EA3              0.32898487       0.004492512       1.00000000
## 3  lBMI =~  BMI              0.38906525       0.005118436       1.00000000
## 11 lSCZ ~~ lEA3             -0.04779385       0.013969207       0.04779385
## 12 lSCZ ~~ lBMI              0.12676083       0.015809034      -0.12676083
## 14 lEA3 ~~ lBMI             -0.18557739       0.013052721      -0.18557739
##    Standardized_SE
## 1       0.02201049
## 2       0.01365568
## 3       0.01315573
## 11      0.01396921
## 12      0.01580903
## 14      0.01305272

 ## fit a model for the correlations based on ldsc for comparison:
 ldsc.cors <-  usermodel(covstruc = ldsc.covstruct,estimation = "DWLS",model = model)

## [1] "Running primary model"
## [1] "Calculating model chi-square"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed 
##   0.627

 ldsc.cors

## $modelfit
##    chisq df p_chisq AIC CFI         SRMR
## df    NA  0      NA  NA   1 2.007542e-10
## 
## $results
##     lhs op  rhs Unstandardized_Estimate Unstandardized_SE Standardized_Est
## 1  lSCZ =~  SCZ              0.50090672       0.010097916       1.00000000
## 2  lEA3 =~  EA3              0.33509115       0.004504385       1.00000000
## 3  lBMI =~  BMI             -0.45722380       0.006857845       1.00000000
## 11 lSCZ ~~ lEA3              0.06172632       0.019153422       0.06172632
## 12 lSCZ ~~ lBMI              0.11110016       0.017700927      -0.11110016
## 14 lEA3 ~~ lBMI              0.26925026       0.013625589      -0.26925026
##    Standardized_SE
## 1       0.02015927
## 2       0.01344227
## 3       0.01499888
## 11      0.01915342
## 12      0.01770093
## 14      0.01362559

Lets collect the relevant results into one dataframe

df <-  round(cbind(ldsc.cors$results[4:6,6:7],hdl.cors$results[4:6,6:7],hdl.cors2$results[4:6,6:7]),3)

colnames(df) <- c("rg ldsc","se ldsc","rg hdl","se hdl","rg hdl piecewise","se hdl piecewise")
rownames(df) <- c("cor(SCZ,EA)","cor(SCZ,BMI)","cor(BMI,EA)")

knitr::kable(df)

	rg ldsc	se ldsc	rg hdl	se hdl	rg hdl piecewise	se hdl piecewise
cor(SCZ,EA)	0.062	0.019	0.063	0.015	0.048	0.014
cor(SCZ,BMI)	-0.111	0.018	-0.131	0.014	-0.127	0.016
cor(BMI,EA)	-0.269	0.014	-0.299	0.016	-0.186	0.013

As you can see very similar results, but in this instance the standard errors aren’t that much smaller for hdl() then for ldsc() which could be due to the somewhat low overlap between the BMI sumstats and the reference LD, or to slight mismatch between the UKB reference LD and these GWAS, which are based on UKB and many other cohorts. Nonetheless, you now have HDL at your disposal which will certainly be valuable in low power situations. Note that with out of sample LD hdl() may esitmate somewhat lower heritability, and a higher intercept, this means you should NOT use the intercept to shrink effect sizes if you run a GWAS in GenomicSEM.

HDL in GenomicSEM

Michel G Nivard

7/16/2020