Created on 2020-05-17

Welcome to your friendly neighbourhood series of statistical genetics R workbooks! If this is the first time you stumble upon them, in desperate need for someone’s code to avoid reinventing the wheel, make sure you check my previous notebook on preparing GWAS data for imputation here.

Otherwise, without further ado, let’s move on to the next…

1 Single-SNP heritability and post-GWAS power calculations

In this workbook I’ll tackle a couple of questions that arise very often after you have conducted a genome-wide association study (GWAS).

1.1 Single-SNP heritability

Suppose you have identified a SNP significantly associated with your trait of interest. The first question is usually formulated in two equivalent ways:

  • what is the proportion of phenotypic variance explained by a certain SNP?
  • what is the single-SNP heritability for your trait?

To answer this question, we can use equation 4 in the supplementary text of this paper - click on the ‘Code’ button on the right to show the function definition.

hsquared <- function(beta, se, N, maf){
  num = 2*(beta^2)*maf*(1-maf)
  den = num + 2*N*(se^2)*maf*(1-maf)
  hsq = num/den
  return(hsq)
}

The proportion of phenotypic variance explained by a SNP is a function of the SNP’s minor allele frequency(MAF), effect size and standard error, as well as the sample size of your GWAS study.

So say your SNP has a MAF = 0.15, beta coefficient = -0.07, standard error = 0.01, and you had genotypes of your SNP of interest available for 826 participants. The proportion of variance explained (or heritability) by your SNP will be:

maf  = 0.15
beta = -0.07
se   = 0.01
N    = 826

hsquared(beta, se, N, maf)
## [1] 0.056

1.2 Post-hoc power calculation

Now say you have a SNP of interest, maybe significantly associated with a trait in the literature, and you are trying to replicate the association in another, perhaps smaller sample. Your statistical test is not significant and you wonder whether the original association reported was a false positive or simply your replication is underpowered. Or perhaps you have some significant associations in your GWAS and you simply want to perform some post-hoc power calculations, perhaps to compare the same SNP across different GWASes (like I did here).

This is easily done with the following function - inspired by this wiki - click on the ‘Code’ button on the right to show the function definition.

posthoc_snp_power_calc <- function(beta, se, N, maf, alpha){
  hsq       = hsquared(beta, se, N, maf)
  thresh    = qchisq(alpha, df = 1, lower.tail = FALSE)
  power     = pchisq(thresh, df = 1, lower.tail = FALSE, ncp = N * hsq)
  return(power)
}

To demonstrate this, suppose you want to know what was the power to detect a significant association for the SNP of interest in the previous section, with the same GWAS estimates, at a significance level of 0.05:

alpha = 0.05

posthoc_snp_power_calc(beta, se, N, maf, alpha)
## [1] 0.9999994

You had 99% power! Yay!

However, in GWAS we typically apply a Bonferroni correction for multiplicity that effectively brings the significance level down to 5e-8. To see whether your SNP effect estimates had any power to survive this correction, simply change your alpha level:

alpha = 5e-8

posthoc_snp_power_calc(beta, se, N, maf, alpha)
## [1] 0.9152475

After Bonferroni correction, your SNP estimates still had 91% power to be declared significant. And indeed the SNP passed the significance threshold :)

What if the same SNP was tested for association with a different trait in a different sample? What would have been the power to detect an association? Let us assume the following GWAS estimates: beta = -0.04, se = 0.03, N = 826. The power is:

beta  = -0.04
se    = 0.03
N     = 826
maf   = 0.15

sprintf('Power without Bonferroni: %.2f', posthoc_snp_power_calc(beta, se, N, maf, alpha = 0.05))
## [1] "Power without Bonferroni: 0.27"
sprintf('Power with Bonferroni: %.2f', posthoc_snp_power_calc(beta, se, N, maf, alpha = 5e-8))
## [1] "Power with Bonferroni: 0.00"

Too low! :( Most likely you would not detect an association due to lack of power.


This tutorial ends here, short and sweet. I hope you found it helpful! If you have any comments or questions, or topics you would like to see addressed, please get in touch by sending an email to marzia.scelsi.15@ucl.ac.uk. For other awesome resources on imaging, genetics, and everything else in between, check out the COMBINE lab website and Twitter.