LD decay has been proposed as a central issue to genomic risk prediction across populations (Martin et al., 2017: http://dx.doi.org/10.1016/j.ajhg.2017.03.004). The purpose of this project is to estimate LD decay for GWAS hits. Another goal is to identify SNPs that are more robust to LD decay and use those for calculating population factors or polygenic scores.Another index is the average r^2 in the non-reference population (in this case, YRI). Higher values indicate lower LD decay. This code calculates LD decay for each SNP as correlation between LD values (r^2) and allele frequencies of SNPs in LD for SNPS in CEU and YRI. I chose educational attainment GWAS hits, specifically the 9 quasi-replicates (Piffer, 2017) LD and MAF are downloaded using LDlink (https://analysistools.nci.nih.gov/LDlink/?tab=home) using the LD proxy option and saving the output into a .txt file. LD is calculated for CEU and YRI separately.Each file contains all the SNPs in LD with the lead (GWAS hit) SNP in each population SNPs that are overlapping between CEU and YRI are kept. The correlation between LD values (r^2) for CEU and YRI is computed. Higher correlation means lower LD decay
Mean frequency is calculated for the SNPs in LD for CEU and YRI Difference between the CEU and YRI frequency is calculated separately for the LD in SNPs and for the lead SNP Meta-difference is calculated: (freq_leadSNP(CEU)-freq_leadSNP(YRI))- (freq_LDSNP(CEU)-freq_LDSNP(YRI)) bigger absolute values indicate that lead SNP have bigger difference. In particular, if value is positive, then LD decay overestimates the YRI score. if value is negative, then LD decay underestimates the YRI score The logic is this: this is a simulation of what LD decay “does” to the allele frequency of the causal variant. If the alleles in LD with the GWAS hit have a lower frequency among CEU, then LD decay underestimates CEU GWAS hit frequency. If the value is 0, then LD decay does not have a bias towards one population’s polygenic score With this method, one can identify the GWAS hits where LD decay does not break down across populations. These SNPs can then used to build an LD-decay resistant polygenic score #Order for loading the files: rs1008078_CEU,rs1008078_YRI,rs1158857_CEU, rs11588857_YRI, rs12987662_CEU, rs12987662_YRI,rs148734725_CEU, rs148734725_YRI, rs9320913_CEU, rs9320913_YRI Download from: osf.io/t2ej8
Load files
rs1008078_CEU=read.delim(file.choose(),header=TRUE)
rs1008078_YRI=read.delim(file.choose(),header=TRUE)
#Remove SNPs with no name
rs1008078_CEU=rs1008078_CEU[-(which(rs1008078_CEU$RS_Number==".")),]
rs1008078_YRI=rs1008078_YRI[-(which(rs1008078_YRI$RS_Number==".")),]
Merge df’s by rsID
merged_rs1008078=merge(rs1008078_CEU,rs1008078_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs1008078=merged_rs1008078[-(which(merged_rs1008078$RS_Number=="rs1008078")),]#remove same SNP
Select only SNPs in LD equal to or greater than X in CEU
merged_rs1008078_highLD=merged_rs1008078[which(merged_rs1008078$R2.x>=0.8),]
Correlation of LD values (r^2) between CEU and YRI
cor.test(merged_rs1008078$R2.x,merged_rs1008078$R2.y)
##
## Pearson's product-moment correlation
##
## data: merged_rs1008078$R2.x and merged_rs1008078$R2.y
## t = 2.7697, df = 184, p-value = 0.006186
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.05784239 0.33431328
## sample estimates:
## cor
## 0.2000568
cor_LD_rs1008078=cor(merged_rs1008078$R2.x,merged_rs1008078$R2.y)#save corr coefficient
Plot correlation of r^2 coefficients Correlation between allele frequencies of SNPs in LD
cor.test(merged_rs1008078$MAF.x,merged_rs1008078$MAF.y)
##
## Pearson's product-moment correlation
##
## data: merged_rs1008078$MAF.x and merged_rs1008078$MAF.y
## t = 4.6623, df = 184, p-value = 5.992e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1900547 0.4479745
## sample estimates:
## cor
## 0.3250457
cor_MAF_1008078=cor(merged_rs1008078$MAF.x,merged_rs1008078$MAF.y)#save correlation
#if MAF has negative beta in GWAS, calculate frequency of other allele (1-MAF)
mean_rs1008078_CEU=mean(merged_rs1008078_highLD$MAF.x)#mean frequency for CEU
mean_rs1008078_YRI=mean(merged_rs1008078_highLD$MAF.y)#mean frequency for YRI
Plot correlation of allele frequencies Calculate mean r^2
r2_rs1008078_CEU=mean(merged_rs1008078_highLD$R2.x)#mean r^2 CEU
r2_rs1008078_YRI=mean(merged_rs1008078_highLD$R2.y)#mean r^2 YRI
Difference between CEU and YRI (CEU - YRI MAF) at the SNPs in LD for the positive effect allele
delta_mean_rs1008078=mean_rs1008078_CEU- mean_rs1008078_YRI
delta_mean_rs1008078
## [1] 0.1904692
Re-add lead SNP
merged_rs1008078_recreate=merge(rs1008078_CEU,rs1008078_YRI,by.x="RS_Number", by.y="RS_Number")
rs1008078=merged_rs1008078_recreate[(which(merged_rs1008078_recreate$RS_Number=="rs1008078")),]#select only lead SNP
Calculate CEU-YRI freq difference at the lead SNP
delta_rs1008078=rs1008078$MAF.x - rs1008078$MAF.y
delta_rs1008078
## [1] 0.4259
Meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs1008078=delta_rs1008078 - delta_mean_rs1008078
meta_delta_rs1008078
## [1] 0.2354308
Repeat with other SNPs…
##rs11588857
rs11588857_CEU=read.delim(file.choose(),header=TRUE)
rs11588857_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs11588857_CEU=rs11588857_CEU[-(which(rs11588857_CEU$RS_Number==".")),]
rs11588857_YRI=rs11588857_YRI[-(which(rs11588857_YRI$RS_Number==".")),]
#merge df's by rsID
merged_rs11588857=merge(rs11588857_CEU,rs11588857_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs11588857=merged_rs11588857[-(which(merged_rs11588857$RS_Number=="rs11588857")),]#remove same SNP
#select only SNPs in LD equal to or greater than X in CEU
merged_rs11588857_highLD=merged_rs11588857[which(merged_rs11588857$R2.x>=0.8),]
cor.test(merged_rs11588857$R2.x,merged_rs11588857$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs11588857$R2.x and merged_rs11588857$R2.y
## t = 26.844, df = 466, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7409778 0.8125288
## sample estimates:
## cor
## 0.7792804
cor_LD_rs11588857=cor(merged_rs11588857$R2.x,merged_rs11588857$R2.y)
plot(merged_rs11588857$R2.x,merged_rs11588857$R2.y)#plot correlation of LD values
r2_rs11588857_CEU=mean(merged_rs11588857_highLD$R2.x)#mean r^2 CEU
r2_rs11588857_YRI=mean(merged_rs11588857_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs11588857_highLD$MAF.x,merged_rs11588857_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs11588857_highLD$MAF.x and merged_rs11588857_highLD$MAF.y
## t = 1.9031, df = 23, p-value = 0.06962
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.03078048 0.66679014
## sample estimates:
## cor
## 0.3688363
cor_MAF_rs11588857=cor(merged_rs11588857_highLD$MAF.x,merged_rs11588857_highLD$MAF.y)
plot(merged_rs11588857_highLD$MAF.x,merged_rs11588857_highLD$MAF.y)
mean_rs11588857_CEU=mean(merged_rs11588857_highLD$MAF.x)#mean frequency for CEU
mean_rs11588857_YRI=mean(merged_rs11588857_highLD$MAF.y)#mean frequency for YRI
delta_mean_rs11588857=mean_rs11588857_CEU = mean_rs11588857_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs11588857
## [1] 0.104284
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs11588857_recreate=merge(rs11588857_CEU,rs11588857_YRI,by.x="RS_Number", by.y="RS_Number")
rs11588857=merged_rs11588857_recreate[(which(merged_rs11588857_recreate$RS_Number=="rs11588857")),]#select only lead SNP
delta_rs11588857=rs11588857$MAF.x - rs11588857$MAF.y #calculate CEU-YRI difference at the lead SNP
delta_rs11588857
## [1] 0.1414
meta_delta_rs11588857=delta_rs11588857 - delta_mean_rs11588857 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs11588857
## [1] 0.037116
#rs12987662
rs12987662_CEU=read.delim(file.choose(),header=TRUE)
rs12987662_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs12987662_CEU=rs12987662_CEU[-(which(rs12987662_CEU$RS_Number==".")),]
rs12987662_YRI=rs12987662_YRI[-(which(rs12987662_YRI$RS_Number==".")),]
merged_rs12987662=merge(rs12987662_CEU,rs12987662_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs12987662=merged_rs12987662[-(which(merged_rs12987662$RS_Number=="rs12987662")),]
#select only SNPs in LD equal to or greater than X in CEU
merged_rs12987662_highLD=merged_rs12987662[which(merged_rs12987662$R2.x>=0.8),]
cor.test(merged_rs12987662$R2.x,merged_rs12987662$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs12987662$R2.x and merged_rs12987662$R2.y
## t = 23.66, df = 597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.6518672 0.7348109
## sample estimates:
## cor
## 0.6956503
cor_LD_rs12987662=cor(merged_rs12987662$R2.x,merged_rs12987662$R2.y)
plot(merged_rs12987662$R2.x,merged_rs12987662$R2.y)#plot correlation of LD values
r2_rs12987662_CEU=mean(merged_rs12987662_highLD$R2.x)#mean r^2 CEU
r2_rs12987662_YRI=mean(merged_rs12987662_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs12987662_highLD$MAF.x,merged_rs12987662_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs12987662_highLD$MAF.x and merged_rs12987662_highLD$MAF.y
## t = 0.71795, df = 59, p-value = 0.4756
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1625670 0.3369869
## sample estimates:
## cor
## 0.09306355
cor_MAF_rs12987662=cor(merged_rs12987662_highLD$MAF.x,merged_rs12987662_highLD$MAF.y)
plot(merged_rs12987662_highLD$MAF.x,merged_rs12987662_highLD$MAF.y)#plot correlation of allele freqs
mean_rs12987662_CEU=mean(merged_rs12987662_highLD$MAF.x)#mean frequency for CEU
mean_rs12987662_YRI=mean(merged_rs12987662_highLD$MAF.y)#mean frequency for YRI
delta_mean_rs12987662=mean_rs12987662_CEU = mean_rs12987662_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs12987662
## [1] 0.01875574
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs12987662_recreate=merge(rs12987662_CEU,rs12987662_YRI,by.x="RS_Number", by.y="RS_Number")
rs12987662=merged_rs12987662_recreate[(which(merged_rs12987662_recreate$RS_Number=="rs12987662")),]#select only lead SNP
delta_rs12987662=rs12987662$MAF.x - rs12987662$MAF.y #calculate CEU-YRI difference at the lead SNP
delta_rs12987662
## [1] 0.4048
meta_delta_rs12987662=delta_rs12987662 - delta_mean_rs12987662 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs12987662
## [1] 0.3860443
#rs148734725
rs148734725_CEU=read.delim(file.choose(),header=TRUE)
rs148734725_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs148734725_CEU=rs148734725_CEU[-(which(rs148734725_CEU$RS_Number==".")),]
rs148734725_YRI=rs148734725_YRI[-(which(rs148734725_YRI$RS_Number==".")),]
merged_rs148734725=merge(rs148734725_CEU,rs148734725_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs148734725=merged_rs148734725[-(which(merged_rs148734725$RS_Number=="rs148734725")),]
#select only SNPs in LD equal to or greater than X in CEU
merged_rs148734725_highLD=merged_rs148734725[which(merged_rs148734725$R2.x>=0.8),]
cor.test(merged_rs148734725$R2.x,merged_rs148734725$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs148734725$R2.x and merged_rs148734725$R2.y
## t = 71.738, df = 835, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9174833 0.9364780
## sample estimates:
## cor
## 0.9275776
cor_LD_rs148734725=cor(merged_rs148734725$R2.x,merged_rs148734725$R2.y)
plot(merged_rs148734725$R2.x,merged_rs148734725$R2.y)#plot correlation of LD values
r2_rs148734725_CEU=mean(merged_rs148734725_highLD$R2.x)#mean r^2 CEU
r2_rs148734725_YRI=mean(merged_rs148734725_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs148734725_highLD$MAF.x,merged_rs148734725_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs148734725_highLD$MAF.x and merged_rs148734725_highLD$MAF.y
## t = 2.9352, df = 130, p-value = 0.003942
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08192354 0.40301046
## sample estimates:
## cor
## 0.2493061
cor_MAF_rs148734725=cor(merged_rs148734725_highLD$MAF.x,merged_rs148734725_highLD$MAF.y)
plot(merged_rs148734725_highLD$MAF.x,merged_rs148734725_highLD$MAF.y)#plot correlation of allele freqs
mean_rs148734725_CEU=mean(merged_rs148734725_highLD$MAF.x)#mean frequency for CEU
mean_rs148734725_YRI=mean(merged_rs148734725_highLD$MAF.y)#mean frequency for YRI
delta_mean_rs148734725=mean_rs148734725_CEU = mean_rs148734725_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs148734725
## [1] 0.2464295
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs148734725_recreate=merge(rs148734725_CEU,rs148734725_YRI,by.x="RS_Number", by.y="RS_Number")
rs148734725=merged_rs148734725_recreate[(which(merged_rs148734725_recreate$RS_Number=="rs148734725")),]#select only lead SNP
delta_rs148734725=rs148734725$MAF.x - rs148734725$MAF.y #calculate CEU-YRI difference at the lead SNP
delta_rs148734725
## [1] 0.0235
meta_delta_rs148734725=delta_rs148734725 - delta_mean_rs148734725 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs148734725
## [1] -0.2229295
#rs11712056
rs11712056_CEU=read.delim(file.choose(),header=TRUE)
rs11712056_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs11712056_CEU=rs11712056_CEU[-(which(rs11712056_CEU$RS_Number==".")),]
rs11712056_YRI=rs11712056_YRI[-(which(rs11712056_YRI$RS_Number==".")),]
merged_rs11712056=merge(rs11712056_CEU,rs11712056_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs11712056=merged_rs11712056[-(which(merged_rs11712056$RS_Number=="rs11712056")),]
#select only SNPs in LD equal to or greater than X in CEU
merged_rs11712056_highLD=merged_rs11712056[which(merged_rs11712056$R2.x>=0.8),]
cor.test(merged_rs11712056$R2.x,merged_rs11712056$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs11712056$R2.x and merged_rs11712056$R2.y
## t = 31.248, df = 816, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7052053 0.7677770
## sample estimates:
## cor
## 0.738074
cor_LD_rs11712056=cor(merged_rs11712056$R2.x,merged_rs11712056$R2.y)
plot(merged_rs11712056$R2.x,merged_rs11712056$R2.y)#plot correlation of LD values
r2_rs11712056_CEU=mean(merged_rs11712056_highLD$R2.x)#mean r^2 CEU
r2_rs11712056_YRI=mean(merged_rs11712056_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs11712056_highLD$MAF.x,merged_rs11712056_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs11712056_highLD$MAF.x and merged_rs11712056_highLD$MAF.y
## t = 0.40794, df = 70, p-value = 0.6846
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1850559 0.2772411
## sample estimates:
## cor
## 0.04870051
cor_MAF_rs11712056=cor(merged_rs11712056_highLD$MAF.x,merged_rs11712056_highLD$MAF.y)
plot(merged_rs11712056_highLD$MAF.x,merged_rs11712056_highLD$MAF.y)#plot correlation of allele freqs
#if MAF has negative beta in GWAS, calculate frequency of other allele (1-MAF)
mean_rs11712056_CEU=mean(merged_rs11712056_highLD$MAF.x)#mean frequency for CEU
mean_rs11712056_YRI=mean(merged_rs11712056_highLD$MAF.y)#mean frequency for YRI
delta_mean_rs11712056=mean_rs11712056_CEU = mean_rs11712056_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs11712056
## [1] 0.3019528
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs11712056_recreate=merge(rs11712056_CEU,rs11712056_YRI,by.x="RS_Number", by.y="RS_Number")
rs11712056=merged_rs11712056_recreate[(which(merged_rs11712056_recreate$RS_Number=="rs11712056")),]#select only lead SNP
delta_rs11712056=rs11712056$MAF.x - rs11712056$MAF.y #calculate CEU-YRI difference at the lead SNP
delta_rs11712056
## [1] 0.1035
meta_delta_rs11712056=delta_rs11712056 - delta_mean_rs11712056 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs11712056
## [1] -0.1984528
#rs62263923
rs62263923_CEU=read.delim(file.choose(),header=TRUE)
rs62263923_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs62263923_CEU=rs62263923_CEU[-(which(rs62263923_CEU$RS_Number==".")),]
rs62263923_YRI=rs62263923_YRI[-(which(rs62263923_YRI$RS_Number==".")),]
merged_rs62263923=merge(rs62263923_CEU,rs62263923_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs62263923=merged_rs62263923[-(which(merged_rs62263923$RS_Number=="rs62263923")),]
#select only SNPs in LD equal to or greater than X in CEU
merged_rs62263923_highLD=merged_rs62263923[which(merged_rs62263923$R2.x>=0.8),]
cor.test(merged_rs62263923$R2.x,merged_rs62263923$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs62263923$R2.x and merged_rs62263923$R2.y
## t = 12.725, df = 1534, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2630353 0.3535397
## sample estimates:
## cor
## 0.3089868
cor_LD_rs62263923=cor(merged_rs62263923$R2.x,merged_rs62263923$R2.y)
plot(merged_rs62263923$R2.x,merged_rs62263923$R2.y)#plot correlation of LD values
r2_rs62263923_CEU=mean(merged_rs62263923_highLD$R2.x)#mean r^2 CEU
r2_rs62263923_YRI=mean(merged_rs62263923_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs62263923_highLD$MAF.x,merged_rs62263923_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs62263923_highLD$MAF.x and merged_rs62263923_highLD$MAF.y
## t = -2.1977, df = 68, p-value = 0.03138
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.46439257 -0.02399713
## sample estimates:
## cor
## -0.2575191
cor_MAF_rs62263923=cor(merged_rs62263923_highLD$MAF.x,merged_rs62263923_highLD$MAF.y)
plot(merged_rs62263923_highLD$MAF.x,merged_rs62263923_highLD$MAF.y)#plot correlation of allele freqs
mean_rs62263923_CEU=mean(merged_rs62263923_highLD$MAF.x)#mean frequency for CEU
mean_rs62263923_YRI=mean(merged_rs62263923_highLD$MAF.y)#mean frequency for YRI
delta_mean_rs62263923=mean_rs62263923_CEU = mean_rs62263923_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs62263923
## [1] 0.1183314
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs62263923_recreate=merge(rs62263923_CEU,rs62263923_YRI,by.x="RS_Number", by.y="RS_Number")
rs62263923=merged_rs62263923_recreate[(which(merged_rs62263923_recreate$RS_Number=="rs62263923")),]#select only lead SNP
delta_rs62263923=rs62263923$MAF.x - rs62263923$MAF.y #calculate CEU-YRI difference at the lead SNP
delta_rs62263923
## [1] 0.1296
meta_delta_rs62263923=delta_rs62263923 - delta_mean_rs62263923 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs62263923
## [1] 0.01126857
#rs13294439
rs13294439_CEU=read.delim(file.choose(),header=TRUE)
rs13294439_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs13294439_CEU=rs13294439_CEU[-(which(rs13294439_CEU$RS_Number==".")),]
rs13294439_YRI=rs13294439_YRI[-(which(rs13294439_YRI$RS_Number==".")),]
merged_rs13294439=merge(rs13294439_CEU,rs13294439_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs13294439=merged_rs13294439[-(which(merged_rs13294439$RS_Number=="rs13294439")),]
#select only SNPs in LD equal to or greater than X in CEU
merged_rs13294439_highLD=merged_rs13294439[which(merged_rs13294439$R2.x>=0.8),]
cor.test(merged_rs13294439$R2.x,merged_rs13294439$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs13294439$R2.x and merged_rs13294439$R2.y
## t = 16.794, df = 453, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5593572 0.6730374
## sample estimates:
## cor
## 0.619434
cor_LD_rs13294439=cor(merged_rs13294439$R2.x,merged_rs13294439$R2.y)
plot(merged_rs13294439$R2.x,merged_rs13294439$R2.y)#plot correlation of LD values
r2_rs13294439_CEU=mean(merged_rs13294439_highLD$R2.x)#mean r^2 CEU
r2_rs13294439_YRI=mean(merged_rs13294439_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs13294439_highLD$MAF.x,merged_rs13294439_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs13294439_highLD$MAF.x and merged_rs13294439_highLD$MAF.y
## t = 1.2604, df = 12, p-value = 0.2315
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2304781 0.7385166
## sample estimates:
## cor
## 0.3419129
cor_MAF_rs13294439=cor(merged_rs13294439_highLD$MAF.x,merged_rs13294439_highLD$MAF.y)
plot(merged_rs13294439_highLD$MAF.x,merged_rs13294439_highLD$MAF.y)#plot correlation of allele freqs
mean_rs13294439_CEU=mean(merged_rs13294439_highLD$MAF.x)#mean frequency for CEU
mean_rs13294439_YRI=mean(merged_rs13294439_highLD$MAF.y)#mean frequency for YRI
delta_mean_rs13294439=mean_rs13294439_CEU = mean_rs13294439_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs13294439
## [1] 0.1276357
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs13294439_recreate=merge(rs13294439_CEU,rs13294439_YRI,by.x="RS_Number", by.y="RS_Number")
rs13294439=merged_rs13294439_recreate[(which(merged_rs13294439_recreate$RS_Number=="rs13294439")),]#select only lead SNP
delta_rs13294439=rs13294439$MAF.x - rs13294439$MAF.y #calculate CEU-YRI difference at the lead SNP
delta_rs13294439
## [1] 0.3308
meta_delta_rs13294439=delta_rs13294439 - delta_mean_rs13294439 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs13294439
## [1] 0.2031643
#rs12969294
rs12969294_CEU=read.delim(file.choose(),header=TRUE)
rs12969294_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs12969294_CEU=rs12969294_CEU[-(which(rs12969294_CEU$RS_Number==".")),]
rs12969294_YRI=rs12969294_YRI[-(which(rs12969294_YRI$RS_Number==".")),]
merged_rs12969294=merge(rs12969294_CEU,rs12969294_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs12969294=merged_rs12969294[-(which(merged_rs12969294$RS_Number=="rs12969294")),]
#select only SNPs in LD equal to or greater than X in CEU
merged_rs12969294_highLD=merged_rs12969294[which(merged_rs12969294$R2.x>=0.8),]
cor.test(merged_rs12969294$R2.x,merged_rs12969294$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs12969294$R2.x and merged_rs12969294$R2.y
## t = 9.8865, df = 555, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3139804 0.4554036
## sample estimates:
## cor
## 0.3869653
cor_LD_rs12969294=cor(merged_rs12969294$R2.x,merged_rs12969294$R2.y)
plot(merged_rs12969294$R2.x,merged_rs12969294$R2.y)#plot correlation of LD values
r2_rs12969294_CEU=mean(merged_rs12969294_highLD$R2.x)#mean r^2 CEU
r2_rs12969294_YRI=mean(merged_rs12969294_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs12969294_highLD$MAF.x,merged_rs12969294_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs12969294_highLD$MAF.x and merged_rs12969294_highLD$MAF.y
## t = -2.0059, df = 38, p-value = 0.05203
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.566345528 0.002305971
## sample estimates:
## cor
## -0.3094256
cor_MAF_rs12969294=cor(merged_rs12969294_highLD$MAF.x,merged_rs12969294_highLD$MAF.y)
plot(merged_rs12969294_highLD$MAF.x,merged_rs12969294_highLD$MAF.y)#plot correlation of allele freqs
mean_rs12969294_CEU=1-(mean(merged_rs12969294_highLD$MAF.x))#mean frequency for CEU
mean_rs12969294_YRI=1-(mean(merged_rs12969294_highLD$MAF.y))#mean frequency for YRI
delta_mean_rs12969294=mean_rs12969294_CEU = mean_rs12969294_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs12969294
## [1] 0.743855
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs12969294_recreate=merge(rs12969294_CEU,rs12969294_YRI,by.x="RS_Number", by.y="RS_Number")
rs12969294=merged_rs12969294_recreate[(which(merged_rs12969294_recreate$RS_Number=="rs12969294")),]#select only lead SNP
delta_rs12969294=(1-rs12969294$MAF.x) - (1-rs12969294$MAF.y) #calculate CEU-YRI difference at the lead SNP
delta_rs12969294
## [1] -0.0273
meta_delta_rs12969294=delta_rs12969294 - delta_mean_rs12969294 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs12969294
## [1] -0.771155
#rs9320913
rs9320913_CEU=read.delim(file.choose(),header=TRUE)
rs9320913_YRI=read.delim(file.choose(),header=TRUE)
#remove SNPs with no name
rs9320913_CEU=rs9320913_CEU[-(which(rs9320913_CEU$RS_Number==".")),]
rs9320913_YRI=rs9320913_YRI[-(which(rs9320913_YRI$RS_Number==".")),]
merged_rs9320913=merge(rs9320913_CEU,rs9320913_YRI,by.x="RS_Number", by.y="RS_Number")
merged_rs9320913=merged_rs9320913[-(which(merged_rs9320913$RS_Number=="rs9320913")),]
#select only SNPs in LD equal to or greater than X in CEU
merged_rs9320913_highLD=merged_rs9320913[which(merged_rs9320913$R2.x>=0.8),]
cor.test(merged_rs9320913$R2.x,merged_rs9320913$R2.y)#correlation of LD values between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs9320913$R2.x and merged_rs9320913$R2.y
## t = 24.667, df = 881, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.5984004 0.6765868
## sample estimates:
## cor
## 0.639142
cor_LD_rs9320913=cor(merged_rs9320913$R2.x,merged_rs9320913$R2.y)
plot(merged_rs9320913$R2.x,merged_rs9320913$R2.y)#plot correlation of LD values
r2_rs9320913_CEU=mean(merged_rs9320913_highLD$R2.x)#mean r^2 CEU
r2_rs9320913_YRI=mean(merged_rs9320913_highLD$R2.y)#mean r^2 YRI
cor.test(merged_rs9320913_highLD$MAF.x,merged_rs9320913_highLD$MAF.y)#correlation of allele frequency between CEU and YRI
##
## Pearson's product-moment correlation
##
## data: merged_rs9320913_highLD$MAF.x and merged_rs9320913_highLD$MAF.y
## t = -2.7804, df = 27, p-value = 0.00977
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7147167 -0.1273046
## sample estimates:
## cor
## -0.4717972
cor_MAF_rs9320913=cor(merged_rs9320913_highLD$MAF.x,merged_rs9320913_highLD$MAF.y)
plot(merged_rs9320913_highLD$MAF.x,merged_rs9320913_highLD$MAF.y)#plot correlation of allele freqs
mean_rs9320913_CEU=mean(merged_rs9320913_highLD$MAF.x)#mean frequency for CEU
mean_rs9320913_YRI=mean(merged_rs9320913_highLD$MAF.y)#mean frequency for YRI
delta_mean_rs9320913=mean_rs9320913_CEU = mean_rs9320913_YRI#difference between CEU and YRI (CEU - YRI MAF)
delta_mean_rs9320913
## [1] 0.1938034
#meta-delta (discrepancy between freq difference among GWAS hit and high LD SNPs)
merged_rs9320913_recreate=merge(rs9320913_CEU,rs9320913_YRI,by.x="RS_Number", by.y="RS_Number")
rs9320913=merged_rs9320913_recreate[(which(merged_rs9320913_recreate$RS_Number=="rs9320913")),]#select only lead SNP
delta_rs9320913=rs9320913$MAF.x - rs9320913$MAF.y
meta_delta_rs9320913=delta_rs9320913 - delta_mean_rs9320913 # meta-difference: difference between CEU-YRI frequency difference at the lead SNP and at the SNP in LD
meta_delta_rs9320913
## [1] 0.1209966
We can explore the correlation of each SNP frequency across SNPs in LD and use this as a measure of robusteness to LD decay.
vec_cor_Maf=c(cor_MAF_1008078,cor_MAF_rs11588857,cor_MAF_rs11712056,cor_MAF_rs12987662,cor_MAF_rs148734725,cor_MAF_rs62263923,cor_MAF_rs13294439,cor_MAF_rs12969294,cor_MAF_rs9320913)
MAF_corr=mean(vec_cor_Maf)
MAF_corr
## [1] 0.04312479
To compute polygenic scores, it is advisable to use SNPs whose correlation of frequencies is higher among those in LD. Another index of robusteness is probably the consistency of LD patterns, measured with the correlation of r^2 coefficients across populations Again, we should pick the SNPs with higher cross-population correlations of r^2 coefficients.
vec_LD_corr=c(cor_LD_rs1008078,cor_LD_rs11588857,cor_LD_rs12987662,cor_LD_rs148734725,cor_LD_rs11712056,cor_LD_rs62263923,cor_LD_rs13294439,cor_LD_rs12969294,cor_LD_rs9320913)
vec_LD_corr
## [1] 0.2000568 0.7792804 0.6956503 0.9275776 0.7380740 0.3089868 0.6194340
## [8] 0.3869653 0.6391420
LD_mean_corr=mean(vec_LD_corr)
LD_mean_corr
## [1] 0.5883519
Another index is the average r^2 in the non-reference population (in this case, YRI). Higher values indicate lower LD decay.
vec_r2_CEU=c(r2_rs1008078_CEU,r2_rs11588857_CEU,r2_rs11712056_CEU,r2_rs12987662_CEU,r2_rs148734725_CEU,r2_rs62263923_CEU,r2_rs13294439_CEU,r2_rs12969294_CEU,r2_rs9320913_CEU)
r2_CEU=mean(vec_r2_CEU)
vec_r2_YRI=c(r2_rs1008078_YRI,r2_rs11588857_YRI,r2_rs11712056_YRI,r2_rs12987662_YRI,r2_rs148734725_YRI,r2_rs62263923_YRI,r2_rs13294439_YRI,r2_rs12969294_YRI,r2_rs9320913_YRI)
r2_YRI=mean(vec_r2_YRI)
r2_CEU
## [1] 0.9243701
r2_YRI
## [1] 0.5402195
vec_r2_YRI#r^2 coefficients for YRI for the SNPs in LD (r>=0.8) with each SNPs
## [1] 0.09123077 0.75840000 0.42182222 0.68338361 0.82704242 0.22201857
## [7] 0.76980000 0.21439500 0.87388276
Mean meta-delta
meta_delta_tot=c(meta_delta_rs1008078,meta_delta_rs11588857,meta_delta_rs12987662,meta_delta_rs148734725,meta_delta_rs11712056,meta_delta_rs62263923,meta_delta_rs13294439,meta_delta_rs12969294,meta_delta_rs9320913)
mean(meta_delta_tot)
## [1] -0.02205743
Correlation between r^2 YRI average and r^2 correlations between YRI and CEU
cor(vec_r2_YRI,vec_LD_corr)
## [1] 0.8142408
The two indices are highly correlated (r=0.86)
Put two indices plus the correlation of allele frequencies for SNPs in LD in df and compute corr matrix
LD_indices=data.frame(vec_r2_YRI,vec_cor_Maf,vec_LD_corr)
cor(LD_indices)
## vec_r2_YRI vec_cor_Maf vec_LD_corr
## vec_r2_YRI 1.0000000 0.1529190 0.8142408
## vec_cor_Maf 0.1529190 1.0000000 0.2299958
## vec_LD_corr 0.8142408 0.2299958 1.0000000
The three indices are inter-correlated Compute a factor
library(psych)
fa_LD_ind=fa(LD_indices)
Show factor scores for each SNP
fa_LD_ind$scores
## MR1
## [1,] -1.6184942
## [2,] 0.7974941
## [3,] 0.4379016
## [4,] 1.4051692
## [5,] 0.6298004
## [6,] -1.1661760
## [7,] 0.1384932
## [8,] -0.8450230
## [9,] 0.2208348
This factor should indicate robustness of each SNP to LD decay. Use it to weight average allele frequencies.
We can use these scores as weights or we can only include the SNPs with positive score in the calculation of a polygenic score and compare the YRI-CEU difference with the score computed from all the SNPs. The difference in the total polygenic score between CEU and YRI is 8.76%. The difference in the LD-robust polygenic score is 18.5%.The weighted difference is 10.9% (for calculations, see: https://docs.google.com/spreadsheets/d/11Gxp-a6sbr8MiB5uQeaMHnP0fHPCR7Pvng_PxTgtgH0/edit?usp=sharing ) Hence, controlling for LD in this case increases the European-African difference.
Note that this overestimation of the African PS was predicted for polygenic scores lower than 50% among Europeans, as LD decay pushes up the allele frequencies back towards the background frequency of 50%.
For these 9 GWAS hits, the mean bias is around 0 (-0.02) favouring YRI (mean meta-delta). As expected, there is no difference, as predicted if the effects of LD decay had no systematic bias. This result means that alleles in LD with the lead SNP tend to have the same frequency as the lead SNP. The correlation of LD values (r^2) across the two populations is 0.59 (LD_mean_corr), indicating a moderate robustness of LD patterns across populations. The average r^2 in the non-reference population (in this case, YRI) is 0.54 vs 0.92 in the reference population (CEU).