Eurasian specific alleles and polygenic scores

Do population-specific alleles, that is, mutations that arose after the African exodus ca. 100/60kya ago (estimates vary wildly, because there were probably different migration waves throughout this period), drive the correlation between polygenic scores (PGS) and phenotypic differences in IQ? More specifically, since GWAS are carried out on Europeans, do they tend to pick genetic variants that are beneficial and specific to Europeans? Critics of my work usually assume that GWAS pick only good mutations, that is, the intelligence-enhancing alleles that are specific to a population. In fact, GWAS can also pick up population specific variants that are actually trait-decreasing (“detrimental” if the trait is socially desirable or leads to enhanced fitness). For example, a recent GWAS (Asgari et al., 2019) carried out on Peruvians found an allele that is absent from non-American populations and this allele actually decreases height by almost an inch! Population specific variants are probably more likely to be detrimental because they are recent mutations and purifying selection has not had the time to eliminate them from the gene pool. This would actually lead to a bias against the reference (in most cases European) population. When we compute PGS we flip the frequency of the alleles so if the trait-decreasing allele has frequency X, the trait-increasing allele has frequency 1-X. So if there is a population specific allele with negative effect and 5% frequency, the positive effect allele will have frequency 95% in the reference population, and 100% in the other populations. So this would actually lead to an inflation of the non-reference population score if the population-specific variants tend on average to be detrimental. Conversely, if the population-specific variants picked by the GWAS tend to be beneficial, this will cause a bias towards the GWAS-reference population.

To answer this question, I compute PGS for SNPs which are putatively Eurasian-specific (i.e. non-African). I select SNPs with a MAF (minor allele frequency) of less than 1% among Africans, which takes into account later Eurasian blackflow into Africa, which was very limited and among West Africans (Yorubans), less than 1%. The PGS generated using the Eurasian-specific SNPs should reflect evolutionary processes that acted after the out of Africa exodus. Conversely, the remaining SNPs represent standing genetic variation common to African and non African populations. This loss of information will reduce the accuracy of the PGS for non African populations but increase it for Africans. In other words, it creates an even ground by removing the population-specific variants. This is the mirror approach to the one of including all population specific mutations, but this won’t be possible to achieve until we have GWAS that use African samples as the reference population.

Open file with EA SNPs freqs

setwd("~/genetics_2018")
EA_SNPS=read.csv("MTAG_EA_snps_wide.csv")

First, I compute and plot EA_MTAG PGS using all the significant SNPs found in 1KG(N=3,258). I use the weighted PGS because it provides a slightly better correlation with IQ (0.89 vs 0.88). I use the unweighted score for common and specific PGS in order to compute allele frequency differences between different PGS selection (common vs specific) methods.

##             Population EA_MTAG_PGS
## 16 Mende, Sierra Leone   -1.584780
## 10             Gambian   -1.424938
## 15        Luhya, Kenya   -1.400287
## 8        Esan, Nigeria   -1.363638

Select only alleles with freqs <0.01 and >0.99 among West Africans (Yoruba)

EA_SNPS_specific_1=EA_SNPS[which(EA_SNPS$YRI<0.01),]
EA_SNPS_specific_2=EA_SNPS[which(EA_SNPS$YRI>0.99),]

Number of non-African specific positive effect alleles

length(EA_SNPS_specific_1$snps)

## [1] 375

Number of non-African specific negative effect alleles

length(EA_SNPS_specific_2$snps)

## [1] 284

Merge and compute PGS with Eurasian-specific alleles

EA_SNPS_specific=rbind(EA_SNPS_specific_1,EA_SNPS_specific_2)

PGS_specific=apply(EA_SNPS_specific[,c(3:28)],2,mean, na.rm=TRUE)

PGS_specific

##       ACB       ASW       BEB       CDX       CEU       CHB       CHS 
## 0.4338882 0.4355830 0.4493860 0.4490430 0.4598105 0.4525244 0.4504155 
##       CLM       ESN       FIN       GBR       GIH       GWD       IBS 
## 0.4498999 0.4311399 0.4655661 0.4608298 0.4539314 0.4314126 0.4456334 
##       ITU       JPT       KHV       LWK       MSL       MXL       PEL 
## 0.4524830 0.4477669 0.4522636 0.4319753 0.4312416 0.4484422 0.4430688 
##       PJL       PUR       STU       TSI       YRI 
## 0.4543658 0.4490633 0.4500952 0.4601421 0.4310262

Remove pop specific SNPs from EA_SNPS and compute PGS only with common alleles

EA_SNPS_common=EA_SNPS[which(EA_SNPS$YRI<0.99 & EA_SNPS$YRI>0.01),]
PGS_common=apply(EA_SNPS_common[,c(3:28)],2,mean, na.rm=TRUE)

PGS_common

##       ACB       ASW       BEB       CDX       CEU       CHB       CHS 
## 0.4791506 0.4801676 0.4940630 0.5079197 0.5030657 0.5135130 0.5108930 
##       CLM       ESN       FIN       GBR       GIH       GWD       IBS 
## 0.5003317 0.4789523 0.5021442 0.5002876 0.4979446 0.4782798 0.5059835 
##       ITU       JPT       KHV       LWK       MSL       MXL       PEL 
## 0.4983000 0.5114401 0.5089760 0.4792303 0.4765589 0.4945481 0.4912919 
##       PJL       PUR       STU       TSI       YRI 
## 0.4945491 0.4982913 0.4972943 0.5025037 0.4798599

Correlation between common and specific PGS

cor(PGS_common, PGS_specific)

## [1] 0.793964

Population-specific PGS plot

##    Population PGS_specific
## 26        YRI    0.4310262
## 9         ESN    0.4311399
## 19        MSL    0.4312416
## 13        GWD    0.4314126

Common PGS plot

PGS=data.frame(PGS_specific, PGS_common)
setDT(PGS, keep.rownames = "Population")
PGS <- aggregate(PGS$PGS_common, by=list(PGS$Population), FUN=mean)  # aggregate
colnames(PGS) <- c("Population", "PGS_common")  # change column names
PGS<- PGS[order(PGS$PGS_common), ]  # sort
PGS$Population <- factor(PGS$Population, levels = PGS$Population)  # to retain the order in plot
head(PGS, 4)

##    Population PGS_common
## 19        MSL  0.4765589
## 13        GWD  0.4782798
## 9         ESN  0.4789523
## 1         ACB  0.4791506

theme_set(theme_bw())

ggplot(PGS, aes(x=Population, y=PGS_common)) + 
  geom_point(size=3) + 
  geom_segment(aes(x=Population, 
                   xend=Population, 
                   y=PGS_common, 
                   yend=PGS_common)) + 
  labs(title="EA_MTAG PGS") + 
  theme(axis.text.x = element_text(angle=70, vjust=0.5))

ggplot(PGS, aes(x=PGS_common, y=Population)) +
  geom_segment(aes(x = 0.47, y = Population, xend = PGS_common, yend = Population), color = "grey50") +
  geom_point()+
  labs(title = "EDU PGS Common",
       caption = "Lee et al., 2018 GWAS EA MTAG") +
  scale_x_continuous(name="EA_MTAG")

Plot the difference

PGS=data.frame(PGS_specific, PGS_common)
setDT(PGS, keep.rownames = "Population")
PGS$PGS_diff=PGS$PGS_common - PGS$PGS_specific 
PGS <- aggregate(PGS$PGS_diff, by=list(PGS$Population), FUN=mean)  # aggregate
colnames(PGS) <- c("Population", "PGS_difference")  # change column names
PGS<- PGS[order(PGS$PGS_difference), ]  # sort
PGS$Population <- factor(PGS$Population, levels = PGS$Population)  # to retain the order in plot
head(PGS, 4)

##    Population PGS_difference
## 10        FIN     0.03657810
## 11        GBR     0.03945786
## 22        PJL     0.04018326
## 25        TSI     0.04236162

theme_set(theme_bw())

ggplot(PGS, aes(x=Population, y=PGS_difference)) + 
  geom_point(size=3) + 
  geom_segment(aes(x=Population, 
                   xend=Population, 
                   y=PGS_difference, 
                   yend=PGS_difference)) + 
  labs(title="EA_MTAG PGS") + 
  theme(axis.text.x = element_text(angle=70, vjust=0.5))

ggplot(PGS, aes(x=PGS_difference, y=Population)) +
  geom_segment(aes(x = 0.02, y = Population, xend = PGS_difference, yend = Population), color = "grey50") +
  geom_point()+
  labs(title = "Common-specific PGS Difference",
       caption = "Lee et al., 2018 GWAS EA MTAG") +
  scale_x_continuous(name="EA_MTAG")

Black-White Gap before and after removal of population-specific SNPs: Before (CEU-YRI)

EUR_AFR_gap_before=0.4943137-0.4699792
EUR_AFR_gap_before

## [1] 0.0243345

After (CEU-YRI):

EUR_AFR_gap_after= 0.5030657-0.4798599
EUR_AFR_gap_after

## [1] 0.0232058

Difference between gaps (before vs after)

EUR_AFR_gap_after - EUR_AFR_gap_before

## [1] -0.0011287

East Asian-White gap before and after removal of population-specific SNPs:

Before (CHS-CEU)

EAS_EUR_gap_before=0.4986564-0.4943137
EAS_EUR_gap_before

## [1] 0.0043427

After (CHS-CEU)

EAS_EUR_gap_after=0.5108930-0.5030657
EAS_EUR_gap_after

## [1] 0.0078273

Difference between gaps (before and after)

EAS_EUR_gap_after - EAS_EUR_gap_before

## [1] 0.0034846

Ratio between the EAS-EUR and EUR-AFR gaps, before and after

Ratio_before=EAS_EUR_gap_before/EUR_AFR_gap_before
Ratio_after=EAS_EUR_gap_after/EUR_AFR_gap_after
Ratio_before

## [1] 0.1784586

Ratio_after

## [1] 0.3372993

There were 659 non-African specific SNPs (about 20% of the total sample). Among those, 375 and 284 of -respectively- positive and negative effect alleles were non African specific. In the absence of African-based GWAS, we might not know if this proportion is higher or lower for African-specific alleles. By removing non-African specific alleles, and computing polygenic scores after this removal, we assume that the proportion of beneficial mutations that are unique to Africans (which will be found by future African GWAS’) is the same as for Europeans. In fact, this might be higher or lower, but we just don’t know. Before removing the Eurasian-specific SNPs, the East Asian/White gap is 17.8% of the White-Black gap. After removal of the population specific SNPs, the East Asian/White gap is 33.7% of the White-Black gap.

Conclusion: After removing population specific variants, the Black-White PGS gap is slightly reduced (2.43% vs 2.32%). The common-specific PGS difference shows a small bias in favour of European populations, and against non-European populations, particularly East Asians and to a smaller extent Africans. This increases the differences between East Asians and Europeans and slightly reduces the Black White gap. Projecting the PGS differences to phenotypic (IQ) differences, if the East Asian/European gap is 5 IQ points, using the full PGS, the Black/White gap is 5/0.178 times larger than the former, which is equal to 28.09. This correspond to an African IQ= 100-28.09=71.91. Using the PGS without population-specific variants, the gap is 5/0.337 times larger,equal to 14.83. This corresponds to an African IQ=100-14.83= 85.17. More accurate cross-population comparisons of PGS will require inclusion of the population-specific SNPs and computation of population-specific PGS. This will be achieved only when sufficiently large GWAS will be carried out in different ethnic groups.

Eurasian specific alleles and polygenic scores

Davide Piffer

18 April 2019