2017 Intelligence GWAS: Group-level polygenic scores

A research group (Sniekers et al. 2017) yesterday published the largest GWAS of intelligence. I will calculate polygenic scores using the independent hits and compare them with polygenic scores computed from previous educational attainment GWAS and reported in Piffer (2017). You will need to load 5 external files (in this order):“Nature2017IntelligenceGWAS.csv”,“df_merged_Nature_top_positive.csv”,“Int_EA_Replicated.csv”, “Matched_18_freqs.csv”,“Replication_linked_Okbay.csv”, “FstDistances.csv” Download them from: osf.io/5yhf

Load packages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(RMySQL)

## Loading required package: DBI

Call to 1000 Genomes database

sql = src_mysql("population_freqs", host = "128.199.203.208", user = "anon", port = 3306)
sql_1kg = tbl(sql, "1000genomes_phase3")

Load list of SNPs from file Nature2017IntelligenceGWAS.csv

Nature2017IntelligenceGWAS=read.csv(file.choose(),header=TRUE) 
Nature2017_ind=Nature2017IntelligenceGWAS
snps = Nature2017_ind$rsID

Fetch from SQL server

(Nature_2017_SNPs= sql_1kg %>% filter(SNP %in% snps) %>% collect())

## # A tibble: 15 × 35
##      CHR        SNP    A1    A2    ACB    ASW    BEB    CDX    CEU    CHB
##    <int>      <chr> <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1      2 rs10191758     G     A 0.1979 0.2049 0.1047 0.0000 0.3687 0.0097
## 2      7 rs10236197     C     T 0.5833 0.5000 0.3372 0.4839 0.3586 0.5728
## 3      9 rs11138902     A     G 0.2865 0.4016 0.4302 0.3548 0.5859 0.4612
## 4      1 rs12744310     T     C 0.1042 0.1066 0.1047 0.4516 0.2172 0.3010
## 5     16 rs12928404     C     T 0.6042 0.6066 0.3023 0.5806 0.3737 0.4563
## 6      2 rs13010010     T     C 0.0417 0.0492 0.1163 0.4839 0.4091 0.5291
## 7     17 rs16954078     A     T 0.0260 0.0492 0.1105 0.0000 0.2020 0.0049
## 8     13  rs2251499     T     C 0.1562 0.1803 0.1512 0.1129 0.2727 0.1117
## 9      6  rs2490272     T     C 0.1354 0.2213 0.4651 0.5591 0.6465 0.8107
## 10    22 rs36093924     T     C 0.5365 0.5492 0.5174 0.2527 0.4444 0.2379
## 11     7  rs4728302     T     C 0.2708 0.4508 0.3837 0.6505 0.5505 0.4757
## 12     2  rs6746731     G     T 0.2448 0.3361 0.5814 0.2419 0.5909 0.3835
## 13     3  rs6779302     T     G 0.4479 0.4180 0.3721 0.0107 0.3636 0.0583
## 14     3  rs7646501     G     A 0.7240 0.7213 0.3721 0.1129 0.2374 0.1796
## 15     6  rs9320913     A     C 0.1615 0.2295 0.2907 0.4409 0.5000 0.4369
## # ... with 25 more variables: CHS <dbl>, CLM <dbl>, ESN <dbl>, FIN <dbl>,
## #   GBR <dbl>, GIH <dbl>, GWD <dbl>, IBS <dbl>, ITU <dbl>, JPT <dbl>,
## #   KHV <dbl>, LWK <dbl>, MSL <dbl>, MXL <dbl>, PEL <dbl>, PJL <dbl>,
## #   PUR <dbl>, STU <dbl>, TSI <dbl>, YRI <dbl>, EAS <dbl>, EUR <dbl>,
## #   AFR <dbl>, AMR <dbl>, SAS <dbl>

Merge file with downloaded frequencies to original input file with extra info (z scores, p value, etc)

df_merged_Nature_top=merge(Nature_2017_SNPs,Nature2017_ind,by.x="SNP",by.y="rsID")
write.csv(df_merged_Nature_top,"df_merged_Nature_top.csv")

Unfortunately 3 variants were missing (rs66495454,rs113315451,rs41352752). Two indels and 1 SNP.Had to add them manually by using 1000 Genomes Ensembl browswer. Manually reversed the allele when z score was negative to get only positive alleles. Load updated file “df_merged_Nature_top_positive”

df_merged_Nature_top_positive=read.csv(file.choose(),header=TRUE)
df_pol_top=df_merged_Nature_top_positive[,1:26]

Calculate polygenic scores

PS_top=apply(df_pol_top,2,mean)
PS_top

##       ACB       ASW       BEB       CDX       CEU       CHB       CHS 
## 0.4753722 0.4690556 0.4312000 0.4925333 0.4747000 0.5264056 0.5071167 
##       CLM       ESN       FIN       GBR       GIH       GWD       IBS 
## 0.4622167 0.4685500 0.4621556 0.4654611 0.4706389 0.4717833 0.4535333 
##       ITU       JPT       KHV       LWK       MSL       MXL       PEL 
## 0.4305722 0.5315111 0.5073278 0.4567889 0.4885778 0.4422722 0.4467167 
##       PJL       PUR       STU       TSI       YRI 
## 0.4433056 0.4551167 0.4439056 0.4579222 0.4704444

Apply weights (z scores)

df_weighted_top=(df_merged_Nature_top_positive[,1:26])*(abs(df_merged_Nature_top_positive$z))
PS_top_w=apply(df_weighted_top,2,mean)
PS_top_w

##      ACB      ASW      BEB      CDX      CEU      CHB      CHS      CLM 
## 2.791038 2.760445 2.563153 2.922041 2.836348 3.149946 3.019268 2.760920 
##      ESN      FIN      GBR      GIH      GWD      IBS      ITU      JPT 
## 2.750451 2.754825 2.775733 2.799647 2.767001 2.700084 2.549037 3.160909 
##      KHV      LWK      MSL      MXL      PEL      PJL      PUR      STU 
## 3.014528 2.686187 2.872146 2.630699 2.655750 2.635490 2.715050 2.627435 
##      TSI      YRI 
## 2.727961 2.760959

Sniekers et al. (2017) also identified 9 SNPs that reached significance both in their study and in the Okbay educational attainment GWAS. As this constitutes a quasi-replication, these SNPs are more likely to be real (or closer to) causal variants. Load file “Int_EA_Replicated.csv”

df_int_EA=read.csv(file.choose(),header=TRUE)
PS_int_EA=apply(df_int_EA[,2:27],2,mean)
PS_int_EA

##       ACB       ASW       BEB       CDX       CEU       CHB       CHS 
## 0.5197333 0.5236667 0.5445667 0.6320667 0.5813111 0.6272889 0.6381778 
##       CLM       ESN       FIN       GBR       GIH       GWD       IBS 
## 0.5555667 0.4988889 0.5768667 0.5689667 0.5378556 0.4970889 0.5596778 
##       ITU       JPT       KHV       LWK       MSL       MXL       PEL 
## 0.5522444 0.6255556 0.6274222 0.4909889 0.5267778 0.5441889 0.5607667 
##       PJL       PUR       STU       TSI       YRI 
## 0.5428000 0.5416667 0.5293667 0.5617000 0.5011111

Compute correlation between unweighted, weighted PS, Piffer-estimated PGS, Population IQ (Piffer 2015). PGS were calculated using Okbay et al. (2016), Davies et al. (2016) and Rietveld et al. (2013) meta-analyses of educational attainment. For more details see Piffer (2017)

IQ=c(83,85,81,NA,99,105,105,83.5,71,101,100,NA,62,97,NA,105,99.4,74,64,88,85,84,83.5,79,99,71)
Piffer_2017_9= c(0.323,0.355,0.401,0.485,0.428,0.553,0.535,0.394,0.334,0.477,0.469,0.429,
                 0.328,0.438,0.412,0.530,0.517,0.333,0.341,0.385,0.353,0.419,0.394,0.404,0.439,
                 0.340)
Piffer_2017_Okbay_162SNPs=c(0.485,0.483,0.503,0.516,0.503,0.529,0.522, 0.498,0.479,0.516,0.506,
                            0.506,0.484,0.513,0.504,0.522,0.522,0.477,0.473, 0.495,0.474,0.505,
                            0.498,0.501,0.513,0.482)


df_PS=data.frame(PS_top,PS_top_w,PS_int_EA,IQ,Piffer_2017_9,Piffer_2017_Okbay_162SNPs)

IQ has NAs, hence use pairwise deletion

cor(df_PS,use="pairwise.complete.obs")

##                              PS_top  PS_top_w PS_int_EA        IQ
## PS_top                    1.0000000 0.9936323 0.6136411 0.4088755
## PS_top_w                  0.9936323 1.0000000 0.6814205 0.4966647
## PS_int_EA                 0.6136411 0.6814205 1.0000000 0.8766974
## IQ                        0.4088755 0.4966647 0.8766974 1.0000000
## Piffer_2017_9             0.5731021 0.6460421 0.9241116 0.8797065
## Piffer_2017_Okbay_162SNPs 0.4218000 0.4979507 0.8351598 0.8543011
##                           Piffer_2017_9 Piffer_2017_Okbay_162SNPs
## PS_top                        0.5731021                 0.4218000
## PS_top_w                      0.6460421                 0.4979507
## PS_int_EA                     0.9241116                 0.8351598
## IQ                            0.8797065                 0.8543011
## Piffer_2017_9                 1.0000000                 0.9500431
## Piffer_2017_Okbay_162SNPs     0.9500431                 1.0000000

Correlation between the intelligence PS and IQ,PS_Piffer2017,PS_Piffer_2017_162SNPs are r=0.496, 0.646, 0.497. Correlations between the intelligence-EA PS and IQ, PS_Piffer2017_9, PS_Piffer_2016_162SNPs are r= 0.877, 0.924, 0.835 These are lower than the correlations previously observed (Piffer, 2017)

Run Monte Carlo simulation using random SNPs First to calculate Monte Carlo p value for correlation with IQ Load random SNPs (N=13,130). “Matched_18_freqs.csv”

sql_random_freqs <- read.csv(file.choose(),header=TRUE)

Keep only freqs.Transpose

t_random_freqs=t(sql_random_freqs[6:31])

Compute non-overlapping polygenic scores of same size (18)

byapply <- function(t_random_freqs, by, fun, ...)
{
  # Create index list
  if (length(by) == 1)
  {
    nc <- ncol(t_random_freqs)
    split.index <- rep(1:ceiling(nc / by), each = by, length.out = nc)
  } else # 'by' is a vector of groups
  {
    nc <- length(by)
    split.index <- by
  }
  index.list <- split(seq(from = 1, to = nc), split.index)
  
  # Pass index list to fun using sapply() and return object
  sapply(index.list, function(i)
  {
    do.call(fun, list(t_random_freqs[, i], ...))
  })
}

# Run function
y <- byapply(t_random_freqs, 18, rowMeans)

Compute corr between IQ and PGS for random sets of 18 SNPs (N=730)

cor_random18_IQ=cor(y,IQ,use="complete.obs")

Calculate Monte Carlo p value (applying formula with correction) for weighted PS

r_w=length(which(cor_random18_IQ>=0.496))
n_w=length(cor_random18_IQ)
MC_p_w=(r_w+1)/(n_w+1)
MC_p_w

## [1] 0.3816689

The MC p value is 0.382. This means that the same results (r x IQ > 0.496) can be achieved with random SNPs roughly 38.2% of the times.

Create correlation matrix of random and real (PS 9) polygenic scores to obtain a Monte Carlo p value for the correlation between the 9 SNPs EA Polygenic score and the new Nature 2017 polygenic score.

cor_random18_PS_9=cor(y,Piffer_2017_9,use="complete.obs")

Calculate p value

r_18_9=length(which(cor_random18_PS_9>=0.646)) 
n_18_9=length(cor_random18_PS_9)
MC_p_18_9=(r_18_9+1)/(n_18_9+1) 
MC_p_18_9

## [1] 0.120383

Calculate PS by continent

df_weighted_top_continent=(df_merged_Nature_top_positive[,27:31])*(abs(df_merged_Nature_top_positive$z))
PS_top_cont=apply(df_weighted_top_continent,2,mean)
PS_top_cont

##      EAS      EUR      AFR      AMR      SAS 
## 2.913380 2.736141 2.781131 2.720869 2.624233

The MC p value is 0.120. This means that the same results (r x PS_9 > 0.646) can be achieved with random SNPs roughly 12% of the times. In general, the only robust finding of this study is an East Asian (EAS) advantage but the scores for the other populations are largely overlapping.

Run MC simulation for the 9 intelligence-EA pseudo-replicated SNPs.

byapply <- function(t_random_freqs, by, fun, ...)
{
  # Create index list
  if (length(by) == 1)
  {
    nc <- ncol(t_random_freqs)
    split.index <- rep(1:ceiling(nc / by), each = by, length.out = nc)
  } else # 'by' is a vector of groups
  {
    nc <- length(by)
    split.index <- by
  }
  index.list <- split(seq(from = 1, to = nc), split.index)
  
  # Pass index list to fun using sapply() and return object
  sapply(index.list, function(i)
  {
    do.call(fun, list(t_random_freqs[, i], ...))
  })
}

# Run function
y_9 <- byapply(t_random_freqs, 9, rowMeans)

Compute corr between random sets of 9 SNPs and IQ

cor_random9_IQ=cor(y_9,IQ,use="complete.obs")

Calculate p value

r_9_9=length(which(cor_random9_IQ>=0.876)) 
n_9_9=length(cor_random9_IQ)
MC_p_9_9=(r_9_9+1)/(n_9_9+1) 
MC_p_9_9

## [1] 0.00890411

The MC p value for the 9 intelligence-EA quasi-replicated corr x IQ PGS is 0.009.

Factor analysis was used to extrapolate a signal of directional polygenic selection for the 18 intelligence SNPs(Piffer, 2015)

library(psych)
fa_int=fa(t(df_pol_top))
fa_int$scores

##             MR1
## ACB -1.27631875
## ASW -0.96102733
## BEB -0.07533494
## CDX  1.35058189
## CEU  0.84374466
## CHB  1.10943690
## CHS  1.20766955
## CLM  0.35707546
## ESN -1.65963577
## FIN  0.77070237
## GBR  0.79683553
## GIH -0.04922992
## GWD -1.35832632
## IBS  0.63069832
## ITU -0.07433635
## JPT  0.87816364
## KHV  1.26670030
## LWK -1.59864602
## MSL -1.44376294
## MXL  0.21476285
## PEL -0.05987738
## PJL  0.06571488
## PUR  0.37462562
## STU -0.39063541
## TSI  0.76435902
## YRI -1.68393985

The results show clearer differences between populations, compared to the polygenic scores and look similar to the polygenic scores published by Piffer (2015, 2017)

cor(fa_int$scores,df_PS)

##        PS_top  PS_top_w PS_int_EA IQ Piffer_2017_9
## MR1 0.3082242 0.4038636 0.8990939 NA     0.8876893
##     Piffer_2017_Okbay_162SNPs
## MR1                 0.8799348

Compute correlation with IQ (remove NA pairwise)

cor(fa_int$scores,df_PS,use = "pairwise.complete.obs")

##        PS_top  PS_top_w PS_int_EA        IQ Piffer_2017_9
## MR1 0.3082242 0.4038636 0.8990939 0.9034576     0.8876893
##     Piffer_2017_Okbay_162SNPs
## MR1                 0.8799348

Repeat for the intelligence-EA SNPs

fa_int_ea=fa(t(df_int_EA[,2:27]))
fa_int_ea$scores

##               MR1
## ACB -1.0629236351
## ASW -0.9970987866
## BEB -0.6601555423
## CDX  1.2507053503
## CEU  0.7542623611
## CHB  1.3743195824
## CHS  1.6347042174
## CLM -0.1131873487
## ESN -1.2547291601
## FIN  0.5814735810
## GBR  0.7815198753
## GIH -0.0005317457
## GWD -1.1861165277
## IBS  0.4755995807
## ITU -0.2120756729
## JPT  1.3210159053
## KHV  1.9253948560
## LWK -1.2545719258
## MSL -1.1648724693
## MXL -0.2591111303
## PEL -0.7621497853
## PJL  0.0347293802
## PUR -0.2075127835
## STU -0.4320448257
## TSI  0.6767470955
## YRI -1.2433904462

cor(fa_int_ea$scores,df_PS)

##        PS_top PS_top_w PS_int_EA IQ Piffer_2017_9
## MR1 0.5516593 0.626515 0.9401761 NA     0.9618424
##     Piffer_2017_Okbay_162SNPs
## MR1                 0.9236177

cor(fa_int_ea$scores,df_PS,use = "pairwise.complete.obs")

##        PS_top PS_top_w PS_int_EA        IQ Piffer_2017_9
## MR1 0.5516593 0.626515 0.9401761 0.9061601     0.9618424
##     Piffer_2017_Okbay_162SNPs
## MR1                 0.9236177

Indeed, as we can see above, the factor scores have very high correlations with population IQ (r=0.9) and the 162 SNPs polygenic score from Okbay et.al (2017) and the SNPs polygenic score made from the 9 pseudo-repllicated SNPs (Piffer, 2017). Thus, in this instance factor analysis outperforms simply averging allele frequencies.

Show factor loadings

fa_int$loadings

## 
## Loadings:
##       MR1   
##  [1,]       
##  [2,] -0.382
##  [3,]  0.710
##  [4,] -0.778
##  [5,] -0.329
##  [6,]  0.929
##  [7,] -0.396
##  [8,]  0.238
##  [9,]  0.921
## [10,]  0.568
## [11,] -0.713
## [12,] -0.422
## [13,]  0.829
## [14,] -0.977
## [15,]  0.840
## [16,]  0.726
## [17,] -0.834
## [18,]  0.310
## 
##                  MR1
## SS loadings    7.965
## Proportion Var 0.443

Calculate mean fa loadings

mean(fa_int$loadings)

## [1] 0.06868852

Repeat for the int-ea SNPs

fa_int_ea$loadings

## 
## Loadings:
##       MR1   
##  [1,] -0.358
##  [2,] -0.883
##  [3,]  0.994
##  [4,]  0.156
##  [5,]  0.305
##  [6,]  0.875
##  [7,]  0.789
##  [8,]  0.475
##  [9,] -0.510
## 
##                  MR1
## SS loadings    3.888
## Proportion Var 0.432

mean(fa_int_ea$loadings)

## [1] 0.2047414

Finally, a Monte Carlo simulation with the random (N=730) factor scores. First carry out fa over groups of 18 SNPs…see first group output as test

ind=seq(1,(ncol(t_random_freqs)-18),by=18)
results_fa=lapply(ind, function(x) fa(t_random_freqs[, x + 0:17]))
results_fa[[1]]$scores

##             MR1
## ACB  1.11569074
## ASW  0.53001953
## BEB -0.52523418
## CDX -1.55733116
## CEU  0.47450740
## CHB -1.95284548
## CHS -1.80867302
## CLM  0.28743412
## ESN  0.67975332
## FIN  0.15299121
## GBR  0.60614566
## GIH -0.17495926
## GWD  1.03746149
## IBS  0.92244653
## ITU  0.13683089
## JPT -1.84459552
## KHV -1.81687202
## LWK  0.79571481
## MSL  1.33625037
## MXL  0.07936017
## PEL  0.31384984
## PJL -0.24407216
## PUR  0.39328959
## STU -0.23122284
## TSI  0.35191022
## YRI  0.94214976

Save factor scores

results_fa_scores=lapply(results_fa, "[[" , "scores" )

Convert list to df and transpose

df_fa_scores=data.frame(lapply(data.frame(t(sapply(results_fa_scores, `[`))), unlist))
df_fa_scores=t(df_fa_scores)

Compute correlation between population IQ and random factor scores

cor_fa_iq=cor(df_fa_scores,IQ,use = "pairwise.complete.obs")

Compute Monte Carlo p value for corr x population IQ

r_iq=length(which(cor_fa_iq>=0.9))
n_iq=length(cor_fa_iq)
MC_p_iq=(r_iq+1)/(n_iq+1)
MC_p_iq

## [1] 0.004109589

Repeat for intelligence-EA factor

ind=seq(1,(ncol(t_random_freqs)-9),by=9)
results_fa_int_ea=lapply(ind, function(x) fa(t_random_freqs[, x + 0:8]))
results_fa_scores_int_ea=lapply(results_fa_int_ea, "[[" , "scores" )
df_fa_scores_int_ea=data.frame(lapply(data.frame(t(sapply(results_fa_scores_int_ea, `[`))), unlist))
df_fa_scores_int_ea=t(df_fa_scores_int_ea)
cor_fa_iq_int_ea=cor(df_fa_scores_int_ea,IQ,use = "pairwise.complete.obs")

Compute Monte Carlo p value for corr x population IQ

r_iq_int_ea=length(which(cor_fa_iq_int_ea>=0.9))
n_iq_int_ea=length(cor_fa_iq_int_ea)
MC_p_iq_int_ea=(r_iq_int_ea+1)/(n_iq_int_ea+1)
MC_p_iq_int_ea

## [1] 0.002056203

Compute correlation between PGS Okbay et al. (N=162) and random factor scores

cor_fa_Okbay=cor(df_fa_scores,Piffer_2017_Okbay_162SNPs,use = "pairwise.complete.obs")

Compute Monte Carlo p value

r_Okbay=length(which(cor_fa_Okbay>=0.879))
n_Okbay=length(cor_fa_Okbay)
MC_p_Okbay=(r_Okbay+1)/(n_Okbay+1)
MC_p_Okbay

## [1] 0.02191781

Do the same as above for the 9 EA SNPs PS (Piffer, 2017)

cor_fa_9=cor(df_fa_scores,Piffer_2017_9,use = "pairwise.complete.obs")
r_9=length(which(cor_fa_9>=0.887))
n_9=length(cor_fa_9)
MC_p_9=(r_9+1)/(n_9+1)
MC_p_9

## [1] 0.01643836

The Monte Carlo simulation showed that the correlation between the factor score obtained from the intelligence GWAS and population IQ, and the previous educational attainment GWAS polygenic scores, is higher than the vast majority of the correlations produced stochastically (i.e. correlations between factor scores of random SNPs and population IQ and ed. att. GWAS PS are almost always lower). Specifically, the empirical (MC) p values are very low (0.004,0.022, 0.016 for population IQ, Okbay 162 PS and Piffer 2017 PS, respectively).

Calculate 9 SNPs EA factor score (load “Replication_linked_Okbay.csv”“)

Replication_linked_Okbay <- read.csv(file.choose(),header=TRUE)
fa_9_EA=fa(Replication_linked_Okbay[,2:10])

Show factor loadings

fa_9_EA$loadings

## 
## Loadings:
##             MR1   
## rs1008078   -0.801
## rs11588857   0.831
## rs12987662   0.956
## rs148734725 -0.492
## rs11712056   0.606
## rs62263923   0.863
## rs13294439   0.910
## rs12969294   0.459
## rs9320913    0.717
## 
##                  MR1
## SS loadings    5.153
## Proportion Var 0.573

Compute mean fa loadings

mean(fa_9_EA$loadings)

## [1] 0.4499548

Correlation between 3 factors: 9 SNPs EA factor, Intell factor, Int-EA factor

df_fac=data.frame(fa_9_EA$scores,fa_int$scores,fa_int_ea$scores)
cor(df_fac)

##             MR1     MR1.1     MR1.2
## MR1   1.0000000 0.9608041 0.9459244
## MR1.1 0.9608041 1.0000000 0.9347101
## MR1.2 0.9459244 0.9347101 1.0000000

Carry out Monte Carlo analysis for EA x int correlation

cor_fa_18=cor(df_fa_scores,fa_9_EA$scores,use = "pairwise.complete.obs")
r_18=length(which(cor_fa_18>=0.96))
n_18=length(cor_fa_18)
MC_p_18=(r_18+1)/(n_18+1)
MC_p_18

## [1] 0.02328767

The correlation between the EA and intelligence factors is 0.96 (MC p= 0.023)

Controlling for spatial autocorrelation. Load file “FstDistancesAlph_2016.csv” EA factor

FstDistances <- read.csv(file.choose(),header=TRUE)
library(caTools)
library(lm.beta)
library(car)

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## The following object is masked from 'package:dplyr':
## 
##     recode

combinations_EA=combs(fa_9_EA$scores, 2)
combinations_int=combs(fa_int$scores,2)
#SAC control 9 EA factor
cnames_EA=provideDimnames(combinations_EA, sep = "", base = list(LETTERS))
cnamesdf_EA=as.data.frame(cnames_EA)
distances_EA9=cnamesdf_EA$A-cnamesdf_EA$B
fa_9_dist=abs(distances_EA9)
FstDistances$fa_9_dist=fa_9_dist
fit_dist_fa_9=lm(FstDistances$IQdist ~FstDistances$Fst_Chr1+FstDistances$fa_9_dist)
summary(fit_dist_fa_9)

## 
## Call:
## lm(formula = FstDistances$IQdist ~ FstDistances$Fst_Chr1 + FstDistances$fa_9_dist)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -18.1961  -5.6657   0.0894   5.5857  17.9431 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              5.2076     0.8819   5.905 1.15e-08 ***
## FstDistances$Fst_Chr1  -26.1310    19.8261  -1.318    0.189    
## FstDistances$fa_9_dist  10.3841     1.1716   8.864  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.683 on 250 degrees of freedom
##   (72 observations deleted due to missingness)
## Multiple R-squared:  0.5027, Adjusted R-squared:  0.4987 
## F-statistic: 126.3 on 2 and 250 DF,  p-value: < 2.2e-16

betafit_fit_dist_9=lm.beta(fit_dist_fa_9)
structure(betafit_fit_dist_9)

## 
## Call:
## lm(formula = FstDistances$IQdist ~ FstDistances$Fst_Chr1 + FstDistances$fa_9_dist)
## 
## Standardized Coefficients::
##            (Intercept)  FstDistances$Fst_Chr1 FstDistances$fa_9_dist 
##              0.0000000             -0.1207470              0.8120172

vif(fit_dist_fa_9)

##  FstDistances$Fst_Chr1 FstDistances$fa_9_dist 
##               4.218947               4.218947

SAC control Intelligence factor

cnames_int=provideDimnames(combinations_int, sep = "", base = list(LETTERS))
cnamesdf_int=as.data.frame(cnames_int)
distances_int=cnamesdf_int$A-cnamesdf_int$B
fa_int_dist=abs(distances_int)
FstDistances$fa_int_dist=fa_int_dist
fit_dist_fa_int=lm(FstDistances$IQdist ~FstDistances$Fst_Chr1+FstDistances$fa_int_dist)
summary(fit_dist_fa_int)

## 
## Call:
## lm(formula = FstDistances$IQdist ~ FstDistances$Fst_Chr1 + FstDistances$fa_int_dist)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -20.0620  -5.0967   0.1203   4.7051  18.0239 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                4.5172     0.8315   5.433 1.32e-07 ***
## FstDistances$Fst_Chr1    -23.4298    16.2952  -1.438    0.152    
## FstDistances$fa_int_dist  10.7256     0.9639  11.128  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.203 on 250 degrees of freedom
##   (72 observations deleted due to missingness)
## Multiple R-squared:  0.5629, Adjusted R-squared:  0.5594 
## F-statistic:   161 on 2 and 250 DF,  p-value: < 2.2e-16

betafit_fit_dist_int=lm.beta(fit_dist_fa_int)
structure(betafit_fit_dist_int)

## 
## Call:
## lm(formula = FstDistances$IQdist ~ FstDistances$Fst_Chr1 + FstDistances$fa_int_dist)
## 
## Standardized Coefficients::
##              (Intercept)    FstDistances$Fst_Chr1 FstDistances$fa_int_dist 
##                0.0000000               -0.1082649                0.8378747

vif(fit_dist_fa_int)

##    FstDistances$Fst_Chr1 FstDistances$fa_int_dist 
##                 3.242631                 3.242631

The EA and Intell factor scores predict population IQ also after controlling for SAC (Beta= 0.81 and 0.84, respectively)

References:

Piffer, D. Can We Detect Polygenic Selection on Cognitive Ability Using GWAS Hits? Employing Random SNPs as a Null Model.. Preprints 2017, 2017010127 (doi: 10.20944/preprints201701.0127.v2).

Sniekers et al. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence, Nature Genetics 2017 (doi: doi:10.1038/ng.3869)

2017 Intelligence GWAS: Group-level polygenic scores

Davide Piffer

May 23, 2017