EDU and Height PGS

I will calculate group-level Polygenic Scores (PGS) for 52 populations and 7 super-populations for SNPs that were found to explain variation in educational attainment by the largest GWAS (N= 1.1 million) to date (Lee et al., in press) and in height (Wood et al., 2014).

(Skip this part you’re not running the code but just reading this) ** Download files from: https://osf.io/ewxqj/ HGDP frequencies file: “Lee_results_2.csv”. Frequencies of GWAS hits from 52 populations pre-computed using HGDP-CEPH browser (http://spsmart.cesga.es/ceph.php?dataSet=ceph_stanford). GWAS summary file: “Lee_etalGWAS2018.csv”. HGDP frequencies for EDU by continent file: “Lee_results_continent.csv”" HGDP frequencies for height: “Woodetal_results.csv”" Height GWAS summary file: “ng.3097-S2.csv”" HGDP frequencies for height by continent: “Woodetal_results_continent.csv”"

You will be requested to upload them (in the same order) as the code runs. Note that GWAS files were not LD clumped because this would result in too small number of SNPs being found in the HGDP dataset. However, LD clumping should be performed on the output HGDP files. For simplicity, this step is omitted but will be done in a follow-up version. **

HGDP_CEPH=read.csv(file.choose(), header=TRUE, sep = ";")#open HGDP-CEPH browser output with freqs from GWAS hits
gwas=read.csv(file.choose(), header=TRUE)#open GWAS summary file
gwas=gwas[which(gwas$Pval<5*10^-8),]#select only significant SNPs

Merge two datasets

HGDP_CEPH_merged=merge(HGDP_CEPH,gwas, by.x = "SNP", by.y = "MarkerName")

Split into 8 groups according to levels of vector “var”: “AC” ,“AG”, “CA”, “CT”, “GA”, “GT” ,“TC”, “TG”

Note that some SNPs are on reverse strand so AC is TG on GWAS and viceversa, so this will cause a nonmatch. Select reverse-stranded SNPs and translate into same strand as HGDP-CEPH reference panel. Then, flip frequency of non-matching alleles (when A2 (A1) is (not) HGDP reference allele).

Create objects based on match or nonmatch between A1 and reference allele and flip frequency of nonmatch alleles

Calculate weighted frequency by SNP by group.

Re-merge groups

compute PGS by population

PGS<-with(df_remerged, tapply(GVS, population, mean))
PGS

##         Algeria (Mzab) - Mozabite     Bougainville - NAN Melanesian 
##                     -2.059055e-05                     -3.567165e-04 
##                Brazil - Karitiana                    Brazil - Surui 
##                     -3.317165e-04                     -5.549449e-04 
## C. African Republic - Biaka Pygmy              Cambodia - Cambodian 
##                     -1.962441e-04                     -4.213228e-04 
##                       China - Dai                      China - Daur 
##                      1.578740e-04                     -1.492126e-05 
##                       China - Han                    China - Hezhen 
##                      9.718110e-05                     -2.491339e-05 
##                      China - Lahu                    China - Miaozu 
##                     -2.247165e-04                      6.992913e-05 
##                   China - Mongola                      China - Naxi 
##                      2.216535e-04                      1.038504e-04 
##                    China - Oroqen                       China - She 
##                      2.168268e-04                     -6.139370e-05 
##                        China - Tu                     China - Tujia 
##                     -1.900315e-04                      1.401575e-04 
##                     China - Uygur                      China - Xibo 
##                     -1.086614e-04                      1.097165e-04 
##                      China - Yizu  Colombia - Piapoco and Curripaco 
##                      3.661417e-05                     -1.979449e-04 
##      D. R. of Congo - Mbuti Pygmy                   France - Basque 
##                      4.722047e-05                      1.481732e-04 
##                   France - French           Israel (Carmel) - Druze 
##                      6.676378e-05                     -2.445276e-04 
##    Israel (Central) - Palestinian          Israel (Negev) - Bedouin 
##                     -1.733386e-04                     -2.821890e-04 
##              Italy - from Bergamo                 Italy - Sardinian 
##                     -8.703937e-05                     -2.899606e-04 
##                    Italy - Tuscan                  Japan - Japanese 
##                     -4.414173e-05                     -7.813386e-05 
##                     Kenya - Bantu                     Mexico - Maya 
##                     -2.382992e-04                     -3.859055e-04 
##                     Mexico - Pima                     Namibia - San 
##                     -3.126850e-04                     -4.937008e-04 
##               New Guinea - Papuan                  Nigeria - Yoruba 
##                     -7.946378e-04                     -3.792441e-04 
##         Orkney Islands - Orcadian                Pakistan - Balochi 
##                      9.040945e-05                     -5.339370e-05 
##                 Pakistan - Brahui                Pakistan - Burusho 
##                     -1.697874e-04                     -3.267717e-06 
##                 Pakistan - Hazara                 Pakistan - Kalash 
##                      1.440945e-05                     -3.177953e-05 
##                Pakistan - Makrani                 Pakistan - Pathan 
##                     -1.910236e-04                     -2.441417e-04 
##                 Pakistan - Sindhi                  Population Set 1 
##                     -1.127795e-04                     -1.281969e-04 
##                  Russia - Russian        Russia (Caucasus) - Adygei 
##                     -5.775591e-05                     -1.708819e-04 
##                Senegal - Mandenka                   Siberia - Yakut 
##                     -2.953858e-04                      2.992126e-05 
##              South Africa - Bantu 
##                     -2.875354e-04

Sort in PGS in descending order

sort(PGS,decreasing = TRUE)

##                   China - Mongola                    China - Oroqen 
##                      2.216535e-04                      2.168268e-04 
##                       China - Dai                   France - Basque 
##                      1.578740e-04                      1.481732e-04 
##                     China - Tujia                      China - Xibo 
##                      1.401575e-04                      1.097165e-04 
##                      China - Naxi                       China - Han 
##                      1.038504e-04                      9.718110e-05 
##         Orkney Islands - Orcadian                    China - Miaozu 
##                      9.040945e-05                      6.992913e-05 
##                   France - French      D. R. of Congo - Mbuti Pygmy 
##                      6.676378e-05                      4.722047e-05 
##                      China - Yizu                   Siberia - Yakut 
##                      3.661417e-05                      2.992126e-05 
##                 Pakistan - Hazara                Pakistan - Burusho 
##                      1.440945e-05                     -3.267717e-06 
##                      China - Daur         Algeria (Mzab) - Mozabite 
##                     -1.492126e-05                     -2.059055e-05 
##                    China - Hezhen                 Pakistan - Kalash 
##                     -2.491339e-05                     -3.177953e-05 
##                    Italy - Tuscan                Pakistan - Balochi 
##                     -4.414173e-05                     -5.339370e-05 
##                  Russia - Russian                       China - She 
##                     -5.775591e-05                     -6.139370e-05 
##                  Japan - Japanese              Italy - from Bergamo 
##                     -7.813386e-05                     -8.703937e-05 
##                     China - Uygur                 Pakistan - Sindhi 
##                     -1.086614e-04                     -1.127795e-04 
##                  Population Set 1                 Pakistan - Brahui 
##                     -1.281969e-04                     -1.697874e-04 
##        Russia (Caucasus) - Adygei    Israel (Central) - Palestinian 
##                     -1.708819e-04                     -1.733386e-04 
##                        China - Tu                Pakistan - Makrani 
##                     -1.900315e-04                     -1.910236e-04 
## C. African Republic - Biaka Pygmy  Colombia - Piapoco and Curripaco 
##                     -1.962441e-04                     -1.979449e-04 
##                      China - Lahu                     Kenya - Bantu 
##                     -2.247165e-04                     -2.382992e-04 
##                 Pakistan - Pathan           Israel (Carmel) - Druze 
##                     -2.441417e-04                     -2.445276e-04 
##          Israel (Negev) - Bedouin              South Africa - Bantu 
##                     -2.821890e-04                     -2.875354e-04 
##                 Italy - Sardinian                Senegal - Mandenka 
##                     -2.899606e-04                     -2.953858e-04 
##                     Mexico - Pima                Brazil - Karitiana 
##                     -3.126850e-04                     -3.317165e-04 
##     Bougainville - NAN Melanesian                  Nigeria - Yoruba 
##                     -3.567165e-04                     -3.792441e-04 
##                     Mexico - Maya              Cambodia - Cambodian 
##                     -3.859055e-04                     -4.213228e-04 
##                     Namibia - San                    Brazil - Surui 
##                     -4.937008e-04                     -5.549449e-04 
##               New Guinea - Papuan 
##                     -7.946378e-04

Calculate Z-score

scale(PGS, center = TRUE, scale = TRUE)

##                                           [,1]
## Algeria (Mzab) - Mozabite          0.511838662
## Bougainville - NAN Melanesian     -1.112980227
## Brazil - Karitiana                -0.992131278
## Brazil - Surui                    -2.071207728
## C. African Republic - Biaka Pygmy -0.337263188
## Cambodia - Cambodian              -1.425284364
## China - Dai                        1.374528882
## China - Daur                       0.539243778
## China - Han                        1.081141888
## China - Hezhen                     0.490942261
## China - Lahu                      -0.474897772
## China - Miaozu                     0.949407017
## China - Mongola                    1.682836440
## China - Naxi                       1.113380963
## China - Oroqen                     1.659504028
## China - She                        0.314597951
## China - Tu                        -0.307231748
## China - Tujia                      1.288887894
## China - Uygur                      0.086107794
## China - Xibo                       1.141737645
## China - Yizu                       0.788363897
## Colombia - Piapoco and Curripaco  -0.345484723
## D. R. of Congo - Mbuti Pygmy       0.839634302
## France - Basque                    1.327635684
## France - French                    0.934105828
## Israel (Carmel) - Druze           -0.570663428
## Israel (Central) - Palestinian    -0.226538906
## Israel (Negev) - Bedouin          -0.752717137
## Italy - from Bergamo               0.190627863
## Italy - Sardinian                 -0.790284984
## Italy - Tuscan                     0.397993242
## Japan - Japanese                   0.233676733
## Kenya - Bantu                     -0.540555863
## Mexico - Maya                     -1.254078513
## Mexico - Pima                     -0.900133825
## Namibia - San                     -1.775156348
## New Guinea - Papuan               -3.229873203
## Nigeria - Yoruba                  -1.221877502
## Orkney Islands - Orcadian          1.048408000
## Pakistan - Balochi                 0.353269615
## Pakistan - Brahui                 -0.209372646
## Pakistan - Burusho                 0.595576517
## Pakistan - Hazara                  0.681027192
## Pakistan - Kalash                  0.457751620
## Pakistan - Makrani                -0.312027644
## Pakistan - Pathan                 -0.568798358
## Pakistan - Sindhi                  0.066201022
## Population Set 1                  -0.008325669
## Russia - Russian                   0.332182900
## Russia (Caucasus) - Adygei        -0.214663356
## Senegal - Mandenka                -0.816510158
## Siberia - Yakut                    0.756010635
## South Africa - Bantu              -0.778561684
## attr(,"scaled:center")
## [1] -0.0001264745
## attr(,"scaled:scale")
## [1] 0.0002068698

Carry out the same job at the sub-continent level. This should make results more stable because populations have very low sample sizes but sub-continental clusters rely on larger samples. Upload “Lee_results_continent.csv”

compute PGS by population

PGS_continent<-with(df_remerged, tapply(GVS, population, mean))
PGS_continent

##             AFRICA            AMERICA CENTRAL-SOUTH ASIA 
##      -2.510866e-04      -3.590079e-04      -1.000551e-04 
##          EAST ASIA             EUROPE        MIDDLE EAST 
##       1.702362e-05      -4.521260e-05      -1.950079e-04 
##            OCEANIA   Population Set 1 
##      -6.217480e-04      -1.281969e-04

Sort in PGS in descending order

sort(PGS_continent,decreasing = TRUE)

##          EAST ASIA             EUROPE CENTRAL-SOUTH ASIA 
##       1.702362e-05      -4.521260e-05      -1.000551e-04 
##   Population Set 1        MIDDLE EAST             AFRICA 
##      -1.281969e-04      -1.950079e-04      -2.510866e-04 
##            AMERICA            OCEANIA 
##      -3.590079e-04      -6.217480e-04

Calculate Z-score

scale(PGS_continent, center = TRUE, scale = TRUE)

##                           [,1]
## AFRICA             -0.19945206
## AMERICA            -0.72864722
## CENTRAL-SOUTH ASIA  0.54113545
## EAST ASIA           1.11523459
## EUROPE              0.81005740
## MIDDLE EAST         0.07553174
## OCEANIA            -2.01700152
## Population Set 1    0.40314161
## attr(,"scaled:center")
## [1] -0.0002104114
## attr(,"scaled:scale")
## [1] 0.0002039347

Let’s compare the results using the largest height GWAS (Wood et al., 2014).

HGDP_CEPH=read.csv(file.choose(), header=TRUE, sep = ";")#open HGDP-CEPH browser output with freqs from GWAS hits
gwas=read.csv(file.choose(), header=TRUE, sep = ";")#open GWAS summary file
HGDP_CEPH_merged=merge(HGDP_CEPH,gwas, by.x = "SNP", by.y = "SNP")
df_AC=subset(HGDP_CEPH_merged, var=="AC")
df_AG=subset(HGDP_CEPH_merged, var=="AG")
df_CA=subset(HGDP_CEPH_merged, var=="CA")
df_CT=subset(HGDP_CEPH_merged, var=="CT")
df_GA=subset(HGDP_CEPH_merged, var=="GA")
df_GT=subset(HGDP_CEPH_merged, var=="GT")
df_TC=subset(HGDP_CEPH_merged, var=="TC")
df_TG=subset(HGDP_CEPH_merged, var=="TG")

#AC
df_AC_rev_strand_matching=subset(df_AC,A1=="T")
df_AC_rev_strand_nonmatching=subset(df_AC,A1=="G")
#replace "T" with "A" and "C" with "G"
df_AC_rev_strand_matching$A1[df_AC_rev_strand_matching$A1=="T"] <- "A" 
df_AC_rev_strand_matching$A2[df_AC_rev_strand_matching$A2=="G"] <- "C" 
df_AC_rev_strand_nonmatching$A1[df_AC_rev_strand_nonmatching$A1=="G"] <- "C" 
df_AC_rev_strand_nonmatching$A2[df_AC_rev_strand_nonmatching$A2=="T"] <- "A"
df_AC_rev_strand_nonmatching$freq_A=df_AC_rev_strand_nonmatching$freq_C#flip freq
#AG
df_AG_rev_strand_matching=subset(df_AG,A1=="T")
df_AG_rev_strand_nonmatching=subset(df_AG,A1=="C")
#replace "T" with "A" and "C" with "G"
df_AG_rev_strand_matching$A1[df_AG_rev_strand_matching$A1=="T"] <- "A" 
df_AG_rev_strand_matching$A2[df_AG_rev_strand_matching$A2=="C"] <- "G" 
df_AG_rev_strand_nonmatching$A1[df_AG_rev_strand_nonmatching$A1=="C"] <- "G" 
df_AG_rev_strand_nonmatching$A2[df_AG_rev_strand_nonmatching$A2=="T"] <- "A"
df_AG_rev_strand_nonmatching$freq_A=df_AG_rev_strand_nonmatching$freq_G#flip freq
#CA
df_CA_rev_strand_matching=subset(df_CA,A1=="G")
df_CA_rev_strand_nonmatching=subset(df_CA,A1=="T")
#replace "T" with "A" and "C" with "G"
df_CA_rev_strand_matching$A1[df_CA_rev_strand_matching$A1=="G"] <- "C" 
df_CA_rev_strand_matching$A2[df_CA_rev_strand_matching$A2=="T"] <- "A" 
df_CA_rev_strand_nonmatching$A1[df_CA_rev_strand_nonmatching$A1=="T"] <- "A" 
df_CA_rev_strand_nonmatching$A2[df_CA_rev_strand_nonmatching$A2=="G"] <- "C"
df_CA_rev_strand_nonmatching$freq_C=df_CA_rev_strand_nonmatching$freq_A#flip freq
#CT
df_CT_rev_strand_matching=subset(df_CT,A1=="G")
df_CT_rev_strand_nonmatching=subset(df_CT,A1=="A")
#replace "T" with "A" and "C" with "G"
df_CT_rev_strand_matching$A1[df_CT_rev_strand_matching$A1=="G"] <- "C" 
df_CT_rev_strand_matching$A2[df_CT_rev_strand_matching$A2=="A"] <- "T" 
df_CT_rev_strand_nonmatching$A1[df_CT_rev_strand_nonmatching$A1=="A"] <- "T" 
df_CT_rev_strand_nonmatching$A2[df_CT_rev_strand_nonmatching$A2=="G"] <- "C"
df_CT_rev_strand_nonmatching$freq_C=df_CT_rev_strand_nonmatching$freq_T#flip freq
#GA
df_GA_rev_strand_matching=subset(df_GA,A1=="C")
df_GA_rev_strand_nonmatching=subset(df_GA,A1=="T")
#replace "T" with "A" and "C" with "G"
df_GA_rev_strand_matching$A1[df_GA_rev_strand_matching$A1=="C"] <- "G" 
df_GA_rev_strand_matching$A2[df_GA_rev_strand_matching$A2=="T"] <- "A" 
df_GA_rev_strand_nonmatching$A1[df_GA_rev_strand_nonmatching$A1=="T"] <- "A" 
df_GA_rev_strand_nonmatching$A2[df_GA_rev_strand_nonmatching$A2=="C"] <- "G"
df_GA_rev_strand_nonmatching$freq_G=df_GA_rev_strand_nonmatching$freq_A#flip freq
#GT
df_GT_rev_strand_matching=subset(df_GT,A1=="C")
df_GT_rev_strand_nonmatching=subset(df_GT,A1=="A")
#replace "T" with "A" and "C" with "G"and viceversa
df_GT_rev_strand_matching$A1[df_GT_rev_strand_matching$A1=="C"] <- "G" 
df_GT_rev_strand_matching$A2[df_GT_rev_strand_matching$A2=="A"] <- "T" 
df_GT_rev_strand_nonmatching$A1[df_GT_rev_strand_nonmatching$A1=="A"] <- "T" 
df_GT_rev_strand_nonmatching$A2[df_GT_rev_strand_nonmatching$A2=="C"] <- "G"
df_GT_rev_strand_nonmatching$freq_G=df_GT_rev_strand_nonmatching$freq_T#flip freq
#TC
df_TC_rev_strand_matching=subset(df_TC,A1=="A")
df_TC_rev_strand_nonmatching=subset(df_TC,A1=="G")
#replace "T" with "A" and "C" with "G"and viceversa
df_TC_rev_strand_matching$A1[df_TC_rev_strand_matching$A1=="A"] <- "T" 
df_TC_rev_strand_matching$A2[df_TC_rev_strand_matching$A2=="G"] <- "C" 
df_TC_rev_strand_nonmatching$A1[df_TC_rev_strand_nonmatching$A1=="G"] <- "C" 
df_TC_rev_strand_nonmatching$A2[df_TC_rev_strand_nonmatching$A2=="A"] <- "T"
df_TC_rev_strand_nonmatching$freq_T=df_TC_rev_strand_nonmatching$freq_C#flip freq
#TG
df_TG_rev_strand_matching=subset(df_TG,A1=="A")
df_TG_rev_strand_nonmatching=subset(df_TG,A1=="C")
#replace "T" with "A" and "C" with "G"and viceversa
df_TG_rev_strand_matching$A1[df_TG_rev_strand_matching$A1=="A"] <- "T" 
df_TG_rev_strand_matching$A2[df_TG_rev_strand_matching$A2=="C"] <- "G" 
df_TG_rev_strand_nonmatching$A1[df_TG_rev_strand_nonmatching$A1=="C"] <- "G" 
df_TG_rev_strand_nonmatching$A2[df_TG_rev_strand_nonmatching$A2=="A"] <- "T"
df_TG_rev_strand_nonmatching$freq_T=df_TG_rev_strand_nonmatching$freq_G#flip freq

#AC
df_AC_match=df_AC[which(as.character(df_AC$reference)==as.character(df_AC$A1)),]
df_AC_nonmatch=df_AC[which(as.character(df_AC$reference)==as.character(df_AC$A2)),]
df_AC_nonmatch$freq_A=df_AC_nonmatch$freq_C
#re-merge objects
df_AC=rbind(df_AC_nonmatch,df_AC_match, df_AC_rev_strand_matching,df_AC_rev_strand_nonmatching)
#AG
df_AG_match=df_AG[which(as.character(df_AG$reference)==as.character(df_AG$A1)),]
df_AG_nonmatch=df_AG[which(as.character(df_AG$reference)==as.character(df_AG$A2)),]
df_AG_nonmatch$freq_A=df_AG_nonmatch$freq_G
#re-merge objects
df_AG=rbind(df_AG_nonmatch,df_AG_match,df_AG_rev_strand_matching,df_AG_rev_strand_nonmatching )
#CA
df_CA_match=df_CA[which(as.character(df_CA$reference)==as.character(df_CA$A1)),]
df_CA_nonmatch=df_CA[which(as.character(df_CA$reference)==as.character(df_CA$A2)),]
df_CA_nonmatch$freq_C=df_CA_nonmatch$freq_A
#re-merge objects
df_CA=rbind(df_CA_nonmatch,df_CA_match,df_CA_rev_strand_matching,df_CA_rev_strand_nonmatching)
#CT
df_CT_match=df_CT[which(as.character(df_CT$reference)==as.character(df_CT$A1)),]
df_CT_nonmatch=df_CT[which(as.character(df_CT$reference)==as.character(df_CT$A2)),]
df_CT_nonmatch$freq_C=df_CT_nonmatch$freq_T
#re-merge objects
df_CT=rbind(df_CT_nonmatch,df_CT_match,df_CT_rev_strand_matching,df_CT_rev_strand_nonmatching)
#GA
df_GA_match=df_GA[which(as.character(df_GA$reference)==as.character(df_GA$A1)),]
df_GA_nonmatch=df_GA[which(as.character(df_GA$reference)==as.character(df_GA$A2)),]
df_GA_nonmatch$freq_G=df_GA_nonmatch$freq_A
#re-merge objects
df_GA=rbind(df_GA_nonmatch,df_GA_match,df_GA_rev_strand_matching,df_GA_rev_strand_nonmatching)

df_GT_match=df_GT[which(as.character(df_GT$reference)==as.character(df_GT$A1)),]
df_GT_nonmatch=df_GT[which(as.character(df_GT$reference)==as.character(df_GT$A2)),]
df_GT_nonmatch$freq_G=df_GT_nonmatch$freq_T
#re-merge objects
df_GT=rbind(df_GT_nonmatch,df_GT_match,df_GT_rev_strand_matching,df_GT_rev_strand_nonmatching)
#TC
df_TC_match=df_TC[which(as.character(df_TC$reference)==as.character(df_TC$A1)),]
df_TC_nonmatch=df_TC[which(as.character(df_TC$reference)==as.character(df_TC$A2)),]
df_TC_nonmatch$freq_T=df_TC_nonmatch$freq_C
#re-merge objects
df_TC=rbind(df_TC_nonmatch,df_TC_match,df_TC_rev_strand_matching,df_TC_rev_strand_nonmatching)
#TG
df_TG_match=df_TG[which(as.character(df_TG$reference)==as.character(df_TG$A1)),]
df_TG_nonmatch=df_TG[which(as.character(df_TG$reference)==as.character(df_TG$A2)),]
df_TG_nonmatch$freq_T=df_TG_nonmatch$freq_G
#re-merge objects
df_TG=rbind(df_TG_nonmatch,df_TG_match,df_TG_rev_strand_matching,df_TG_rev_strand_nonmatching)

df_AC$GVS=df_AC$Beta*df_AC$freq_A
df_AG$GVS=df_AG$Beta*df_AG$freq_A
df_CA$GVS=df_CA$Beta*df_CA$freq_C
df_CT$GVS=df_CT$Beta*df_CT$freq_C
df_GA$GVS=df_GA$Beta*df_GA$freq_G
df_GT$GVS=df_GT$Beta*df_GT$freq_G
df_TC$GVS=df_TC$Beta*df_TC$freq_T
df_TG$GVS=df_TG$Beta*df_TG$freq_T

df_remerged=rbind(df_AC,df_AG,df_CA,df_CT,df_GA,df_GT,df_TC,df_TG)

compute PGS by population

PGS_height<-with(df_remerged, tapply(GVS, population, mean))
PGS_height

##         Algeria (Mzab) - Mozabite     Bougainville - NAN Melanesian 
##                      5.658260e-04                      8.865022e-05 
##                Brazil - Karitiana                    Brazil - Surui 
##                      8.304815e-04                      7.197035e-04 
## C. African Republic - Biaka Pygmy              Cambodia - Cambodian 
##                     -2.471546e-04                      1.691982e-04 
##                       China - Dai                      China - Daur 
##                      1.870784e-04                     -2.276167e-04 
##                       China - Han                    China - Hezhen 
##                     -2.139295e-04                     -1.527322e-04 
##                      China - Lahu                    China - Miaozu 
##                     -9.264581e-05                     -3.045727e-05 
##                   China - Mongola                      China - Naxi 
##                      2.100238e-04                      4.921185e-04 
##                    China - Oroqen                       China - She 
##                      2.882057e-04                      2.034705e-04 
##                        China - Tu                     China - Tujia 
##                      5.267974e-05                      2.288458e-05 
##                     China - Uygur                      China - Xibo 
##                      4.501542e-04                      1.105154e-05 
##                      China - Yizu  Colombia - Piapoco and Curripaco 
##                      1.066300e-04                      5.386335e-04 
##      D. R. of Congo - Mbuti Pygmy                   France - Basque 
##                     -2.331304e-04                      1.158374e-03 
##                   France - French           Israel (Carmel) - Druze 
##                      1.221944e-03                      6.357106e-04 
##    Israel (Central) - Palestinian          Israel (Negev) - Bedouin 
##                      7.785256e-04                      3.776189e-04 
##              Italy - from Bergamo                 Italy - Sardinian 
##                      9.144101e-04                      6.108048e-04 
##                    Italy - Tuscan                  Japan - Japanese 
##                      7.082286e-04                     -1.293392e-04 
##                     Kenya - Bantu                     Mexico - Maya 
##                     -1.779181e-04                      7.738093e-04 
##                     Mexico - Pima                     Namibia - San 
##                      6.677194e-04                     -5.796916e-04 
##               New Guinea - Papuan                  Nigeria - Yoruba 
##                      2.257982e-04                     -3.220859e-04 
##         Orkney Islands - Orcadian                Pakistan - Balochi 
##                      1.388032e-03                      6.966396e-04 
##                 Pakistan - Brahui                Pakistan - Burusho 
##                      6.089916e-04                      5.401361e-04 
##                 Pakistan - Hazara                 Pakistan - Kalash 
##                      5.301066e-04                      3.747515e-04 
##                Pakistan - Makrani                 Pakistan - Pathan 
##                      4.680559e-04                      5.521789e-04 
##                 Pakistan - Sindhi                  Population Set 1 
##                      4.809907e-04                      4.253793e-04 
##                  Russia - Russian        Russia (Caucasus) - Adygei 
##                      1.219391e-03                      1.099690e-03 
##                Senegal - Mandenka                   Siberia - Yakut 
##                     -1.233216e-05                      1.275590e-04 
##              South Africa - Bantu 
##                     -2.494581e-05

Sort in PGS in descending order

sort(PGS_height,decreasing = TRUE)

##         Orkney Islands - Orcadian                   France - French 
##                      1.388032e-03                      1.221944e-03 
##                  Russia - Russian                   France - Basque 
##                      1.219391e-03                      1.158374e-03 
##        Russia (Caucasus) - Adygei              Italy - from Bergamo 
##                      1.099690e-03                      9.144101e-04 
##                Brazil - Karitiana    Israel (Central) - Palestinian 
##                      8.304815e-04                      7.785256e-04 
##                     Mexico - Maya                    Brazil - Surui 
##                      7.738093e-04                      7.197035e-04 
##                    Italy - Tuscan                Pakistan - Balochi 
##                      7.082286e-04                      6.966396e-04 
##                     Mexico - Pima           Israel (Carmel) - Druze 
##                      6.677194e-04                      6.357106e-04 
##                 Italy - Sardinian                 Pakistan - Brahui 
##                      6.108048e-04                      6.089916e-04 
##         Algeria (Mzab) - Mozabite                 Pakistan - Pathan 
##                      5.658260e-04                      5.521789e-04 
##                Pakistan - Burusho  Colombia - Piapoco and Curripaco 
##                      5.401361e-04                      5.386335e-04 
##                 Pakistan - Hazara                      China - Naxi 
##                      5.301066e-04                      4.921185e-04 
##                 Pakistan - Sindhi                Pakistan - Makrani 
##                      4.809907e-04                      4.680559e-04 
##                     China - Uygur                  Population Set 1 
##                      4.501542e-04                      4.253793e-04 
##          Israel (Negev) - Bedouin                 Pakistan - Kalash 
##                      3.776189e-04                      3.747515e-04 
##                    China - Oroqen               New Guinea - Papuan 
##                      2.882057e-04                      2.257982e-04 
##                   China - Mongola                       China - She 
##                      2.100238e-04                      2.034705e-04 
##                       China - Dai              Cambodia - Cambodian 
##                      1.870784e-04                      1.691982e-04 
##                   Siberia - Yakut                      China - Yizu 
##                      1.275590e-04                      1.066300e-04 
##     Bougainville - NAN Melanesian                        China - Tu 
##                      8.865022e-05                      5.267974e-05 
##                     China - Tujia                      China - Xibo 
##                      2.288458e-05                      1.105154e-05 
##                Senegal - Mandenka              South Africa - Bantu 
##                     -1.233216e-05                     -2.494581e-05 
##                    China - Miaozu                      China - Lahu 
##                     -3.045727e-05                     -9.264581e-05 
##                  Japan - Japanese                    China - Hezhen 
##                     -1.293392e-04                     -1.527322e-04 
##                     Kenya - Bantu                       China - Han 
##                     -1.779181e-04                     -2.139295e-04 
##                      China - Daur      D. R. of Congo - Mbuti Pygmy 
##                     -2.276167e-04                     -2.331304e-04 
## C. African Republic - Biaka Pygmy                  Nigeria - Yoruba 
##                     -2.471546e-04                     -3.220859e-04 
##                     Namibia - San 
##                     -5.796916e-04

Calculate Z-score

scale(PGS_height, center = TRUE, scale = TRUE)

##                                          [,1]
## Algeria (Mzab) - Mozabite          0.46202378
## Bougainville - NAN Melanesian     -0.60887692
## Brazil - Karitiana                 1.05597633
## Brazil - Surui                     0.80736310
## C. African Republic - Biaka Pygmy -1.36250623
## Cambodia - Cambodian              -0.42810721
## China - Dai                       -0.38797966
## China - Daur                      -1.31865837
## China - Han                       -1.28794085
## China - Hezhen                    -1.15059881
## China - Lahu                      -1.01575016
## China - Miaozu                    -0.87618365
## China - Mongola                   -0.33648455
## China - Naxi                       0.29660590
## China - Oroqen                    -0.16102490
## China - She                       -0.35119179
## China - Tu                        -0.68960361
## China - Tujia                     -0.75647132
## China - Uygur                      0.20242758
## China - Xibo                      -0.78302759
## China - Yizu                      -0.56852594
## Colombia - Piapoco and Curripaco   0.40099704
## D. R. of Congo - Mbuti Pygmy      -1.33103238
## France - Basque                    1.79184741
## France - French                    1.93451533
## Israel (Carmel) - Druze            0.61886210
## Israel (Central) - Palestinian     0.93937431
## Israel (Negev) - Bedouin           0.03964051
## Italy - from Bergamo               1.24433299
## Italy - Sardinian                  0.56296748
## Italy - Tuscan                     0.78161060
## Japan - Japanese                  -1.09809923
## Kenya - Bantu                     -1.20712222
## Mexico - Maya                      0.92878976
## Mexico - Pima                      0.69069780
## Namibia - San                     -2.10880169
## New Guinea - Papuan               -0.30108278
## Nigeria - Yoruba                  -1.53067059
## Orkney Islands - Orcadian          2.30725722
## Pakistan - Balochi                 0.75560205
## Pakistan - Brahui                  0.55889817
## Pakistan - Burusho                 0.40436935
## Pakistan - Hazara                  0.38186063
## Pakistan - Kalash                  0.03320535
## Pakistan - Makrani                 0.24260357
## Pakistan - Pathan                  0.43139622
## Pakistan - Sindhi                  0.27163247
## Population Set 1                   0.14682658
## Russia - Russian                   1.92878607
## Russia (Caucasus) - Adygei         1.66014763
## Senegal - Mandenka                -0.83550641
## Siberia - Yakut                   -0.52155591
## South Africa - Bantu              -0.86381458
## attr(,"scaled:center")
## [1] 0.0003599558
## attr(,"scaled:scale")
## [1] 0.0004455836

Work on files for height PGS by continent.

Calculate PGS by continent

PGS_height_continent<-with(df_remerged, tapply(GVS, population, mean))
PGS_height_continent

##             AFRICA            AMERICA CENTRAL-SOUTH ASIA 
##      -2.019476e-04       7.300819e-04       5.284300e-04 
##          EAST ASIA             EUROPE        MIDDLE EAST 
##       7.809251e-06       1.055534e-03       5.900705e-04 
##            OCEANIA   Population Set 1 
##       1.699612e-04       4.253793e-04

sort(PGS_height_continent, decreasing = TRUE)

##             EUROPE            AMERICA        MIDDLE EAST 
##       1.055534e-03       7.300819e-04       5.900705e-04 
## CENTRAL-SOUTH ASIA   Population Set 1            OCEANIA 
##       5.284300e-04       4.253793e-04       1.699612e-04 
##          EAST ASIA             AFRICA 
##       7.809251e-06      -2.019476e-04

#transform into Z scores
scale(PGS_height_continent, center = TRUE, scale = TRUE)

##                           [,1]
## AFRICA             -1.51127588
## AMERICA             0.77863670
## CENTRAL-SOUTH ASIA  0.28319594
## EAST ASIA          -0.99592224
## EUROPE              1.57824399
## MIDDLE EAST         0.43464117
## OCEANIA            -0.59752943
## Population Set 1    0.03000975
## attr(,"scaled:center")
## [1] 0.0004131649
## attr(,"scaled:scale")
## [1] 0.0004070153

Calculate correlation between EDU and Height PGS:

cor(PGS_height, PGS)

## [1] 0.08979982

It is difficult to interpret single population scores because they rely on small (sometimes very small, eg. N<10 samples). The top scores for EDU are achieved by East Asians and the lowest scores by some African populations, Melanesians and Papuans (remarkably, populations that are both genetically and geographically very far apart). Conversely, Europeans have top scores for the Height PGS. East Asians and Africans have the lowest scores. The genotypic scores of Europeans (in particular Northern Europeans, such as Orcadians, French and Russians) and East Asians correspond to the well-known physical differences between these populations. The low score of Africans is difficult to explain, because Africans are not particularly short. However, the lowest scores are achieved by the Pygmies and San, which is in line with predictions. The null correlation between height and EDU suggests that selective pressures on these two traits are unrelated and that the particular rankings are not due to statistical artifacts or unknown biases that might produce higher scores for some populations (e.g. LD breakdown, sample size, etc. ).

The two major limitations of this approach are: 1) Very low coverage of the HGDP-CEPH database, which comprises only about 10% of all the SNPs present in 1000 Genomes; 2) Very small sample sizes,with some populations numbering less than 10 individuals (e.g. Italy - Tuscan= 8; San=5; Bantu=8; Colombia - Piapoco and Curripaco= 7; China - Lahu=8).

References:

Wood AR, Esko T, Yang J, et al.: Defining the role of common variation in the genomic and biological architecture of adult human height. Nat Genet. 2014; 46(11): 1173–86.

EDU and Height PGS

Davide Piffer

30/03/2018