1. Introduction

Jewish people have higher IQ on average than non-Jewish people. Has polygenic selection played a role in this difference? This is a well-known hypothesis, described here on Wikipedia, and here by Steven Pinker, but it has never been empirically tested. We investigate the question using data from the Wisconsin Longitudinal Study. This data sets contains an IQ measure as well as polygenic scores for educational attainment (edu PGS). (Educational attainment is highly correlated with IQ, and is sometimes used as a proxy for IQ).

Note that the comparison of PGS values between populations may not be an approach that works scientifically, as indicated by research such as this paper. Hopefully this question will be explored further in the coming years.

 

2. Analysis in WLS

Imports and settings

library(pacman)
p_load(tidyverse, pander, stringr, lavaan, magrittr, lavaanPlot, ggpubr, broom, pwr, simsem)
PanderOpts(digits = 3, table.split.table = Inf, missing = "-", table.alignment.default = "left")
theme_set(theme_bw(base_size=13))

 

Read data

A data set with PGS is prepared here. See link for details about data preparation.

read_df <- function(){
  read_delim("pgs_data.csv", delim=";", col_types=cols(si001re = col_double())) %>% 
  select(idpub, relfml, z_relr75, PGS=pgs_std, IQ=z_iq100, birthyear=z_brdxdy,
         ses57, income=z_yfam74, sexrsp=z_sexrsp, rtype)
}
df <- read_df()

 

Select major religious affiliations

Keep only Jewish people and the two major religious groups, Lutherans and Catholics

df %<>% filter(relfml %in% c("Roman Catholic", "Lutheran", "Jewish")) %>% 
  mutate(Religion = ifelse(relfml == "Jewish", "Jewish", "Christian"),
         isJewish = Religion == "Jewish") %>% 
  drop_na(Religion, PGS)

df_iq <- df %>% drop_na(IQ)

df %>% select(Religion) %>% table %>% pander(caption="Sample size")
Sample size
Christian Jewish
4630 53

 

Plot of PGS and IQ scores of Jewish people and Christians

plot_dist <- function(df, fill){
  plot_density <- function(attr) {
    ggplot(df, aes_string(x=attr, fill=fill)) + 
      geom_density(alpha=0.6)
  }
  ggarrange(plot_density("IQ"), plot_density("PGS"), ncol=2, nrow=1, common.legend=T)
}

df_iq %>% plot_dist("Religion")

 

Jewish edu PGS vs Christians

df %<>% mutate(
  ses57_std = as.numeric(scale(ses57)),
  logses57_std = as.numeric(scale(log(ses57))))
m <- lm(PGS ~ Religion, df)
m %>% summary() %>% pander()
m %>% confint_tidy(conf.level=0.95, parm="ReligionJewish") %>% 
  pander(caption="95% confidence interval")
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0148 0.0145 -1.02 0.308
ReligionJewish 1.38 0.136 10.1 7.19e-24
Fitting linear model: PGS ~ Religion
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
4683 0.986 0.0214 0.0212
95% confidence interval
conf.low conf.high
1.11 1.65


Jewish edu PGS vs Christians, controlling for 1974 income, parents SES and gender

m <- lm(PGS ~ Religion + ses57_std + log(income) + sexrsp, df)
m %>% summary() %>% pander()
m %>% confint_tidy(conf.level=0.95, parm="ReligionJewish") %>% 
  pander(caption="95% confidence interval")
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0317 0.0562 -0.565 0.572
ReligionJewish 1.29 0.146 8.81 1.76e-18
ses57_std 0.107 0.0156 6.88 6.8e-12
log(income) 0.00512 0.0113 0.452 0.651
sexrspmale 0.00477 0.0312 0.153 0.878
Fitting linear model: PGS ~ Religion + ses57_std + log(income) + sexrsp
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
4050 0.984 0.0353 0.0343
95% confidence interval
conf.low conf.high
1 1.58

 

Plot of PGS vs IQ

ggplot(df_iq, aes(x=PGS, y=IQ)) + 
  geom_jitter(aes(color="Christian"), alpha=0.5) + 
  geom_jitter(data=df_iq %>% filter(Religion == "Jewish"), aes(color="Jewish"), size=2) +
    labs(color="Religion") +
    theme(legend.position = c(0, 1),legend.justification = c(-0.01, 1.01)) +
    scale_color_manual(values = c("blue","orangered2"))

 

SEM model

model <- 'IQ ~ c * isJewish + b * PGS
          PGS ~ a * isJewish
          ab := a * b
          tot := ab + c
          prop_mediated := ab/tot'

fit <- sem(model, df_iq)
lavaanPlot(model=fit, node_options = list(shape = "circle", fontname = "Helvetica"), 
           edge_options = list(color = "grey"), coefs = T, stand=T)

summary(fit, standardized=T, header=F)
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   IQ ~                                                                  
##     isJewish   (c)    2.394    1.992    1.202    0.229    2.394    0.017
##     PGS        (b)    4.429    0.213   20.835    0.000    4.429    0.298
##   PGS ~                                                                 
##     isJewish   (a)    1.363    0.137    9.922    0.000    1.363    0.145
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .IQ              199.595    4.181   47.734    0.000  199.595    0.910
##    .PGS               0.969    0.020   47.734    0.000    0.969    0.979
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     ab                6.034    0.674    8.958    0.000    6.034    0.043
##     tot               8.428    2.062    4.087    0.000    8.428    0.060
##     prop_mediated     0.716    0.172    4.160    0.000    0.716    0.716


SEM with SES

model <- 'IQ ~ isJewish
          PGS ~ isJewish
          ses57 ~ isJewish
          ses57 ~ PGS
          IQ ~ ses57
          IQ ~ PGS'

fit <- sem(model, df_iq)
lavaanPlot(model=fit, node_options = list(shape = "circle", fontname = "Helvetica"), 
           edge_options = list(color = "grey"), coefs = T, stand=T, stars=T)
summary(fit, standardized=T, header=F)
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   IQ ~                                                                  
##     isJewish         -2.311    1.945   -1.188    0.235   -2.311   -0.017
##   PGS ~                                                                 
##     isJewish          1.363    0.137    9.922    0.000    1.363    0.145
##   ses57 ~                                                               
##     isJewish         13.350    1.431    9.327    0.000   13.350    0.137
##     PGS               1.062    0.153    6.951    0.000    1.062    0.102
##   IQ ~                                                                  
##     ses57             0.352    0.020   17.672    0.000    0.352    0.246
##     PGS               4.054    0.207   19.613    0.000    4.054    0.272
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .IQ              186.794    3.913   47.734    0.000  186.794    0.851
##    .PGS               0.969    0.020   47.734    0.000    0.969    0.979
##    .ses57           103.083    2.160   47.734    0.000  103.083    0.967

 

Plot of PGS and IQ scores of the two major Christian groups

Since there is not significant different, it is justified to merge them into one group.

df_iq %>% filter(!isJewish) %>% mutate(Denomination = relfml) %>% plot_dist("Denomination")

 

Comparing Jewish people in Wisconsin to Jewish people living elsewhere

It is possible that the Jewish people in Wisconsin are somehow not representative of the national Jewish population. There are not a lot of good available data on this, so we investigate this by looking at the income levels of federal employees.

We find public salary information from www.fedsdatacenter.com, a list of Jewish surnames form wikipedia, and a list of the largest US cities and their states from Wikipedia. The list of cities is used to assign each person a US state.

For every person we calculate the relative salary, which is the salary of that person divided by the mean salary in the location the person works at.

Finally we look at whether the relative salaries of Jewish people in Wisconsin is higher than the relative salaries in other US states.

get_relative_salaries <- function(){
  cities <- read_delim("cities.csv", delim="\t", col_names=F) %>%
    mutate(Location = str_replace(X2, "\\[.*\\]", ""),
           state = trimws(X3)) %>% 
    select(Location, state)
  
  jewish_names <- read_csv("jewish_names.txt", col_names = c("surname")) %>% 
    separate(surname, into="surname")
  
  salaries <- read_delim("salaries.csv", delim=";", col_types=cols(.default="c")) %>% 
    separate(Name, into=c("surname", "first_name")) %>% 
    mutate(Salary = as.numeric(str_replace_all(Salary, c("\\$"="", ","=""))),
           surname = str_to_title(surname),
           Location = str_to_title(Location)) %>% 
    select(surname, Salary, Location) %>% 
    filter(Salary > 0) %>% 
    mutate(log_salary = log(Salary))
  
  merge(salaries, cities, by="Location", all.y=T) %>% 
    mutate(State = ifelse(state=="Wisconsin", "Wisconsin", "Other US States"),
           isJewish = ifelse(surname %in% jewish_names$surname, T, F)) %>% na.omit
}

#get_relative_salaries() %>% write_tsv("relative_salaries.tsv")
dfs <- read_tsv("relative_salaries.tsv")

 

Relative salaries of Jews in Wisconsin compared to other places

We find that there is no difference

jewish <- dfs %>% group_by(Location) %>% 
  mutate(relative_salary = Salary / mean(Salary),
         log_relative_salary = log_salary / mean(log_salary)) %>% filter(isJewish)

ggplot(jewish, aes(y=relative_salary, x=State)) + 
  geom_boxplot() + ylab("Relative salary")

t.test(relative_salary ~ State, jewish)
Welch Two Sample t-test: relative_salary by State
Test statistic df P value Alternative hypothesis mean in group Other US States mean in group Wisconsin
-0.0418 421 0.967 two.sided 1.09 1.09

 

Same as above, but with log-transformed salaries

ggplot(jewish, aes(y=log_relative_salary, x=State)) + 
  geom_boxplot() + ylab("log(Relative salary)")

t.test(log_relative_salary ~ State, jewish)
Welch Two Sample t-test: log_relative_salary by State
Test statistic df P value Alternative hypothesis mean in group Other US States mean in group Wisconsin
0.137 423 0.891 two.sided 1.01 1.01



3. Replication in HRS

The Health and Retirement Study.
Data processing here.
Note that the cognition score is based on a test that is much less accurate than the one in WLS.

Read data

hrs <- read_tsv("df_std.tsv") %>% 
  mutate(Religion = case_when(RARELIG == 3 ~ "Jewish", RARELIG == 1 ~ "Christian"),
         isJewish = Religion == "Jewish") %>% 
  filter(Religion %in% c("Jewish", "Christian")) %>% 
  select(Religion, Cognition = cog1_100, GENDER, PGS_EDU = PGS_EDU_std, PGS_COG = PGS_COG_std, isJewish) %>% 
  drop_na(isJewish, PGS_EDU, Cognition)
  
hrs %>% select(Religion) %>% table %>% pander(caption="Sample size")
Sample size
Christian Jewish
6517 212


Plot of Cognition and polygenic scores

plot_dist <- function(df){
  plot_density <- function(attr) {
    ggplot(df, aes_string(x=attr, fill="Religion")) + 
      geom_density(alpha=0.6)
  }
  ggarrange(plot_density("Cognition"), 
            plot_density("PGS_EDU"),
            plot_density("PGS_COG"), 
            ncol=3, nrow=1, common.legend=T)
}

plot_dist(hrs %>% filter(Cognition < 140, Cognition > 60))


PGS in Jewish people vs Christians

m <- lm(PGS_EDU ~ Religion, hrs)
m %>% summary() %>% pander()
m %>% confint_tidy(conf.level=0.95, parm="ReligionJewish") %>% 
  pander(caption="95% confidence interval")
  Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0371 0.0122 -3.04 0.00236
ReligionJewish 1.15 0.0688 16.7 1.82e-61
Fitting linear model: PGS_EDU ~ Religion
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
6729 0.985 0.0399 0.0397
95% confidence interval
conf.low conf.high
1.01 1.28


Plot of PGS vs Cognition

ggplot(hrs, aes(x=PGS_EDU, y=Cognition)) + 
  geom_jitter(aes(color="Christian"), alpha=0.5) + 
  geom_jitter(data=hrs %>% filter(Religion == "Jewish"), aes(color="Jewish"), size=2) +
    labs(color="Religion") +
    theme(legend.position = c(0, 1),legend.justification = c(-0.01, 1.01)) +
    scale_color_manual(values = c("blue","orangered3"))

 

SEM model

model <- 'Cognition ~ c * isJewish + b * PGS_EDU
          PGS_EDU ~ a * isJewish
          ab := a * b
          tot := ab + c
          prop_mediated := ab/tot'

fit <- sem(model, hrs)
lavaanPlot(model=fit, node_options = list(shape = "circle", fontname = "Helvetica"), 
           edge_options = list(color = "grey"), coefs = T, stand=T)

summary(fit, standardized=T, header=F)
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Information saturated (h1) model          Structured
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Cognition ~                                                           
##     isJewish   (c)    0.868    0.930    0.934    0.350    0.868    0.011
##     PGS_EDU    (b)    2.051    0.162   12.696    0.000    2.051    0.156
##   PGS_EDU ~                                                             
##     isJewish   (a)    1.149    0.069   16.715    0.000    1.149    0.200
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Cognition       170.453    2.939   58.004    0.000  170.453    0.975
##    .PGS_EDU           0.970    0.017   58.004    0.000    0.970    0.960
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     ab                2.357    0.233   10.110    0.000    2.357    0.031
##     tot               3.226    0.922    3.498    0.000    3.226    0.043
##     prop_mediated     0.731    0.215    3.405    0.001    0.731    0.731



4. Additional info

Other religions in the WLS data set

df <- read_df() %>% mutate(Religion = relfml)
df %>% drop_na(IQ) %>% group_by(Religion) %>% 
  summarise(n = n(), Mean_IQ = mean(IQ), sd_IQ = sd(IQ), Mean_PGS = mean(PGS), sd_PGS = sd(PGS)) %>% 
  filter(n > 1)
Religion n Mean_IQ sd_IQ Mean_PGS sd_PGS
Baptist, Southern Baptist, Free Will Baptist, Primitive Baptist 109 95.3 14.2 -0.113 0.954
Episcopalian, Anglican 76 105 17 0.27 1.1
Evangelical 13 100 16.9 0.281 1.05
Evangelical Free 6 106 21.3 -0.123 0.54
Jehovah’s Witnesses 9 97.3 18.9 -0.355 0.926
Jewish 52 108 14.6 1.35 1.08
Lutheran 1982 99.6 14.9 -0.0419 0.978
Other religions, not established religion or changing religion with no clue of what they are changing to 248 101 15.7 -0.0305 1.02
Pentecostal, Assembly Of God 19 96.5 12.2 -0.117 0.865
Presbyterian 211 103 15.5 -0.025 0.937
Protestant (No Denomination Given, Born Again, Born Again Christian, Christian, No Preference, Nothing Particular) 43 100 17.6 -0.0682 1.02
Protestant, Non-Denominational (R Must Have Used The Words ‘Non-Denominational’ For Religious Preference) 4 104 13.4 -0.167 1
Reformed, Dutch, Swiss, And Christian Reformed, First Reformed, Church Of America 36 103 17.1 0.272 0.773
Roman Catholic 2523 99.7 14.7 0.0142 0.988
United Church Of Christ, Congregational Evangelical And Reformed 269 102 15.2 0.0911 1.1
United Methodist, Methodist, Evangelical United Brethren, Free Methodist, Primitive Methodist, Nazarene Free Methodist 615 99.7 15.1 -0.0508 1.02
- 40 99.8 13.4 0.0469 0.85


Power analysis

53 Jewish people is not a large sample. However, this is the number that were present in the study. And small sample size is not necessarily a problem as long as the power is sufficiently good.

For instance, let’s assume that we are looking for a difference in PGS between Jewish people and Christians that is in the same level as the one usually found in IQ tests, eg around 2/3d. What would be the power to detect this difference at 0.05 significance level?

mean_Jewish <- 1.66
mean_Christian <- 1
sd <- 1
Cohen.d <- (mean_Jewish - mean_Christian)/sqrt(((sd^2) + (sd^2))/2)
tibble("WLS" = pwr.t2n.test(n1 = 53, n2 = 4630, d = Cohen.d, sig.level = 0.05)$power,
       "HRS" = pwr.t2n.test(n1 = 212, n2 = 6517, d = Cohen.d, sig.level = 0.05)$power)
WLS HRS
0.998 1