Startup

library(pacman)
p_load(ggplot2, meta, metafor, reshape2, latex2exp)

Functions

#Deviation methods

#Jensen's (1973) between-group heritability

BGHJ <- function(PM1, PM2, MO) {
  ME <- PM1 - ((PM1 - PM2)/2)
  MOD <- PM1 - MO
  MED <- PM1 - ME
  BGHAD <- MOD/MED
  if(BGHAD > 1) BGHAD <- abs(1 + (1 - (MOD/MED)))
  return(BGHAD)}

#Jensen's (1973) adjusted between-group heritability

BGHJA <- function(PM1, PM2, MO, PA, PB, PM) {
 ME <- abs(PM1 - PM2)
 AD <- (PB + (1 - PA)) / PM
 MEA <- ME * AD
 BGHAD <- MO/MEA
 if(BGHAD > 1) BGHAD <- abs(1 + (1 - (MOD/MED)))
 return(abs(BGHAD))}

#Scarr's (1977) between-group heritability

BGHS <- function(M1, M2, SD2, OE, d) {
  MB <- M1 - M2
  SDD <- MB/SD2
  EE <- d/SDD
  BGHA <- OE/EE
  return(abs(BGHA))}

#Method to obtain standard deviation from interquartile range assuming normality

SDIQR <- function(q1, q3, n) {
  SDIQR <- (q3 - q1) / (2 * (qnorm((0.75 * n - 0.125) / (n + 0.25))))
  return(SDIQR)}

#Method to obtain standard deviation from 95% confidence interval assuming normality

SDCI <- function(ub, lb, lev, n) {
  SDCI <- sqrt(n) * (ub - lb) / (qnorm(1 - ((1 - lev) / 2)) * 2)
  return(SDCI)}

#Biometric methods

#Clipping function

clip <- function(x, min = 0, max = 1) {
  x[x<min] <- min
  x[x>max] <- max
  return(x)}

#Weighted mean function

WM <- function(M1, M2, N1, N2) {
  SN <- N1 + N2
  P1 <- N1/SN; P2 <- N2/SN
  WM <- (M1 * P1) + (M2 * P2)
  return(WM)}

#DeFries (1973) between-group heritability

BGHD <- function(d, WGH, Fst) {
  r <- 2 * Fst
  t <- ((0.5 * d)^2)/(1+(0.5 * d)^2)
  HB <- WGH * ((r*(1 - t))/(t*(1 - r)))
  return(HB)}

Rationale

For some reason people are very curious about the degree to which genetics account for group differences in some trait. Despite this, there have been few actual estimates generated. There are basic behavior genetic formulae and ways to estimate the genetic contribution to group differences based on observed versus expected differences but there aren’t many datasets out there which allow directly estimating these. Using data from the Trajectories of Complex Phenotypes/Philadelphia Neurodevelopmental Cohort (TCP/PNC), I’ll get estimates of the between-group heritability (i.e., proportion of group differences attributable to genetics) using three different methods with different assumptions which I will discuss. Work in progress because pending data.

Deviation Methods

Jensen’s Method (1973)

Jensen’s (1973) method involves taking the means for two groups and computing the mean for the population of individuals who are genetic composites of both groups and dividing that by the expected difference if those individuals are 50/50 derivatives of said groups. For assessing this, I’ve made the BGHJ function, which takes the means of populations 1 and 2 and the mean of the mixed population and outputs the observed between-group heritability.

For example, in the TCP/PNC, the full imputed dataset African-American mean in general intelligence (g, henceforth, just “intelligence”) is -1.01 compared to a European-American mean which is set to 0 to estimate the others and a biracial mean of -0.14. Thus, the between-group heritability is:

BGHJ(0, -1.01, -0.14)
## [1] 0.2772277

With a between-group heritability of just 27%, little of the between-group difference is due to genes. This estimate, however, is too low. The main reason is that the mixed group also deviates from the expected amount of European ancestry. Assumption #1 of this method is that the mixed group’s ancestry is at the midpoint of the two parent groups. Here, the parent groups have mean European ancestry of 0.187 (African Americans) and 0.986 (European Americans), whereas the mixed group has a mean of 0.796 but should have a mean of

\[\frac{0.986+0.187}{2} = 0.5865\]

0.796 - 0.587 = 0.209, or 21% more European than expected. As such, we have to adjust the expected value according to

\[\frac{P_b + (1-P_a)}{M_a} \times d\]

where \(P_a\) is the less and \(P_b\) is the more admixed parent population’s mean ancestry, \(M_a\) is the mixed group’s mean ancestry, and d is the parent population difference expressed in standardized units (i.e., Cohen’s d, although Hedge’s g is preferable). Accordingly, this is

\[\frac{0.187 + (1-0.986)}{0.796} \times 1.01 = 0.255\]

which should be compared to the expected mean, without adjustment, of 0.505. Now we can calculate the between-group heritability with the same method, adjusted for the actual mean ancestry of the mixed population using the BGHJA function, which requires the first group being the less admixed parent population (I am uninterested in fixing this).

BGHJA(0, -1.01, -0.14, 0.986, 0.187, 0.796)
## [1] 0.5489385

Plotting these:

BGH1 <- BGHJ(0, -1.01, -0.14)
BGH2 <- BGHJA(0, -1.01, -0.14, 0.986, 0.187, 0.796)
dj <- data.frame("BGS" = c(BGH1, BGH2), "Name" = c("Unadjusted Jensen", "Adjusted Jensen"))

ggplot(dj, aes(x = Name, y = BGS, fill = Name)) + geom_bar(stat = "identity", show.legend = F) + coord_cartesian(ylim = c(0, 1)) + xlab("Method") + ylab("Between-group Heritability") + theme_bw() + scale_fill_manual(values = c("#E69F00", "#999999"))

The results are, overall, consistent with a modest amount of between-group heritability based on the means of the parent and mixed populations. Adjustment was clearly necessary because the mean ancestry of the mixed population was considerably higher than expected.

This method is quite crude and rests on a few assumptions. For instance

  • The ancestry point estimates for each group are accurate.
  • Assortative mating with respect to mixing is absent.
  • There is no confounding between environmental effects and genetic ancestry.
  • Measurement invariance with respect to the trait being compared is tenable.
  • The phenotype is proper.

The first assumption, given earlier as well, was for the unadjusted method; with the adjusted method, it becomes superfluous. However, the majority of literature on this subject does not have access to genetic ancestry estimates, and early research using blood groups tautologically generated extremely unreliable point estimates. For example, the studies by Loehlin, Vandenberg & Osborne (1973) and Scarr et al. (1977) used methods which made it impossible to properly generate point estimates or conduct well-powered analyses for their purposes (see Reed, 1997); the discussion by Loehlin, Lindzey & Spulher (1975) and Loehlin (2000) are also worthwhile reads. This makes a meta-analysis which can be regarded as accurate contingent on any variability between the different samples being quashed in the meta-analytic mean. In the TCP/PNC, the confidence intervals around the means for European ancestry and intelligence were considerable at 0.759 to 0.833 and -0.276 to -0.004 respectively. This sort of deviation from the expected mean based on the parent populations and around the point estimates should be acknowledged when using this method.

Related to the first assumption, if there’s assortative mating with respect to mixing such that individuals from the parent populations are more or less likely to mix given some level of the trait of interest or its causes among children, then this will lead to the formula delivering - depending on the direction of the effect - over or underestimates. Without demographic data on the parents, it can be difficult to assess whether this has influenced the results. If data on the parent’s levels of these phenotypes are available - and measurement invariant with respect to offspring phenotypes -, then the expected mean for the child is trivial to compute. In the TCP/PNC, parental education, a proxy for cognitive ability, is available, but not equally across populations: in the African American and mixed-race populations, very little information is available for fathers. A lack of assortative mating with respect to education is not tenable in this sample based on mothers’ education values (higher for white mothers who mix, lower for black mothers who mix, not significant). The effect that any assortative mating might have can be investigated by probing the polygenic scores at local admixture blocks, however, so this assumption can easily be checked, and with that knowledge, accounted for. This same method can be used to assess whether admixture in general was assortative historically, although historical admixture would not affect this method, only admixture in the parental generation.

Confounding between ancestry and environment is generally assumed to take the form of skin color-based discrimination. This can be accounted for by assessing local admixture, the within siblings effect of skin color or appearance, or the effects of misclassified race on adopted children’s cognitive outcomes. The first method of checking this assumption is being conducted presently and the latter have been done amply in the colorism literature. The general result is that the effect of color on IQ estimates within sibling pairs is reduced almost entirely or eliminated. In many cases, it made insignificant and the sign is random (as often reversed as consistent; see: to-do). Some studies have assessed discrimination by looking at the effect it has on the market niche or other similar outcomes and found that, once ability is controlled, racial differences dissipate or are reversed for those outcomes, similarly signaling a lack of discrimination effect (to-do). In this sample, if discrimination were a concern, then we should expect the mixed group to have worse than expected results, but the opposite is true. There could be some more subtle type of confounding, but without evidence, that sort of concern becomes pseudoscientific. Another point here is that if the shared environment is systematic with respect to race/ancestry, then the between-group heritability will be overestimated. This can be assessed readily by checking if, first, sibling and parent regression to the mean match up, and second, if the regression is linear, third, if regression to the mean by race is comparable, and finally, comparable with family-level variables added. This was assessed by Murray (1999) and found to be tenable, so it’s unlikely that the shared environment explains regression to the mean. It would be even better to assess this in a study of children adopted apart like the MSTRA. In general, however, this should not be a concern since shared environmental effects play a minimal role in adulthood and it’s hard to imagine a racially systematic environmental effect in the unshared environment, which is, by definition, unsystematic with respect to siblings. If there is a homogeneous between-groups confounding effect, its size can be estimated based on the assumption that there aren’t otherwise systematic environmental components (i.e., shared/unshared) with the formula

\[\frac{d}{\sqrt{1-h^2}}\]

where d is the standardized group difference and \(h^2\) is the - assumed to be equal, or simply averaged - heritability for the different groups. Moderating this by another component, assuming it is systematic in its effect, is trivial. However, before doing that, one should assess whether the component is related to the source of group differences as expected. For example, it’s generally found that the most heritable component (and in some cases, the only) of cognitive tests is g, and that the g loadings of the items are correlated with the heritability, but not the shared or unshared environments. Given that this is the case, if group differences are likewise attributable to g (or indeed, group differences in g are the locus of investigation as they are here), we can conclude that the environmental components reduce rather than account for the group differences, and as such, moderation in that way would be improper, so all one would need to worry about is a potentially fully-causal-in-its-effect X-factor.

But this is a confusing assumption since this assumes that the between-group heritability is equivalent to the within-group heritability plus or minus some dimensionality. If, for example, heritability is 50% and the environmental residual has a stationary effect, then the between-group heritability should be 100% since this is the only differing component of variance. This method is, thus, quite unclear in what it means to give a between-group heritability besides that it suggests some systematic (why?) deviation from the 100% genetic prediction. Without larger samples with good details, it can be hard to decipher why this result comes about. A good sample to assess this would include ancestry estimates, good phenotypes of interest, and parental details in order to estimate their effects on the estimate. If, for instance, having a white mother means positive parental effects (part of the shared environment) compared to having a black mother or vice-versa, then that could explain the deviation, but the magnitude would be quite large. If this is due to the age of the sample and the shared environment is merely systematic with respect to genotype but not race, that would explain the deviation, as the sample is quite young, so shared environment is likely to be substantial. This suggests a growing gap between groups with age and that the shared environment is a purely parental effect, and moreover, that the group with, say, worse mothers (whatever those may be), should have lower than expected results, but that is empirically not observed. The substantive interpretation of this method is uncertain, but a measure of sampling error and assortativity in the parental generation is one explanation. This area of research would benefit from structural modeling efforts.

Measurement invariance with respect to race is a prerequisite for any analysis of behavioral variables. When it’s satisfied, we can be assured that the same factors account for performance in both races (to-do: cite Dalliard, Lubke). When the same model parameters can be fit on the same biometric model of the causes of factor mean differences using twins, siblings, parents, etc., we can be assured that within each group, the causes are the same and there aren’t X-factors (i.e., group-specific factors explaining the mean difference but nothing - or something, given some moderator - inside the groups) at play, unless they’re so-specified as to be unfalsifiable - and thus pseudoscientific - in nature. In the TCP/PNC, measurement invariance was tenable (and it was tenable with respect to a Dutch cohort, for both blacks and whites; to-do: cite Swagerman et al. 2016 and tidy code, upload analysis), and it was tenable both between groups and across the range of admixture, which suggests that an X-factor would have to be specified such that it covaries with admixture, and isn’t just homogeneous with respect to group, whilst leaving parts like the residual variances (and residual covariances, since I checked those) identical across this range. It’s hard to think of anything besides color/appearance which could satisfy this role, especially since the observed effects of color are not also Jensen effects.

If the phenotype in question is a sumscore, it likely contains considerable error. Most sumscores are not even properly calculated. As such, it’s better to do as was done in the TCP/PNC and use a latent variable, which is free of error and of nuisance dimensionality, and of course, which has had measurement invariance assured. This helps to assure that the point estimates for the outcome variable are as accurate as possible. Moreover, because latent variables are free of measurement error, they help to reduce the sampling error (which is exacerbated by measurement error) that is likely to plague the usually much smaller mixed samples. Given that most literature using this method uses small samples, this is doubly important. A meta-analysis of all relevant results should note how a failure to use the proper phenotype (i.e., a latent variable) attenuates results by including error and multidimensionality.

The ancestry point estimates in the TCP/PNC were accurate (array-based), assortative mating may have been present (hard to tell aggregate without complete parental information, relevant phenotypes) and can be assessed with local admixture methods later (for historical in the African American group and recent in the mixed group), prior evidence and measurement invariance seemed to indicate a lack of confounding of environmental effects and ancestry, and the phenotype was as free of error and nuisance dimensionality as possible.

many citatations on the to-do list

To-Do: Biracial Score Meta-Analysis

Scarr’s Method (1977)

Scarr et al. (1977) also suggested using a deviation-based method. In this case, instead of assessing the deviation from the parent race mean attained by mixed-race individuals, they suggested taking the deviation from the expected bivariate admixture correlation within an admixed population (using the unadmixed European American population would lead to range restriction determining the result) where the between-group heritability is assumed to be 100% as the between-group heritability. For example, if the expected correlation, given by

\[\frac{d}{\frac{\mu_A - \mu_B}{\sigma_B}}\]

is 0.10 and the observed within-population correlation is 0.08, then the between-group heritability is 80%. Using the empirical values from the TCP/PNC, we arrive at

\[\frac{1.01}{\frac{0.986 - 0.187}{0.117}} = 0.148\]

as the expected correlation for 100% between-group heritability. The observed within-group correlation for the African American group was 0.086, for a between-group heritability of 58%, which can be calculated using the function for the Scarr method, BGHS, and plotted to compare to the other deviation-based method

BGHS(0.986, 0.187, 0.117, 0.086, 1.01)
## [1] 0.5814843
Scarr <- BGHS(0.986, 0.187, 0.117, 0.086, 1.01)

df <- data.frame("BGS" = Scarr, "Name" = "Scarr")
ds <- rbind(dj, df)

ggplot(ds, aes(x = Name, y = BGS, fill = Name)) + geom_bar(stat = "identity", show.legend = F) + coord_cartesian(ylim = c(0, 1)) + xlab("Method") + ylab("Between-group Heritability") + theme_bw() + scale_fill_manual(values = c("#E69F00", "#999999", "#3C94CF"))

Because this method is based on the deviation of a generally larger group along a different criteria, it has a smaller confidence interval than the mixed deviation method, and as such, is likely to be a more accurate point estimate (along with being more theoretically accurate) in any particular instance. However, it still assumes a lack of sampling error (and unless using Hedge’s g, that the relative sample sizes are irrelevant for the standardized mean difference) and range restriction which is clearly not the case: the data do bunch up near 0% and there’s no way around that, but it can be ameliorated by increased sampling, which may cover the mid-range better between mean ancestries for the two parent populations better. But if this an accurate depiction of the population distribution, then this won’t help. Perhaps a miniature meta-analysis of the population values for European ancestry and its distribution are in order, in order to better approximate the expected correlation. This will not correct the range restricted observed correlation and if this estimate is used, it assumed that the samples in the meta-analysis have the same underlying relationship and that these are the only ancestry components; since many of these are not true by way of sampling error, stratification, and whatnot, they’re going to lead to attenuation, some bias individually, but it is hopefully the case that the meta-analytic result will remove it.

African American European Ancestry Meta-Analysis

For some of these studies, I had to approximate the SD. In order to do that, I used Wan et al.’s (2014) approximation from the IQR for normally distributed data. The assumption of normality is probably off for these, as seen in my own data, because of the range restriction, but the bias from this will not be very high. In these cases, I also use the median and if they provide only the African ancestry, I use that minus one. The formula used specifically is

\[\sigma \approx \frac{q_{3}-q_{1}}{2\Phi^{-1}(\frac{0.75n-0.125}{n+0.25})}\]

and all of the studies I had to do that for are given below:

#Signorello et al. 2008

SDIQR(.887, .997, 379)
## [1] 0.08185754
#Cheng et al. 2012 - ARIC

SDIQR(77.8, 89.5, 2285)
## [1] 8.678759
#Cheng et al. 2012 - JHS

SDIQR(77.9, 88.7, 3185)
## [1] 8.009718
#Cheng et al. 2012 - MEC

SDIQR(69.8, 87.4, 1551)
## [1] 13.05917
#Bryc et al. 2010

SDIQR(.116, .277, 365)
## [1] 0.1198274

I also had to calculate the SD from confidence intervals. For that I used the formula

\[\sigma = \sqrt{N} \times (upper - lower) / 2Z\]

and all of the studies that I had to do that for are given below:

#Baharian et al. 2016 - SCCS

SDCI(.1443, .1365, 0.95, 2128)
## [1] 0.09179147
#Baharian et al. 2016 - HRS

SDCI(.1727, .1616, 0.95, 1501)
## [1] 0.1097072
#Baharian et al. 2016 - ASW

SDCI(.2320, .1950, 0.95, 97)
## [1] 0.09296287

Notably, for the Baharian et al. 2016 sample, the SD estimates will differ slightly from their real values because he used bootstrapped 95% confidence intervals.

Bryc et al. (2015) reported a mean of 24% but declined to report their SD, so pixel counting was used and yielded an SD of approximately 17% with an n of 5,269. Likewise, Jin et al. (2012) reported means of 21.65% using FRAPPE and 21.61% using STRUCTURE for 1,890 African Americans, but did not report SDs. Smith et al. reported means of 18.4%, 18.3%, 15.9%, and 18.8% for samples of 18, 23, 45, and 23 African Americans in Chicago, Pittsburgh, Baltimore, and North Carolina respectively, without SDs. Stefflova et al. (2009) reported 23.7% with a sample of 31, but only provided pseudo-standard errors, similar to Stefflova et al. (2011) who reported 16.3% in a Philadelphia cohort of 331 and 14.7% in a broader African American cohort of 50; they also included non-combining Y chromosomal and mtDNA estimates. McQueen et al. (2014) reported 17.8% with no SD for 677 African Americans in the Add Health study. Gates Jr. (2013) reported that Ancestry.com, 23&Me, FamilyTreeDNA, National Geographic’s Genographic Project, and AfricanDNA found values of 29%, 22%, 22.83%, 19%, and 19%, but SDs and sample sizes were left unspecified.

#Load the data

#admixture <- read.csv("AfricanAncestry.csv", fileEncoding = "UTF-8-BOM")

#Compute SEs; for large sample sizes, the SE of the SD is approximately 0.71SD/sqrt(N)

admixture$SE <- (0.71 * admixture$SD) / sqrt(admixture$N)

admixture
##                     Study Year  Mean     SD     N
## 1           Halder et al. 2008 0.143 0.1330   136
## 2            Ducci et al. 2009 0.210 0.1400   864
## 3         Zakharia et al. 2009 0.219 0.1220   128
## 4       Signorello et al. 2010 0.071 0.0819   379
## 5           Nassir et al. 2012 0.225 0.1470 11712
## 6             Bryc et al. 2010 0.185 0.1198   365
## 7     Cheng et al. (ARIC) 2012 0.151 0.0868  2285
## 8      Cheng et al. (JHS) 2012 0.160 0.0801  3185
## 9      Cheng et al. (MEC) 2012 0.194 0.1306  1551
## 10            Bryc et al. 2014 0.240 0.1700  5269
## 11          Halder et al. 2015 0.330 0.1700    70
## 12 Baharian et al. (SCCS) 2016 0.140 0.0918  2128
## 13  Baharian et al. (HRS) 2016 0.167 0.1097  1501
## 14  Baharian et al. (ASW) 2016 0.213 0.0930    97
## 15      Kirkegaard et al. 2016 0.170 0.1100   140
## 16      Kirkegaard et al. 2019 0.170 0.1100   140
## 17         Lasker et al.  2019 0.187 0.1170  2228
## 18              Personal  2020 0.108 0.1520    61
##                                   Note           SE
## 1                                      0.0080973057
## 2  Randomly split components acombined 0.0033816567
## 3                                      0.0076561987
## 4                            Mean, IQR 0.0029869135
## 5                                      0.0009644066
## 6                          Median, IQR 0.0044521392
## 7                          Median, IQR 0.0012892436
## 8                          Median, IQR 0.0010077113
## 9                          Median, IQR 0.0023544834
## 10                    Pixel-counted SD 0.0016628121
## 11                African admixture SD 0.0144264093
## 12                       Bootstrap CIs 0.0014129123
## 13                       Bootstrap CIs 0.0020103637
## 14                       Bootstrap CIs 0.0067043307
## 15                          Supplement 0.0066006547
## 16                                     0.0066006547
## 17                                     0.0017598944
## 18     1000 Genomes (CEU and YRI, AAs) 0.0138177401
#Mean and SD random effects meta-analyses

admean <- rma(yi = Mean, sei = SD/sqrt(N), measure = "SMD", ni = N, data = admixture)
adsd <- rma(yi = SD, sei = SE, measure = "SMD", ni = N, data = admixture)  

predict(admean)
## 
##    pred     se  ci.lb  ci.ub  cr.lb  cr.ub 
##  0.1818 0.0126 0.1571 0.2066 0.0751 0.2886
predict(adsd)
## 
##    pred     se  ci.lb  ci.ub  cr.lb  cr.ub 
##  0.1193 0.0065 0.1066 0.1321 0.0649 0.1738
#For plotting

admean <- metagen(Mean,
                 SD/sqrt(N),
                 data = admixture,
                 studlab = Study,
                 sm = "SMD", 
                 method.tau = "SJ")

adsd <- metagen(SD,
                 SE,
                 data = admixture,
                 studlab = Study,
                 sm = "SMD", 
                 method.tau = "SJ")

forest(admean,
       sortvar = Year,
       xlim = c(0, 0.4),
       rightlabs = c("Mean", "95% CI", "Weight"),
       leftcols = c("Study"),
       leftlabs = c("Study"),
       pooled.totals = F,
       smlab = "",
       print.tau2 = F,
       col.diamond = "gold",
       col.diamond.lines = "black",
       print.I2.ci = F,
       digits.sd = 2,
       comb.fixed = F)

forest(adsd,
       sortvar = Year,
       xlim = c(0, 0.4),
       rightlabs = c("Mean", "95% CI", "Weight"),
       leftcols = c("Study"),
       leftlabs = c("Study"),
       pooled.totals = F,
       smlab = "",
       print.tau2 = F,
       col.diamond = "gold",
       col.diamond.lines = "black",
       print.I2.ci = F,
       digits.sd = 2,
       comb.fixed = F)

While it’s also possible to attempt a meta-analysis of European ancestry results, I don’t feel like it, so I set their mean to 0.99.

\[\frac{1.01}{\frac{0.99 - 0.1818}{0.1193}} = 0.1491\]

The mean difference and within-African American correlation for the Kirkegaard et al. (2019) study was unavailable so the mean values from both studies could not be used. As such, the between-group heritability estimate is unchanged (58%).

To-Do: Bivariate Relationships Meta-Analysis

Misleading Prior Deviation Results

Witty & Jenkins (1936)

Witty & Jenkins (1936) proposed an ingenuous way to investigate the admixture-ability relationship. These researchers sampled a handful of African American students (n = 103) from the Terman studies of the far right tail of the ability distribution who had IQ scores above 125 (Jenkins, 1936). They then assessed the extent of admixture in the sample by interviewing the parents of the students. Children were subsequently sorted into ordinal categories based on admixture labeled N, “Pure Negro,” NW, about half, NNW, “More Negro than white,” and NNW, “More white than Negro.” The resulting admixture estimates were compared to the average level in a supposedly nationally representative sample collected by Herskovits (1930). Their idea was that if European admixture was related to ability, these particular African Americans should have had an elevated level of European admixture. They concluded “[A]fter examination of available data… superior intelligence test ability is not exhibited by those Negroes having the largest amount of white ancestry.”

The first major problem with the study is that parental self-reports of ancestry are potentially unreliable and uninformative; based on self-report surveys, the degree of European admixture is uncertain. Using other Chicago African Americanss as the comparison group would have at least allowed the authors to test whether the level of admixture reported by the parents was likely to be higher than the locally reported average. Luckily, others have endeavored to estimate the average level of European admixture in Chicago. As noted above, a sample of 18 found a mean of 18.4%; Reed (1969) estimated the average level of European admixture at about 13%, which may seem very low, but this used blood groups. Parra et al. (1998) reported a level of 18.8% in Maywood. The genomic, blood group, and limited ancestry informative marker results are fairly consistent, probably indicating a level around 18%, which is probably right at the national average, but if the Reed result is taken as accurate, this may have trended up over time. This cannot be certain. Nonetheless, Loehlin, Lindzey & Spuhler (1975) and Mackenzie (1984) pointed out that Herskovits’ sample is expected to have a higher than average proportion of European admixture. If the ordinal classifications of admixture are converted to percentage (where N = 100% African, NW = 50%, NNW = 66%, and NWW = 33%), the average European admixture turns out to be 31%. This is only 4% lower than another elite sample of college-attending African Americans reported by Meier (1949). Applying the same method to the Witty & Jenkins’ sample yields an average European admixture of 34% - making Witty & Jenkins correct that their sample did not have (meaningfully) higher European admixture than Herskovitz’ sample (or Meier’s), but this sample was elite. This may be unsurprising given that the fathers of the Witty & Jenkins sample worked primarily in upper-level occupations and thus had relatively high SES, and presumably, heightened European admixture (see Kirkegaard et al., 2019 and the sources within).

If we take the converted percentage figures from these three studies as accurate, then the results could be surprisingly congruent with a hypothesis involving genetic influence. They certainly seem to suggest a correlation between European admixture and SES in African Americans.

The second part of Witty & Jenkins’ study involved comparing the levels of admixture in a sample of 28 “gifted” (IQ > 140) and 35 “superior” (IQ > 125) African Americans. The level of European admixture in these groups was found not to differ significantly (again, based on self-reports). Using the conversion values described above, the gifted sample would be a mere 1.2% more European. For this result to be indicting for admixture influence, it would need to be larger, better measured, and to provided justifiable admixture estimates, firstly, but taking it as it is and as it has been interpreted, we would need to know what the predicted change in admixture would have to be to explain the observed difference in IQ scores of around 1 \(\sigma\) between the two samples. If it is assumed that the mean level of European admixture in Chicago African Americans is 20% with a standard deviation of 15% and the proportion of the African American - European American gap which is expected to be genetic matches the within-group heritabilities of g in adults (i.e., \(\approx\) 80%) and Europeans are 100% European, then the expected change in European admixture between the groups is 0.8/5.33 = 0.15 \(\sigma\). This corresponds to an expected difference of 2.25% whereas the observed differences were 1.2% Applying this same logic to the first test, the expected level of European admixture for the sample would be 26.8%, whereas the observed level was 34%. These discrepancies are explicable in any number of ways including improper assumptions about the genetic influence on group differences, incorrect figures being given for the mean and \(\sigma\) in the local population (e.g., if \(\mu\) = 18% and \(\sigma\) = 11% with European Americans at 99% European, the value halves - to 1.181% - and becomes basically what’s found in the study with 80% heritability, less with greater proposed between-group heritability), errant reporting, or more parsimoniously, insufficient statistical power with the small sample used. Estimates based on this sort of very bad admixture estimate and small sample might be worth including in a future meta-analysis (there are many similar studies which are not as popular due to having the opposite reported results), but modern methods should be preferred.

Loehlin, Vandenberg & Osborne (1973)

Loehlin, Vandenberg & Osborne (1973) failed to find an association between African admixture - assessed via blood groups - and IQ in two small samples (combined n = 84) of twins from Kentucky and Georgia. The authors reached this result using 16 blood group genes found to vary reliably in frequency in the African American and European American populations (the racial distributions in the two samples correlated at r = 0.88) and a composite ability score derived from nineteen tests.

The reason for this result cannot be found in a lack of association, but that there was no power to detect an association. The frequency differences of the selected genes and the number of them were both insufficient to allow the authors to estimate admixture, let alone its effect on cognitive ability. To show how certain that conclusion is, consider the work of Reed (1973), published shortly after the Loehlin, Vandenberg & Osborne (1973) paper, who calculated that eighteen blood loci with perfect discrimination (i.e., frequencies of 100% in one group and 0% in another) were required to generate point estimates of admixture with 95% accuracy. This physically impossible condition was not met by the study.

The study gave additional reason to doubt the naive interpretation of their results. Namely, linkage decay. These authors proposed that, in order for their test to have discovered a relationship between blood group genes derived from their European ancestors and ability in African Americans, these genes would need to have been predictive of the presence of the genes affecting intelligence from the same ancestors. They found that European blood group genes were not predictable from one another in the African American sample (r = -0.04 and -0.02 in Georgia and Kentucky respectively), indicating that European intelligence genes were probably not related to European blood group genes in the African American population. As such, the data simply had no validity for the purpose of determining whether or not there was a relationship between European admixture and cognitive ability in the African American population (Loehlin, 2000, p. 188).

Scarr, Pakstis, Katz & Barker (1977)

Scarr et al. (1977) conducted a methodologically improved version of the Loehlin, Vandenberg & Osborne (1973) study. Like them, these researchers sought to determine whether blood group genes could be used to estimate European admixture in the African American population, and then assessed if this admixture estimate could be related to the first principal component of four cognitive tests. Scarr et al. found a statistically insignificant -0.05 correlation between their PC scores and their admixture index in a sample of 181 African American twins from Philadelphia. This was further reduced to -0.02 after controlling for SES and skin color. The authors then divided their samples into thirds by ancestry and compared the top to the bottom third, finding an insignificant 0.11 \(\sigma\) difference in the PC score between them. On the basis of these results, the authors declared that “not more than one third” (p. 85) of the difference between African Americans and European Americans could be due to genes, and that evne this figure seemed unlikely.

Firstly, it is not clear how Scarr et al. (1977) derived their figure of not more than one third. The rationale is explained in a footnote (p. 85), but their calculations do not match their remark. To briefly summarize, they found an African American-European American difference of 0.9 \(\sigma\) - about average - for their PC score. Assuming the mean difference in European ancestry was 0.77 (0.99 for European Americans and 0.22 for African Americans), they held that the difference between the upper and lower thirds should be 0.23 \(\sigma\) when the difference is 20% (35% upper, 15% lower). They reasoned that if the admixture component was responsible for three-quarters of the difference, then the smallest mean difference between upper and lower thirds should be 0.18 \(\sigma\). This reasoning (expected \(h^2_B \times 0.23 \sigma\); \(h^2_B\) is the between-group heritability) coupled with the finding of a 0.11 \(\sigma\) difference implied that the between-group heritability should have been 0.48 instead of “not more than one third”.

However, that calculation assumes too much. In reality, the difference between the top and bottom thirds may have been much more or less than 0.2 - it is uncertain (Mackenzie, 1984). It is unlikely that Scarr et al.’s small sample, with its crude ancestry estimates, was sufficient. But, their 0.33 figure can be granted and their conclusion still doesn’t work. The sample ranged in age from 10 to 16. Noting the substantial increase in heritability with age (Bouchard, 2013), their result can be made to comport with a strong genetic effect. The African American within-group heritability (\(h^2_w\)) was estimated by Scarr et al. (1977) to be 48% (incidentally, exactly their \(h^2_B\)). Accepting Jensen’s (1998) “Default Hypothesis,” the given value of \(h^2_w\), and all the rest of the study’s numbers, this suggests that the expected difference is 0.11 (since a 2.2 \(\sigma\) difference \(\times\) |-0.05| = 0.11), which is the value they obtained. Jensen also calculated the validity of the index to be 0.49 (which would make the predicted difference 0.07 \(\sigma\), or lower than found, but Scarr objected to this). Alternatively, it could be found by finding the meta-analytic skin color-ancestry correlation and assessing how that compared to the one Scarr et al. (1977) found. Whatever that is, using Jensen’s validity estimate (from his discussion in Scarr’s later book), we get a correct mean difference of 0.22, which leads to an expected \(h^2_B \times\) 0.23 = 0.22, for an egregious 96% \(h^2_B\), which, corrected for the reliability of the PC (since this, being a PC, subsumes error variance), which should be around 0.95 or so (around typical for a composite of as many tests), would be >1, the BGH can be absurd with these data (reliability corrections affirm the consequent and may just be more absurd). This is part of why deviation-based estimates have to be done very carefully and why the Scarr et al. (1977) data are basically useless.

Reed (1997) criticized Scarr et al. (1977) for their use of an “odds coefficient” and suspect rounding procedures for zero-frequency blood group phenotypes. The rank-ordering method Scarr et al. (1977) used, Reed argued, introduced considerable randomness into the admixture estimates. The size of the bias to the odds coefficient is exepcted to be on the order of \(\pm\) 1 to 2. This bias is enough to render a significant admixture-ability relationship unlikely (and the data, useless). Nonetheless, Scarr et al. (1977) argued that their index of ancestry was very reliable, but the only evidence in that regard pointed in the opposite direction. For instance, the correlation between their index of ancestry and skin color was only 0.27. Compared to other figures in the literature, this was almost 42-62% of more typical values (Parra, Kittles & Shrivers, 2004; Ruiz-Linares et al., 2014; Gravlee, Non & Mulligan, 2009; Leite et al., 2011). The U.S. mean looks to be closer to 0.44 (assuming this is correct and using the reliability corrections above with it, the \(h^2_B\) would be an astounding 87%).

Assuming the same African American population as above, with a mean European ancestry of 20% (\(\sigma\) = 15%), and Scarr et al.’s (1977) expected \(h^2_B\) of 0.75, the correlation between ability and ancestry is 0.14 for a perfectly reliable index. Using Jensen’s (0.49) or a skin color-based reliability (0.61, based on the above values), the expected ability-ancestry correlation becomes 0.069 or 0.085. However, this assumes a \(h^2_W\) of 1 (or no environmental effects); a lower value for this attenuates our already low correlation. Using the provided heritability for the African American group of 0.48, the new expected values become 0.048 and 0.059 (assuming attenuation), which, corrected for reliability become 0.043 and 0.053, which are not very different from the actual 0.05 correlation observed. The expected difference between the upper and lower thirds would become 0.11 and 0.12 - the level observed.

Loehlin (2000, p. 188) noted that the association between admixture and ability could be taken a bit futher since they’re not just insignificant and small, they actually comport with a weak form of Spearman’s hypothesis because European ancestry was positively related (0.1) with the Raven’s Progressive Matrices (RPM) scores and negatively related (-0.12) with a memory measure. This is interesting because it has been consistently found that African Americans perform relatively better than European Americans on short-term or episodic memory assessments and worse on more spatially-loaded measures like the RPM (Jensen, 1998). Loehlin remarked that given high reliability and a perfect relationship between European ancestry and cognitive ability, the expected correlation might only go as high as 0.4. As we know, the sample may not have been sufficiently variable to sustain the needed correlations and both the instrument and the methods were unreliable for our purposes. This makes small correlations like these interesting, if real, because they represent the first published genetic proof of Spearman’s weak hypothesis. But they are not reliable. Furthermore, Scarr et al.’s (1977) g loadings (racially averaged and derived via PCA, so somewhat off) were positively correlated at 0.64 with the score differences between African and European Americans and at 0.84 with the score differences between the most and least African African Americans. Therefore, the results of the study can be interpreted as supportive of the weak form of Spearman’s hypothesis (assuming correlated vectors results are valid, and really, they are not) like the TCP/PNC results.

The Scarr et al. (1977) findings can be reinterpreted in support of the effects they’re generally taken to disprove, but they ultimately don’t support either idea. A null finding without power is not more than anecdotal evidence in favor of the null hypothesis. The unknown details and the unreliable estimates of the sample make it entirely dubious.

Overview of Deviation Results

The advantage of the deviation methods over the biometric ones is that all the information needed to calculate them properly is readily available; the biometric method (discussed below), merely assumes the information provided is correct, including simplifying assumptions which are strictly untrue. Regardless, the deviation-based results here are broadly consistent with modest between-group heritability, but they’re insufficient and, as mentioned, hard to interpret. A more generally acceptable and established method was offered by DeFries (1973).

Biometric Methods

DeFries’ Method (1973)

In an attempt to show, analytically, that \(h^2_B\) is not necessarily high despite high values of \(h^2_W\), DeFries (1973) derived (via Lush) the formula for \(h^2_B\), which is as follows

\[h^2_B = \frac{r(1-t)}{t(1-r)}h^2_W\]

where t is the phenotypic intraclass correlation (ICC) and r is the genetic ICC, or inbreeding coefficient. This is superior as a counterargument to the idea that a high value of \(h^2_W\) does not necessarily imply a high value of \(h^2_B\) because of a pseudoscientific group-specific variable or X-factor since it is actually completely solvable rather than plausibly alone. It is possible to extend the formula to include dominance, epistasis, or, perhaps more importantly given what’s known about the architecture of traits like intelligence, gene-environment correlation, as follows

\[\begin{aligned} h^2_B + h_Be_Br_{A_b E_b} = \frac{\sigma A_B,P_B}{V_{P_B}} = \frac{\sigma A_B,(A_B + D_B + I_B + E_B)}{V_{P_B}} \\ = \frac{V_{A_B}}{V_{P_B}} + \frac{\sigma A_B,E_B}{V_{P_B}} = h^2_B + \frac{(V_{A_B}V_{E_F})^{1/2}}{V_{P_B}}r_{A_B E_B} = b_{A_B P_B} \end{aligned}\]

where \(A_B\) is the additive genetic mean of a group, \(P_B\) is the mean phenotypic value in the absence of genotype-environment correlation, given as \(r_{AE}\). In this way, nonzero gene-environment correlation doesn’t forbid estimation. If the correlation between genotypes and environments is positive, \(h^2_B\) would underestimate the genetic proportion of group differences (\(h^2_B\) > \(b_{A_B P_B}\) when \(r_{AE}\) > 0); if the relationship is negative, \(h^2_B\) overestimates the genetic proportion of group differences. More importantly, with enough of a negative relationship, the group with the lower phenotypic mean may actually have a greater (additive) genetic mean (stressing the importance of sign consistency and causal variants). Current PGS and existing biometric designs can be used to estimate the relevant parameters in SEMs, but few cohorts have the requisite data.

With the standardized group difference, d (can be done with Hedge’s g), if the raw data to compute t are not available, t can be approximated as

\[\eta^2 \approx \frac{0.5d^2}{1 + 0.5d^2}\]

If the data are available and \(\sigma^2\) (importantly, phenotypic in this case, although I use the same symbols variously for both genetic and phenotypic phenomena - which is which should be clear from context) is equal for the groups (generalizing to the unequal case is trivial), it can be directly computed as

\[\sigma^2_B = (\frac{\sigma^2_W}{k})^2\]

\[t = \frac{\sigma^2_B}{\sigma^2_B + \sigma^2_W}\]

The values for r have been more elusive. If one assumes shared genetic architectures and no genetic differences, then r = 0; if it’s assumed that genetic and environmental group differences are unrelated, then r = \(t/h^2\); and if it’s assumed that genetic and environmental differences are exactly mirrored, then r = \(h^2t\), because, where \(h^2\) is the whole population heritability

\[h^2_W = h^2 \frac{1-r}{1-t}\]

r is equal to two times the fixation index, \(Fst\) (see Wright, 1951; Whitlock, 2008; Leinonen et al., 2013), thanks to the Wahlund effect (Zhivotovsky, 2015, among others); that is, the inbreeding coefficient (calculated in PLINK) \(Fis\), for individuals relative to the subpopulation, is convertible to both the F-statistic with respect to the total population, \(Fit\), and the coefficient for subpopulations with respect to the total population, \(Fst\), which is what we want to use for comparing the groups in question (it is also the only one which must be positive and bounded between 0 and 1). It is simple algebra to see that these can all be derived from one another. For example, given some value of \(Fis\)

\[Fis = \frac{Fit - Fst}{1 - Fst}\]

so to derive \(Fit\)

\[Fit = Fis - (Fis \times Fst) + Fst\]

or more commonly,

\[1 - Fit = (1 - Fis)(1-Fst)\]

and \(Fst\) is thus

\[Fst = \frac{Fit - Fis}{1- Fis }\]

Justifying the doubling of \(Fst\), we’re diploid, but its necessity can also be demonstrated using a well-known empirical regularity: that the genetic diversity due to population differences tends to be much less than the genetic diversity within populations. Stated another way, \(Fst\) is able to give us the ratio of \(\sigma^2_B\) to \(\sigma^2_W\), \(Qst\), for some trait

\[Qst = \frac{Fst}{Fst+\frac{1}{2}(1-Fst)} = \frac{2Fst}{1+Fst} = \frac{\sigma^2_B}{\sigma^2_B + 2\sigma^2_W} = \frac{\sigma^2_B}{\sigma^2_T}\]

\(Qst\) is “the amount of genetic variance among populations relative to the total genetic variance in [a] trait” (Leinonen et al., 2013) and “[w]hen dealing with haploid populations or collections of entirely selfed lines, \(\sigma^2_W\)… is weighted by one, rather than two” (Walsh & Lynch, 2018, p. 453). This formula also assumes the variance in the trait is additive as a simplifying assumption. It is possible to plausibly derive it from phenotypic information, which has the advantage that it’s complete where the purely genetic estimates are not or they don’t characterize the whole architecture (including unshared portions) of the trait; unfortunately, these values may be contentious, even if they’ve been accurate in the controlled environments of plant and animal breeding experiments. Whitlock & Guillaume (2009) further noted that

\[\sigma^2_B = \frac{2Fst \sigma^2_W}{1-Fst}\]

which, through simple algebra, implies

\[\sigma^2_W = \frac{\sigma^2_B}{2Fst} - \frac{\sigma^2_B}{2}\]

Using the formula for \(Qst\) and the \(Fst\) from the TCP/PNC (given below) for the African American-European American comparison, the genetic diversity associated with the (sub)populations is 13%, compared to 87% which is found within the populations. When this quantity exceeds \(Fst\) (significantly), there’s evidence for divergent selection; when \(Fst\) exceeds it (again, significantly), there’s evidence for convergent, stabilizing or uniform selection, where similar values of some trait are favored across populations. If the two values are equal, the divergence observed is expected to have occurred under drift (though drift can be falsely assumed to be selection in some cases). Because an insignificant result does not imply evidence in favor of the null hypothesis (Wasserstein & Lazar, p. 132; absent, e.g., a power analysis, confirmatory tests, qualification of the strength of the evidence - perhaps with Bayes factors and similar metrics capable of quantifying the weight of the evidence and so on), this result “[d]oes not rule out selection, but does not support it either” (Walsh & Lynch, 2018, p. 453). Without many populations in a test, estimates are usually inaccurate and power is generally low (and lower yet for tests that \(Qst\) < \(Fst\); see Whitlock & Guillaume, 2009; O’Hara & Merila, 2005). In order to properly convey a consensus statement of caution, I’ll just quote Walsh & Lynch about some of the assumptions and their effects on this test

It is important to stress that any comparison of this sort must be performed using the same set of populations to obtain both \(Qst\) and \(Fst\). An analysis using an estimate of \(Fst\) from one set of populations and \(Qst\) from another is not trustworthy.

The first (of many) caveats with respect to this strategy is that, even under neutrality, the expected value of \(Qst\) will not necessarily equal \(Fst\) if the trait of interest is influenced by nonadditive genetic effects…. [W]ith nonadditive gene action, the within- and among-population components of genetic variation for neutral characters under short-term divergence are no longer equal to \(\sigma^2_{GW} = (1-f)\sigma^2_G\) and \(\sigma^2_{GB} = 2f\sigma^2_G\) (where \(f\) is the parameter estimated by \(Fst\)), but instead are influenced by a number of higher-order terms. In general, because the within-population genetic variance declines less rapidly with inbreeding under nonadditivity (and sometimes even increases), \(Qst\)… will tend to be smaller than \(Fst\) under neutrality. In particular, Whitlock… showed that additive \(\times\) additive variance always results in \(Qst < Fst\) under neutrality. Dominance also causes \(Qst\) and \(Fst\) to deviate under neutrality, with the direction of the inequality depending on the details of the population structure. There is disagreement as to the practical importance of these departures, especially given the large variances associated with \(Qst\) estimates. However, because these violations of assumptions often (but not always) result in \(Qst < Fst\), this general behavior makes conclusions regarding adaptive divergence based on elevated \(Qst\) conservative, while rendering observations of \(Qst < Fst\) ambiguous. Violations of this assumption of additivity may not be a serious issue for most morphological traits, but given that life-history traits often show considerable nonadditive variance, these may be more vulnerable to false impressions under a comparison of \(Qst\) and \(Fst\).

A second caveat is that the choice of markers used to estimate \(Fst\) can introduce bias. The strong assumption is that the markers chosen are neutral, such that any structure associated with the markers reflects the neutral population structure. Historically, allozyme markers were commonly used to estimate \(Fst\), and because these represent variant protein products, some may not be neutral. Another problematic (but widely used) marker class is microsatellites. For \(Fst\) to serve as a neutral proxy for the behavior of alleles underlying a focal trait, the mutational structure of the markers and QTLs must be compatible. Microsatellite alleles can easily back-mutate, resulting in underestimation of \(Fst\). While microsatellite-specific distance metrics (such as \(Rst\)) have been proposedm these should not be used in place of \(Fst\) for comparison with \(Qst\). These modified metrics adjust for high rates of back-mutations, something not expected at QTL alleles, potentially resulting in different adjusted measures of allelic divergence at the markers versus QTLs. These issues are of special concern given that many early studies used microsatellites. The ever-increasing use of SNPs to estimate \(Fst\) avoids these concerns.

Relevant to this, regarding the calculation of \(h^2_W\), it should be stated that it’s probably estimated too high and both epistasis and dominance (and bifurcated genetic effects) are likely to be estimated too low: they’re generally taken to be zero in twin models. More advanced family models have consistently shown at least dominance variance likely has a role to play in many interesting traits including intelligence, models like the ACE with the choice between dominance and the shared environment are simply unable to simultaneously estimate them, so dominance is dropped because there is evidence of shared environmental effects and they usually fit better (better fit may be expected for the shared environment relative to dominance as a result of sampling error unfortunately). It’s difficult to do anything about this with existing methods of calculating the SNP heritability. Hopefully an abundance of data and both methodological and technological improvements will allow this situation to be ameliorated. A bit less unfortunately, it’s almost always a proxy phenotype which is used in lieu of the real parameter of interest for psychological traits like intelligence or neuroticism (like a sumscore and all that comes with that) in these analyses.

\(Qst\) as a function of \(Fst\) can be thought of as a noisier version of an already noisy statistic. The already limited power due to the numerous assumptions in the comparison is generally very small for pairwise comparisons because the variance among groups is a function of the number of groups included in the analysis. Using Walsh & Lynch’s equation 12.28c for a simplified case, the power for some comparison can be estimated by asking how often the ratio \(\frac{Qst}{Fst}\) is in excess of a value \(\delta\). Where \(n_d\) is the number of comparison groups (“demes”), this is as follows:

\[Pr(\frac{Qst}{Fst}>\delta) = Pr(\frac{(n_d-1)Qst}{Fst}>\delta(n_d-1)) = Pr(\chi^2_{n_{d-1}}>\delta(n_d-1))\]

To obtain a significant value in a comparison of \(n_d\) = 2, Walsh & Lynch calculated that the true value of \(Qst\) must be five-times larger than the \(Fst\) in order to be significant at the \(\alpha\) = 0.05 level (\(Qst\) tests are two-sided, so \(Pr(\chi^2_1>5.02) = 0.025\)), but with \(n_d\) = 10, \(Pr(\chi^2_9>19.03) = 0.025\) leads to \(\delta\) = \(\frac{19.03}{3}\) = 2.1. Stated another way, with ten groups, \(Qst\) must only be two-times \(Fst\) to be significant. Keeping \(n_d\) at 10, for a test of \(Qst\) < \(Fst\), \(Qst\) must be \(\frac{2.7}{9}\) = 0.3, or approximately one-third of \(Fst\) (keep previously-stated substantial comparison issues for these tests in mind). This equation is clearly useful, but it would be more useful taking into account the considerable effect of sampling variance on estimates. Rogell et al. (2012), taking after Whitlock & Guillaume (2009), used violin plots of the difference between estimated \(Qst\) values and the expected \(Qst\) distribution for a neutral trait with wholly additive gene action as a way of visualizing the uncertainty in estimates. These have the advantage that they’re easy to interpret: when the credible interval is entirely above or below zero, then there’s evidence for divergence or convergence. Ascertainment bias may also be at work as an explanation of the \(Qst\) > \(Fst\) finding (or the converse) for many populations.

As a final note on this test, it can be biased by linkage disequilibrium (LD), an explanation offered for differences in PGS effect sizes among human populations (see Zanetti & Weale, 2018; this may also be why LD is more differentiated than SNPs tagged for many traits). Quoting Walsh & Lynch (see pp.460-462):

Tests comparing \(Fst\) values at candidate loci against the distribution of \(Fst\) values at putatively neutral markers [are a step removed from comparisons of \(Qst\) to \(Fst\)], in that, ideally, we would like to contrast the \(Fstq\) value (the average \(Fst\) value for loci underlying our focal trait) against the genome-wide \(Fst\) neutral standard. Given the near impossibility of locating all such causative loci, we have instead been using \(Qst\), as with an additive trait, this should track the \(Fstq\) values at the underlying causative loci. However… allele-frequency changes are not the only route through which genetic variances (and hence the components of \(Qst\)) can change. Selection-generated gametic-phase disequilibrium (LD) - even among unlinked loci - can have a dramatic effect, even in situations where little allele-frequency change occurs. This impact of LD on \(Qst\) was stressed first by Latta, and later by Le Corre and Kremer. Because \(Qst\) is based on variance components, it can be influenced by linkage disequilibrium, which generates covariances between alleles at different loci, either inflating or deflating the resulting variances. When this happens, the values of \(Qst\) and \(Fstq\) can become decoupled, and (as we will see) \(Qst\) can have more power to detect selection than \(Fstq\) (even presuming we could locate all the underlying loci).

Thus, while a significant departure of \(Qst\) from the background value of \(Fst\) is usually taken as indicating a shift in the \(Fstq\) values at the underlying trait loci, this is only strictly correct when linkage disequilibrium is absent [emphasis mine]. Even in cases where selection induces little allele-frequency change (and hence little shift in \(Fstq\) relative to the background \(Fst\)), selection-induced disequilibrium (i.e., shifts in gamete, as opposed to allele, frequencies) can still generate a significant \(Qst\) signal. In particular, under the infitesimal model, there is essentially no shift in the allele frequencies at underlying loci (\(Fstq \simeq Fst\)), but there can be a substantial change in the genetic variances due to selection-induced LD, and hence a perturbation of \(Qst\) away from \(Fstq\). In such a setting, a direct comparison of \(Fstq\) to the genome-wide \(Fst\) standard would not reveal any evidence of selection, but a comparison of \(Qst\) (with its LD-shifted variance components) against \(Fst\) might. Hence, under polygenic sweep conditions, an appropriately performed \(Qst\) test might detect selection signatures missed by allele-frequency based tests [bold emphasis mine].

To expand on this point, we need to consider how the within- and among-population LD impact \(Qst\). Letting the subscript \(x\) denote either within- or among-population values (\(x = w\) and \(x = a\), respectively), we can express the variances comprising \(Qst\) as

\[\sigma^2_x = \sigma^2_{x,0} + d_x = (1+\phi_x)\sigma^2_{x,0}\]

where

\[\phi_x = \frac{d_x}{\sigma^2_{x,0}}\]

where \(\sigma^2_{x,0}\) is the linkage disequilibrium value, \(d_x\) is the diequilibrium contribution generated by covariance among alleles at different loci, and \(\phi_x\) is the ratio of the disequilibrium contribution to the linkage disequilibrium (i.e., genic) variance (note that \(\phi_x\) is negative when \(d_x\) is negative)…. [S]tabilizing or directional selection within a population generates negative \(d\), so we often expect negative within-population LD (negative values of \(d_w\) and \(\phi_w\)).

Turning to the among-population LD, Latta noted that if each population is under stabilizing selection for a different optimum value (\(\theta\)), then for an additive trait where the population means have reached their optimal values

\[d_a = \sigma^2_\theta - 2Fstq \sigma^2_A\]

where \(\sigma^2_\theta\) is the variance in the optimum value over populations, and \(\sigma^2_A\) is the expected additive genetic variation if the populations were to be randomly mated to form a single, panmictic, population (in linkage equilibrium). With nearly uniform selection (the variance in \(\theta\) values over demes is small) and reduced migration (so that \(Fstq\) is large), [that equation] gives a negative covariance (\(d_a,\phi_a < 0\)) between trait-increasing alleles at different loci across demes, reducing the among-group variance \(\sigma^2_{GB}\) below its linkage-disequilibrium value. Conversely, if diversifying selection is strong (\(\sigma^2_\theta\) is large) and gene flow is high (\(Fstq\) is small), a positive covariance is expected (\(d_a,\phi_a > 0\)), and \(\sigma^2_{GB}\) is inflated relative to its value in the absence of LD. Thus, \(Qst\) often magnifies the effect of selection over what is epxected from changes in \(Fstq\) alone, with significant changes in \(Qst\) (relative to \(Fst\)) possible even when little differentiation has occurred at the underlying QTLs (\(Fstq \simeq Fst\)).

For a completely additive trait, Le Corre and Kremer quantified the influence of LD on \(Qst\) by noting that the relationship between \(Qst\) (based on variance components) and \(Fstq\) (based on the underlying loci) is given by

\[Qst = \frac{(1-\phi_a)Fstq}{(\phi_a-\phi_w)Fstq+1+\phi_w}\]

where \(\phi_x\) is given [earlier]. Note that \(Qst\) equals \(Fstq\) only when the among- and within-population LD values are equal (\(\phi_a = \phi_w\)). Using [that equation], Kremer and Le Corre showed that \(Qst > Fstq\) when \(\phi_a > \phi_w\). Given that stabilizing selection within populations generates negative values of \(\phi_w\), while diversifying selection (variation in the optimum over populations) generates positive values of \(\phi_a\), this combination amplifies the signal in \(Qst\) over that generated from \(Fstq\). As \(Qst > Fst\) is the signal for divergent selection, while our last result implies that \(Qst > Fstq > Fst\), the impact of LD is to magnify the impact of divergent selection over that expected from allele-frequency changes along (\(Fstq\)). Again, the salient point is that even if the difference between \(Fstq\) and \(Fst\) is small, the differences between \(Qst\) and \(Fst\) can still be large.

Hence, while \(Qst\)-based tests are fraught with complications, if properly performed (which is no small feat), they may actually be more poewrful than a scan for \(Fst\) outliers at known candidate genes for the trait of interest. While \(Fst\)-based scans are trait independent, knowledge of the potential target trait of traits allows \(Qst\), and thus further information from LD, to be exploited.

As a demonstration, Walsh & Lynch provided an example where genes in the photoperiod pathway in populations of Swedish Populus tremula which were under selection that “did not seem to generate a significant departure between \(Fstq\) and \(Fst\)” though “it did generate among-population covariances” such that “the highest five of the allelic pairs correlated between loci also involved either one (or both) alleles (SNPs) that showed significant clines with latitude”. With LD in mind, check the consistency of \(r_{g}\) and prediction by population before even bothering with this sort of test if you’re not planning to incorporate LD information (and note how dubious your result is!). Using only finemapped variants with equal cross-population effect sizes (of which there are, presently, few for most traits) may be helpful but probably severely underpowered. If the real effect sizes differ between populations, there may be differences in the architecture of the trait which preclude this test working properly, the phenotype may be improperly measured or different between groups, or there may be some sort of confounding going on. For more, see Scutari, Mackay & Balding (2016), Visscher et al. (2017), Zanetti & Weale (2018), Skotte et al. (2019), and Marnetto et al. (2020).

In any case, a significant departure in this test does not negate genetic group differences, as should be clear from the formula. In order to clearly convey this point (stated differently, all that’s needed is a difference, not even directional effect consistency with known loci), the expected value of \(h^2_B\) for different values of r and \(h^2_W\) can be plotted. In the example below, all examples used a t of 0.2. As can be observed, \(h^2_B\) is a monotonically increasing function of r and some level of \(h^2_W\) > 0. Worth observing is that the commonly-repeated estimate of the portion of genetic diversity in humans due to groups (15%, versus 85% found within groups) is compatible with a \(h^2_B\) of 0.5 with \(h^2_W\) = 0.65 and t = 0.2.

ggplot(data = df2, aes(x = x, y = value, color = variable)) + labs(x = "Genetic Intraclass Correlation", y = "Between-group Heritability", color = TeX("$h^2_W$")) + scale_color_manual(labels = c("0%", "10%", "20%", "30%", "40%", "50%", "60%", "70%", "80%", "90%", "100%"), values = c("darkred", "orange", "yellow", "lightgreen", "green", "lightblue", "blue", "violet", "magenta", "purple", "gold")) + geom_line(size = 1, alpha = 0.5) + coord_cartesian(ylim = c(0, 1)) + xlim(0, 0.8) + theme_bw() + theme(legend.position = c(0.90, 0.45), legend.background = element_blank()) 

#The value for the TCP/PNC sample as a dot on the graph

# + annotate("point", shape = 18, x = 0.1394, y = 0.4368791, colour = "darkred", size = 3) 

As can be seen, Jensen’s default hypothesis is by no means implied. In fact, it’s not even implied by an \(h^2_W\) of 100% (note the large empty region to the left of the 100%-curve). An interesting consequence of the relationship between differences and heritabilities is that they suggest populations will be genetically diverged to some degree as a consequence of a phenotypic difference and \(h^2_W\) provided the difference is real and that there aren’t any mean-influencing factors which are homogeneous with respect to the population (and again, I mean subpopulation). For psychometric traits, measurement invariance suggests that the region to the left of the 100% curve is effectively off limits barring non-genetic effects which are extremely difficult to describe (which must mimic genetic ones or continuously varying environmental ones, the magnitude of which can be easily ascertained). As a result, there might be some need to focus on violations of the assumption underlying this formula, that the genetic architectures of the trait are shared. This situation could emerge due to isolation, mutation, dominance, epistasis, homogeneous developmental insults shared to a different degree between populations, and so on: there are numerous options, and it may be the case that classical estimates of \(h^2_W\) aren’t even correct when modeled properly, with form and invariance taken into account, and with more parameters like dominance and epistasis included in models. Since measurement invariance is so routinely violated in international comparisons, there won’t be much need to invoke those explanations for quite some time (provided it ever becomes common). But specific environmental explanations should still generally be avoided; the analysis underlying claims of some environmental effect having pronounced importance are almost always extremely lacking, especially for a trait as contentious as intelligence.

While values of aggregate \(Fst\) have been widely available and estimated in different ways, they need to be supplied for the SNPs specific to our trait of interest, lest, say, researchers observe that admixed populations, highly differentiated for the SNPs for some trait (due to assortative mating or whatever reason), have low \(Fsts\) overall and they assume those values are low for the relevant trait, when that may not be the case (and vice-versa).

The following table provides pairwise \(Fsts\) for the MTAG educational attainment SNPs (this phenotype is an imperfect correlate of intelligence which almost certainly suffers from tag and measurement issues across populations; it may be possible to get around this exploiting LD and fitting measurement models) used commonly by Plomin’s lab derived in the 1000 Genomes cohort and the TCP/PNC.

Source Population One Population Two \(Fst\)
1KG AFR AMR 0.0879
1KG AFR EAS 0.1219
1KG AFR EUR 0.1038
1KG AFR SAS 0.0914
1KG AMR EAS 0.0612
1KG AMR EUR 0.0207
1KG AMR SAS 0.0265
1KG EAS EUR 0.0879
1KG EAS SAS 0.0554
1KG EUR SAS 0.0323
TCP EA AA 0.0697
TCP EA HA 0.0121

It is not recommended to take \(Fsts\) from a population without phenotype data (or from small samples for that matter) without some assurance that they aren’t selected relative to the rest of their populations. This warning should be a pronounced concern for 1000 Genomes since the people volunteered; they may be selected for the SNPs we’re interested in. It’s also important to use the proper method for computing these values. Methods can differ and the tradeoff between statistical power to, e.g., conduct differentiation tests or get accurate values and filtering over substantive (or not) concerns is worth careful consideration. Anyway, since the values for the TCP/PNC are available and more reliable than other datasets like 1KG in terms of their size and known representativeness, I’ll compute the \(h^2_B\) for the African American and European American populations. In the future, I’ll add the Hispanic-European American comparison (this requires computing the heritability, which can be done multiple ways, some more or less accurate), and when new PGS are released, I’ll recompute these values and, also assess the consistency between versions. It may be worthwhile to assess the consistency between EA1 through EA3 right now, but I’ll save that for later. The \(h^2_W\) for African Americans and European Americans is derived from Mollon et al. (2018). Though a genomic estimate, it is quite accurate (but attenuated since it was based on a sumscore; corrections for reliability may be appropriate, but wouldn’t do much. The data should just be computed appropriately in the first place if we want to get more reliable estimates.) thanks to being derived from IBD, unlike SNP heritability estimation methods which severely underestimate heritability in a way which is dependent on array coverage, as compared to twin/sibling estimates which are taken as unbiased (and for which any potential bias can be tested, but bias is usually minor, of inconsistent direction, or flatly not supported). The function for DeFries heritability is BGHD.

BWM <- WM(0.72, 0.61, 4694, 1940)
BWM
## [1] 0.6878324
DeFries <- BGHD(1.01, BWM, 0.0697)
DeFries
## [1] 0.4368791
df <- data.frame("BGS" = DeFries, "Name" = "DeFries")
dd <- rbind(ds, df)

DeFries believed that if trait values were equal for populations, r would also be even. This is not the case without causal SNPs (or by exploiting LD assuming the tag is for a causal SNP) and shared genetic architectures (i.e., excluding population-specific variants, pleiotropy and such; under a Qst < Fst scenario, or for a trait with strong mutation-selection balance, this could be more likely). Given that intelligence has been observed to be affected by mutation-selection balance and that it shows overdominance (and, as such, the converse of inbreeding depression), there is likely an excess of heterozygosity if there has been selection based on purification (given higher mutation rates and historical population sizes outside of Africa, this would be likelier there), biasing \(Fst\) and \(Qst\) inferences (see Edelaar & Bjorklund, 2011); additionally, any variants which are at fixation in the discovery population (especially by bottlenecks, which occurred often outside of Africa, emphasizing the need for African discovery samples; see Campbell & Tishkoff, 2008) will be less likely to have been discovered, leading to a further underestimate of the between-population genetic variance which may be consequential.

It is possible that these values are too high or too low, so more discovery is needed to improve them. It is also possible to compute the expected mean difference from these values. Here are how the methods stack up:

ggplot(dd, aes(x = Name, y = BGS, fill = Name)) + geom_bar(stat = "identity", show.legend = F) + coord_cartesian(ylim = c(0, 1)) + xlab("Method") + ylab("Between-group Heritability") + theme_bw() + scale_fill_manual(values = c("#E69F00", "#999999", "#3C94CF", "#E32528"))

The estimates from the different methods are all quite comparable at present, but they’re each underestimated to different degrees. The Jensen method assumes a lack of sampling error and assortative mating among the parents of the people who mixed (and a lack of \(\sigma_{g,e}\), but that didn’t seem to fit), but, especially at the small sample sizes observed, that assumption is untenable; the CIs are large. The Scarr method assumes a lack of range restriction and sampling variance, but at least the former assumption is clearly not viable. The DeFries method assumes the inbreeding coefficient is for the totality of the differentiating SNPs and that they’re the causal variants. It’s convenient to assume that the \(Fst\) we have is mirrored in the undiscovered SNPs, but that’s unlikely, since those SNPs will presumably be of smaller effect and less common across populations. Moreover, many of these SNPs are clearly not causal, just tags, and in some cases, stratification-related tags; this may explain why LD differentiation is so much greater than the SNP-wise differentiation. Moreover, the \(h^2_W\) is assumed to be free of error, but sampling error, method variance, and error in the target phenotype attenuate the heritability. In this case, the heritability was a sumscore, not a latent variable, so it contains some portion of error, nuisance dimensionality, specificity, etc., the method for heritability calculation also delivers attenuated estimates with array rather than whole-genome data, causing further (in this case, probably minute) reductions for artefactual reasons. Importantly, the adult \(h^2_W\) needs to be used since it increases to an asymptote with age while at younger ages the shared environment mirrors its effect in biological families, leading to underestimates using younger-aged estimates (presuming a method which shows age-related changes - one which confounded shared environmental and genetic variance at a young age would deliver an accurate estimate if the unshared variance is assumed to be constant during aging). The approximation of t may also be affecting the results, presumably increasing the \(h^2_B\) with DeFries method in some cases, although in this one, it negligibly decreased it.

An educated guess still has the output of DeFries formula <60%. Corrected for measurement error assuming the reliability is bog-standard for good composites (i.e., 0.95), 44% becomes 46%, if 85% of the variance is actually due to g, 54% - but this assumes that the non-g variance only attenuates estimates and that cannot be stated confidently. Increasing \(h^2_W\) to 100%, the value reaches a mere 64%, which would make genes only 1.3 times as important as environments (and genetic effects which aren’t indexed yet; note current ones may be indexed improperly as well, and the divergence between LD and SNP-based results reinforces that) for determining the mean differences - hardly pronounced and certainly not startling (although that also doesn’t make them meliorable without knowing what environments and how; see Anastasi, 1958).

To-Do (when there’s more data): structural/biometric models

To-Do

  • Citations
  • Assumptions
    • Types of heritability (and note, heritability is slope not \(r^2\) in sibling regression)
    • Reliability, meta-analysis of estimates
    • Effect of failing to use proper variables (latent, sumscore, wrongly computed, measurement error, attenuating dimensionality, etc.)
    • Effect of imputing values (trait, population) to a population
    • Consistency of \(F_{st}\) and \(r_g\) between EA versions
    • Genotype-environment interaction/correlation and assortative mating
    • Measurement invariance
    • Admixture confounding/not confounding \(F_{st}\) estimates for specific PGS
    • Age effects (Chipuer, Rovine & Plomin (1990) and Devlin, Daniels & Roeder (1997) failing to account for it despite everyone in the field pointing out the age issues before and after is worth noting)
    • Standardized versus unstandardized coefficients and expected mean differences (see Turkheimer, 1991)
    • How \(h^2_B\) based on DeFries’ method is not a “maximum” genetic variance (this should be obvious, with unknowns and only additivity, but it is apparently not)
    • Etc.
  • Simulations
  • Plots for methods
  • Structural model(s)
    • Integrating personal F-statistics (with Wahlund effect explanation/relevance)
    • Proofs
    • Extended equations for epistasis, dominance, etc.
    • Outside of the model, \(h^2_B\) can exceed 1 or be below 0 for some methods with certain environmental effects including various types of error
  • Review evidence
    • On twin and other studies
    • Old method admixture studies
    • Assumption checks (explain Bouchard’s pseudo-analysis and how assumption violations don’t have to do harm)
    • LD vs. SNP-wise differentiation
    • Historical assortative mating (wrt admixture)
    • Current assortative mating (wrt admixture)
    • Size of required environmental effects given heritability estimates
    • Transracial adoption?
    • \(Fst\) components are not biometric ones
  • Local admixture
    • Gene discovery
    • Assessing historical assortative mating

References

Jensen, A. R. (1973). Educability and Group Differences. Harper & Row.

Scarr, S., Pakstis, A. J., Katz, S. H., & Barker, W. B. (1977). Absence of a relationship between degree of white ancestry and intellectual skills within a black population. Human Genetics, 39, 69-86. https://doi.org/10.1007/BF00273154

Loehlin, J. C., Vandenberg, S. G., & Osborne, R. T. (1973). Blood group genes and Negro-white ability differences. Behavior Genetics, 3(3), 263-270.

Reed, T. E. (1997). “The genetic hypothesis”: It was not tested but it could have been. American Psychologist, 52(1), 77-78. https://doi.org/10.1037/0003-066X.52.1.77

Loehlin, J. C., Lindzey, G., & Spuhler, J. N. (1975). Race differences in intelligence. W H Freeman/Times Books/ Henry Holt & Co.

Loehlin, J. C. (2000). Group differences in intelligence. In Handbook of intelligence (pp. 176-193). Cambridge University Press. https://doi.org/10.1017/CBO9780511807947.010

Wan, X., Wang, W., Liu, J., & Tong, T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14. https://doi.org/10.1186/1471-2288-14-135

Signorello, L. B., Williams, S. M., Zheng, W., Smith, J. R., Long, J., Cai, Q., Hargreaves, M. K., Hollis, B. W., & Blot, W. J. (2010). Blood vitamin D levels in relation to genetic estimation of African ancestry. Cancer Epidemiology, Biomarkers & Prevention???: A Publication of the American Association for Cancer Research, Cosponsored by the American Society of Preventive Oncology, 19(9), 2325-2331. https://doi.org/10.1158/1055-9965.EPI-10-0482

Cheng, C.-Y., Reich, D., Haiman, C. A., Tandon, A., Patterson, N., Elizabeth, S., Akylbekova, E. L., Brancati, F. L., Coresh, J., Boerwinkle, E., Altshuler, D., Taylor, H. A., Henderson, B. E., Wilson, J. G., & Kao, W. H. L. (2012). African Ancestry and Its Correlation to Type 2 Diabetes in African Americans: A Genetic Admixture Analysis in Three U.S. Population Cohorts. PLOS ONE, 7(3), e32840. https://doi.org/10.1371/journal.pone.0032840

Bryc, K., Auton, A., Nelson, M. R., Oksenberg, J. R., Hauser, S. L., Williams, S., Froment, A., Bodo, J.-M., Wambebe, C., Tishkoff, S. A., & Bustamante, C. D. (2010). Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proceedings of the National Academy of Sciences, 107(2), 786-791. https://doi.org/10.1073/pnas.0909559107

Baharian, S., Barakatt, M., Gignoux, C. R., Shringarpure, S., Errington, J., Blot, W. J., Bustamante, C. D., Kenny, E. E., Williams, S. M., Aldrich, M. C., & Gravel, S. (2016). The Great Migration and African-American Genomic Diversity. PLOS Genetics, 12(5), e1006059. https://doi.org/10.1371/journal.pgen.1006059

Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2015). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics, 96(1), 37-53. https://doi.org/10.1016/j.ajhg.2014.11.010

Jin, W., Xu, S., Wang, H., Yu, Y., Shen, Y., Wu, B., & Jin, L. (2012). Genome-wide detection of natural selection in African Americans pre- and post-admixture. Genome Research, 22(3), 519-527. https://doi.org/10.1101/gr.124784.111

Smith, M. W., Patterson, N., Lautenberger, J. A., Truelove, A. L., McDonald, G. J., Waliszewska, A., Kessing, B. D., Malasky, M. J., Scafe, C., Le, E., De Jager, P. L., Mignault, A. A., Yi, Z., de The, G., Essex, M., Sankale, J.-L., Moore, J. H., Poku, K., Phair, J. P., … Reich, D. (2004). A High-Density Admixture Map for Disease Gene Discovery in African Americans. American Journal of Human Genetics, 74(5), 1001-1013.

Stefflova, K., Dulik, M. C., Pai, A. A., Walker, A. H., Zeigler-Johnson, C. M., Gueye, S. M., Schurr, T. G., & Rebbeck, T. R. (2009). Evaluation of Group Genetic Ancestry of Populations from Philadelphia and Dakar in the Context of Sex-Biased Admixture in the Americas. PLoS ONE, 4(11). https://doi.org/10.1371/journal.pone.0007842

Stefflova, K., Dulik, M. C., Barnholtz-Sloan, J. S., Pai, A. A., Walker, A. H., & Rebbeck, T. R. (2011). Dissecting the Within-Africa Ancestry of Populations of African Descent in the Americas. PLOS ONE, 6(1), e14495. https://doi.org/10.1371/journal.pone.0014495

McQueen, M. B., Boardman, J. D., Domingue, B. W., Smolen, A., Tabor, J., Killeya-Jones, L., Halpern, C. T., Whitsel, E. A., & Harris, K. M. (2015). The National Longitudinal Study of Adolescent to Adult Health (Add Health) Sibling Pairs Genome-Wide Data. Behavior Genetics, 45(1), 12-23. https://doi.org/10.1007/s10519-014-9692-4

Gates Jr., H. L. (2013, February 11). Exactly How ‘Black’ Is Black America? The Root. https://www.theroot.com/exactly-how-black-is-black-america-1790895185

Kirkegaard, E. O. W., Woodley of Menie, M. A., Williams, R. L., Fuerst, J., & Meisenberg, G. (2019). Biogeographic Ancestry, Cognitive Ability and Socioeconomic Outcomes. Psych, 1(1), 1-25. https://doi.org/10.3390/psych1010001

Witty, P. A., & Jenkins, M. A. (1936). Intra-race testing and negro intelligence. The Journal of Psychology: Interdisciplinary and Applied, 1, 179-192.

Jenkins, M. D. (1936). A Socio-Psychological Study of Negro Children of Superior Intelligence. The Journal of Negro Education, 5(2), 175. https://doi.org/10.2307/2292155

Herskovits, M. J. (1930). The Anthropometry of the American Negro. Ardent Media.

Reed, T. E. (1969). Caucasian genes in American Negroes. Science (New York, N.Y.), 165(3895), 762-768.

Parra, E. J., Marcini, A., Akey, J., Martinson, J., Batzer, M. A., Cooper, R., Forrester, T., Allison, D. B., Deka, R., Ferrell, R. E., & Shriver, M. D. (1998). Estimating African American Admixture Proportions by Use of Population-Specific Alleles. The American Journal of Human Genetics, 63(6), 1839-1851. https://doi.org/10.1086/302148

Mackenzie, B. (1984). Explaining Race Differences in IQ. American Psychologist, 20.

Meier, A. (1949). A Study of the Racial Ancestry of the Misissippi College Negro. American Journal of Physical Anthropology, 7, 228-232.

Reed, T. E. (1973). Number of Gene Loci required for Accurate Estimation of Ancestral Population Proportions in Individual Human Hybrids. Nature, 244(5418), 575-576. https://doi.org/10.1038/244575a0

Bouchard, T. J. (2013). The Wilson Effect: The increase in heritability of IQ with age. Twin Research and Human Genetics: The Official Journal of the International Society for Twin Studies, 16(5), 923-930. https://doi.org/10.1017/thg.2013.54

Jensen, A. R. (1998). The g Factor: The Science of Mental Ability. Praeger Publishers/Greenwood Publishing Group.

Reed, T. E. (1997). “The genetic hypothesis”: It was not tested but it could have been. American Psychologist, 52(1), 77-78. https://doi.org/10.1037/0003-066X.52.1.77

Parra, E. J., Kittles, R. A., & Shriver, M. D. (2004). Implications of correlations between skin color and genetic ancestry for biomedical research. Nature Genetics, 36(11 Suppl), S54-60. https://doi.org/10.1038/ng1440

Ruiz-Linares, A., Adhikari, K., Acuna-Alonzo, V., Quinto-Sanchez, M., Jaramillo, C., Arias, W., Fuentes, M., Pizarro, M., Everardo, P., Avila, F. de, Gomez-Valdes, J., Leon-Mimila, P., Hunemeier, T., Ramallo, V., Cerqueira, C. C. S. de, Burley, M.-W., Konca, E., Oliveira, M. Z. de, Veronez, M. R., … Gonzalez-Jose, R. (2014). Admixture in Latin America: Geographic Structure, Phenotypic Diversity and Self-Perception of Ancestry Based on 7,342 Individuals. PLOS Genetics, 10(9), e1004572. https://doi.org/10.1371/journal.pgen.1004572

Gravlee, C. C., Non, A. L., & Mulligan, C. J. (2009). Genetic Ancestry, Social Classification, and Racial Inequalities in Blood Pressure in Southeastern Puerto Rico. PLOS ONE, 4(9), e6821. https://doi.org/10.1371/journal.pone.0006821

Leite, T. K. M., Fonseca, R. M. C., Franca, N. M. de, Parra, E. J., & Pereira, R. W. (2011). Genomic Ancestry, Self-Reported “Color” and Quantitative Measures of Skin Pigmentation in Brazilian Admixed Siblings. PLOS ONE, 6(11), e27162. https://doi.org/10.1371/journal.pone.0027162

DeFries, J. C. (1973). Quantitative aspects of genetics and environment in the determination of behavior. In Genetics, environment and behavior: Implications for educational policy. Academic Press. https://doi.org/10.1016/B978-0-12-233450-4.50009-4

Wright, S. (1951). The Genetical Structure of Populations. Annals of Eugenics, 15(1), 323-354. https://doi.org/10.1111/j.1469-1809.1949.tb02451.x

Evolution and Selection of Quantitative Traits. (2018). Oxford University Press.

Whitlock, M. C. (2008). Evolutionary inference from QST. Molecular Ecology, 17(8), 1885-1896. https://doi.org/10.1111/j.1365-294X.2008.03712.x

Leinonen, T., McCairns, R. J. S., O’Hara, R. B., & Merila, J. (2013). QST-FST comparisons: Evolutionary and ecological insights from genomic heterogeneity. Nature Reviews Genetics, 14(3), 179-190. https://doi.org/10.1038/nrg3395

Zhivotovsky, L. A. (2015). Relationships Between Wright’s FST and FIS Statistics in a Context of Wahlund Effect. Journal of Heredity, 106(3), 306-309. https://doi.org/10.1093/jhered/esv019

Walsh, B., & Lynch, M. (2018). Evolution and Selection of Quantitative Traits. Oxford University Press. https://www.oxfordscholarship.com/view/10.1093/oso/9780198830870.001.0001/oso-9780198830870

Whitlock, M. C., & Guillaume, F. (2009). Testing for Spatially Divergent Selection: Comparing QST to FST. Genetics, 183(3), 1055-1063. https://doi.org/10.1534/genetics.108.099812

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA Statement on p-Values: Context, Process, and Purpose. The American Statistician, 70(2), 129-133. https://doi.org/10.1080/00031305.2016.1154108

O’Hara, R. B., & Merila, J. (2005). Bias and Precision in QST Estimates: Problems and Some Solutions. Genetics, 171(3), 1331-1339. https://doi.org/10.1534/genetics.105.044545

Rogell, B., Dannewitz, J., Palm, S., Petersson, E., Dahl, J., Prestegaard, T., Jarvi, T., & Laurila, A. (2012). Strong divergence in trait means but not in plasticity across hatchery and wild populations of sea-run brown trout Salmo trutta: LOCAL ADAPTATION IN SEA-RUN BROWN TROUT. Molecular Ecology, 21(12), 2963-2976. https://doi.org/10.1111/j.1365-294X.2012.05590.x

Zanetti, D., & Weale, M. E. (2018). Transethnic differences in GWAS signals: A simulation study. Annals of Human Genetics, 82(5), 280-286. https://doi.org/10.1111/ahg.12251

Scutari, M., Mackay, I., & Balding, D. (2016). Using Genetic Distance to Infer the Accuracy of Genomic Prediction. PLOS Genetics, 12(9), e1006288. https://doi.org/10.1371/journal.pgen.1006288

Visscher, P. M., Wray, N. R., Zhang, Q., Sklar, P., McCarthy, M. I., Brown, M. A., & Yang, J. (2017). 10 Years of GWAS Discovery: Biology, Function, and Translation. The American Journal of Human Genetics, 101(1), 5-22. https://doi.org/10.1016/j.ajhg.2017.06.005

Skotte, L., Jarsboe, E., Korneliussen, T. S., Moltke, I., & Albrechtsen, A. (2019). Ancestry-specific association mapping in admixed populations. Genetic Epidemiology, 43(5), 506-521. https://doi.org/10.1002/gepi.22200

Marnetto, D., Parna, K., Lall, K., Molinaro, L., Montinaro, F., Haller, T., Metspalu, M., Magi, R., Fischer, K., & Pagani, L. (2020). Ancestry deconvolution and partial polygenic score can improve susceptibility predictions in recently admixed individuals. Nature Communications, 11(1), 1-9. https://doi.org/10.1038/s41467-020-15464-w

Mollon, J., Knowles, E. E. M., Mathias, S. R., Gur, R., Peralta, J. M., Weiner, D. J., Robinson, E. B., Gur, R. E., Blangero, J., Almasy, L., & Glahn, D. C. (2018). Genetic influence on cognitive development between childhood and adulthood. Molecular Psychiatry, 1-10. https://doi.org/10.1038/s41380-018-0277-0

Edelaar, P., & Bjorklund, M. (2011). If FST does not measure neutral genetic differentiation, then comparing it with QST is misleading. Or is it? Molecular Ecology, 20(9), 1805-1812. https://doi.org/10.1111/j.1365-294X.2011.05051.x

Campbell, M. C., & Tishkoff, S. A. (2008). AFRICAN GENETIC DIVERSITY: Implications for Human Demographic History, Modern Human Origins, and Complex Disease Mapping. Annual Review of Genomics and Human Genetics, 9, 403-433. https://doi.org/10.1146/annurev.genom.9.081307.164258

Anastasi, A. (1958). Heredity, environment, and the question how? Psychological Review, 65(4), 197-208. https://doi.org/10.1037/h0044895