Setup

library(pacman); p_load(ggplot2, scales)

my_theme <- function(){
  theme(
    legend.background = element_rect(colour = "black", linewidth = 0.5),
    legend.key.width = unit(0.5, "cm"),
    legend.key.height = unit(0.4, "cm"),
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(colour ="gray80",linewidth = 1,linetype="dashed"),
    panel.grid.minor.y = element_blank(),
    axis.text.x = element_text(colour ="black",size=rel(1.5),hjust=0.5),
    axis.text.y = element_text(colour ="black",size=rel(1.5),vjust=0.5),
    axis.title.x = element_text(colour ="black",size=rel(1.2)),
    axis.title.y = element_text(colour ="black",size=rel(1.2)),
    plot.margin=unit(c(1,0.5,0.5,0.5), "cm"),
    plot.title=element_text(hjust=0.5,vjust=5,size=rel(1.5),face="bold"),
    plot.caption=element_text(hjust=0,size=rel(1.2)),
    axis.line.x=element_line(color="gray80",linewidth = 1,linetype="dashed"),
    axis.line.y=element_line(color="gray80",linewidth = 1,linetype="dashed"),
    axis.ticks.x = element_blank(),
    axis.ticks.y = element_blank())}

Rationale

Gregory Clark has a new paper on social status. It produces a very curious true value for assortative mating of m = 0.57. This is curious because it implies an inconsistency in either Clark’s (2023) results or the results of twin studies of the heritability of social status. For example, Hyytinen et al. (2019) found support for an ACE or ADE model for male and an AE model for female earnings. For men, the heritability in the ACE model was 0.69 and in the ADE model it was 0.53, but the ACE model produced a nonsense estimate for C of -0.15. For women, the heritability was 40% (table 3). In their review of previous studies (table 1), the average heritability in the U.S. was 0.41, whereas the average for Australia was 0.45, and the average for Sweden was 0.40, while Norway had a single estimate for both sexes, and they achieved values of 0.50 for men and 0.45 for women. If Clark’s implied level of assortative mating for status is correct and these estimates are not based on seriously mismeasured status (they are based on earnings/wages/incomes/etc.), then Clark’s work suggests that should be ridiculously inflated.

To understand this, we need to elaborate on several different quantities. First, we need to understand relatedness due to additive genetic variance in a population with assortative mating. Monozygotic twins’ correlation is attributable to \(h^2\), whereas the parent-offspring correlation is attributable to \(\frac{1}{2}(1+\rho)h^2\). The grandparent-grandchild relationship is due to \(\frac{1}{4}(1+\rho)(1+\rho h^2)h^2\). Full-siblings including dizygotic twins are related by \(\frac{1}{2}(1+\rho h^2)h^2\), whereas half-siblings are related by \(\frac{1}{4}(1+2\rho h^2+\rho^2 h^2)h^2\), and the avuncular/materteral (uncle/aunt) correlation is \(\frac{1}{4}(1+\rho h^2)h^2\) and first cousins are \(\frac{1}{8}(1+\rho h^2)^3 h^2\). I understand that’s a lot. Clark listed even more in Table 1.

Second, subtract the dizygotic twin correlation from monozygotic and double it. Call \(\rho h^2\) m. This means the classical twin method does not estimate \(h^2\) in isolation, but it instead estimates \(h^2(1-m)\), or with consistent terms, \(h^2(1-\rho)\). Because Clark uses m and is specifically referring to the correlation between spouses’ underlying genetic components, that is what I will use from here on. \(h^2\) is in equilibrium because if m > 0, there’s a positive buildup of linkage disequilibrium over multiple generations. Due to recombination, mutation, selection, drift, changes in m, etc., the increase in the additive genetic variance that results from assortative mating will asymptote. This gives way to the concept of \(h^2_{\text{equilibrium}}\). If m is 0.57, 1-0.57 = 0.43 is what you would multiply Clark’s heritability estimates by in order to obtain the expected twin model-based heritability estimates.

If Clark’s 0.57 value for m is true, then it suggests that to obtain a purified estimate of \(h^2\) from twin-based heritability estimates, you would need to multiply them by \(\frac{1}{1-0.57}=2.33\). Applying this to Hyytinen et al.’s estimates, the averages for the U.S., Australia, and Sweden would all be >0.95, which seems to be absurdly high. We also know this is inconsistent with Clark’s estimates, because his model is derived with equilibrium heritabilities on the order of 0.191-0.722 in mind, with “Modern Status” having a heritability of 0.466 (all estimates in table S2).

Both Gregory Clark and twin models cannot be correct.

But since I don’t have the data to make things whole, I’ll instead simulate the development of \(h^2_{\text{equilibrium}}\).

The simulation below is a simplified model incorporating a range of unrealistic values for the mutation and recombination rates, as well as the option to change the ranges assortative mating values are plucked from. It will illustrate how additive genetic variance can develop over multiple generations in which assortative mating takes place and it will show how this effectively asymptotes with constant values of m, as Clark suggested.

Code

simulate_variance_change_random_each_gen <- function(h2, r1_range, r2_range, n1, n2, mu_range, rho_range) {
  # h2: heritability (additive genetic variance / total phenotypic variance)
  # r1_range: range of possible values for initial assortative mating coefficient
  # r2_range: range of possible values for final assortative mating coefficient
  # n1: number of generations before change in assortative mating
  # n2: number of generations after change in assortative mating
  # mu_range: range of possible values for mutation rate
  # rho_range: range of possible values for recombination rate
  
  # Initialize vector to store additive genetic variance for each generation
  var_additive <- numeric(n1 + n2)
  
  # Set initial additive genetic variance
  var_additive[1] <- h2
  
  # Simulate additive genetic variance before change in assortative mating
  for (i in 2:n1) {
    # Sample random values for parameters from specified ranges
    r1 <- runif(1, r1_range[1], r1_range[2])
    mu <- runif(1, mu_range[1], mu_range[2])
    rho <- runif(1, rho_range[1], rho_range[2])

    # Calculate new additive genetic variance based on previous generation
    delta_var <- r1^2 * (1 - var_additive[i-1])^2 / (1 + r1^2 * var_additive[i-1] * (1 - var_additive[i-1]))
    delta_var <- delta_var - mu * var_additive[i-1] - rho * var_additive[i-1]^2 / (1 - var_additive[i-1])
    var_additive[i] <- var_additive[i-1] + delta_var}
  
  # Simulate additive genetic variance after change in assortative mating if you want
  for (i in (n1+1):(n1+n2)) {
    # Sample random values for parameters from specified ranges
    r2 <- runif(1, r2_range[1], r2_range[2])
    mu <- runif(1, mu_range[1], mu_range[2])
    rho <- runif(1, rho_range[1], rho_range[2])
  
    # Calculate new additive genetic variance based on previous generation
    delta_var <- r2^2 * (1 - var_additive[i-1])^2 / (1 + r2^2 * var_additive[i-1] * (1 - var_additive[i-1]))
    delta_var <- delta_var - mu * var_additive[i-1] - rho * var_additive[i-1]^2 / (1 - var_additive[i-1])
    var_additive[i] <- var_additive[i-1] + delta_var}
return(var_additive)}

set.seed(123)

# Set parameters
h2 <- 0.5 # Initial additive genetic variance
r1_range <- c(0.25, 0.35)
r2_range <- c(0.25, 0.35)
n1 <- 100
n2 <- 200
mu_range <- c(0.0005, 0.0015)
rho_range <- c(0.005, 0.015)

# Simulate additive genetic variance over multiple generations with no change in assortative mating and random parameter values that change in each generation
var_additive <- simulate_variance_change_random_each_gen(h2, r1_range, r2_range, n1, n2, mu_range, rho_range)

# Plot results using ggplot
data <- data.frame(Generation = 1:(n1+n2), Variance = var_additive)
ggplot(data, aes(x = Generation, y = Variance)) +
  geom_line(linewidth = 1) +
  xlab("Generation") +
  ylab("Additive Genetic Variance") + 
  ggtitle("Additive Genetic Variance Inflation Due to Assortative Mating Asymptotes") + 
  my_theme() + 
  scale_y_continuous(limits = c(0.45, 0.8), labels = scales::percent)

It’s apparent that the level of the asymptote is limited in this model by \(\mu\) and \(\rho\), not by the starting value for \(h^2\). Other factors are likely to be at play, like the gene-environment correlation that seems to be implied in some of Clark’s work. This rGE is asymmetric in the general population: individuals will tend to spend more if they have a windfall and they will spend less if they live a poor life, relative to the lives of the people in their lineage. This social mechanism may help to maintain the constancy observed in Clark’s data. With my simple model that does not incorporate gene-environment correlation (as this would hardly be tractable), to obtain some of Clark’s asymptotes, serious adjustments have to be made.

set.seed(123)

h2 <- 0.3 
r1_range <- c(0.55, 0.59)
r2_range <- c(0.55, 0.59)
n1 <- 100
n2 <- 200
mu_range <- c(0.03, 0.10)
rho_range <- c(0.07, 0.19)

var_additive <- simulate_variance_change_random_each_gen(h2, r1_range, r2_range, n1, n2, mu_range, rho_range)

data <- data.frame(Generation = 1:(n1+n2), Variance = var_additive)
ggplot(data, aes(x = Generation, y = Variance)) +
  geom_line(linewidth = 1) +
  xlab("Generation") +
  ylab("Additive Genetic Variance") + 
  ggtitle("Matching Obtained Values With This Simple Model Requires Unrealistic Fine-Tuning For Some Outcomes") + 
  my_theme() + 
  scale_y_continuous(labels = scales::percent) + 
  geom_hline(yintercept = 0.466, linetype = "dotdash", color = "gray60", linewidth = 1)

This model is obviously too simple, so unrealistic recombination and mutation rates have to be suggested to reach Clark’s values for certain outcomes. This was merely illustrative. But what about for the latent outcome he suggested to have a heritability of 72% and to drive other results?

set.seed(123)

h2 <- 0.5
r1_range <- c(0.55, 0.59)
r2_range <- c(0.55, 0.59)
n1 <- 100
n2 <- 200
mu_range <- c(0.0005, 0.0015)
rho_range <- c(0.005, 0.015)

var_additive <- simulate_variance_change_random_each_gen(h2, r1_range, r2_range, n1, n2, mu_range, rho_range)

data <- data.frame(Generation = 1:(n1+n2), Variance = var_additive)
ggplot(data, aes(x = Generation, y = Variance)) +
  geom_line(linewidth = 1) +
  xlab("Generation") +
  ylab("Additive Genetic Variance") + 
  ggtitle("Matching Clark's 0.72 Value Does Not Require Terribly Unrealistic Assumptions") + 
  my_theme() + 
  scale_y_continuous(labels = scales::percent) + 
  geom_hline(yintercept = 0.72, linetype = "dotdash", color = "gray60", linewidth = 1)

That’s more reasonable.

References

Clark, G. (2023). The inheritance of social status: England, 1600 to 2022. Proceedings of the National Academy of Sciences, 120(27), e2300926120. https://doi.org/10.1073/pnas.2300926120

Hyytinen, A., Ilmakunnas, P., Johansson, E., & Toivanen, O. (2019). Heritability of lifetime earnings. The Journal of Economic Inequality, 17(3), 319–335. https://doi.org/10.1007/s10888-019-09413-x

Resource: Table 2 in this PDF from James Lee, https://jamesjlee.altervista.org/wp-content/uploads/2019/03/genetics_notes.pdf, https://web.archive.org/web/20230627024910/https://jamesjlee.altervista.org/wp-content/uploads/2019/03/genetics_notes.pdf

sessionInfo()

## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scales_1.2.1  ggplot2_3.4.1 pacman_0.5.1 
## 
## loaded via a namespace (and not attached):
##  [1] highr_0.10       bslib_0.4.2      compiler_4.2.2   pillar_1.8.1    
##  [5] jquerylib_0.1.4  tools_4.2.2      digest_0.6.31    jsonlite_1.8.4  
##  [9] evaluate_0.20    lifecycle_1.0.3  tibble_3.1.8     gtable_0.3.1    
## [13] pkgconfig_2.0.3  rlang_1.0.6      cli_3.6.0        rstudioapi_0.14 
## [17] yaml_2.3.7       xfun_0.37        fastmap_1.1.0    withr_2.5.0     
## [21] dplyr_1.1.0      knitr_1.42       generics_0.1.3   vctrs_0.5.2     
## [25] sass_0.4.5       grid_4.2.2       tidyselect_1.2.0 glue_1.6.2      
## [29] R6_2.5.1         fansi_1.0.4      rmarkdown_2.20   farver_2.1.1    
## [33] magrittr_2.0.3   htmltools_0.5.4  colorspace_2.1-0 labeling_0.4.2  
## [37] utf8_1.2.3       munsell_0.5.0    cachem_1.0.6

Heritability and Assortative Mating

Setup

Rationale

Code

References