Some people have worried about whether it is possible that the average sibling difference in IQ is 12 is really possible. It is.

Init

#load libraries
library(MASS)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## x dplyr::select() masks MASS::select()
#set default theme
theme_set(theme_bw())

Data

Simulate data for illustration. We sample from multivariate normal distribution with correlations that vary. Then we rescale data to IQ metric of 100/15 (mean/SD).

#simulate datasets
x = map_df(seq(0, 1, .01), function(r) {
  #simulate sibling scores at correlation r
  mvrnorm(n = 100e3, 
          Sigma = matrix(c(1, r, r, 1), nrow = 2), 
          mu = c(0, 0)
          ) %>% 
    as_tibble() %>% 
    mutate(
      r = r,
    )
}) %>% 
  #change the scale to IQ norms
  mutate(
    V1 = V1 * 15 + 100,
    V2 = V2 * 15 + 100,
    AD = abs(V1 - V2)
  )
## Warning: `as_tibble.matrix()` requires a matrix with column names or a `.name_repair` argument. Using compatibility `.name_repair`.
## This warning is displayed once per session.
#summary stats
x2 = x %>% 
  group_by(r) %>% 
  summarise(
    MAD_empirical = mean(abs(V1 - V2))
  )

#theory expected difference
#Jensen, Bias in Mental Testing, p. 459
x2$MAD_theory = 2*15*sqrt(1 - x2$r) / sqrt(pi)

Plot results

First we plot examples of distributions by correlation between siblings (or any other pairs). Then we plot mean absolute difference across all simulated pair correlations, along with the theoretically expected values.

#difference distributions by correlation
x %>% 
  mutate(r = factor(r)) %>% 
  filter(r %in% c(0, .25, .5, .75)) %>%
  ggplot(aes(AD)) +
  geom_histogram(center = .5, binwidth = 1) +
  #vertical lines for the mean
  geom_vline(data = x2 %>% filter(r %in% c(0, .25, .5, .75)), aes(xintercept = MAD_empirical), color = "red") +
  facet_wrap("r") +
  xlab("Sibling absolute difference") +
  ggtitle("Distribution of absolute difference by sibling correlation strength",
          subtitle = "Red line = mean")

#theory vs. empirical results
x2 %>% 
  gather(key = metric, value = MAD, MAD_empirical, MAD_theory) %>% 
  mutate(metric = str_replace(metric, "MAD_", "")) %>% 
  ggplot(aes(r, MAD, color = metric)) +
  geom_line() +
  scale_color_discrete("Data") +
  xlab("Sibling correlation") +
  ylab("Sibling mean absolute difference") +
  ggtitle("Theory predicted and simulated mean absolute difference by sibling correlation")

Data table

x2

Voila! It works as intended. IQ science can live another day.