Some people have worried about whether it is possible that the average sibling difference in IQ is 12 is really possible. It is.
#load libraries
library(MASS)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.3
## ✓ tidyr 1.0.0 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## x dplyr::select() masks MASS::select()
#set default theme
theme_set(theme_bw())
Simulate data for illustration. We sample from multivariate normal distribution with correlations that vary. Then we rescale data to IQ metric of 100/15 (mean/SD).
#simulate datasets
x = map_df(seq(0, 1, .01), function(r) {
#simulate sibling scores at correlation r
mvrnorm(n = 100e3,
Sigma = matrix(c(1, r, r, 1), nrow = 2),
mu = c(0, 0)
) %>%
as_tibble() %>%
mutate(
r = r,
)
}) %>%
#change the scale to IQ norms
mutate(
V1 = V1 * 15 + 100,
V2 = V2 * 15 + 100,
AD = abs(V1 - V2)
)
## Warning: `as_tibble.matrix()` requires a matrix with column names or a `.name_repair` argument. Using compatibility `.name_repair`.
## This warning is displayed once per session.
#summary stats
x2 = x %>%
group_by(r) %>%
summarise(
MAD_empirical = mean(abs(V1 - V2))
)
#theory expected difference
#Jensen, Bias in Mental Testing, p. 459
x2$MAD_theory = 2*15*sqrt(1 - x2$r) / sqrt(pi)
First we plot examples of distributions by correlation between siblings (or any other pairs). Then we plot mean absolute difference across all simulated pair correlations, along with the theoretically expected values.
#difference distributions by correlation
x %>%
mutate(r = factor(r)) %>%
filter(r %in% c(0, .25, .5, .75)) %>%
ggplot(aes(AD)) +
geom_histogram(center = .5, binwidth = 1) +
#vertical lines for the mean
geom_vline(data = x2 %>% filter(r %in% c(0, .25, .5, .75)), aes(xintercept = MAD_empirical), color = "red") +
facet_wrap("r") +
xlab("Sibling absolute difference") +
ggtitle("Distribution of absolute difference by sibling correlation strength",
subtitle = "Red line = mean")
#theory vs. empirical results
x2 %>%
gather(key = metric, value = MAD, MAD_empirical, MAD_theory) %>%
mutate(metric = str_replace(metric, "MAD_", "")) %>%
ggplot(aes(r, MAD, color = metric)) +
geom_line() +
scale_color_discrete("Data") +
xlab("Sibling correlation") +
ylab("Sibling mean absolute difference") +
ggtitle("Theory predicted and simulated mean absolute difference by sibling correlation")
x2
Voila! It works as intended. IQ science can live another day.