Load stuff.
options(digits = 2)
library(pacman)
p_load(kirkegaard)
Similuate 3 physical brain causes of intelligence, each of equal importance.
set.seed(1)
brain_data = data_frame(
size = rnorm(1000),
speed = rnorm(1000),
efficiency = rnorm(1000),
intelligence = (size + speed + efficiency) %>% standardize() %>% multiply_by(15) %>% add(100)
)
#stats
describe(brain_data) %>% print()
## vars n mean sd median trimmed mad min max range
## size 1 1000 -0.01 1 -0.04 -0.01 1.03 -3.0 3.8 6.8
## speed 2 1000 -0.02 1 -0.03 -0.01 1.08 -3.2 3.6 6.9
## efficiency 3 1000 0.02 1 -0.01 0.02 0.99 -3.5 2.9 6.4
## intelligence 4 1000 100.00 15 100.19 100.05 15.06 54.7 152.0 97.3
## skew kurtosis se
## size -0.02 -0.01 0.03
## speed 0.01 -0.02 0.03
## efficiency -0.08 -0.08 0.03
## intelligence -0.04 0.06 0.47
#cors
wtd.cors(brain_data)
## size speed efficiency intelligence
## size 1.0000 0.0064 0.049 0.59
## speed 0.0064 1.0000 0.023 0.58
## efficiency 0.0494 0.0225 1.000 0.60
## intelligence 0.5939 0.5816 0.601 1.00
#model
(model1 = lm(intelligence ~ size + speed + efficiency, data = brain_data))
##
## Call:
## lm(formula = intelligence ~ size + speed + efficiency, data = brain_data)
##
## Coefficients:
## (Intercept) size speed efficiency
## 100.10 8.16 8.16 8.16
#plot
GG_denhist(brain_data, "intelligence")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
We employ a simple truncated selection scenario. Evolution kills off the bottom 50% of the population, and breeds a new population from the remainder, the same size as the original.
#select top 50% cases
#simple truncated selection
brain_data_parents = brain_data %>% arrange(desc(intelligence)) %>% .[1:500, ]
#stats
describe(brain_data_parents) %>% print()
## vars n mean sd median trimmed mad min max range
## size 1 500 0.47 0.89 0.45 0.46 0.89 -2.2 3.8 6.0
## speed 2 500 0.49 0.89 0.45 0.47 0.90 -2.3 3.6 6.0
## efficiency 3 500 0.49 0.90 0.50 0.49 1.00 -1.9 2.9 4.8
## intelligence 4 500 111.96 8.87 110.23 110.93 8.31 100.2 152.0 51.8
## skew kurtosis se
## size 0.16 0.02 0.04
## speed 0.19 0.11 0.04
## efficiency 0.00 -0.45 0.04
## intelligence 1.12 1.28 0.40
As can be seen, the selection caused the mean IQ to increase to about 112, or 0.80 d. The three causes only increased about 0.50 d. The value comes directly from their correlation with intelligence, i.e. 0.80 * 0.60 = 0.48.
We breed the new generation, assuming an complete additive model. We do so to avoid dealing with complications from regression, nonadditive and nongenetic causes. We also ignore the reduced variance effect, which would normally be compensated for by variance arising from recombination (which we don’t simulate).
#sample new population
brain_data_gen2 = data_frame(
size = rnorm(1000, mean = mean(brain_data_parents$size)),
speed = rnorm(1000, mean = mean(brain_data_parents$speed)),
efficiency = rnorm(1000, mean = mean(brain_data_parents$efficiency)),
intelligence = 8.16 * size + 8.16 * speed + 8.16 * efficiency + 100
)
#stats
describe(brain_data_gen2) %>% print()
## vars n mean sd median trimmed mad min max range
## size 1 1000 0.49 1.04 0.49 0.49 1.04 -2.7 3.5 6.3
## speed 2 1000 0.47 0.99 0.49 0.47 0.96 -3.2 3.2 6.4
## efficiency 3 1000 0.48 0.98 0.45 0.48 0.94 -2.7 3.4 6.1
## intelligence 4 1000 111.74 13.56 111.86 111.72 14.25 56.5 158.3 101.8
## skew kurtosis se
## size -0.06 -0.03 0.03
## speed -0.05 -0.12 0.03
## efficiency -0.01 -0.08 0.03
## intelligence 0.00 0.05 0.43
#cors
wtd.cors(brain_data_gen2)
## size speed efficiency intelligence
## size 1.0000 0.0017 -0.089 0.57
## speed 0.0017 1.0000 -0.039 0.57
## efficiency -0.0894 -0.0388 1.000 0.51
## intelligence 0.5733 0.5735 0.510 1.00
#plot
GG_denhist(brain_data_gen2, "intelligence")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
For good measure, let’s do a third generation.
#parents
brain_data_gen2_parents = brain_data_gen2 %>% arrange(desc(intelligence)) %>% .[1:500, ]
#stats
describe(brain_data_gen2_parents) %>% print()
## vars n mean sd median trimmed mad min max range
## size 1 500 0.97 0.92 0.97 0.97 0.91 -1.9 3.5 5.4
## speed 2 500 0.92 0.88 0.94 0.93 0.83 -1.4 3.2 4.6
## efficiency 3 500 0.88 0.90 0.90 0.88 0.95 -2.0 3.4 5.3
## intelligence 4 500 122.61 8.07 121.08 121.71 7.67 111.9 158.3 46.4
## skew kurtosis se
## size -0.03 -0.27 0.04
## speed -0.04 -0.19 0.04
## efficiency -0.09 -0.26 0.04
## intelligence 1.08 1.24 0.36
#sample new population
brain_data_gen3 = data_frame(
size = rnorm(1000, mean = mean(brain_data_gen2_parents$size)),
speed = rnorm(1000, mean = mean(brain_data_gen2_parents$speed)),
efficiency = rnorm(1000, mean = mean(brain_data_gen2_parents$efficiency)),
intelligence = 8.16 * size + 8.16 * speed + 8.16 * efficiency + 100
)
#stats
describe(brain_data_gen3) %>% print()
## vars n mean sd median trimmed mad min max range
## size 1 1000 0.95 1 0.95 0.95 1.03 -2.1 4.6 6.7
## speed 2 1000 0.92 1 0.93 0.93 0.99 -2.2 3.9 6.1
## efficiency 3 1000 0.86 1 0.85 0.86 0.94 -2.6 4.2 6.8
## intelligence 4 1000 122.23 15 122.16 122.41 14.86 68.8 163.9 95.1
## skew kurtosis se
## size 0.08 -0.07 0.03
## speed -0.07 -0.18 0.03
## efficiency -0.02 0.21 0.03
## intelligence -0.12 -0.05 0.46
#cors
wtd.cors(brain_data_gen3)
## size speed efficiency intelligence
## size 1.0000 -0.0049 0.0053 0.58
## speed -0.0049 1.0000 0.0436 0.58
## efficiency 0.0053 0.0436 1.0000 0.59
## intelligence 0.5843 0.5844 0.5883 1.00
#plot
GG_denhist(brain_data_gen3, "intelligence")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Finally, let’s plot the three populations together to see the evolution over time.
#bind rows with ids
brain_data_all = bind_rows(list(gen1 = brain_data, gen2 = brain_data_gen2, gen3 = brain_data_gen3), .id = "generation")
#plot together
GG_denhist(brain_data_all, "intelligence", "generation")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
GG_denhist(brain_data_all, "size", "generation")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#in the last population, there are still negative outliers for brain size and intelligence
GG_scatter(brain_data_gen3, "size", "intelligence") +
geom_label(aes(x = -2.1, y = 140, label = "strong negative outlier"), nudge_x = .5)
Here we see the final point: that despite 2 generations of strong selection on the trait of interest, there are still negative outliers for brain size that have high intelligence. In this particular dataset, there is a case with -1.32 z brain size, which is about 2 d below the mean. Yet the IQ of the case is 124, 2 points above the average of the third population. How? Well, it has good z scores in the two other physical traits: speed = 3.31 z and efficiency 0.90 z.