Init

Load stuff.

options(digits = 2)
library(pacman)
p_load(kirkegaard)

Simiulate data

Similuate 3 physical brain causes of intelligence, each of equal importance.

set.seed(1)
brain_data = data_frame(
  size = rnorm(1000),
  speed = rnorm(1000),
  efficiency = rnorm(1000),
  intelligence = (size + speed + efficiency) %>% standardize() %>% multiply_by(15) %>% add(100)
)

#stats
describe(brain_data) %>% print()

##              vars    n   mean sd median trimmed   mad  min   max range
## size            1 1000  -0.01  1  -0.04   -0.01  1.03 -3.0   3.8   6.8
## speed           2 1000  -0.02  1  -0.03   -0.01  1.08 -3.2   3.6   6.9
## efficiency      3 1000   0.02  1  -0.01    0.02  0.99 -3.5   2.9   6.4
## intelligence    4 1000 100.00 15 100.19  100.05 15.06 54.7 152.0  97.3
##               skew kurtosis   se
## size         -0.02    -0.01 0.03
## speed         0.01    -0.02 0.03
## efficiency   -0.08    -0.08 0.03
## intelligence -0.04     0.06 0.47

#cors
wtd.cors(brain_data)

##                size  speed efficiency intelligence
## size         1.0000 0.0064      0.049         0.59
## speed        0.0064 1.0000      0.023         0.58
## efficiency   0.0494 0.0225      1.000         0.60
## intelligence 0.5939 0.5816      0.601         1.00

#model
(model1 = lm(intelligence ~ size + speed + efficiency, data = brain_data))

## 
## Call:
## lm(formula = intelligence ~ size + speed + efficiency, data = brain_data)
## 
## Coefficients:
## (Intercept)         size        speed   efficiency  
##      100.10         8.16         8.16         8.16

#plot
GG_denhist(brain_data, "intelligence")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Selection

We employ a simple truncated selection scenario. Evolution kills off the bottom 50% of the population, and breeds a new population from the remainder, the same size as the original.

#select top 50% cases
#simple truncated selection
brain_data_parents = brain_data %>% arrange(desc(intelligence)) %>% .[1:500, ]

#stats
describe(brain_data_parents) %>% print()

##              vars   n   mean   sd median trimmed  mad   min   max range
## size            1 500   0.47 0.89   0.45    0.46 0.89  -2.2   3.8   6.0
## speed           2 500   0.49 0.89   0.45    0.47 0.90  -2.3   3.6   6.0
## efficiency      3 500   0.49 0.90   0.50    0.49 1.00  -1.9   2.9   4.8
## intelligence    4 500 111.96 8.87 110.23  110.93 8.31 100.2 152.0  51.8
##              skew kurtosis   se
## size         0.16     0.02 0.04
## speed        0.19     0.11 0.04
## efficiency   0.00    -0.45 0.04
## intelligence 1.12     1.28 0.40

As can be seen, the selection caused the mean IQ to increase to about 112, or 0.80 d. The three causes only increased about 0.50 d. The value comes directly from their correlation with intelligence, i.e. 0.80 * 0.60 = 0.48.

We breed the new generation, assuming an complete additive model. We do so to avoid dealing with complications from regression, nonadditive and nongenetic causes. We also ignore the reduced variance effect, which would normally be compensated for by variance arising from recombination (which we don’t simulate).

#sample new population
brain_data_gen2 = data_frame(
  size = rnorm(1000, mean = mean(brain_data_parents$size)),
  speed = rnorm(1000, mean = mean(brain_data_parents$speed)),
  efficiency = rnorm(1000, mean = mean(brain_data_parents$efficiency)),
  intelligence = 8.16 * size + 8.16 * speed + 8.16 * efficiency + 100
)

#stats
describe(brain_data_gen2) %>% print()

##              vars    n   mean    sd median trimmed   mad  min   max range
## size            1 1000   0.49  1.04   0.49    0.49  1.04 -2.7   3.5   6.3
## speed           2 1000   0.47  0.99   0.49    0.47  0.96 -3.2   3.2   6.4
## efficiency      3 1000   0.48  0.98   0.45    0.48  0.94 -2.7   3.4   6.1
## intelligence    4 1000 111.74 13.56 111.86  111.72 14.25 56.5 158.3 101.8
##               skew kurtosis   se
## size         -0.06    -0.03 0.03
## speed        -0.05    -0.12 0.03
## efficiency   -0.01    -0.08 0.03
## intelligence  0.00     0.05 0.43

#cors
wtd.cors(brain_data_gen2)

##                 size   speed efficiency intelligence
## size          1.0000  0.0017     -0.089         0.57
## speed         0.0017  1.0000     -0.039         0.57
## efficiency   -0.0894 -0.0388      1.000         0.51
## intelligence  0.5733  0.5735      0.510         1.00

#plot
GG_denhist(brain_data_gen2, "intelligence")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

For good measure, let’s do a third generation.

#parents
brain_data_gen2_parents = brain_data_gen2 %>% arrange(desc(intelligence)) %>% .[1:500, ]

#stats
describe(brain_data_gen2_parents) %>% print()

##              vars   n   mean   sd median trimmed  mad   min   max range
## size            1 500   0.97 0.92   0.97    0.97 0.91  -1.9   3.5   5.4
## speed           2 500   0.92 0.88   0.94    0.93 0.83  -1.4   3.2   4.6
## efficiency      3 500   0.88 0.90   0.90    0.88 0.95  -2.0   3.4   5.3
## intelligence    4 500 122.61 8.07 121.08  121.71 7.67 111.9 158.3  46.4
##               skew kurtosis   se
## size         -0.03    -0.27 0.04
## speed        -0.04    -0.19 0.04
## efficiency   -0.09    -0.26 0.04
## intelligence  1.08     1.24 0.36

#sample new population
brain_data_gen3 = data_frame(
  size = rnorm(1000, mean = mean(brain_data_gen2_parents$size)),
  speed = rnorm(1000, mean = mean(brain_data_gen2_parents$speed)),
  efficiency = rnorm(1000, mean = mean(brain_data_gen2_parents$efficiency)),
  intelligence = 8.16 * size + 8.16 * speed + 8.16 * efficiency + 100
)

#stats
describe(brain_data_gen3) %>% print()

##              vars    n   mean sd median trimmed   mad  min   max range
## size            1 1000   0.95  1   0.95    0.95  1.03 -2.1   4.6   6.7
## speed           2 1000   0.92  1   0.93    0.93  0.99 -2.2   3.9   6.1
## efficiency      3 1000   0.86  1   0.85    0.86  0.94 -2.6   4.2   6.8
## intelligence    4 1000 122.23 15 122.16  122.41 14.86 68.8 163.9  95.1
##               skew kurtosis   se
## size          0.08    -0.07 0.03
## speed        -0.07    -0.18 0.03
## efficiency   -0.02     0.21 0.03
## intelligence -0.12    -0.05 0.46

#cors
wtd.cors(brain_data_gen3)

##                 size   speed efficiency intelligence
## size          1.0000 -0.0049     0.0053         0.58
## speed        -0.0049  1.0000     0.0436         0.58
## efficiency    0.0053  0.0436     1.0000         0.59
## intelligence  0.5843  0.5844     0.5883         1.00

#plot
GG_denhist(brain_data_gen3, "intelligence")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Finally, let’s plot the three populations together to see the evolution over time.

#bind rows with ids
brain_data_all = bind_rows(list(gen1 = brain_data, gen2 = brain_data_gen2, gen3 = brain_data_gen3), .id = "generation")

#plot together
GG_denhist(brain_data_all, "intelligence", "generation")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

GG_denhist(brain_data_all, "size", "generation")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#in the last population, there are still negative outliers for brain size and intelligence
GG_scatter(brain_data_gen3, "size", "intelligence") +
  geom_label(aes(x = -2.1, y = 140, label = "strong negative outlier"), nudge_x = .5)

Here we see the final point: that despite 2 generations of strong selection on the trait of interest, there are still negative outliers for brain size that have high intelligence. In this particular dataset, there is a case with -1.32 z brain size, which is about 2 d below the mean. Yet the IQ of the case is 124, 2 points above the average of the third population. How? Well, it has good z scores in the two other physical traits: speed = 3.31 z and efficiency 0.90 z.

Understanding selection for complex traits

Simulation example of intelligence evolution via brain size, speed and efficiency

Emil O. W. Kirkegaard

Init

Simiulate data

Selection