library(pacman)
p_load(kirkegaard, dplyr)
Assume fertility follows a Poisson distribution. Then generate populations with varying total fertility rate (TFR) according to the range that currently exists between nations. For each country, calculate the mean number of siblings (MNS).
First we make a function that generates offspring and their number of siblings from the number of children a woman has.
#make sibs function
make_sibs = function(x) {
#inner function
make_sibs_inner = function(x) {
#if 0
if (x == 0) return(NULL)
#siblings
return(rep(x - 1, times = x))
}
#make sibs for entire vector, join
map(x, make_sibs_inner) %>% unlist
}
#test function
make_sibs(rep(2, 10))
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
make_sibs(0:4)
## [1] 0 1 1 2 2 2 3 3 3 3
The function works as expected.
Next, we make a wrapper function to get the mean. We don’t want to save the intermediates to save memory.
#function to get mean number of siblings
mean_num_sib = function(x) {
#mean
mean(make_sibs(x))
}
#test
mean_num_sib(rep(2, 10))
## [1] 1
mean_num_sib(0:4)
## [1] 2
This function also works as expected.
So finally, we simulate data by country using the Poisson distribution, then convert to sibling data and find the mean.
#fertility data
#https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependencies_by_total_fertility_rate
#every country has the same population of 10k
countries = data_frame(
TFR = seq(1.2, 7.6, by = .01),
TFR_empirical = map(TFR, mean),
data = map(TFR, ~rpois(10000, .)),
MNS = map_dbl(data, mean_num_sib)
)
What does the relationship between MNS and TFR look like at the population level?
GG_scatter(countries, "TFR", "MNS", case_names = F)
So, they apparently balance out so that MNS ≈ TFR. That was unexpected. Did we do the simulation right? Let’s look in detail at a single one.
#deterministic simulation
set.seed(5)
#get some children
(ex_children_per_woman = rpois(20, 2))
## [1] 1 3 4 1 0 3 2 3 5 0 1 2 1 2 1 1 1 4 2 3
#convert these to sibling data
(ex_siblings = make_sibs(ex_children_per_woman))
## [1] 0 2 2 2 3 3 3 3 0 2 2 2 1 1 2 2 2 4 4 4 4 4 0 1 1 0 1 1 0 0 0 3 3 3 3
## [36] 1 1 2 2 2
#how many siblings?
length(ex_siblings)
## [1] 40
#means
mean(ex_children_per_woman)
## [1] 2
mean(ex_siblings)
## [1] 1.9
It checks out. There must be some deep math reason here I don’t get.