Lecture 4_Simple_Branching

Motivation

We’ll now turn to a popular brand of stochastic processes, used frequently to model all sorts of real world phenomena (population growth, disease spread, etc.). These seemingly simply processes can quickly expand into nuanced complexities and pursuing their underlying structures, while sometimes challenging, is ultimately rewarding.

Branching processes are a class of stochastic processes that model the growth of populations. They are widely used in biology and epidemiology to study the spread of infectious diseases and epidemics. Applications include nuclear chain reactions and the spread of computer software viruses. Their original motivation was to study the extinction of family surnames, an issue of concern to the Victorian aristocracy in 19th. century Britain.

Historical Introduction

In 1873, the British statistician Sir Francois Galton posed the following question in the Educational Times.

Problem 4001: A large nation, of whom we will only concern ourselves with adult males, $N$ in number, and who each bear separate surnames colonize a district. Their law of population is such that, in each generation, $a_0$ percent of the adult males have no male children who reach adult life; $a_1$ have one such male child; $a_2$ have two; and so on up to $a_5$ who have five .Find (1) what proportion of their surnames will have become extinct after $r$ generations; and (2) how many instances there will be of the surname being held by $m$ persons.

The Reverend Henry William Watson replied with a solution. The study of branching processes grew out of Watson and Galton’s collaboration. Their results were independently discovered by the French statistician lrenee-Jules Bienayme. The basic branching process model is sometimes called a Bienayme-Galton-Watson process.

Malthus *Essay on the Principle of Population”. 379 out of 487 bourgeois families in the city of Berne died out between 1583 and 1783. we have 379/487 equals approx. 75%!

Simple Branching Processes

These processes are aptly named: they model the ‘branching out’ of some sort of population (for our purposes, let’s think of cells). Each cell lives for one time point and, at the end of the time point, either reproduces by creating some number of identical offspring (identical to the original cell) or doesn’t reproduce at all (and simply dies). This process continues independently with each of the remaining cells; if all the cells die out without reproducing, nothing else happens (the cells have gone extinct!).

We can formalize this with some notation. Let $X_t$ count the number of living cells in generation $t$. We define $X_0 = 1$, which is essentially saying that there is one cell to start. The cell then has some ‘generation distribution’ which we will refer to as the offspring distribution: \[ \{ p_0, p_1, p_2, . . . \}, \] where $p_i$ is the probability that the cell has exactly $i$ offspring. Note that $\sum_{i=0}^\infty p_i=1$, so that we have a valid mass function.

Every cell that is created after the original cell has this identical generation distribution; also, the reproductions of every cell is independent of other cells. The population goes extinct as soon as, well, the population is of size zero! The extinction probability, which we will call $p_e$, of the process is the probability that the process ever goes extinct.

Note that branching process are Markov; if we know how many cells are alive at time $t-1$ (which is $X_{t-1}$), then no extra information from the history of the process matters in determining the distribution of $X_t$. This is intuitive; we only care how many cells are alive in the period before (and thus how many will create reproductions), we don’t care how we got to that point!

Let’s start with some simple examples of branching processes.

$p_0 = 1$ (and $p_i = 0$ for $i\ge 1$). This is the most boring case: the original cell dies out and doesn’t reproduce at all (because the probability of producing zero offspring is one). The population is already extinct by $t = 1$.
$p_0 = 1/ 100$ and $p_1 = 99/ 100$ (and $p_i = 0 $for $i \ge 2$). Here, the cell either produces a single copy (with probability .99) or doesn’t reproduce at all (with probability .01). While the population might get lucky and survive for a while (and, indeed, over any one period, a living population will probability survive to the next period), the probability of surviving, say, to generation $n$ is $\left(\frac{99}{100}\right)^n$ (since the population would need to reproduce successfully, with probability 99/100, for $n$ generations). This goes to 0 as $n$ gets large, so the probability of extinction for this population is 1.

We can simulate multiple paths of this process using a $Geom(1/ 100)$ random variable (this counts the number of failures until a success in independent trials, where a success has probability 1/ 100 in every trial; of course, in this case, the ‘failures’ are the cell reproducing and the ‘success’ is the cell not reproducing). Even though we get some large values (843 is our max here), all of the paths fail eventually (of course, simulating 10,000 paths isn’t a proof that this population will go extinct, it just helps with our intuition).

set.seed(0)
x <- rgeom(10000, .01)
max(x)

## [1] 843

$p_0 = 1/ 4$, $p_1 = 1/ 2$, and $p_2 = 1/ 4$ (and $p_i = 0$ for $i \ge 3$). This is our most complicated process so far: either the cell reproduces a single copy with probability 1/ 2, dies out without producing a copy with probability 1/ 4 or creates two copies with probability 1/ 4.

library("data.table")
library("ggplot2")

It is less clear how this population will fare in the long run. We can try simulating 500 different paths for 50 periods. Most of the paths collapse to zero (extinction), but plenty of populations are alive and well after 50 periods; one has more than 40 members!

set.seed(0)
nsims <- 500
index <- 50
# function to generate paths
FUN_branch <- function(index) {
 population <- 1
 for (i in 2:index) {
 population <- c(population, sum(sample(0:2, 
 population[length(population)],
 replace = TRUE,
prob = c(1 / 4, 1 / 2, 1 / 4))))
 }
 population
}
# generate data and visualiz
data <- data.table(t = 1:index, 
 replicate(FUN_branch(index), n = nsims))
data <- melt(data, id.vars = "t")
ggplot(data, aes(x = t, y = value, col = variable)) +
 geom_line(alpha = 1 / 4) +
 theme_bw() +
 theme(legend.position = "none") +
 ggtitle("Simple Branching Process") +
 xlab("t") + ylab("size of population")

At this point you might want to watch the YouTube video available separately in this folder.

We’re going to develop a better framework for calculating the extinction probability of these processes. However, to do that, we need to learn about a new statistical tool - probability generation function.

Exercise 4 Consider a branching process with initial value $X_0=1$, and offspring distribution \[ p_0=\frac{4}{7}, \qquad p_1=\frac{2}{7}, \qquad p_{2}=\frac{1}{7}. \] Calculate

$P(X_1=0)$.
$P(X_1=2)$.
$P(X_2=2)$.

Lecture 4_Simple_Branching_Processes

George Yanev

2022-09-15

Motivation

Historical Introduction

Simple Branching Processes