Let \(\mu = E(X_0)\) be the mean of the offspring distribution and denote the generation size mean by \(\mu_n=E(X_n)\). To find the mean of the size of the \(n^{th}\) generation \(E(X_n)\), condition on }$. FRom Lecture 5 we have the following recurrence for the PGF of the process: \[ \Pi_{X_n}(s)=\Pi_{X_0}\left(\Pi_{X_{n-1}}(s)\right). \] Differentiating w.r.t.\(s\), using the chain rule, we have: \[ \Pi'_{X_n}(s)=\Pi'_{X_0}\left(\Pi_{X_{n-1}}(s)\right)\Pi'(s), \] and setting \(s=1\) \[ \Pi'_{X_n}(1)=\Pi'_{X_0}\left(\Pi_{X_{n-1}}(1)\right)\Pi'(1)=\Pi'_{X_0}(1)\Pi'(1). \] That is \[ E(X_n)=\mu E(X_{n-1}). \] This, in turn, leads to: \[ E(X_n)=\mu E(X_{n-1})=\mu^2 E(X_{n-2})=\ldots =\mu^{n-1}E(X_0)=\mu^n\qquad n\ge 0. \]
We can use the above result \(E(X_n)=\mu^n\) to find the expected value of the total progeny until time \(t\) (i.e., all individuals who have ever lived up to time \(t\) ). Let’s define this as \(T_t\). By definition, we have that: \[ T_t=X_0+X_1+\ldots +X_t, \] or the size of all of the generations up through time \(t\). Taking the expectation of both sides (and using linearity of expectation, and remembering that by definition) \(X_0=1\): \[ E(T_t)=E(X_0+X_1+\ldots +X_t)=E(X_0)+E(X_1)+\ldots +E(X_t)\\ = 1+\mu +\mu^2+\ldots + \mu^t. \] Voila! This is relatively easy to calculate and allows us to think about the long-term total population size for different populations. If we have \(\mu>1\), again, the series \(1+\mu+\mu^2+\ldots +\mu^t\) clearly goes to \(\infty\) as \(t\) goes to infinity (the expected total population size gets very large as time passes). If we have \(\mu=1\), then \(E(T_t)=1\), which makes sense; if we expect just one cell to be reproduced each time, then, after time \(t\), we expect \(t\) total cells! Finally, if \(\mu<1\), we have that \(E(T_t)\) converges (it’s just a geometric series!) to: \[ =\frac{1}{1-\mu}. \] Larger values of \(\mu\) will thus make the expected total population size larger; for example, if \(\mu=0.99\), then \(E(T_t)=100\), and if \(\mu=0.01\), then \(E(T_t)\) is slightly larger than 1. This makes sense, because if the offspring distribution has a higher mean, we expect the total size of the population to be larger (note that in the \(\mu=0.01\) case, the expected value of the population is just slightly above 1, because there is a good chance that the first cell doesn’t reproduce at all!
For the long-term expected generation size, \[ \lim_{n\to \infty}E(X_n)=\lim_{n\to \infty}\mu^n= \left\{ \begin{array}{ll} 0, & \mbox{if} \quad \mu<1,\\ 1, & \mbox{if}\quad \mu=1, \\ \infty, & \mbox{if}\quad \mu>1. \end{array} \right. \]
A branching process is said to be subcritical if \(\mu<1\), critical if \(\mu=1\), and supercritical if \(\mu>1\). For a subcritical branching process, mean generation size declines exponentially to zero. For a supercritical process, mean generation size exhibits long-term exponential growth. The limits suggest three possible regimes depending on \(µ\): long-term extinction, stability, and boundless growth. However, behavior of the mean generation size does not tell the whole story.
Insight into the evolution of a branching process is gained by simulation. We simulated 10 generations \(X_0, X_1, \ldots X_{10}\) of a branching process with Poisson offspring distribution, choosing three values for the Poisson mean parameter corresponding to three types of branching process: \(\mu = 0.75\) (subcritical), \(\mu = 1\) (critical), and \(\mu = 1.5\) (supercritical). Each process was simulated five times. See the R script file branching.R. Results are shown in Table 7.1.
#Poisson offspring distribution
branch <- function(n,lam) { ## Poisson
z <- c(1,rep(0,n))
for (i in 2:(n+1)) {
z[i] <- sum(rpois(z[i-1],lam))
}
return(z)
}
branch(10, 0.75)
## [1] 1 1 0 0 0 0 0 0 0 0 0
branch(10, 1)
## [1] 1 0 0 0 0 0 0 0 0 0 0
branch(10, 1.5)
## [1] 1 3 6 7 8 12 17 24 37 57 106
For \(\mu = 0.75\), all simulated paths result in eventual extinction. Furthermore, the extinction occurs fairly rapidly. When \(\mu= 1\), all but one of the simulations in Table 4.1 become extinct by the \(10^{th}\) generation. Indeed, for a general branching process, in the subcritical and critical cases \((\mu \le 1)\), the population becomes extinct with probability l.
In the supercritical case \(\mu=1.5\), most simulations in Table~7.1 seem to grow without bound. However, one realization goes extinct. We will see that in the general supercritical case, the probability that the population eventually dies out is less than one, but typically greater than zero.
Table 7.1 Simulations of a branching process for three choices of the offspring mean \(\mu\).
| Mean | \(X_0\) | \(X_1\) | \(X_2\) | \(X_3\) | \(X_4\) | \(X_5\) | \(X_6\) | \(X_7\) | \(X_8\) | \(X_9\) | \(X_{10}\) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.75 | 1 | 2 | 3 | 3 | 3 | 1 | 3 | 0 | 0 | 0 | 0 |
| 0.75 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.75 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.75 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 0.75 | 1 | 3 | 3 | 1 | 3 | 1 | 0 | 0 | 0 | 0 | |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 3 | 6 | 6 | 5 | 6 | 7 | 8 | 8 | 8 | 6 | 5 |
| 1 | 1 | 2 | 1 | 1 | 2 | 1 | 2 | 1 | 0 | 0 | 0 |
| 1 | 1 | 3 | 4 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1.5 | 1 | 2 | 3 | 10 | 22 | 41 | 93 | 173 | 375 | 763 | 1597 |
| 1.5 | 1 | 1 | 1 | 1 | 2 | 4 | 7 | 9 | 11 | 19 | 29 |
| 1.5 | 1 | 4 | 5 | 18 | 34 | 68 | 127 | 246 | 521 | 1011 | 2065 |
| 1.5 | 1 | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1.5 | 1 | 2 | 5 | 3 | 2 | 6 | 9 | 17 | 18 | 13 | 19 |
Assume that \(X_0, X_1, \ldots\) is a subcritical branching process. Let \(E_n=\{X_n=0\}\) be the event that the population is extinct by generation \(n\), for \(n\ge 1\). Let \(E\) be the event that the population is ultimately extinct. Then, \[ E=\{X_n=0, \quad \mbox{for some} \quad n\ge 1\}= \bigcup_{n=1}^\infty E_n, \] and \(E_1\subseteq E_2 \subseteq \ldots\).It follows that the probability that the population eventually goes extinct is \[ P(E)=P\left(\bigcup_{n=1}^\infty E_n\right)=\lim_{n\to \infty} P(E_n)=\lim_{n\to \infty}P(X_n=0). \] The probability that the population is extinct by generation \(n\) satisfies the inequality \[ P(X_n=0)=1-P(X_n\ge 1)=1-\sum_{k=1}^\infty P(X_n=k) \ge 1-\sum_{k=1}^\infty kP(X_n=k) \\ =1-E(X_n) =1-\mu^n. \]
Taking limits gives \[ P(E)=\lim_{n\to \infty} P(X_n=0)\ge \lim_{n\to \infty}1-\mu^n=1, \] since \(\mu<1\). Thus, \(P(E)=1\). With probability 1, a subcritical branching process eventually goes extinct.
Example 7.1 Subcritical branching processes have been used to model the spread of infections and diseases in highly vaccinated populations. Farrington and Grant (1999) cite several examples, including the spread of measles and mumps, the outbreak of typhoidal salmonellae reported in Scotland in 1967-1990, and outbreaks of human monkeypox virus in past decades. Becker (1974) finds evidence of subcriticality in European smallpox data from 1950 to 1970.
Often the goal of these studies is to use data on observed outbreaks to estimate the unknown mean offspring parameter \(\mu\) as well as the number of generations of spread until extinction.
Theorem Given a branching process, let \(\Pi_{X_0}\) be the probability generating function of the offspring distribution. Then, the probability \(p_e\) of eventual extinction is the smallest positive root of the equation \(s=\Pi_{X_0}(s)\). If \(\mu\le 1\), that is, in the subcritical and critical cases, the extinction probability is equal to 1. If \(\mu>1\), then there is a positive probability of non-extinction.
Remark: We have already shown that in the subcritical \(\mu <1\) case, the population goes extinct with probability 1. The theorem gives that this is also true for \(\mu=1\), even though for each generation the expected generation size is \(E(X_n)=\mu^n=1\). For the supercritical case \(\mu>1\), the expected generation size \(E(X_n)\) grows without bound. However, the theorem gives that even in this case there is positive probability of eventual extinction.
Exercise 12 An annual plant produces \(N\) seeds in a season which are assumed to have a Poisson distribution with parameter \(\lambda\). Each seed has a probability \(p\) of germinating to create a new plant which propagates in the following year. Let \(M\) be the random variable of the number of new plants.Then \[ P(M=m)=\frac{(p\lambda)^m}{m!}e^{-p\lambda} \qquad m=0,1,\ldots \] Find the mean number of plants in year \(k\). Show that the extinction is certain if \(p\lambda\le 1\).
Before proving the theorem, we offer some examples of its use. Let, as before, \(p_e\) denote the probability of eventual extinction.
Example 7.2 Consider a branching process with offspring distribution \[ P(X_0=k)=\left(\frac{3}{4}\right)^k\cdot \frac{1}{4}, \qquad k\ge 0. \]
branch <- function(n,p) {
x <- c(1,rep(0,n))
for (i in 2:(n+1)) {
x[i] <- sum(rgeom(x[i-1],p))
}
return(x) }
branch(10,1/4)
## [1] 1 1 6 15 37 110 326 896 2670 8332 24744
trials <- 1000
simlist <- replicate(trials,branch(10,1/4)) [11]
sum(simlist==0)/trials
## [1] 0
The branching process is simulated 1000 times, keeping track of the number of times the process goes extinct by the 10th generation. The function branch \((n,p)\) simulates \(n\) steps of a branching process whose offspring distribution is geometric with parameter p. The replicate command repeats the simulation 1000 times, storing the outcome of $X_{10} for each trial in the vector simlist. The proportion of 0s in simlist estimates the extinction probability \(p_e\) . Our conclusions are slightly biased since we assume that if extinction takes place it will occur by time \(n=10\). Of course, extinction could occur later. However, it appears from the simulations that if extinction occurs it happens very rapidly.
The process is supercritical with \(\mu=3\). Results from simulations are collected in Table~7.2. Four of the 12 runs went extinct by time \(n=10\). The exact extinction probability is \(p_e=1/3\). (The fact that 4 out of 12 is exactly 1/3 is, we assure the reader, pure coincidence.)
Table 7.2 Simulation of a supercritical branching process with \(\mu=3\). Four of the 12 runs go extinct by the \(10^{th}\) generation.
| \(X_0\) | \(X_1\) | \(X_2\) | \(X_3\) | \(X_4\) | \(X_5\) | \(X_6\) | \(X_7\) | \(X_8\) | \(X_9\) | \(X_{10}\) |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 4 | 25 | 97 | 394 | 1160 | 3475 | 10685 | 31885 | 95757 | 287130 |
| 1 | 1 | 1 | 3 | 7 | 11 | 47 | 165 | 515 | 1525 | 4689 |
| 1 | 10 | 37 | 115 | 350 | 1124 | 3455 | 10073 | 29896 | 88863 | 267386 |
| 1 | 1 | 1 | 2 | 3 | 0 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 8 | 31 | 71 | 248 | 779 | 2282 | 6864 | 19895 | 59196 | 178171 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 4 | 7 | 13 | 34 | 106 | 380 | 1123 | 3385 | 10200 | 30090 |
| 1 | 16 | 49 | 163 | 447 | 1284 | 3794 | 11592 | 34626 | 104390 | 312704 |
| 1 | 1 | 1 | 5 | 16 | 51 | 155 | 559 | 1730 | 5378 | 15647 |
| 1 | 1 | 3 | 31 | 79 | 267 | 883 | 2637 | 8043 | 23970 | 71841 |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Example 7.3 (Lotka’s estimate of the extinction probability) One of the earliest applications of branching processes is contained in the work of Alfred Lotka, considered the father of demographic analysis, who estimated the probability that a male line of descent would ultimately become extinct. Based on the 1920 census data, Lotka fitted the distribution of male offspring to a zero-adjusted geometric distribution of the form \[ P(X_0=0)=0.48235\qquad \mbox{and}\qquad P(Z_0=(0.2126)\cdot (0.5893)^{k-1}), \qquad k\ge 1. \] Lotka found the extinction probability as the numerical solution to \[ p_e=0.48235 +\frac{(0.2126)p_e}{1-(0.5893)p_e}, \] giving the value \(p_e=0.819\). The mean of the male offspring distribution is \(\mu=1.26\). It is interesting that despite a mean number of children (sons and daughters) per individual of about 2.5, the probability of extinction of family surnames is over 80%.
Here we will prove that If \(\mu\le 1\), that is, in the subcritical and critical cases, the extinction probability is equal to 1. If \(\mu>1\), then there is a positive probability of non-extinction. The rest of the theorem, that \(p_e\) is the smallest solution to the equation \(w=\Pi_{X_0}(w)\) was proved earlier.
Let’s talk about convexity. Remember that a convex function has a positive second derivative; the second derivative of \(x^2\) w.r.t. \(x\) is 2.
library("xts")
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
#library("roll")
library("ggplot2")
library("data.table")
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:xts':
##
## first, last
data <- data.table(x = 0:100,
y = (0:100) ^ 2)
ggplot(data, aes(x = x, y = y)) +
geom_line() +
theme_bw() +
xlab("x") + ylab("y = x ^ 2") + ggtitle("Convex Function")
Is our PGF \(\Pi_{X_0}(s)\) convex on the interval \(0\le s\le 1\) (this interval is all that we will worry about, because we are looking for the extinction probability \(p_e\), which must be on the interval 0 to 1 because it is a probability)? Let’s write out our PGF again: \[ \Pi_{X_0}(s)=P(X=0)+sP(X=1)+s^2P(X=2)+\ldots \] as well as its derivative \[ \Pi'_{X_0}(s)=P(X=1)+2sP(X=2)+\ldots \] Let’s only consider branching processes with \(P(X>1)>0\); that is, the offspring distribution can take on a value of 2 or more. Otherwise, we get pretty uninteresting processes: the cell either makes 1 copy or zero copies and, unless it makes 1 copy with probability 1 (which is also boring; just one cell in each generation for the rest of time!), will thus definitely die out eventually.
If \(P(X>1)>0\), then both \(\Pi_{X_0}\) and \(\Pi'_{X_0}(s)\) are strictly increasing functions in terms of \(s\), because all of the terms are non-negative (remember that \(s\) is between 0 and 1) and at least one of the terms that varies with \(s\) (i.e., \(s^2P(X=2)\)) is not 0 (we can just look at the equations; it’s clear that they are increasing with \(s\)).
Since the first derivative \(\Pi'_{X_0}(s)\) is strictly increasing on the interval, we have that the PGF is convex. We can thus make a useful chart
Figure 1 PDF for s in [0,1]
This chart has a lot of important hidden complexities. Let’s walk through what is happening here.
The blue line is the output of the PGF \(\Pi_{X_0}(s)\). For different values \(s\) on the horizontal axis, we see the output of \(\Pi_{X_0}(s)\) on the vertical axis (labeled here as \(y\)). As we have shown, this line is convex (which we can see here visually).
The black line is simply the unit line with slope 1 (or \(y=s\)).
Note that the location where the blue line \(\Pi_{X_0}(s)\) intersects the y-axis is \(P(X_0=0)\), or the probability of zero offspring according to the offspring distribution. This makes sense because remember one of the first properties that we saw of the PGF: \(\Pi_{X_0}(0)=P(X_0=0)\).
The point where the blue line (the PGF) crosses the unit line is \(p_e\), the extinction probability. That’s because, remember, \(p_e\) is the smallest solution to \(w=\Pi_{X_0}(w)\), and the intersection of the blue line (the PGF) and the unit line (along which every point we have \(s=t\)) provides this value. Also not that this intersection is the smallest solution to \(w=\Pi_{X_0}(w)\), because the blue and black lines also intersect at \(s=1\). This was also a key property of PGFs that we learned about earlier: \(\Pi_{X_0}(1)=1\).
Finally, remember that we showed how \(\Pi'_{X_0}(1)=E(X_0)\). Therefore, the slope of the blue line (PGF) at \(s=1\) is just \(E(X_0)\), or the mean of the offspring distribution!
Now comes the really cool part. We’ve seen that \(E(X_0)\) defines the slope of the PGF at 1. Let’s consider the case where \(E(X_0)<1\). Our chart could look something like this:
Figure 2 PDF for s in [0,1]
Note that, because \(E(X_0)<1\), the blue line stays entirely above the black line until \(s=1\). This is obviously the case in the above chart, but it’s also the case whenever \(E(X_0)<1\). The reason is because, since the slope is less than 1, as we move backward from \(s=1\) the blue line will not fall as quickly as the black line and thus the black line will be under the blue line. As we move closer to \(s=0\), the slope will get even smaller (because we showed that the PGF is a convex function from \(s=0\) to \(s=1\) and thus the slope increases on this interval) and the blue line will never catch the black line before \(s=0\).
What’s the implication here? Well, since we get \(p_e\), the probability of extinction, from the intersections of the blue and black line, and the only intersection possible is at\(s=1\), we can therefore say that \(p_e=1\), or the population will definitely go extinct if the expected value of the offspring distribution is less than 1 (\(\mu=E(X_0)<1\)).
In fact, the same holds if \(\mu=E(X_0)=1\), since the slope of the blue line will be 1 at \(s=1\) but strictly smaller for \(s<1\) and thus won’t cross the black line either. Therefore, we can amend our statement above: the population will definitely go extinct if the expected value of the offspring distribution is less than or equal to 1 (\(\mu=E(X_0)\le 1\)).
How about the final case? Our first chart gave a good example of when \(\mu=E(X_0)>1\).
Figure 3 PDF for s in [0,1]
Here, the slope of the blue line (PGF) at \(s=1\) is greater than 1, which, for symmetrical reasons to the ones discussed before (slope is larger than the black unit line, so it will fall faster and thus be below the black line), ensures that we will have the blue line intersect the black line again somewhere between \(s=0\) and \(s=1\) and thus have a solution \(p_e\) that is smaller than 1. Even the edge case works here: if we have \(P(X_0=0)\), in which case the blue line just intersects the black line at \(s=0\) , this means that \(p_e=0\), or the population will certainly not go extinct. This makes sense, because \(P(X_0=0)=0\) means ‘there is a zero probability that a cell has zero offspring’ which just means that the cells would reproduce with certainty! Therefore, we can say **the population will not certainly go extinct (it might go extinct, it’s just not a certainty) if the expected value of the offspring distribution is greater than 1 (\(\mu=E(X_0)>1\)).
This is definitely a different sort of derivation, and perhaps better suited to video form. At this point you may want to watch Video 2 for Lecture~7.
These are very powerful results; we can immediately say if a process will definitely go extinct or there is a positive probability of non-extinction by just looking at the mean of the offspring distribution \(\mu\).