Disclaimer: the following represents material for educational purposes only and reflects the views and analyses of its author, Chris H. Wilson. In particuar, it does not represent the views or official policies of his employer, the University of Florida, nor does it constitute advice if the reader feels they have been exposed to SARS-COV2, or have any symptoms of Covid-19. In such cases, the reader is encouraged to contact their physician and local health department for guidance. The author gives permission to all reproductions for informational purposes only, requesting only a link back to this original source.
If you are anything like me, you have been reading a lot of articles about the virology and epidemiology of SARS-COV2, the agent behind the new disease Covid-19. Our knowledge about this virus is developing rapidly - thankfully - however the scale of the public health crisis we are facing is daunting. Rather than attempt any kind of comprehensive analysis here, I want to help unpack one of the key pieces of terminology that is being thrown around a lot, “R naught” (\(R_0\)) or the “basic reproduction number”. This will be a mid-level mathematical exposition, intended to make this epidemiological concept intuitive to those with a decent generic quantitative fluency, but no special training in disease ecology or related theory. It may even be useful to those who have studied some population and disease ecology, but are feeling rusty or never fully understood the logic behind the models.
The basic “toy model” in epidemiology is the so-called “SIR” model, standing for Susceptible/Infected/Recovered. This is the bread and butter of modeling disease outbreaks in a population. In many cases, its assumptions are way too restrictive to provide an accurate quantitative predictive framework, but it is a useful framework to understand in order to educate intuition. In the case of SARS-COV2, I am told that SIR models actually do fit the data surprisingly well, at least within cities, for reasons that will become clear. So the structure in this little tutorial is simple: first we start with basics of multiplicative growth, then we apply our equation to infections and derive \(R_0\) and finally we reconstruct the full SIR model.
To begin, recall that the basic process underying biological growth is multiplicative. That is to say, cells divide, organisms reproduce, and so do viruses! This multiplication process tells us that, absent structural constrains from outside, populations of cells, viruses and organisms do not accumulate linearly over time but exponentially (i.e. non-linearly). In addition to multiplying, they spread. Multiplicative growth is captured quite simply by the following difference equation: \[\tag{1} X[t+1] = X[t] + \tau X[t]\] We can rewrite this as: \[\tag{2} X[t+1] = (1 + \tau) X[t]\] and after substitution \(\lambda = 1 + \tau\), we get: \[\tag{3} X[t+1] = \lambda X[t]\] Here, \(\lambda\) represents the multiplicative/geometric growth rate of the population \(X\) over some discrete time interval indexed by \(t\). If we take \(\lambda = 2\), our time-step is a doubling interval, and the population growth looks like that in Figure 1:
Fig 1: multiplicative growth
Given \(T\) discrete time intervals, and an initial population \(X[0]\), it is easily shown that our equation is \[\tag{4} X[T] = \lambda^T X[0]\]
Now, discrete equations are nice, and in many ways most congenial for biological processes. However, because we want to use the tools of calculus and differential equations, we need to map this into continuous form, which we do by defining an instantaneous growth rate \(k\), and setting \(\lambda^T = e^{kT}\), hence \(k=log(\lambda)\), and our continuous equation is: \[\tag{5} X[t] = e^{kt} X[0]\] This turns out to be the integral solution of the differential equation: \[\tag{6} \frac{dX}{dt} = kX\] and with equation 6 we are finally ready to jump into the SIR models, from which the concept of “basic reproductive rate” is most easily derived.
Our first goal is to model how the number of infected people in a population grows over time. To do this, we start with equation 6 and then we make additional assumptions. The basic logic is this: you start with an infected individual. That person then comes into contact with a certain number of non-infected individuals at a certain rate. Call this, “rate of contact” (\(\kappa\)). In each contact, there is a certain probability that the infection will be transmitted, “transmission probability”(\(\rho\)). When the number of infected individuals is small relative to the total population (\(N\)), the growth in rate of infections will be, to a high degree of approximation: \[\tag{7} \frac{dI}{dt} = \kappa \rho I\]
Notice that this has the exact same form as the basic exponential growth model above (simply substitute \(\kappa \rho\) for \(k\)). In fact, this is what we see - during early phases of an outbreak, the epidemic grows exponentially.
As always with differential equations, it is helpful to be super clear about the dimensional analysis (a hugely under-taught tool, see Sanjoy Mahavan’s excellent book “Street Fighting Math”). On the left hand side, we have something with dimensions of persons/time. On the right hand side we have I (with dimensions ‘persons’), multiplied by dimensionless probability \(\rho\), multiplied by a rate \(\kappa\). So \(\kappa\) must ultimately have dimension of 1/time. Indeed, thinking it through it is #persons-contacted/(person-infected X time). The dimensions of persons cancel, and we have 1/time, just as needed. We will return to this dimensional analysis.
What equation 7 assumes so far is that every person contacted by an infected individual is susceptible to infection. However, as the infected proportion of the population grows, this assumption breaks down as the probability that an individual that is contacted is already infected, or has recovered and is now resistant increases. To accomodate, we now introduce a pool of susceptible individuals, “S”, and we represent this probability as \(S/N\). Since this ratio is dimensionless, we can simply stick it onto equation 7 no problem, and we get: \[\tag{8} \frac{dI}{dt} = \kappa \rho I \frac{S}{N}\]
Over time, \(\frac {S}{N}\) declines to 0, and the growth rate of infections goes to zero. By itself, this dynamic is essentially logistic. In fact, if there were only two states (“infected” and “susceptible”), we could rewrite as \[\tag{9} \frac{dI}{dt} = \kappa \rho I \frac{N-I}{N}=\kappa \rho I (1 - \frac{I}{N})\] which is mathematically identical to the most common logistic growth model used in biology and ecology.At this point, we need to point out that there is a stringent assumption here. That is, this probability is effectively homogeneous across the population and can be represented by the proportion \(\frac{S}{N}\). This is the so-called “law of mass action” imported from equilibrial dynamics in chemistry and physics. In reality, we can allow for lots of microscale variation in this probability. As long as the variations are sufficiently well-behaved, the emergent macroscale behavior can be captured by this fraction. In this context, we require the notion that populations are, among other things, well mixed. The real world can get in the way of this being a good assumption when there is strong spatial structure, strong heterogeneity in the population, and so forth.
Another thing happens though. Infected individuals either die or recover and become resistant. The simplest assumption here is that death and recovery are also both first-order processes, so we subtract them from the growth rate and complete our differential equation. For now, I am going to pretend that no one dies (yay!) and people just become resistant, as the math fundamentally does not change either way: \[\tag{10} \frac {dI}{dt} = \kappa \rho I \frac{S}{N} - \nu I\]
Now, briefly, we must maintain the RHS with dimensions of persons/time, so \(\nu\) is simply another rate with dimension of 1/time, and represents rate of recovery into resistance.
And now we can see exactly what \(R_0\) is and where it comes from. An individual does not stay infectious forever. They recover and become resistant “R” instead, at the rate \(\nu\). But before they do this, they are generating new infections at the rate \(\kappa \rho\). So if we take the following ratio: \[\tag{11} \frac {\kappa \rho}{\nu}\], we have a nice dimensionless number that represents the average number of new infections per infected individual. This is \(R_0\) the “basic reproductive number”!
Return to the dimensionality for a second. It is customary to write \(\kappa \rho\) as \(\beta\). Remember that this has a dimension of 1/time. Therefore, the “typical time” to a new infection is the reciprocal or \(\frac{1}{\beta}\). Likewise, the “typical time” to recovery is \(\frac{1}{\nu}\). “\(R_0\)” is clearly then the typical time to recovery over the typical time to infection. A larger \(R_0\) could arise from either recovery taking longer, time to infection being shorter, or both.
Large \(R_0\) is bad. Small \(R_0\) is good.
In fact, we can now easily recover the single most important epidemiological consequence of \(R_0\). Is the number of infected individuals growing or declining? Our goal is to make a set of interventions that turn equation 10 negative, thus guaranteeing the decline of infected individuals, and eventual extinction of the outbreak.
We re-write equation 10 using \(R_0\) from formula 11 and we get: \[\tag {12} \frac {dI}{dt} = R_0 \nu I \frac{S}{N} - \nu I = \nu I (R_0 \frac{S}{N} - 1) \]
At the beginning of an outbreak, where \(S \approx N\), this differential equation is clearly positive when \(R_0\) > 1, and negative when \(R_0\) < 1.
Here is the punchline: at the beginning of an outbreak, if you severely limit average number of contacts between people, transmission probabilities, or both, you can drive \(R_0\) below one and prevent the outbreak from ever materializing. This is the goal of social distancing measures. We can also be more surgical and effective at this if we target infected individuals and their contacts for quarantine. This requires widespread testing and is why the Trump administration’s early response to Covid-19 probably constitutes gross negligence.
As the pandemic proceeds, the number of susceptibles presumably declines, which leads to self-limitation of the outbreak through the ratio \(\frac{S}{N}\). This is the so-called “herd immunity” effect that the British seem to be banking on. In better scenarios, we deploy mass scale vaccination campaigns to drive enough people out of the “S” pool so that the equation goes negative that way. “Herd immunity” via infection is a terribly risky proposition since it requires a lot of people to get sick, and in the case of SARS-COV2 this is going to mean lots of people dead both directly from the virus and indirectly from a flooded healthcare system that can no longer take care of people with all kinds of ailments. This, too, probably amounts to gross negligence, given all the uncertainties about Covid-19.
The rest of this tutorial will be brief. We complete our model with understanding the disease by considering the three pools separately, in the order S-I-R. \[\tag {13} \frac {dS}{dt} = - R_0 \nu I \frac{S}{N} \] \[\tag {14} \frac {dI}{dt} = R_0 \nu I \frac{S}{N} - \nu I \] \[\tag {15} \frac {dR}{dt} = \nu I \] This represents a system of three coupled ODEs. We have removal of Susceptibles by infection, already derived in detail. The initial condition is \(S[0] = N\). The dynamics of the Infected pool have already been dissected in detail. Then we simply add an equation for the change in Recovered/Resistant pool which receives everyone dropping out of the Infected pool.
The astute reader will notice that our development actually only has 3 parameters, all of which go into the dimensionless ratio \(R_0\). This is why \(R_0\) is so widely discussed. It compactly summarizes all the information present in our simplest theoretical model of how infectious diseases work.
And there you have it! This is the most basic foundation of infectious disease epidemiology. Along the way, I hope the assumptions underlying the epidemiology are more clear, as is the meaning of the ubiquitous \(R_0\) term. In parting, I want to emphasize that \(R_0\) is NOT an inherent property of a virus. It emerges dialectically from viral biology interacting with human social behavior, which in turn is shaped historically by culture, politics and economics.
We have a lot of leverage over epidemics. Let us use our collective agency to write a better history for this pandemic!
Recalling Richard Levins’ admonishment to always ask “Where does the rest of the world enter the model?”, what follows is a run-down of key parameters and how they are impacted by the environment and social behavior.
\(R_0\): The basic reproductive rate. This is defined most simply as \(\frac {\kappa \rho}{\nu}\) (equation 10 in text). These components are explained next.
\(\kappa\) is here the mean # of individuals exposed by every infected individual. This is where social distancing and lock-downs come into play in a big way. Again, notice that we are assuming a constant number across everyone. We can relax that assumption and let it vary by individual and day, so long as that variability is sufficiently well-behaved, it will aggregate and we can just use the mean value. So-called “super spreaders” may cause larger problems for this assumption. But again, if the population being studied is well-mixed, that may attenuate the problems here. Nevertheless, this deserves much closer scrutiny.
\(\rho\). The probability of transmission given an exposure. Again, all the caveats about mean-field apply, and it is important to resolve the many modes of transmission (aerosolization, droplet, surfaces, fecal-oral, etc.). This is also where weather might make a difference if it slows the virus down, hinders survival outside the body, etc.
\(\nu\). The rate of recovery. This will be impacted by medical treatment, availability of resources, underlying health of population, etc. If we were to expand the SIR model to include mortality, that rate would have similar influences.
Coming soon.
In progress: Wayne Wilson, Philip Shirk, Andrew Lyjak,