\(R_0\), pronounced “R naught, is called the basic reproduction number in Epidemiology, it is defined as the expected number of secondary cases produced by a single case in a susceptible population.\(^1\) \(R_0\) is an important parameter for public health planning such as how quickly people can return to work or when to reopen schools, et al. If \(R_0\) is larger than 1 the infection will spread in the population.When \(R_0\) is smaller than 1, the infection will die out. The Wikipedia (https://en.wikipedia.org/wiki/Basic_reproduction_number) has a very good introduction to the concept of \(R_0\). Here is a table from the Wikipedia for \(R_0\) for different infectious diseases. Spline from the Bunnings

The purpose of this note is to show how to calculate the \(R_0\) using serological data and R software.

The SIR Epidemic Model

SIR model is the most basic compartment model established by Kermack and Mckendrick in 1927. The model is used to study dynamic of infectious diseases in certain population of certain area and at certain time period.

The model divides the population into three compartments (or three classes):

  1. Susceptible(\(S\)): we write the number of people in this class as \(S(t)\), which stands for at time \(t\), the number of people were not infected but susceptible to the infection,

  2. Infected (\(I\)): we write the number of people in this class as \(I(t)\), which stands for at time \(t\), how many people were infected and could transmit the disease to other people.

  3. Recovered,(\(R\)) removed or immuned: we write the number people in this class as \(R(t)\),which stands for at time \(t\), how many people were recovered, removed or immuned.

Therefore, the total population \(N(t)\) at time \(t\) will be

\[N(t)=S(t)+I(t)+R(t) \] For the SIR model we will make the following three assumptions:

(1). The total population in this area is constant, i.e. \(N(t)\equiv K\) i.e \(S(t)+I(t)+R(t)\equiv K\)

(2). within t unit time, the number of new infected individuals is proportional to number of susceptible individuals \(S(t)\) and number of already infected individuals \(I(t)\), i.e within the unit time t, the number of new infected individuals is equal to \(\beta*S(t)*I(t)\). Here, \(\beta\) is a scalar and \(\beta \in (0,\infty)\), it also called effective contact rate.

(3). At time \(t\) the number of recovered individuals is proportional to number of infected individuals,i.e within the unit time \(t\) the number of recovered, removed or immuned is \(\gamma *I(t)\), here \(\gamma\) is a scalar and \(\gamma \in (0,\infty)\), \(\gamma\) is also called removal rate or recover rate.

Based on the above assumptions we can obtain three ordinary differential equations(ODE).

First,

\[\frac{dS(t)}{dt}=-\beta*S(t)*I(t)\tag{1}\] This equation relates to the above assumption \((2)\), it means within unit time \(t\) the change of number of susceptible individuals. The number of susceptible individuals will be decreased by \(\beta*S(t)*I(t)\), the negative sign means decrease, note \(\beta \in (0,\infty)\).

Second,

\[\frac{dI(t)}{dt}=\beta*S(t)*I(t)-\gamma*I(t) \tag{2}\] This equation means the change of number of infected individuals within unit time \(t\) is equal to the number of newly infected individuals \(\beta*S(t)*I(t)\) minus the number of recovered individuals \(\gamma*I(t)\). This equation relates to the above assumptions \((2)\) and \((3)\).

Third,

\[\frac{dR(t)}{dt}=\gamma*I(t) \tag{3}\] This equation means the change of number of recovered individualss within unit time \(t\) is proportional to the number of infected individuals within the unit time \(t\). This equation can be derived from assumption (3).

We put the equation \((1),(2),(3)\) together:

\[\left\{\begin{matrix} \frac{dS(t)}{dt}=&-\beta*S(t)*I(t)\\ \frac{dI(t)}{dt}=&\beta*S(t)*I(t)-\gamma*I(t)\\ \frac{dR(t)}{dt}=&\gamma*I(t) \end{matrix}\right. \] Next, we will analyse these three equations and show how to obtain the \(R_0\) and I will give an epidemic example as how to calculate the \(R_0\).

We add up the left side and right side of equations \((1),(2) ,(3)\), we get:

\[\frac{d(S(t)+I(t)+R(t))}{dt}=0\] This verified that \(S(t)+I(t)+R(t)\) is a constant,we set it as \(K\).

Next, we divide the equation \((2)\) by the equation \((1)\) we get

\[\frac{dI(t)}{dS(t)}=-1+\frac{\rho}{S(t)}\tag{4}\] where \(\rho=\frac{\gamma}{\beta}\),also note the variable in this ordinary differential equation \((4)\) is \(S(t)\), not \(t\).

For convenient, we also can write the equation \((4)\) as:

\[\frac{dI}{dS}=-1+\frac{\rho}{S}\tag{4'}\]

From the equation \((4)\) we can easily see when \(\rho=S(t)\) then \(\frac{dI(t)}{dS(t)}=0\), then the infection numbers \(I(t)\) will achieve the maximum value by the Fermat’s theorem (or Extreme Value Theorem),

Next, I will solve the ordinary differential equation \((4)\) and plot trajectories of the ODE to show the relationship between the \(I(t)\) and \(S(t)\).

The equation \((4)\) is just a first order differential equation, it is not difficult to solve this ODE, substitute \(\frac{dI}{dS}=I'\), we get \[I'=-1+\frac{\rho}{S} \tag{5}\] Integrate both sides of the equation \((5)\) we get:

\[I=\int (-1+\frac{\rho}{S})dS \tag{6}\]

this is because if \(f'(x)=g(x)\) then \(f(x)=\int g(x)dx\)

Evaluate the integral in the equation \((6)\) we get \[I=-S+\rho\log(S)+c_1 \tag{7}\] This function shows the relationship between the number of infected individuals and number of susceptible individuals with the parameter\(\rho (=\frac{\gamma}{\beta})\), here \(c_1\) is an arbitrary constant whose value can be determined by a boundary condition.

We set \(\rho=200\) and \(c_1\) at different values and plot the functions \((7)\) using the following R code.

rho<-200
C1=50
C2<-0
C3<--100
C4<--200
C5<--300
C6<--400
S<-seq(100,1000,by=10)
I<--S+rho*log(S)+C1
I2<--S+rho*log(S)+C2
I3<--S+rho*log(S)+C3
I4<--S+rho*log(S)+C4
I5<--S+rho*log(S)+C5
I6<--S+rho*log(S)+C6

plot(S,I,type='l',col=1)
lines(S,I2,type='l',col=2)
lines(S,I3,type='l',col=3)
lines(S,I4,type='l',col=4)
lines(S,I5,type='l',col=5)
lines(S,I6,type='l',col=6)
abline(v = 200,col='7',lty=2)

From these plots we can see when \(S=200\) i.e when \(S=\rho\), the number of infected individuals will reach to the maximum number. Note this plot cannot tell us at what time the number of infected individuals can reach to the maximum number since there was no time variable. Neither can we tell the total number of the population from the plot since \(R\) is not in the plot.

Now let us analyse the relationship between the number of infected individuals and time from the equation \((2)\) and we can get three scenarios:

\[\left\{\begin{matrix} \beta*S(t)*I(t)-\gamma*I(t)>0 \Leftrightarrow \frac{\beta}{\gamma}S(t)>1\\ \beta*S(t)*I(t)-\gamma*I(t)=0 \Leftrightarrow \frac{\beta}{\gamma}S(t)=1 &\text{ or } I(t)=0\\ \beta*S(t)*I(t)-\gamma*I(t)<0 \Leftrightarrow \frac{\beta}{\gamma}S(t)<1 \end{matrix}\right. \tag{7}\] Combine the equation \((2)\) and \((7)\) we can see when \(\frac{\beta}{\gamma}S(t)>1\), \(I(t)\),i.e the number of infected individuals, will be increasing with time, when \(\frac{\beta}{\gamma}S(t)=1\), \(I(t)\) will be the same as before, and when \(\frac{\beta}{\gamma}S(t)<1\),\(I(t)\) will be decreasing.

Now we define at time zero \(S(0)=S_0\) then \(R_0\) is defined as \[R_0=\frac{\beta}{\gamma}S_0 \tag{8}\]

When \(R_0>1\) the number of infected individuals will be increasing with time, the epidemic stars, when \(R_0=1\) the number of infected individuals will not change, and when \(R_0<1\) the number of infected individuals will be decreasing with time.

Notice, many people use the proportions of \(S(t)\), \(I(t)\) and \(R(t)\) in the SIR model, what they used were \(s(t)=\frac{S(t)}{N(t)}\),\(i(t)=\frac{I(t)}{N(t)}\) and \(r(t)=\frac{R(t)}{N(t)}\). Then at time zero, everyone (except the index case) is susceptible, therefore \(s_0=\frac{N_0-1}{N_0}\approx 1\) and

\[R_0=\frac{\beta}{\gamma}*s_0\approx\frac{\beta}{\gamma}\tag{9}\] There is no \(S_0\) in the formula, In fact, the \((8)\) and \((9)\) are the same, it depends whether the actual numbers were used or proportions were use in the SIR model.

Next we talk about the equilibrium state of an infectious disease, we say an infectious disease is in equilibrium whenever \(\frac{S(t)}{dt}=\frac{I(t)}{dt}=\frac{R(t)}{dt}=0\). We write the number of susceptible individuals, the number of infected individuals and the number of recovered individuals at the equilibrium stat as \(S(\infty), I(\infty)\) and \(R(\infty)\).

To calculate the \(R_0\), if we know \(\beta\) and \(\gamma\) then it is not difficult to do the calculation, however sometimes \(\beta\) and \(\gamma\) might not be easy to obtain. Another way to calculate \(R_0\) is to use serological data to obtain \(S_0\) and \(S_{\infty}\), and with the total number of population \(K\) we can derive the \(R_0\), next I will show how to do this.

To be continue…

***************************************************************************************************************************************************************************

Another example to discuss

parameters = c(mu=1/75,beta=0.0005,nu=1,N=5000,alpha=0)
state = c(S=4999,I=1,R=0)
SIR=function(t,state,parameters)
{
  with(as.list(c(state, parameters)),
       {
         dS = N*mu-beta*I*S - mu*S   #mu the nature rate of death,beta effective contact rate, assume birth rate equals to natural death rate
         dI = beta*I*S - (nu+alpha+mu)*I #nu recover rate,alpha additional disease related mortality
         dR = nu*I - mu*R
         list(c(dS, dI, dR))
       })
}

times<-seq(0,250,by=0.1)
require(deSolve)
res.scen1<-as.data.frame(ode(y=state,times=times,func=SIR,parms=parameters))

par(mfrow=c(1,3),cex.axis=1.3,cex.lab=1.5,lwd=3,las=1,mgp=c(3, 0.5, 0))

plot(res.scen1[,1],res.scen1[,2]/(res.scen1[,2]+res.scen1[,3]+res.scen1[,4]),xlab="time",ylab="proportion susceptible",type="l",lwd=2)
plot(res.scen1[,1],res.scen1[,3]/(res.scen1[,2]+res.scen1[,3]+res.scen1[,4]),xlab="time",ylab="proportion infected",type="l",lwd=2)
plot(res.scen1[,2],res.scen1[,3],xlab="number susceptible",ylab="number infected",type="l",lwd=2)

Calculate \(R_0\) and equilibrium values

times<-seq(0,1000,by=0.1)
require(deSolve)
res.scen1<-as.data.frame(ode(y=state,times=times,func=SIR,parms=parameters))


R0<-0.0005*5000/(1/75+1)
R0
## [1] 2.467105
S_inf<-1/(0.0005*5000/(1/75+1))*5000

S_inf
## [1] 2026.667
I_inf<-(1/75)/(0.0005*5000)*(R0-1)*5000

I_inf
## [1] 39.12281

Notes for time homogeneous SIR model

Here \(S\) and \(I\) are only related to age, i.e they are not affected by the calender time. Change in the infected class is given by

\[\frac{I(t)}{dt}=\lambda S(t)-(\nu+\mu)I(t) \tag{10}\]

This is a linear first order differential equation which has the following form:

\[\frac{{dy}}{{dt}} + p(t)y = g(t) \] The solution for this form of ODE is \[y(t) = \frac{{\int{{\mu(t)g(t)dt}} + c}}{{u (t)}}\tag{11}\] where \[u(t)=e^{\int p(t)dt}\] Note, it is \(u\) not \(\mu\) in the above equation.

Next, I will show at age \(a\)

\[I(a)=\frac{\lambda}{\lambda-\nu}N(0)l(a)[e^{-\nu a-}e^{-\lambda a}]\] ***************************************************************************************************************************************************************************

\[\frac{I(t)}{dt}=\lambda S(t)-(\nu+\mu)I(t)\Rightarrow \frac{I(t)}{dt}+(\nu+\mu)I(t)=\lambda S(t) \tag{12}\]

From \((12)\) we can see \[u(t)=e^{\int (\nu+\mu)dt}\tag{13}\]

and \[ g(t)=\lambda S(t)\] We write \[S(t)=N(0)e^{-(\lambda+\mu)t} \]

then, \[g(t)=\lambda N(0)e^{-(\lambda+\mu)t}\tag{14}\]

Therefore, the soultion for equation \((10)\) is

\[\begin{align*}I(t)&=\frac{\int e^{\int (\mu+\nu)dt}\lambda N(0)e^{-(\lambda+\mu)t}dt+c}{e^{\int (\mu+\nu)dt}}\\ &=\frac{\lambda N(0)\int e^{(\mu+\nu)t}e^{-(\lambda+\mu)t}dt+c}{e^{\int (\mu+\nu)dt}}\\ &=\frac{\lambda N(0)\int e^{(\nu-\lambda)t}dt+c}{e^{\int (\mu+\nu)dt}} \end{align*}\]

Since we only care age from \(0\) to \(a\), we obtain:

\[\begin{align*}I(a)&=\frac{\lambda N(0)\int_0^a e^{(\nu-\lambda)t}dt}{e^{\int_0^a (\mu+\nu)dt}}\\ &=\frac{\frac{\lambda}{\nu-\lambda}N(0)[e^{(\nu-\lambda)a}-1]}{e^{(\mu+\nu)a}}\\ &=\frac{\frac{\lambda}{\nu-\lambda}N(0)[e^{(\nu-\lambda)a}-1]}{e^{(\mu+\nu)a}} \times \frac{e^{-\mu a}}{e^{-\mu a}}\\ &=\frac{\frac{\lambda}{\nu-\lambda}N(0)l(a)[e^{(\nu-\lambda)a}-1]}{e^{\nu a}} \text { where } l(a)=e^{-\mu a}\\ &=\frac{\lambda}{\nu-\lambda}N(0)l(a)[e^{-\lambda a}-e^{-\nu a}]\\ &=\frac{\lambda}{\lambda-\nu}N(0)l(a)[e^{-\nu a}-e^{-\lambda a}] \end{align*}\] \(\square\)

The derivative of the above relationship is not so obvious, the key is to take care when we change from an indefinite integral to a definite integral.

***************************************************************************************************************************************************************************

Another example with different vaccination rates

SIR<-function(t,state,parameters)
{
  with(as.list(c(state, parameters)),
       {
         dS <- N*mu*(1-p)-beta*S*I - mu*S
         dI <- beta*S*I - nu*I - mu*I
         dR <- nu*I -mu*R+N*mu*p
         list(c(dS, dI, dR))
       }) 
}

### define parameters
parameters <- c(mu=1/75,beta=0.0005,nu=1)
state <- c(S=4999,I=1,R=0)
times<-seq(0,800,by=0.01)
N<-5000

require(deSolve)
p<-0
outp0 <- as.data.frame(ode(y=state,times=times,func=SIR,parms=parameters))
p<-0.2
outp02 <- as.data.frame(ode(y=state,times=times,func=SIR,parms=parameters))
p<-0.4
outp04 <- as.data.frame(ode(y=state,times=times,func=SIR,parms=parameters))

par(mfrow=c(1,2),lwd=2,las=1,cex.axis=0.8,cex.lab=0.8,mgp=c(2, 0.45, 0),
    mar=c(3.2, 2.8, 3.5, 1.5))

plot(times,outp0$S,type="l",main=" ", xlab="time", ylab="Number of susceptibles")
lines(times,outp02$S,lty=2,col='blue')
lines(times,outp04$S,lty=3,col='red')

plot(outp0$S,outp0$I,main=" ",type="l",xlab="Number of susceptibles",ylab="Number of infected")
lines(outp02$S,outp02$I,lty=2,col='blue')
lines(outp04$S,outp04$I,lty=3,col='red')