The present document has been prepared to provide a brief introduction to infectious disease modelling using free and open source solutions such as “R”. The document is not a complete overview of the infectious disease modelling aspects but provides a jumpstart for interested readers to perform mathematical modelling using R. We have created this document as a prelude to the three hour Faculty Development Program (FDP) at the School of Public Health (SPH) of the Kalinga Institute for Industrial Technology (KIIT), Bhubaneswar, India on “Modelling Infectious Diseases using the principles of reproducible research.”
R packages/libraries are a collection of functions, complied codes and sample datasets. They are stored under a directory called “library” in the R environment. By default, R installs a set of packages during installation. For additional functions/codes, we need to explicitly load the additional packages. We shall require deSolve package for performing differential equations, tidyverse package for data manipulation and analysis and DiagrammeR for making infographics.
library(deSolve)
library(tidyverse)
library(DiagrammeR)
It is considered as a good practice to draw the model by hand/ by using any computer software for conceptualising the model. Following steps can be considered as minimum framework/ steps required for modelling:-
Let us assume a situation wherein we have a closed cohort of 1000 individuals who are infected as well as infective. The duration of infectiousness will remain for 07 days. Once recovered, they remain immune throughout their lifespan.
grViz("digraph flowchart {
graph [layout = dot,
rankdir = LR]
node [fontname = Helvetica, shape = rectangle,]
tab1 [label = '@@1']
tab2 [label = '@@2']
# edge definitions with the node IDs
tab1 -> tab2;
}
[1]: 'Infected'
[2]: 'Recovered'
")
As a pre-requisite to model building, we need to undertake evidence synthesis exercise to extract required parameters from the literature or by conducting a study.
initial_number_infected <- 1000
initial_number_recovered <- 0
Duration_of_infectiousness <- 7
recovery_rate <- 1/Duration_of_infectiousness
follow_up_duration <- 30
Once parameters are obtained, they are placed into three major parameter inputs viz initial values for each compartment, transition parameters from one compartment to another, and the time stamps for which modelling is required.
initial_state_values <- c(I = initial_number_infected,
R = initial_number_recovered)
parameters <- c(gamma = recovery_rate)
times <- seq(from = 0, to = follow_up_duration, by = 1)
The model is created from the above mentioned parameters by specifying differential equations. In R, a function is an object so that the researcher is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions. For performing differntial equations, we create a function with compartment as well as transition parameter at varied timestamps.
cohort_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
dI <- -gamma * I
dR <- gamma * I
return(list(c(dI, dR)))
})
}
A data frame is constructed by application of the differential model. Herein, Odinary Differential Equations (ode) function takes initial compartment values, timestamps, parameters and the model function as arguments.
output <- as.data.frame(ode(y = initial_state_values,
times = times,
func = cohort_model,
parms = parameters))
To understand the output let us first see the dimensions of our dataframe
dim(output)
## [1] 31 3
To look at the first 5 values of the data
head(output,5)
Since, more than one column is representing the state of the person (Infected or recovered), we need to transform the data into tidy format. A tidy data format is said to present when each column represents one variable, each row has one observation and each cell has one value.
dat <- output %>% pivot_longer(cols = c(I,R),
names_to = "state",
values_to = "value")
head(dat,5)
Visualisation is a recommended exploratory data analysis method. Let us see how infected and recovered compartments were changing in the model.
dat %>% ggplot(aes(x = time,
y = value,
color = state,
group = state))+
geom_line()+
xlab("Time(Days)")+
ylab("Number of persons")+
labs(title = "Changes in Infected and Recovered individuals")
To demonstrate increasing complexity, let us now develop a model for a disease wherein three compartments viz susceptible, infected, and recovered are present. There are no births (inflows) or deaths (outflows) in the model. Further, all those who are infected are also infectious and once recovered, immunity is long lasting. The basic assumptions of compartment models such as homogenity among others are also applicable.
grViz("digraph flowchart {
graph [layout = dot,
rankdir = LR]
node [fontname = Helvetica, shape = rectangle,]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab3 [label = '@@3']
# edge definitions with the node IDs
tab1 -> tab2;
tab2 -> tab3;
}
[1]: 'Susceptible'
[2]: 'Infected'
[3]: 'Recovered'
")
## 1. Evidence synthesis
initial_number_susceptible <- 10000000 # Adding susceptibles to IR model
initial_number_infected <- 1 # Considering one infected at start time
initial_number_recovered <- 0
Duration_of_infectiousness <- 7
recovery_rate <- 1/7
Reproduction_number <- 2.28 #R0== beta/gamma,
infection_rate <- Reproduction_number*(1/Duration_of_infectiousness) #beta
follow_up_duration <- 300
## 2. Create model input using parameters
initial_state_values <- c(S = initial_number_susceptible, # Adding susceptibles
I = initial_number_infected,
R = initial_number_recovered)
parameters <- c(beta = infection_rate, # Adding beta parameter
gamma = recovery_rate)
times <- seq(from = 0, to = follow_up_duration, by = 1)
## 3. Create a model using input values
cohort_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
N <- S+I+R # Total population
lambda <- beta * I/N #Calculating lambda as function of beta
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
return(list(c(dS, dI, dR)))
})
}
## 4. Return a dataframe as output using model equations
output <- as.data.frame(ode(y = initial_state_values,
times = times,
func = cohort_model,
parms = parameters))
## 5. Exploratory Data Analysis (Visualisation) of the output
dat <- output %>% pivot_longer(cols = c(S,I,R),
names_to = "state",
values_to = "value")
dat %>% ggplot(aes(x = time,
y = value,
color = state,
group = state))+
geom_line()+
xlab("Time(Days)")+
ylab("Number of persons")+
labs(title = "Changes in Suceptible, Infected and Recovered")
Let us now assume that we have a vaccine with 100% effectiveness against aquiring infection. Will the epidemic be averted by vaccinating 10% population? What other changes do you expect in the epidemic?
## 1. Evidence synthesis
prop_vaccinated <- 0.1
initial_number_susceptible <- (1-prop_vaccinated)*10000000
initial_number_infected <- 1
initial_number_recovered <- prop_vaccinated*10000000
Duration_of_infectiousness <- 7
recovery_rate <- 1/7
Reproduction_number <- 2.28 #R0== beta/gamma,
infection_rate <- Reproduction_number*(1/Duration_of_infectiousness) #beta
follow_up_duration <- 300
## 2. Create model input using parameters
initial_state_values <- c(S = initial_number_susceptible,
I = initial_number_infected,
R = initial_number_recovered)
parameters <- c(beta = infection_rate,
gamma = recovery_rate)
times <- seq(from = 0, to = follow_up_duration, by = 1)
## 3. Create a model using input values
cohort_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
N <- S+I+R # Total population
lambda <- beta * I/N #Calculating lambda as function of beta
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
return(list(c(dS, dI, dR)))
})
}
## 4. Return a dataframe as output using model equations
output <- as.data.frame(ode(y = initial_state_values,
times = times,
func = cohort_model,
parms = parameters))
## 5. Exploratory Data Analysis (Visualisation) of the output
dat <- output %>% pivot_longer(cols = c(S,I,R),
names_to = "state",
values_to = "value")
dat %>% ggplot(aes(x = time,
y = value,
color = state,
group = state))+
geom_line()+
xlab("Time(Days)")+
ylab("Number of persons")+
labs(title = "Changes in Suceptible, Infected and Recovered at 10% vaccine coverage")
Critical Vaccination Threshold is defined as minimum percentage of population which needs to be vaccinated to prevent occurence of epidemic. Going manually by changing the vaccination coverage is a start point in understanding the concept.
Modelling for 20% coverage:
## 1. Evidence synthesis
prop_vaccinated <- 0.2
initial_number_susceptible <- (1-prop_vaccinated)*10000000
initial_number_infected <- 1
initial_number_recovered <- prop_vaccinated*10000000
Duration_of_infectiousness <- 7
recovery_rate <- 1/7
Reproduction_number <- 2.28 #R0== beta/gamma,
infection_rate <- Reproduction_number*(1/Duration_of_infectiousness) #beta
follow_up_duration <- 300
## 2. Create model input using parameters
initial_state_values <- c(S = initial_number_susceptible, # Adding susceptibles
I = initial_number_infected,
R = initial_number_recovered)
parameters <- c(beta = infection_rate, # Adding beta parameter
gamma = recovery_rate)
times <- seq(from = 0, to = follow_up_duration, by = 1)
## 3. Create a model using input values
cohort_model <- function(time, state, parameters) {
with(as.list(c(state, parameters)), {
N <- S+I+R # Total population
lambda <- beta * I/N #Calculating lambda as function of beta
dS <- -lambda * S
dI <- lambda * S - gamma * I
dR <- gamma * I
return(list(c(dS, dI, dR)))
})
}
## 4. Return a dataframe as output using model equations
output <- as.data.frame(ode(y = initial_state_values,
times = times,
func = cohort_model,
parms = parameters))
## 5. Exploratory Data Analysis (Visualisation) of the output
dat <- output %>% pivot_longer(cols = c(S,I,R),
names_to = "state",
values_to = "value")
dat %>% ggplot(aes(x = time,
y = value,
color = state,
group = state))+
geom_line()+
xlab("Time(Days)")+
ylab("Number of persons")+
labs(title = "Changes in Suceptible, Infected and Recovered at 20% vaccine coverage")
To conclude, the complexity of the models will increase further as more parameters and compartments are added. Incorporating age structure will introduce compartments and processes for each age group. Contact rates between age groups and gender may vary calling for further compartmentalisation.Vector borne diseases shall include separate but interacting compartments between and within host and vectors. However, the basic priciples/ framework shall remain the same. Also, from the existing data, caliberation of parameters such as reproduction number can be carried out using Maximum Log-Likelihood and other methods. Do we never have outbreaks when reproduction number is less than one (think stochastically)..and the journey continues.. Enjoy modelling! Thank You!!