Introduction.

The present document has been prepared to provide a brief introduction to infectious disease modelling using free and open source solutions such as “R”. The document is not a complete overview of the infectious disease modelling aspects but provides a jumpstart for interested readers to perform mathematical modelling using R. We have created this document as a prelude to the three hour Faculty Development Program (FDP) at the School of Public Health (SPH) of the Kalinga Institute for Industrial Technology (KIIT), Bhubaneswar, India on “Modelling Infectious Diseases using the principles of reproducible research.”

Loading library

R packages/libraries are a collection of functions, complied codes and sample datasets. They are stored under a directory called “library” in the R environment. By default, R installs a set of packages during installation. For additional functions/codes, we need to explicitly load the additional packages. We shall require deSolve package for performing differential equations, tidyverse package for data manipulation and analysis and DiagrammeR for making infographics.

library(deSolve)
library(tidyverse)
library(DiagrammeR)

Framework for mathematical modeling

It is considered as a good practice to draw the model by hand/ by using any computer software for conceptualising the model. Following steps can be considered as minimum framework/ steps required for modelling:-

  1. Evidence synthesis
  2. Create model input using parameters
  3. Create a model using input values
  4. Return a dataframe as output using model equations
  5. Exploratory Data Analysis of the output

Illustrative example: Infected Cohort

Let us assume a situation wherein we have a closed cohort of 1000 individuals who are infected as well as infective. The duration of infectiousness will remain for 07 days. Once recovered, they remain immune throughout their lifespan.

Understanding the model

grViz("digraph flowchart {
      graph [layout = dot,
       rankdir = LR]
      node [fontname = Helvetica, shape = rectangle,]
      tab1 [label = '@@1']
      tab2 [label = '@@2']
      # edge definitions with the node IDs
      tab1 -> tab2;
      
      }

      [1]: 'Infected'
      [2]: 'Recovered'
      ")

1.Evidence synthesis

As a pre-requisite to model building, we need to undertake evidence synthesis exercise to extract required parameters from the literature or by conducting a study.

initial_number_infected <- 1000
initial_number_recovered <- 0 
Duration_of_infectiousness <- 7 
recovery_rate <- 1/Duration_of_infectiousness
follow_up_duration <- 30

2. Create model input using parameters

Once parameters are obtained, they are placed into three major parameter inputs viz initial values for each compartment, transition parameters from one compartment to another, and the time stamps for which modelling is required.

initial_state_values <- c(I = initial_number_infected, 
                          R = initial_number_recovered) 

parameters <- c(gamma = recovery_rate) 
times <- seq(from = 0, to = follow_up_duration, by = 1) 

3. Create a model function using input values

The model is created from the above mentioned parameters by specifying differential equations. In R, a function is an object so that the researcher is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions. For performing differntial equations, we create a function with compartment as well as transition parameter at varied timestamps.

cohort_model <- function(time, state, parameters) {     
   with(as.list(c(state, parameters)), {  
   dI <- -gamma * I
   dR <- gamma * I
   return(list(c(dI, dR)))                             
   })
}

4. Return a dataframe as output using model equations

A data frame is constructed by application of the differential model. Herein, Odinary Differential Equations (ode) function takes initial compartment values, timestamps, parameters and the model function as arguments.

output <- as.data.frame(ode(y = initial_state_values, 
                            times = times, 
                            func = cohort_model,
                            parms = parameters))

5. Exploratory Data Analysis of the output

To understand the output let us first see the dimensions of our dataframe

dim(output)
## [1] 31  3

To look at the first 5 values of the data

head(output,5)

Since, more than one column is representing the state of the person (Infected or recovered), we need to transform the data into tidy format. A tidy data format is said to present when each column represents one variable, each row has one observation and each cell has one value.

dat <- output %>% pivot_longer(cols = c(I,R),
                        names_to = "state",
                        values_to = "value")
head(dat,5)                      

Visualisation is a recommended exploratory data analysis method. Let us see how infected and recovered compartments were changing in the model.

dat %>% ggplot(aes(x = time,
                   y = value,
                   color = state,
                   group = state))+
  geom_line()+
  xlab("Time(Days)")+
  ylab("Number of persons")+
  labs(title = "Changes in Infected and Recovered individuals")

Increasing complexity

To demonstrate increasing complexity, let us now develop a model for a disease wherein three compartments viz susceptible, infected, and recovered are present. There are no births (inflows) or deaths (outflows) in the model. Further, all those who are infected are also infectious and once recovered, immunity is long lasting. The basic assumptions of compartment models such as homogenity among others are also applicable.

Visualisation of the model

grViz("digraph flowchart {
      graph [layout = dot,
       rankdir = LR]
      node [fontname = Helvetica, shape = rectangle,]
      tab1 [label = '@@1']
      tab2 [label = '@@2']
      tab3 [label = '@@3'] 
      # edge definitions with the node IDs
      tab1 -> tab2;
      tab2 -> tab3;
      }

      [1]: 'Susceptible'
      [2]: 'Infected'
      [3]: 'Recovered'
      ")

Adding changes in the IR model to develop SIR model

## 1. Evidence synthesis
initial_number_susceptible <- 10000000 # Adding susceptibles to IR model
initial_number_infected <- 1 # Considering one infected at start time
initial_number_recovered <- 0 
Duration_of_infectiousness <- 7 
recovery_rate <- 1/7
Reproduction_number <- 2.28 #R0== beta/gamma,
infection_rate <- Reproduction_number*(1/Duration_of_infectiousness) #beta
follow_up_duration <- 300
## 2. Create model input using parameters
initial_state_values <- c(S = initial_number_susceptible, # Adding susceptibles
                          I = initial_number_infected, 
                          R = initial_number_recovered) 

parameters <- c(beta = infection_rate, # Adding beta parameter 
                gamma = recovery_rate) 

times <- seq(from = 0, to = follow_up_duration, by = 1) 

## 3. Create a model using input values

cohort_model <- function(time, state, parameters) {     
   with(as.list(c(state, parameters)), { 
     N <- S+I+R # Total population
     lambda <- beta * I/N #Calculating lambda as function of beta
     dS <- -lambda * S               
     dI <- lambda * S - gamma * I    
     dR <- gamma * I   
   return(list(c(dS, dI, dR)))                             
   })
}
## 4. Return a dataframe as output using model equations
output <- as.data.frame(ode(y = initial_state_values, 
                            times = times, 
                            func = cohort_model,
                            parms = parameters))
## 5. Exploratory Data Analysis (Visualisation) of the output
dat <- output %>% pivot_longer(cols = c(S,I,R),
                        names_to = "state",
                        values_to = "value")
dat %>% ggplot(aes(x = time,
                   y = value,
                   color = state,
                   group = state))+
  geom_line()+
  xlab("Time(Days)")+
  ylab("Number of persons")+
  labs(title = "Changes in Suceptible, Infected and Recovered")

Increasing complexities: Addition of interventions.

Let us now assume that we have a vaccine with 100% effectiveness against aquiring infection. Will the epidemic be averted by vaccinating 10% population? What other changes do you expect in the epidemic?

## 1. Evidence synthesis
prop_vaccinated <- 0.1
initial_number_susceptible <- (1-prop_vaccinated)*10000000 
initial_number_infected <- 1 
initial_number_recovered <- prop_vaccinated*10000000
Duration_of_infectiousness <- 7 
recovery_rate <- 1/7
Reproduction_number <- 2.28 #R0== beta/gamma,
infection_rate <- Reproduction_number*(1/Duration_of_infectiousness) #beta
follow_up_duration <- 300
## 2. Create model input using parameters
initial_state_values <- c(S = initial_number_susceptible, 
                          I = initial_number_infected, 
                          R = initial_number_recovered) 

parameters <- c(beta = infection_rate,  
                gamma = recovery_rate) 

times <- seq(from = 0, to = follow_up_duration, by = 1) 

## 3. Create a model using input values

cohort_model <- function(time, state, parameters) {     
   with(as.list(c(state, parameters)), { 
     N <- S+I+R # Total population
     lambda <- beta * I/N #Calculating lambda as function of beta
     dS <- -lambda * S               
     dI <- lambda * S - gamma * I    
     dR <- gamma * I   
   return(list(c(dS, dI, dR)))                             
   })
}
## 4. Return a dataframe as output using model equations
output <- as.data.frame(ode(y = initial_state_values, 
                            times = times, 
                            func = cohort_model,
                            parms = parameters))
## 5. Exploratory Data Analysis (Visualisation) of the output
dat <- output %>% pivot_longer(cols = c(S,I,R),
                        names_to = "state",
                        values_to = "value")
dat %>% ggplot(aes(x = time,
                   y = value,
                   color = state,
                   group = state))+
  geom_line()+
  xlab("Time(Days)")+
  ylab("Number of persons")+
  labs(title = "Changes in Suceptible, Infected and Recovered at 10% vaccine coverage")

Will the epidemic be averted by increasing vaccination coverage? What is the critical vaccination threshold?

Critical Vaccination Threshold is defined as minimum percentage of population which needs to be vaccinated to prevent occurence of epidemic. Going manually by changing the vaccination coverage is a start point in understanding the concept.

Modelling for 20% coverage:

## 1. Evidence synthesis
prop_vaccinated <- 0.2
initial_number_susceptible <- (1-prop_vaccinated)*10000000 
initial_number_infected <- 1 
initial_number_recovered <- prop_vaccinated*10000000
Duration_of_infectiousness <- 7 
recovery_rate <- 1/7
Reproduction_number <- 2.28 #R0== beta/gamma,
infection_rate <- Reproduction_number*(1/Duration_of_infectiousness) #beta
follow_up_duration <- 300
## 2. Create model input using parameters
initial_state_values <- c(S = initial_number_susceptible, # Adding susceptibles
                          I = initial_number_infected, 
                          R = initial_number_recovered) 

parameters <- c(beta = infection_rate, # Adding beta parameter 
                gamma = recovery_rate) 

times <- seq(from = 0, to = follow_up_duration, by = 1) 

## 3. Create a model using input values

cohort_model <- function(time, state, parameters) {     
   with(as.list(c(state, parameters)), { 
     N <- S+I+R # Total population
     lambda <- beta * I/N #Calculating lambda as function of beta
     dS <- -lambda * S               
     dI <- lambda * S - gamma * I    
     dR <- gamma * I   
   return(list(c(dS, dI, dR)))                             
   })
}
## 4. Return a dataframe as output using model equations
output <- as.data.frame(ode(y = initial_state_values, 
                            times = times, 
                            func = cohort_model,
                            parms = parameters))
## 5. Exploratory Data Analysis (Visualisation) of the output
dat <- output %>% pivot_longer(cols = c(S,I,R),
                        names_to = "state",
                        values_to = "value")
dat %>% ggplot(aes(x = time,
                   y = value,
                   color = state,
                   group = state))+
  geom_line()+
  xlab("Time(Days)")+
  ylab("Number of persons")+
  labs(title = "Changes in Suceptible, Infected and Recovered at 20% vaccine coverage")

Conclusion.

To conclude, the complexity of the models will increase further as more parameters and compartments are added. Incorporating age structure will introduce compartments and processes for each age group. Contact rates between age groups and gender may vary calling for further compartmentalisation.Vector borne diseases shall include separate but interacting compartments between and within host and vectors. However, the basic priciples/ framework shall remain the same. Also, from the existing data, caliberation of parameters such as reproduction number can be carried out using Maximum Log-Likelihood and other methods. Do we never have outbreaks when reproduction number is less than one (think stochastically)..and the journey continues.. Enjoy modelling! Thank You!!