The aim of this tutorial is to introduce individual-based modelling for infectious diseases in R. The dynamics of the stochastic model we target here cannot be captured by ordinary differential equations (ODES). In this session, we will br to:
Pro tip! The best thing to do is to read each section and type (or copy and paste) the R commands (in grey boxes) into your own R session to check that you understand the code. Hands-on questions are indicated in red
The first step when creating a new R file is to specify a working
directory. You need to tell the R session where you want it to work. To
do this you use the command to “set the working directory” or
setwd(). You can label this working directory as
‘home’ , and have other locations stored for plotting or for
getting data. This location should be an existing folder on your
computer. For example
home <- "/Users/Elanda/R/ModellingSIR" ## on a OS X
setwd(home)
Random numbers in R (and other computer programs) are
derived from a random number generator, which can be
initiated with set.seed. Seeding the random stream is very
useful to obtain stochastic results that are reproducible. As
such, you can check whether model output has changed (or not) if you
continue programming.Random.seed is an integer vector,
containing the random number generator (RNG) state for random number
generation in R. It can be saved and restored, but should
not be altered by the user.RNGkind is a more friendly
interface to query or set the kind of RNG in
use.RNGversion can be used to set the random generators as
they were in an earlier R version (for
reproducibility).set.seed is the recommended way to specify
seeds. For example, set the random seed to ‘2020’ and sample 3 numbers
between 1 and 10, and sample 15 numbers with replacement.
set.seed(2020)
sample(x = 1:10,size = 3)
## [1] 7 6 8
sample(x = 1:10,size = 15,replace = T)
## [1] 1 1 4 10 6 1 8 8 10 2 6 2 3 2 8
If we repeat the same operation withou without the
set.seed we obtain different random numbers.
sample(x = 1:10,size = 3)
## [1] 8 4 2
sample(x = 1:10,size = 15,replace = T)
## [1] 7 4 2 4 5 4 6 2 10 6 8 2 1 6 3
In R, we can generate random numbers using for example
sample, runif,
rnorm,rbinom. Please have a look at:
?sample
?runif
?rnorm
?rbinom
?set.seed
The samplefunction takes a sample of the specified size
from the elements of x using either with or without replacement.
Example 1 Random Permutation:
sample(150:-50, replace = FALSE) Generates numbers between
the range specified.
Example 2 Bootstrap re-sampling: ( only length >1)
sample(x, replace = TRUE)
Example 3 1000 Bernoulli trials
sample(c(0,1), 1000, replace = TRUE)
The runif function for uniform distributions provide
information about the uniform distribution on the interval from
min to max. dunif gives the
density, punif gives the distribution function
qunif gives the quantile function and runif
generates random deviates.
Example 1
dunif(x, min = 0, max = 1, log = FALSE)
Example 2
punif(q, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
Example 3
qunif(p, min = 0, max = 1, lower.tail = TRUE, log.p = FALSE)
Example 4 runif(n, min = 0, max = 1)
The ?rnorm function generates the; Density, distribution
function, quantile function and random generation for the normal
distribution with mean equal to mean and standard deviation
equal to sd.The following: dnorm gives the
density, pnorm gives the distribution function,
qnorm gives the quantile function, and rnorm
generates random deviates. The length of the result is determined by
nfor rnorm, and is the maximum of the lengths
of the numerical arguments for the other functions. The numerical
arguments other than n are recycled to the length of the
result. Only the first elements of the logical arguments are used.For
sd = 0 this gives the limit as sd decreases to
0, a point mass at mu. sd < 0
is an error and returns NaN.
Example 1 :
dnorm(x, mean = 0, sd = 1, log = FALSE)
Example 2 :
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
Example 3 :
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
Example 4 : rnorm(n, mean = 0, sd = 1)
The rbinom function generates the Density, distribution
function, quantile function and random generation for the binomial
distribution with parameters size and prob. This is
conventionally interpreted as the number of ‘successes’ in size
trials.
Example 1 dbinom(x, size, prob, log = FALSE)
Example 2 pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
Example 3 qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
Example 4 rbinom(n, size, prob)
It is also possible to sample from a binomial distribution for different individuals at once.
# set number of samples, a probability vector and the number of trials to execute
num_samples <- 4
p_vector <- c(0,0.5,0.9,1)
num_trials <- 1
# sample from the binomial distribution, for each given probability
binom_sample <- rbinom(num_samples, num_trials, p_vector)
# if the sample is 1, we have a success
binom_sample
## [1] 0 1 0 1
binom_sample == 1
## [1] FALSE TRUE FALSE TRUE
For this modelling exercise; we will use a matrix to
store the population using one row per individual and one column per
person characteristic. We will use a special matrix format, called
data.frame, in which you can easily access columns using
the column names. The following code illustrates the usefulness of the
data.frame and how to access and modify elements.
Now, let us create a data.frame for a population of
300 individuals with a health state together with a
x- and y coordinate. During the setup, we
randomly locate each person in a 10x10 area.
pop_size <- 300 # Population size
area_size <- 10 # 10x10 square
xy_coordinates <- seq(0,area_size,0.01) # get sequence per 0.01
# setup population: one row per individual, one column per attribute, row index = id
pop_data <- data.frame(health = rep('S', length=pop_size), # all individuals start in state 'S' (= susceptible)
x_coord = sample(x = xy_coordinates, size = pop_size,replace = T), # sample random x coordinate
y_coord = sample(x = xy_coordinates, size = pop_size,replace = T), # sample random y coordinate
stringsAsFactors = FALSE) # option to treat characters as 'strings' instead of 'factors'
dim(pop_data) # check the dimensions of the dataframe.
## [1] 300 3
head(pop_data) # show the first few rows
## health x_coord y_coord
## 1 S 9.40 0.37
## 2 S 6.82 0.26
## 3 S 7.28 6.86
## 4 S 3.24 5.22
## 5 S 6.18 2.23
## 6 S 8.25 3.85
We now want to simulate a random walk process with a
given maximum velocity. The direction and distance should be randomized.
As such, we set a maximum distance and create a function to apply the
random walk during one time step. Random walk, in probability theory, a
process for determining the probable location of a point subject to
random motions, given the probabilities (the same at each step) of
moving some distance in some direction.
Random walks are an example of Markov processes, in
which future behaviour is independent of past history. It is sequence of
possibly dependent random variables
(x1, x2, x3,...)—identified by increasing values of a
parameter, commonly time—with the property that any prediction of the
next value of the sequence (xn), knowing the preceding
states (x1, x2, …, xn − 1), may be based on the last state
(xn − 1) alone. That is, the future value of such a
variable is independent of its past history.
# set max velocity = maximum distance per time step
max_velocity <- 1 # squares per day
random_walk <- function(pop_data, max_velocity){
# step 1: move each person at random along the x and y axis
pop_data$x_coord <- pop_data$x_coord + runif(n = pop_size,min = -max_velocity, max = max_velocity)
pop_data$y_coord <- pop_data$y_coord + runif(n = pop_size,min = -max_velocity, max = max_velocity)
# step 2: if an individual crosses a model boundary: set back at the border
pop_data$x_coord[pop_data$x_coord > area_size] <- area_size
pop_data$y_coord[pop_data$y_coord > area_size] <- area_size
pop_data$x_coord[pop_data$x_coord < 0] <- 0
pop_data$y_coord[pop_data$y_coord < 0] <- 0
# return the adjusted population data.frame
return(pop_data)
}
If you want to visualize the random walk, you can create a plot before and after applying the function. Please note that it is possible to highlight a one individual using points:
# plot the area and all indivdiuals, using color 1 (= black)
plot(x = pop_data$x_coord,
y = pop_data$y_coord,
col = 1)
# highlight one individual, using color 2 (= red)
points(x = pop_data$x_coord[1],
y = pop_data$y_coord[1],
pch = 20, # other symbol
col = 4) # The original code was point() but its giving a dev.off() error
# highlight the selected individual again, using color 3 (= green)
# plot(x = pop_data$x_coord[1],
#y = pop_data$y_coord[1],
#pch = 20, # other symbol
#col = 3)
# update population
pop_data <- random_walk(pop_data,max_velocity)
head(pop_data)
## health x_coord y_coord
## 1 S 8.987709 0.0000000
## 2 S 6.149457 0.2199252
## 3 S 7.584245 6.3804491
## 4 S 2.912578 4.3621126
## 5 S 6.326365 3.1230200
## 6 S 7.863375 3.7542047
Start with the introduction of infected cases into the population (seeding):
# set number of infected cases to start (or seed) the simulation
num_infected_seeds <- 10
# or select all individuals from row 1 till the specified number of cases.
pop_data$health[1:num_infected_seeds] <- 'I'
# it is also possible to sample random individuals to change their health state
# pop_data$health[sample(1:pop_size,size = num_infected_seeds)] <- 'I'
We can visualize the infected individuals like the following plot? (note: using pch=20 gives you full dots)
plot(x = pop_data$x_coord,
y = pop_data$y_coord,
col = 4,)
points(x = pop_data$x_coord[pop_data$health=="I"],
y = pop_data$y_coord[pop_data$health=="I"],
col = 3,
pch=20)
legend('topright',
c('Infected',
'Susceptible'),
col = c(3,4),
pch = c(20,20))
### Transmission dynamics
To specify a transmission event upon contact, we define the transmission probability when two individuals meet and perform a Bernoulli experiment for each social contact between a infected and susceptible individual.
# disease parameters
transmission_prob <- 0.4 # transmission probability per social contact
# identify all infected individuals
boolean_infected <- pop_data$health == 'I' # = vector with Booleans (TRUE/FALSE)
ind_infected <- which(boolean_infected) # = vector with row indices
# get indices of infected invididuals
ind_infected
## [1] 1 2 3 4 5 6 7 8 9 10
num_infected <- length(ind_infected) # = single value: count data
num_infected
## [1] 10
# we specified the eligible contacts above
# eligible_contacts_matrix
# evaluate the contacts and transmission events for person 1, starting from the contact probability
p_contact_vector <- get_contact_probability(eligible_contacts_matrix,1)
# exclude contacts with non-susceptible individuals
p_contact_vector[pop_data$health != 'S'] <- 0
# sample from binomial distribution, given the contact and transmission probability
binom_infection_vector <- rbinom(pop_size, size = 1, prob = p_contact_vector * transmission_prob)
# update population matrix with new infections, if binomial sample == 1
pop_data$health[binom_infection_vector == 1] <- 'I'
# View(pop_data)
We can now visualize the newly infected individuals. (note: you might have different results since we are working with stochastic processes)
Each time step, we have to account for infected individual that recover. We can handle this using a recovery probability, based on an average number of days infected, and binomial sample.
# set disease parameter: from days infected to recovery rate and a probability
num_days_infected <- 3
recovery_rate <- 1/num_days_infected
recovery_probability <- 1-exp(-recovery_rate)
# we identified the infected individuals above
##boolean_infected <- pop_data$health == 'I'
# get vector with a probability to recover for each individual
#(if not infected, the Boolean is FALSE (=0), the probability is 0)
p_recover_vect <- boolean_infected * recovery_probability
# sample individuals that recover
binom_recover_vector <- rbinom(pop_size, size = 1, prob = p_recover_vect)
# update population data
pop_data$health[binom_recover_vector == 1] <- 'R'
We can now visualize the final health states.
plot(x = pop_data$x_coord[pop_data$health=="S"],
y = pop_data$y_coord[pop_data$health=="S"],
col = 2,)
points(x = pop_data$x_coord[pop_data$health=="I"],
y = pop_data$y_coord[pop_data$health=="I"],
col = 3,
pch=20)
points(x = pop_data$x_coord[pop_data$health=="R"],
y = pop_data$y_coord[pop_data$health=="R"],
col = 4,
pch=20)
legend('topright',
c('Susceptible','Infected','Recovered'),
col = c(2,3,4),
pch = c(20,20,20))
#### Step 4 Add all parts into a loop…
We coded the population, random movement, social contacts, transmission and recovery. Now it is time to combine all these elements into a loop to iterate over different days.
# use iterator 'day_i' to process day 1,2,3 and 4
for(day_i in 1:4){
# print the day index
print(day_i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
# this is the same as
all_days <- 1:10
for(day_i in all_days){
# print the day index
print(day_i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
So, specify the number of days and run over each day using a
for-loop, and compile the full transmission model. To keep track of the
health states over time, you can use a matrix (population size x number
of days) and store pop_data$health after each time
step.
# set number of days
num_days <- 20
# create matrix to log the population health states over time
pop_data_log <- matrix(NA,nrow=pop_size,ncol=num_days)
for(day_i in 1:num_days){
# update location
pop_data <- random_walk(pop_data,max_velocity)
# identify all infected individuals
boolean_infected <- pop_data$health == 'I' # = vector with Booleans TRUE/FALSE
ind_infected <- which(boolean_infected) # = vector with indices
num_infected <- length(ind_infected) # = single value: count data
# evaluate the transmission events for each infected individual
for(infected_i in ind_infected){
p_contact_vector <- get_contact_probability(eligible_contacts_matrix,infected_i)
# exclude contacts with non-susceptible individuals
p_contact_vector[pop_data$health != 'S'] <- 0
# sample from binomial distribution, given the contact and transmission probability
binom_infection_vector <- rbinom(pop_size, size = 1, prob = p_contact_vector * transmission_prob)
# update population matrix with new infections, if binomial sample == 1
pop_data$health[binom_infection_vector == 1] <- 'I'
}
# sample individuals that recover (from the originally infected individuals only!)
p_recover_vect <- boolean_infected * recovery_probability
binom_recover_vector <- rbinom(pop_size, size = 1, prob = p_recover_vect)
pop_data$health[binom_recover_vector == 1] <- 'R'
# log health states
pop_data_log[,day_i] <- pop_data$health
}
Plot the health states over time:
# count the number of S, I or R health states per column (= per time step)
num_susceptible_time <- colSums(pop_data_log == 'S')
num_infected_time <- colSums(pop_data_log == 'I')
num_recovered_time <- colSums(pop_data_log == 'R')
# plot over time
plot(num_susceptible_time,
col=1,lwd=2,type='l',
ylim=c(0,pop_size),
xlab='Time',
ylab='Individuals')
lines(num_infected_time,col=2,lwd=2)
lines(num_recovered_time,col=3,lwd=2)
legend('top',
c('S','I','R'),
col=1:3,
ncol=3,
lwd=2)
### Extending the SIR model
If you have time, you can now continue to build more complex versions of the SIR model to match the infectious disease you are interested in. For example, you could include a latent (or “Exposed”) stage or extend the model to include waning immunity, vaccination, maternal immunity, etc.
Social Contacts (Based on proximity)
We assume that susceptible individuals can acquire an infection if they are close to an infected individual. We define proximity here with a maximum distance in terms of
max_contact_distance. Probability for a social contact, depends on the number people that are within the specified range and the population-based average number of social contacts individuals have per time step (=per day).Let’s visualize this:
TRUEas1andFALSEas0.Does the
contact_probabilityfor individual 1 corresponds with the number of potential contacts in the plot above? What is the contact probability with each of them?