Using R to Simulate a Pandemic

Since the outbreak of the COVID-19 pandemic policy makers and health experts have repeatedly stressed the importance of social distancing. This article in the Washington Post has done so using a series of simulations. In this post, you will learn how can run similar simulations using nothing but base R. In the animation below you can see how these simulations work. Each of the colored dots represents a person that walks around in a two-dimensional space. Different colors indicate whether a person is healthy (gray), sick (red), immune (blue) or dead (black). Sick people can infect healthy people, if they get close to them. After a certain time, sick people will recover and become immune to the disease. However, each period some of the people that have been infected by the disease will die. The question now is, how will the disease spread and how many people are going to die?

At first everybody is healthy

In our simulations people will move around in a two-dimensional space and they will either be healthy, infected, immune or dead. Thus, the population can be described by a matrix of x and y coordinates and factor variable indicating the health status of each person. Moreover, in case someone is currently sick, we need to record how long it takes until this person recovers. We will assume that until they recover sick people can always infect more people. Here is a function that simulates \(n\) many healthy people in a space between (-1,-1) and (1,1):

# function to simulate people
simPeople <- function(n) {
  locs <- matrix(runif(2*n,-1,1), ncol=2)
  status <- rep("healthy",n) # patient status
  t_infec <- rep(NA,n)       # time until immunity
  return(list(locs=locs, status=status, t_infec=t_infec))
}

# example
simPeople(5)

## $locs
##            [,1]        [,2]
## [1,]  0.5061095  0.06445420
## [2,]  0.8456313 -0.86169444
## [3,] -0.7471830  0.04385693
## [4,]  0.5643100 -0.89288196
## [5,]  0.3543725  0.29459073
## 
## $status
## [1] "healthy" "healthy" "healthy" "healthy" "healthy"
## 
## $t_infec
## [1] NA NA NA NA NA

Now, we need a function to visualize this population. This can be done using base R’s plotfunction. We will use our own little color palette to indicate the health status of the people in our population: gray for healthy, red for currently infected, blue for immune, and black for dead.

# function to plot people
plotPeople <- function(people, record, plotLines=T) {
  
  # determine colors
  palette <- c("grey","red","deepskyblue","black")
  colors <- rep(palette[1],length(people$status))
  colors[which(people$status == "infected")] <- palette[2]
  colors[which(people$status == "immune")] <- palette[3]
  colors[which(people$status == "dead")] <- palette[4]
  
  # plot movement of people
  par(mar=c(1,1,1,1))
  plot(people$locs, bg=colors, pch=21, cex=1.5,
       ylim=c(-1,1), xlim=c(-1,1),
       yaxt="n", xaxt="n", 
       ylab="", xlab="")
}

# example
plotPeople(simPeople(100))

People move around

Of course people don’t stand still. We need to have a function that changes people’s location. When moving around, we will assume that a person’s path follows a random walk. In each period, their x and y coordinates will change by an amount \(\varepsilon\) which is drawn from a normal distribution with mean zero and variance \(\delta\). In our later simulations \(\delta\) will reflect how strongly people practice social distancing.

# function to move people
move <- function(people, delta) {
  
  # update location
  eps <- matrix(rnorm(2*nrow(people$locs),0,delta), ncol=2)
  eps[people$status == "dead",] <- c(0,0)
  people$locs <- people$locs + eps
  
  # keep people inside the square-shaped map
  people$locs[,1] <- people$locs[,1]+ifelse(people$locs[,1]<=-1,.1,0)
  people$locs[,2] <- people$locs[,2]+ifelse(people$locs[,2]<=-1,.1,0)
  people$locs[,1] <- people$locs[,1]+ifelse(people$locs[,1]>=1,-.1,0)
  people$locs[,2] <- people$locs[,2]+ifelse(people$locs[,2]>=1,-.1,0)
  
  # return
  return(people)
}

people <- simPeople(n=200)
for(i in 1:100) {
  plotPeople(people)
  people <- move(people, delta=.02)
  Sys.sleep(.05) # short pause
}

Patient zero

Now that everybody is happily moving around, it is time for us to introduce patient zero. We will assume that it takes 20 time periods until a patient recovers, i.e. is he or she is no longer infectious and instead immune. As people had been generated randomly, we will simply select the first person in our list as patient zero.

# patient zero
people$status[1] <- "infected" 
people$t_infec[1] <- 20

But how does the disease spread? Let’s assume that once a healthy person gets within a certain range of an infected person, the healthy person contracts the disease. Given a critical distance called “infecDist”, we can use a nested for-loop and the dist function to check whether any healthy person is close enough to any sick person to contract the disease. Moreover, we need to keep track of how much longer it takes until an infected person recovers and becomes immune. We do that by reducing “t_infec” by one unit in each period. Once “t_infec” reaches zero, the infected person recovers and becomes immune. Here is a function that implements this procedure:

# function to update current infections and trigger new infections
infect <- function(people, infecDist, recovTime) {
  
  # update status
  people$t_infec <- people$t_infec - 1
  people$status[people$t_infec == 0] <- "immune"
  people$t_infec[people$status == "immune"] <- NA
  
  # infect new people
  for(i in which(people$status == "infected")) {
    for(j in which(people$status == "healthy")) {
      if(dist(people$locs[c(i,j),]) < infecDist) {
        people$status[j] <- "infected"
        people$t_infec[j] <- recovTime
      }
    }
  }
  
  # return
  return(people)
}

Next, we need a function that determines for whom the disease ends deadly. We’ll assume that in each period that a person is sick, he or she dies with a certain probability. To avoid another for-loop we can use two functions. The first determines if a single infected person survives or not, the second function then simultaneously applies the first function to all infected people in the population.

# function to determine whether an individual patient survives
survive <- function(patient, p) sample(c(T,F),1,prob=c(1-p, p)) 

# function that determines who of the people in the population die
kill <- function(people, deathProb) {
  infected <- which(people$status == "infected")
  deaths <- which(!sapply(infected,survive,p=deathProb))
  deaths <- infected[deaths]
  people$status[deaths] <- "dead"
  people$t_infec[deaths] <- NA
  return(people)
}

# example
people <- simPeople(n=20)
people$status <- rep("infected",20)  # all people are infected
kill(people, deathProb = .5)$status  # chance of death = 50%

##  [1] "dead"     "dead"     "infected" "dead"     "dead"     "infected"
##  [7] "dead"     "dead"     "dead"     "infected" "dead"     "dead"    
## [13] "dead"     "infected" "infected" "infected" "dead"     "dead"    
## [19] "dead"     "infected"

Simulating our first pandemic

Now, we have the necessary functions to create people, to have them move around, infect each other and potentially die. We are ready to simulate our first pandemic.

# create population with patient zero
people <- simPeople(n=200)
people$status[1] <- "infected" 
people$t_infec[1] <- 20

# simulate pandemic
while(T) {
  plotPeople(people)
  Sys.sleep(0.05)
  if(!any(people$status == "infected")) break()
  people <- move(people, delta=0.05)
  people <- infect(people, infecDist=0.1, recovTime=20)
  people <- kill(people, deathProb=0.05)
}

Keeping track

So far we produced a bunch of images of dots moving around and changing their color. However, to actually understand what turns an infectious disease into a deadly pandemic, we need to keep track of how many people are currently infected, how many have died and how many have reached immunity. To do this, we’ll add a second plot next to the moving dots diagram. In this chart we will visualize the state of the population over time. We store the necessary information in a four column matrix called “record”, which we update using a simple function. Then we add a second diagram to our existing plot function which visualizes the record matrix.

# matrix to record the population's status
record <- matrix(NA,ncol=4,nrow=0)
colnames(record) <- c("healthy","infected","immune","dead")

# function to update the record
updateRecord <- function(people, record) {
  record <- rbind(record, rep(NA,4))
  record[nrow(record),1] <- length(which(people$status=="healthy"))
  record[nrow(record),2] <- length(which(people$status=="infected"))
  record[nrow(record),3] <- length(which(people$status=="immune"))
  record[nrow(record),4] <- length(which(people$status=="dead"))
  return(record)
}

# updated version of our plot function
plotPeople <- function(people, record) {
  
  par(mfrow=c(1,2))
  
  # determine colors
  palette <- c("grey","red","deepskyblue","black")
  colors <- rep(palette[1],length(people$status))
  colors[which(people$status == "infected")] <- palette[2]
  colors[which(people$status == "immune")] <- palette[3]
  colors[which(people$status == "dead")] <- palette[4]
  
  # plot movement of people
  plot(people$locs, bg=colors, pch=21,
       ylim=c(-1,1), xlim=c(-1,1),
       yaxt="n", xaxt="n", 
       ylab="", xlab="")
  
  # plot record (cumulative numbers)
  plot(record[,1], type="l", col=palette[1],
       ylim=c(0,length(people$status)),
       ylab="", xlab="")
  lines(record[,2], col=palette[2])
  lines(record[,3], col=palette[3])
  lines(record[,4], col=palette[4])
  legend("topright", cex=.75,
         legend = colnames(record), 
         lty=rep(1,4), col=palette)
  
  par(mfrow=c(1,1)) # back to default
  
}

An all-in-one function

Lastly, we wrap a single function around all our code. We then only need to call this one function to simulate the entire pandemic.

# function to run an entire simulation
runSim <- function(n_people  = 300, 
                   movement  = .05,
                   infecDist = .05,
                   deathProb = .02,
                   recovTime =  20
                   ) {
  
  # setup
  people <- simPeople(n=n_people)
  record <- matrix(NA,ncol=4,nrow=0)
  colnames(record) <- c("healthy","infected","immune","dead")
  
  # patient zero
  people$status[1] <- "infected" 
  people$t_infec[1] <- recovTime
  
  # run simulation
  while(T) {
    
    # update record and plot
    record <- updateRecord(people, record)
    plotPeople(people, record)
    Sys.sleep(.05)
    if(!any(people$status == "infected")) break()
    
    # move people, infect new people, kill infected people
    people <- move(people, movement)
    people <- infect(people, infecDist, recovTime)
    people <- kill(people, deathProb)
    
  }
  
  # return
  return(record[nrow(record),])
  
}

# demonstration
runSim()

Comparing different diseases

With these tools at hand we can now simulate different diseases and study the resulting epidemics. Here are a couple of examples:

# rapidly spreading disease
runSim(infecDist = .5)

# strong social distancing
runSim(movement = .02)

# high probability of death
runSim(deathProb = .1)

# long recovery time
runSim(recovTime = 50)