1 introduction

*In this etivity, you will code your first simple model and use it to answer a research question. To get used to the format for models in R, we’ll start by giving you some parts of the code, and you need to fill in the missing numbers or code blocks by replacing the #YOUR CODE# placeholders. At the end of each etivity, we’ll provide a solution file that contains the full code, annotated with comments. Make sure you understand each line, and soon you’ll be able to write your own model from scratch!

Your task is to find out how long it takes for a cohort of infected people to recover. As you saw in the video, to answer this question you need to keep track of 2 populations: those that are infected (compartment 𝐼 ), and those that have recovered (compartment 𝑅 ). Infected people recover at a rate 𝛾 (gamma). The differential equations describing this are:*

Load the packages which you need for this etivity, by running the following cell:

2 Run this cell as it is

library(deSolve)   # package to solve the model
library(reshape2)  # package to change the shape of the model output
library(ggplot2)   # package for plotting

To start, it is useful to code what we know about the situation we want to model. We are looking at a cohort of 10^6 currently infected people, and no one has recovered so far. The average duration of infection is 10 days. The question we want to answer is how many people will recover from the infection over a 4-week period.

Given this data, fill in the correct values to the following variables, and run the cell:

initial_number_infected <- 1000000       # the initial infected population 
                                         # size
initial_number_recovered <- 0  # the initial number of people in "0"
                                         # the recovered state
recovery_rate <- 0.1            # the rate of recovery gamma, 
                                         # in units of days^-1
follow_up_duration <- 28      # the duration to run the model for, 
                                         # in units of days
  
# Hint: the units of the recovery rate and the follow-up duration should be
# consistent.

Now, we combine this data into objects that are recognised by the deSolve package as model input. To do this, again run the code below.

# The initial state values are stored as a vector 
# and each value is assigned a name.
initial_state_values <- c(I = initial_number_infected, 
                          R = initial_number_recovered)

# Parameters are also stored as a vector with assigned names and values. 
parameters <- c(gamma = recovery_rate)  
# In this case we only have one parameter, gamma.

Think about: what kind of information is stored in the “initial_state_values” and parameters vectors?

Additionally, we need to specify the time we want the model to run for. This depends on the question we want to answer. In the cell below, the duration you specified earlier is automatically filled in when you run it.

# The times vector creates a sequence of timepoints at which we want to 
# calculate the number of people in the I and R compartment.
times <- seq(from = 0, to = follow_up_duration, by = 1) 

Think about: what kind of information is stored in the times vector?

Check your answers by having a look at each of these vectors to familiarise yourself with the structure: in the following code cell we have typed the object names, so you just need to press “Run” to see what each of them contains.

initial_state_values
##     I     R 
## 1e+06 0e+00
parameters
## gamma 
##   0.1
times
##  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
## [26] 25 26 27 28

The next step is specifying the model. Using the example code in the introductory document, complete the following model function with the differential equations above.

cohort_model <- function(time, state, parameters) { 
  
    with(as.list(c(state, parameters)), {
      
      dI <- -gamma*I
      dR <- gamma*I
      
      return(list(c(dI, dR)))
    })
  
}

Now all there’s left to do is solving this set of equations using the deSolve package. Fill in the following command, which calculates and stores the number of infected and recovered people at each timestep in the output dataframe. Don’t forget to run it!

# Hint: if you can't remember what those arguments correspond to,
# just look up the ode help file:
?ode
## starting httpd help server ... done
output <- as.data.frame(ode(y = initial_state_values,  times = times,   func = cohort_model,  parms =  parameters))

Printing the model output returns a dataframe with columns time (containing our times vector), I (containing the number of infected people at each timestep) and R (containing the number of recovered people at each timestep):

output
##    time          I         R
## 1     0 1000000.00      0.00
## 2     1  904837.42  95162.58
## 3     2  818730.75 181269.25
## 4     3  740818.22 259181.78
## 5     4  670320.04 329679.96
## 6     5  606530.66 393469.34
## 7     6  548811.63 451188.37
## 8     7  496585.30 503414.70
## 9     8  449328.96 550671.04
## 10    9  406569.66 593430.34
## 11   10  367879.44 632120.56
## 12   11  332871.08 667128.92
## 13   12  301194.21 698805.79
## 14   13  272531.79 727468.21
## 15   14  246596.96 753403.04
## 16   15  223130.16 776869.84
## 17   16  201896.51 798103.49
## 18   17  182683.52 817316.48
## 19   18  165298.88 834701.12
## 20   19  149568.62 850431.38
## 21   20  135335.28 864664.72
## 22   21  122456.42 877543.58
## 23   22  110803.15 889196.85
## 24   23  100258.84 899741.16
## 25   24   90717.95 909282.05
## 26   25   82085.00 917915.00
## 27   26   74273.58 925726.42
## 28   27   67205.51 932794.49
## 29   28   60810.06 939189.94

3 Question: Based on the output, how many people have recovered after 4 weeks? What proportion of the total population does this correspond to?

Now, plot your model output in the following cell, with time on the x axis and the number of infected and recovered people on the y axis. You can use the introductory document for help with this.

3.1 base on the output, how many people have recovered afetr 4 weeks

output[output$time == 28, c("time","R")]   
##    time        R
## 29   28 939189.9
# printing the "time" and "R" output columns when the timestep equals 28
# Answer: 939189.9 people are in the recovered compartment at timestep 28

3.2 What proportion of the total population does this correspond to?

output[output$time == 28,"R"]/(output[output$time == 28,"I"]+
 output[output$time == 28,"R"]) # note this is only on a separate line 
## [1] 0.9391899
                                # for formatting reasons!

# Answer: 93.9%

# All we are doing here is dividing the number of recovered people at the
# 4 week timestep by the total population size (I + R) at this timestep.
# Note that since there are no births and deaths in our model, 
# the total population size is the same at each timestep, so we could 
# also calculate it by taking sum(initial_state_values).

3.3 Plotting the output

# First turn the output dataset into a long format, 
# so that the number in each compartment at each timestep
# are all in the same column
output_long <- melt(as.data.frame(output), id = "time")                  

# Plot the number of people in each compartment over time
ggplot(data = output_long,          # specify object containing data to plot
       aes(x = time, y = value, colour = variable, group = variable)) +          # assign columns to axes and groups
  geom_line() +                     # represent data as lines
  xlab("Time (days)")+              # add label for x axis
  ylab("Number of people") +        # add label for y axis
  labs(title = paste("Number infected and recovered over time when gamma =", parameters["gamma"],"days^-1")) # add title

#Using the #paste command, we can combine sentences with the 
#values stored in variables (here gamma)

3.4 Based on the plot, at what timepoint were infected and recovered individuals equal in number?

# Answer: around 7 days - this where the I and R lines intersect.
# You can confirm that by looking at the output table:
# the number in the I and R compartment are very similar at timestep 7.
output[output$time == 7,]
##   time        I        R
## 8    7 496585.3 503414.7

4 Question: Based on the plot, at what timepoint were infected and recovered individuals equal in number?

For the last part of the etivity, try varying 𝛾 to see how it affects the output. For example, in the cell below change gamma ( 𝛾 ) to correspond to an average infectious period of: a) 2 days b) 20 days.

4.1 What is the recovery rate in 2 days?

Varying 𝛾 Average duration of infection = 2 days so the recovery rate = 1/2 = 0.5 days −1

parameters <- c(gamma = 0.5)
output <- as.data.frame(ode(y = initial_state_values,  times = times, func = cohort_model, parms = parameters))

# Plotting the output
output_long <- melt(as.data.frame(output), id = "time")   
# turn output dataset into long format

ggplot(data = output_long,       # specify object containing data to plot
       aes(x = time, 
           y = value, 
           colour = variable, 
           group = variable)) +  # assign columns to axes and groups
  geom_line() +                                 # represent data as lines
  xlab("Time (days)")+                          # add label for x axis
  ylab("Number of people") +                    # add label for y axis
  labs(title = paste("Number infected and recovered over time when gamma =",
                      parameters["gamma"],"days^-1")) + # add title
  scale_color_brewer(palette = "Set1")

# Since the initial number in each compartment and 
# the timesteps of interest haven't changed, 
# these are the only parts of the code we need to rerun.

# Now, copy-paste your plot code from above here to visualise the output.

4.2 What is the recovery rate in 20 days?

parameters <- c(gamma = 0.05)

# Solving the model
output <- as.data.frame(ode(y = initial_state_values, 
                            times = times, 
                            func = cohort_model,
                            parms = parameters))

# Plotting the output
output_long <- melt(as.data.frame(output), id = "time") 

ggplot(data = output_long,                                 
       aes(x = time, 
           y = value, 
           colour = variable, 
           group = variable)) +  # assign columns to axes and groups
  geom_line() +                  # represent data as lines
  xlab("Time (days)")+           # add label for x axis
  ylab("Number of people") +     # add label for y axis
  labs(title = paste("Number infected and recovered over time when gamma =",
                    parameters["gamma"],"days^-1")) # add title

5 Question: What changes do you observe in the transition to the recovered compartment if 𝛾 is higher or lower? For example, how long does it take for everyone to recover in both cases?

#Answer: if the rate is higher ( 𝛾  = 0.5), we can see that infected people recover more quickly: it takes less than 2 days for half of the infected cohort to recover, and by around 8 days, nearly everyone has recovered. A lower rate ( 𝛾  = 0.05) on the other hand corresponds to a slower transition: it takes around 14 days for half of infected people to move into the  𝑅  compartment, and by the end of our 4 week follow-up around a quarter of people still have not recovered.

(r): source from coursera course. Only for studying purpose #https://www.coursera.org/learn/developing-the-sir-model/home/welcome

---
title: "Modelling an infected cohort 1"
author: "Binh Thang Tran (C) from coursera"
date: "5/10/2020"
output:
  html_document:
    code_download: yes
    code_folding: hide
    number_sections: yes
    theme: journal
    toc: yes
    toc_float: yes
  word_document:
    toc: yes
  
---
# introduction
*In this etivity, you will code your first simple model and use it to answer a research question. To get used to the format for models in R, we'll start by giving you some parts of the code, and you need to fill in the missing numbers or code blocks by replacing the #YOUR CODE# placeholders. At the end of each etivity, we'll provide a solution file that contains the full code, annotated with comments. Make sure you understand each line, and soon you'll be able to write your own model from scratch!

Your task is to find out how long it takes for a cohort of infected people to recover. As you saw in the video, to answer this question you need to keep track of 2 populations: those that are infected (compartment  𝐼 ), and those that have recovered (compartment  𝑅 ). Infected people recover at a rate  𝛾  (gamma). The differential equations describing this are:*


Load the packages which you need for this etivity, by running the following cell:

# Run this cell as it is

```{r}
library(deSolve)   # package to solve the model
library(reshape2)  # package to change the shape of the model output
library(ggplot2)   # package for plotting
```

To start, it is useful to code what we know about the situation we want to model. We are looking at a cohort of 10^6  currently infected people, and no one has recovered so far. The average duration of infection is 10 days. The question we want to answer is how many people will recover from the infection over a 4-week period.

Given this data, fill in the correct values to the following variables, and run the cell:

```{r}
initial_number_infected <- 1000000       # the initial infected population 
                                         # size
initial_number_recovered <- 0  # the initial number of people in "0"
                                         # the recovered state
recovery_rate <- 0.1            # the rate of recovery gamma, 
                                         # in units of days^-1
follow_up_duration <- 28      # the duration to run the model for, 
                                         # in units of days
  
# Hint: the units of the recovery rate and the follow-up duration should be
# consistent.
```

Now, we combine this data into objects that are recognised by the deSolve package as model input. To do this, again run the code below.

```{r}
# The initial state values are stored as a vector 
# and each value is assigned a name.
initial_state_values <- c(I = initial_number_infected, 
                          R = initial_number_recovered)

# Parameters are also stored as a vector with assigned names and values. 
parameters <- c(gamma = recovery_rate)  
# In this case we only have one parameter, gamma.
```


Think about: what kind of information is stored in the "initial_state_values" and parameters vectors?

Additionally, we need to specify the time we want the model to run for. This depends on the question we want to answer. In the cell below, the duration you specified earlier is automatically filled in when you run it.

```{r}
# The times vector creates a sequence of timepoints at which we want to 
# calculate the number of people in the I and R compartment.
times <- seq(from = 0, to = follow_up_duration, by = 1) 
```

Think about: what kind of information is stored in the times vector?

Check your answers by having a look at each of these vectors to familiarise yourself with the structure: in the following code cell we have typed the object names, so you just need to press "Run" to see what each of them contains.

```{r}
initial_state_values
parameters
times
```


The next step is specifying the model. Using the example code in the introductory document, complete the following model function with the differential equations above.

```{r}
cohort_model <- function(time, state, parameters) { 
  
    with(as.list(c(state, parameters)), {
      
      dI <- -gamma*I
      dR <- gamma*I
      
      return(list(c(dI, dR)))
    })
  
}
```


Now all there's left to do is solving this set of equations using the deSolve package. Fill in the following command, which calculates and stores the number of infected and recovered people at each timestep in the output dataframe. Don't forget to run it!


```{r}
# Hint: if you can't remember what those arguments correspond to,
# just look up the ode help file:
?ode

```


```{r}
output <- as.data.frame(ode(y = initial_state_values,  times = times,   func = cohort_model,  parms =  parameters))

```


Printing the model output returns a dataframe with columns time (containing our times vector), I (containing the number of infected people at each timestep) and R (containing the number of recovered people at each timestep):

```{r}
output

```

# Question: Based on the output, how many people have recovered after 4 weeks? What proportion of the total population does this correspond to?

Now, plot your model output in the following cell, with time on the x axis and the number of infected and recovered people on the y axis. You can use the introductory document for help with this.

## base on the output, how many people have recovered afetr 4 weeks

```{r}
output[output$time == 28, c("time","R")]   
# printing the "time" and "R" output columns when the timestep equals 28
# Answer: 939189.9 people are in the recovered compartment at timestep 28
```


## What proportion of the total population does this correspond to?

```{r}
output[output$time == 28,"R"]/(output[output$time == 28,"I"]+
 output[output$time == 28,"R"]) # note this is only on a separate line 
                                # for formatting reasons!

# Answer: 93.9%

# All we are doing here is dividing the number of recovered people at the
# 4 week timestep by the total population size (I + R) at this timestep.
# Note that since there are no births and deaths in our model, 
# the total population size is the same at each timestep, so we could 
# also calculate it by taking sum(initial_state_values).
```

## Plotting the output

```{r}
# First turn the output dataset into a long format, 
# so that the number in each compartment at each timestep
# are all in the same column
output_long <- melt(as.data.frame(output), id = "time")                  

# Plot the number of people in each compartment over time
ggplot(data = output_long,          # specify object containing data to plot
       aes(x = time, y = value, colour = variable, group = variable)) +          # assign columns to axes and groups
  geom_line() +                     # represent data as lines
  xlab("Time (days)")+              # add label for x axis
  ylab("Number of people") +        # add label for y axis
  labs(title = paste("Number infected and recovered over time when gamma =", parameters["gamma"],"days^-1")) # add title
#Using the #paste command, we can combine sentences with the 
#values stored in variables (here gamma)
  
```

## Based on the plot, at what timepoint were infected and recovered individuals equal in number?

```{r}
# Answer: around 7 days - this where the I and R lines intersect.
# You can confirm that by looking at the output table:
# the number in the I and R compartment are very similar at timestep 7.
output[output$time == 7,]
```

# Question: Based on the plot, at what timepoint were infected and recovered individuals equal in number?

For the last part of the etivity, try varying  𝛾  to see how it affects the output. For example, in the cell below change gamma ( 𝛾 ) to correspond to an average infectious period of:
a) 2 days
b) 20 days.

## What is the recovery rate in 2 days?

Varying  𝛾 
Average duration of infection = 2 days so the recovery rate = 1/2 = 0.5 days −1


```{r}
parameters <- c(gamma = 0.5)
output <- as.data.frame(ode(y = initial_state_values,  times = times, func = cohort_model, parms = parameters))

# Plotting the output
output_long <- melt(as.data.frame(output), id = "time")   
# turn output dataset into long format

ggplot(data = output_long,       # specify object containing data to plot
       aes(x = time, 
           y = value, 
           colour = variable, 
           group = variable)) +  # assign columns to axes and groups
  geom_line() +                                 # represent data as lines
  xlab("Time (days)")+                          # add label for x axis
  ylab("Number of people") +                    # add label for y axis
  labs(title = paste("Number infected and recovered over time when gamma =",
                      parameters["gamma"],"days^-1")) + # add title
  scale_color_brewer(palette = "Set1")
                
# Since the initial number in each compartment and 
# the timesteps of interest haven't changed, 
# these are the only parts of the code we need to rerun.

# Now, copy-paste your plot code from above here to visualise the output.
```

## What is the recovery rate in 20 days?
```{r}
parameters <- c(gamma = 0.05)

# Solving the model
output <- as.data.frame(ode(y = initial_state_values, 
                            times = times, 
                            func = cohort_model,
                            parms = parameters))

# Plotting the output
output_long <- melt(as.data.frame(output), id = "time") 

ggplot(data = output_long,                                 
       aes(x = time, 
           y = value, 
           colour = variable, 
           group = variable)) +  # assign columns to axes and groups
  geom_line() +                  # represent data as lines
  xlab("Time (days)")+           # add label for x axis
  ylab("Number of people") +     # add label for y axis
  labs(title = paste("Number infected and recovered over time when gamma =",
                    parameters["gamma"],"days^-1")) # add title
```


# Question: What changes do you observe in the transition to the recovered compartment if  𝛾  is higher or lower? For example, how long does it take for everyone to recover in both cases?

```{r}
#Answer: if the rate is higher ( 𝛾  = 0.5), we can see that infected people recover more quickly: it takes less than 2 days for half of the infected cohort to recover, and by around 8 days, nearly everyone has recovered. A lower rate ( 𝛾  = 0.05) on the other hand corresponds to a slower transition: it takes around 14 days for half of infected people to move into the  𝑅  compartment, and by the end of our 4 week follow-up around a quarter of people still have not recovered.
```



(r): source from coursera course. Only for studying purpose #https://www.coursera.org/learn/developing-the-sir-model/home/welcome
