Executive Summary:

In this study propagation pattern of the COVID19 pandemic in US is made using a widely used SIR model. A R-library is created to fetch the COVID19 data through our website [1] from JHU-CCSE to predict the propagation pattern over a period of 140 days starting from the first US reported death on February 29 up to the projected end sometime in the middle of July, 2020. In this analysis an initial population, N is considered to be 20.23M which is the combined population of New York City (8.399M), Washington (7.615) and Oregon (4.218). We calculated the optimized parameter \(\beta\) that controls the transition between S to I (susceptible (S) to infectious (I)) and \(\gamma\) that controls the transition between I to R ( infectious (I) to recovered (R)) to be 0.6023021 and 0.4169131 respectively. Furthermore, we estimated the reproduction number of 1.44, and using the formula \(1-\frac{1}{R_{0}} = 1 - \frac{1}{1.44} = 0.30556\), about 31.00% of the population of US which is about 102.61 million, need to be vaccinated to stop the pandemic in the future. Based on the available US data for the period of February 29 to May 10, the model predicts what happens if the pandemic continues over a total time span of 140 days or longer. It is estimated from the analysis that peak infected case already happened around May 01, 2020, and the pandemic is expected to come back to the normal state sometime in the middle of July, 2020. This paper is based on the similar case study reported in [2] but adapted for US with modifications with our own library and methods.

Introduction:

Users can choose from a variety of common Ordinary Differential Equation (ODE) models (such as the SIR, SEIR, SIRS, and SIS models), or create on their own to study various kinds of epidemic and pandemic diseases. The objective of this study is to find ways to adapt COVID-19 data to make predictions of the propagation of COVID-19 and take appropriate measures to slow down the spread of the disease to save human lives.

In this analysis we expect to estimate the expected numbers of individuals over time who will be infected, recovered, susceptible, or dead. Infected individuals first pass through an exposed/incubation phase where they are asymptomatic and not infectious, and then move into a symptomatic to infectious stage classified by the clinical status of infection (mild, severe, or critical). The initial population size, initial conditions, and parameter values are used to simulate the spread of infection. In SIR model we consider S susceptible, I infectious and R recovered cases to start with the simulation. Susceptible become infected at a rate equal to the product of an infectious contact rate \(\beta\) and the number of infectious I. Infectious people recover at a rate \(\gamma\).

Timeline of COVID-19 Pandemic:

The first outbreak of novel corona virus was reported sometime in December 2019 from Wuhan, China. In the span of about four months this virus spread to almost all countries in the world. As of May 10, 2020, more than 4.10 million people got infected and about 0.28 million people lost their lives. Many data scientists are trying to predict the nature and timeline of its propagation to save human lives as many as possible by taking all practical measures including social distancing in their models. The first death case was reported by Chinese state media on January 11, 2020, of a 61-year-old man from Wuhan, the capital city of Hubei province. On January 21, 2020, United States declared its first infected case with a man in his 30s from Washington State, who traveled to Wuhan. Oregon, Washington and New York soon report their own cases of possible community transmission. On January 23, 2020, Chinese government imposed strict measures in Wuhan by shutting down almost all activities including travel restrictions in air, trains, subways, buses and put restrictions on public gathering, mass congregation, closing all schools, and colleges. On January 30, 2020, WHO declares global health emergency a “public health emergency of international concern”. On February 11, 2020, novel coronavirus was named as Corona Virus Disease 2019 with its abbreviation as COVID-19. On February 26, 2020, the Centers for Disease Control and Prevention (CDC) confirms the first case of COVID-19 infection in a patient in California with no travel history to an outbreak area. The first reported death in the USA was on February 29, 2020, however, a death in Santa Clara, California, on February 6 is considered to be the first COVID-19 death. In this analysis the timeline is considered to be from February 29, 2020, to May 10, 2020.

The SIR Model for Spread of Disease:

In this study SIR model is used. A graphical representation of SIR Model is shown in the figure below:

SIR model has 3 variables S, I and R which are respectively the numbers of susceptibles, infectious and recovered, and has two parameters \(\beta\) and \(\gamma\) which are respectively the infection rate and the recovery rate.

The basic idea behind the SIR model is based on three groups of people with the following assumptions:

S: is assumed to be the entire population of New York City, Washington State and Oregon who were susceptible to the disease since no one was immued to the disease,

I: the infected population, and

R: individuals who were contaminated but who have either recovered or died.

The dependent variables represent the fraction of the total population in each of the three categories. In this analysis N is cosidered to be 20,232,000 which is the combined population of New York City (8.399M), Washington (7.615) and Oregon (4.218) where the pandemic started in the USA on February 29, 2020.

\[ \begin{equation} \frac{dS}{dt} = -\beta S(t) I(t)/N\ (Eq.1) \end{equation} \]

where \(\beta\) is the rate of infection, which controls the transition between S and I, \[ \begin{equation} \frac{dI}{dt} = \beta S(t) I(t)/N-\gamma I(t)\ (Eq.2) \end{equation} \]

and \(\gamma\) is the recovery rate, which controls the transition between I and R.

\[ \begin{equation} \frac{dR}{dt} = \gamma I(t)\ (Eq.3) \end{equation} \] The first equation (Eq. 1) states that the number of susceptible individuals (S) decreases with the number of newly infected individuals, where new infected cases are the result of the infection rate \(\beta\) multiplied by the number of susceptible individuals (S) who had a contact with infected individuals (I).

The second equation (Eq. 2) states that the number of infectious individuals (I) increases with the newly infected individuals \(\beta IS\), minus the previously infected people who recovered (i.e., \(\gamma I\) which is the removal rate \(\gamma I\) multiplied by the infectious individuals I).

Finally, the last equation (Eq. 3) states that the recovered group (R) increases with the number of individuals who were infectious and who either recovered or died (\(\gamma I\)).

An epidemic develops as follows:

Solving Differential Equations of SIR Model:

Solving a system of differential equations means finding the values of the variables (here S, I and R) at a number of points in time. These values will depend on the parameters’ values. We can numerically solve differential equations in R using ode() function of the deSolve package. If this package is not installed on your system, you need to install it.

Loading covid19 Library for Updated COVID-19 Data that pulls raw data from JHU-CCSE:

Countrywise Tabular Data:

The initial values of the variables need to be defined in a named vector:

Put the daily cumulative incidence numbers for US from February 29 to May 10 into a vector called Infected. Create an incrementing Day vector with the same length as our cases vector.

Define a function to calculate the residual sum of squares (RSS):

Passing in parameters beta and gamma that are to be optimized for the best fit to the incidence data. Fitting a SIR model to the US data we need the following two things:

  1. a solver for solving the three differential equations
  2. an optimizer to find the optimal values for our two unknown parameters, \(\beta\) and \(\gamma\).

The function ode() (for ordinary differential equations) from the {deSolve} R package makes solving the system of equations easy, and to find the optimal values for the parameters we wish to estimate, we can just use the optim() function built into base R.

Specifically, what we need to do is to minimize the sum of the squared differences between I(t), which is the number of infected people at time t, and the corresponding number of cases as predicted by the model \(\hat I\). The quantity to minimize is called residual sum of squares (RSS). Passing in parameters beta and gamma that are to be optimized for the best fit to the incidence data. In order to fit a model to the incidence data for US, we need a value N for the initial uninfected population. To start the simulation we need to specify initial values for N, S, I and R. Here, N is assumed to be the population of New York City (8.399M), Washington (7.615) and Oregon (4.218) combined is N = 20232000.

Next, we need to create a vector with the daily cumulative incidence for US, from February 29 (when first daily incidence data starts in the US), through to May 10 (last available date at the time of publication of this article). We will then compare the predicted incidence from the SIR model fitted to these data with the actual incidence since February 29. The daily cumulative incidence data for US is loaded from the {covid19} R-package developed in this work.

\[RSS(\beta,\gamma) = {\sum_{t} (I(t) - \hat I(t))^2}\ (Eq.4)\]

Now we find the values of beta and gamma that give the smallest RSS, which represents the best fit to the data. First we check for convergence and once it is confirmed then we get the fitted or optimized parameters \(\beta\) and \(\gamma\). Once the convergence is confirmed then we can examine the fitted values for beta and gamma for further analysis and prediction of the spread of COVID19.

## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
##      beta     gamma 
## 0.6023021 0.4169131

With the optimized parameter \(\beta\) and \(\gamma\) that controls the transition between S to I (susceptible (S) to infectious (I)) and controls the transition between I to R ( infectious (I) to recovered (R)) respectively. The plot for the fitted data with the observed data collected from JHU in the time span of February 29 to May 10 is shown in the figure below. The y-axis of the following plot is on a log10 scale versus time in days.

Reproduction number \(R_{0}\)

In this study SIR model with best fit to the observed cumulative incidence data in US, so as to get the correct reproduction number \(R_{0}\), also referred to as basic reproduction ratio, and is given by:

\[ \begin{equation} R_{0}=\frac{\beta}{\gamma}\ Eqn. (5) \end{equation} \]

In other words, the reproduction number \(R_{0}\) refers to the number of healthy people that get infected per number of infected people.

If \(R_{0}\) is greater than \(R_{0}>1\), the infection will spread exponentially. If \(R_{0}\) is less than \(R_{0}<1\), the infection will spread very slowly, and it will eventually die down. The higher the value of \(R_{0}\), the faster an epidemic or a pandemic will spread.

In this simulation \(R_{0}\) comes out to be 1.44 which is relatively lower than reported by others for COVID-19 which is in the range 2.2–2.7 for Wuhan, China [3]. In the literature, it has been estimated that the reproduction number for COVID-19 is approximately 2.7 (with \(\beta\) close to 0.54 and \(\gamma\) close to 0.2). Our reproduction number being lower is mainly due to the fact that the number of confirmed cases stayed constant and equal to 1 at the beginning of the pandemic.

A \(R_{0}\) of 1.44 means that, on average in US, 1.44 persons are infected for each infected person. For simple models, the proportion of the population that needs to be effectively immunized to prevent sustained spread of the disease, known as the “herd immunity threshold”, has to be larger than \(1-\frac{1}{R_{0}}\).

The reproduction number of 1.44 that we estimated suggests that about 31.00% of the population should be immunized to stop the spread of the infection. Considering the population in US of approximately 331 million, about 102.61 million people need to be vaccinated to stop the spread of this disease.

##      beta     gamma 
## 0.6023021 0.4169131
## [1] 1.444671

Now we make time series predictions in days for the period of 140 days and plot them to see their pattern for finding the end date for this pandemic.

Based on the available US data for the period of February 29 to May 10, the model now preddicts what happens if the pandemic continues over a total time span of 140 days. The y-axis is in log10 scale in persons versus the time in days.

The date of peak of pandemic in the US with maximum infected and maximum death. Maximum Infected = confirmed - death - recovered.

##          Date       I
## 62 2020-05-01 1074920
## [1] 80361.01

Conclusions:

This paper is the outcome of a very simple SIR epidemiological model. The numbers we ended up with are very close to the actual data. A better model called SEIR could be used for accomodating the effect of social distancing. SEIR model is similar to the SIR model, where S stands for Susceptible and R stands for Recovered, but the infected people are divided into two categories, E stands for the exposed/infected is asymptomatic and I for the infected and symptomatic. Both SIR and SEIR models belong to the continuous-time dynamic models that assume fixed transition rates. In our analysis we used a fixed reproduction number \(R_{0}\) but it reality it is much more realistic to estimate the current effective reproduction number \(R_{e}\) on a day-by-day basis so as to track the effectiveness of public health interventions, such as social distancing and increased number of tests etc. This article resulted in a numerical comparison with visual presentation of the fitted and observed cumulative incidence in US. The graphs indicated the exponetial nature of the COVID-19 pandemic in US with peak reaching sometime in the beginning of May and is expected to fall and come to a complete normal state sometime in the middle of July, 2020, provided public health intervention is in strict compliance.

We estimated the reproduction number of 1.44 using the optimized \(\beta\) and \(\gamma\) that suggests that about 31.00% of the population should be immunized to stop the spread of the infection. Considering the total population in US of approximately 331 million, about 102.61 million need to be vaccinated. Based on the available US data for the period of February 29 to May 10, the model now preddicts what happens if the pandemic continues over a total time span of 140 days. It is estimated from the analysis that peak infected case will happen around May 01, 2020, and is expected to come down to the normal state sometime in the middle of July, 2020. Humans are expected to fight in against of the natural epidemic or pandemic with their all kinds of resources because life is more important than the material resources and wealth. However, we must also ensure that there are sufficient resources, medical supplies and PPEs are available for treating infected patients that will help them to survive or increase the chances for their survival rates.

References:

  1. https://irays-teknology-ltd.com/
  2. https://www.statsandr.com/blog/covid-19-in-belgium/#more-sophisticated-projections.
  3. High Contagiousness and Rapid Spread of Severe Acute Respiratory Syndrome Coronavirus-2, Volume 26, Number 7—July 2020, Steven Sanche, Yen Ting Lin, Chonggang Xu, Ethan Romero-Severson, Nick Hengartner, and Ruian Ke