Predict the number of COVID-19 cases in North Dakota using epidemiological modelling: Kermack-McKendrick (KM) SIR Model.
Number of new cases per day.
Number of cumulative cases per day.
Observations based on: https://www.health.nd.gov/diseases-conditions/coronavirus/north-dakota-coronavirus-cases-mobile-friendly
Closed SIR Model Limitiations | Things to Keep in Mind:
Kermack-McKendrick (KM) SIR Model Assumptions:
Large scale population size: if populations are small, stochastic effects dominate
Exponentially-distributed waiting times in epidemiological compartments
With closed population: no entry into or departure from the population, except possibly by disease induced death)
Time scale of disease is assumed faster than the time scale of birth and deaths (so that the impact of demographic effects on the population may be ignored)
Homogenous mixing: each individual in the population has an equal chance of interacting with any other
Using available incidence data (June 18, 2020) to estimate of R0.
##
## Call:
## lm(formula = log(cumulative) ~ date, data = cases)
##
## Coefficients:
## (Intercept) date
## -1189.02144 0.06502
##
## Call:
## lm(formula = log(cumulative) ~ date, data = subset(cases, status ==
## "confined"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2632 -0.4468 0.3386 0.7837 0.9655
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1189.021443 65.600856 -18.12 <0.0000000000000002 ***
## date 0.065021 0.003569 18.22 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.03 on 98 degrees of freedom
## Multiple R-squared: 0.7721, Adjusted R-squared: 0.7697
## F-statistic: 331.9 on 1 and 98 DF, p-value: < 0.00000000000000022
## date
## estimate of R0 is 1.4876552614195 | CI = (
## date date
## 0.434538294548643 , 1.54077222829037 )
Estimated R0 is 1.4877 | 97.5% CI = (0.4345, 1.5408)
# According to Kermack-McKendrick model: if you know R0, you will know how many people will be infected in the end (1- z(inf)/N = e^(-R_0 z(infty)/N) -> R_0 = log(1 - z(infty)/N))
# If you know R0 from the beginning of the epidemic, and you assume Kermack-McKendrick is true: if R0=1.8, 75% of population will be infected at the end)
Plot showing predicted vs. observed # of cases with time in days (since first known case).
A table with estimated predictions of # cases, and susceptibles with time in days.
Estimated R0 is 1.4877 | 97.5% CI = (0.4345, 1.5408)
Projected maximum number of cases: 7132
Projected days to reach maximum number of cases: ~105
Projected day for maximum number of cases: 2020-03-11 + 105 = 2020-06-24
The difference in prediction and observed, could be due to a multitude of explanations including:
Thank you for doing your part in protecting yourself and others.
North Dakota Department of Health (2020), “Coronavirus Cases,” Diseases & Conditions, Retrieved from: https://www.health.nd.gov/diseases-conditions/coronavirus/north-dakota-coronavirus-cases-mobile-friendly
For R code or questions or ideas to collaborate, I am happy to hear from you: Rasha Elnimeiry, MPH, MAS, CPH | Relnime1(at)jhu.edu