Coronavirus Data

Coronavirus Data pulled on March 18 from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository, via R Coronavirus package by github user RamiKrispin.

United States coronavirus data includes daily number of cases from February 24, 2020, through March 19th, 2020.

Country.Region date confirmed death DayNumber Confirmed_Total Death_Total Log_Confirmed_Total DeathRate
US 2020-03-16 1133 22 22 4617 85 8.437500 0.0184102
US 2020-03-17 1789 23 23 6406 108 8.764990 0.0168592
US 2020-03-18 1362 10 24 7768 118 8.957768 0.0151905
US 2020-03-19 5894 82 25 13662 200 9.522374 0.0146391
US 2020-03-20 5423 44 26 19085 244 9.856658 0.0127849
US 2020-03-21 6389 63 27 25474 307 10.145414 0.0120515
US 2020-03-22 7787 110 28 33261 417 10.412141 0.0125372
US 2020-03-23 10571 140 29 43832 557 10.688119 0.0127076
US 2020-03-24 9893 149 30 53725 706 10.891634 0.0131410
US 2020-03-25 12038 236 31 65763 942 11.093813 0.0143242

Number of Coronavirus Cases in the US

Cumulative Death Rate by Date

UScoronavirus%>%
                            ggplot()+
                            geom_area(aes(x=date,y=DeathRate), fill="salmon")+
                            theme_minimal()+
                            labs(title="United States: Cumulative Death Rate by Date")

Number of Coronavirus Cases in the US, by Date

The line in the graph represents exponential growth. Unlike linear growth, where we might observe steady increase across our domain, exponential growth exhibits increase that becomes increasingly more rapid among the larger values in our domain.

We can log-transform the Y-values so that each upward interval in our y-axis represents a change of several orders of magnitude, instead of several equally spaced units. By doing this, and then re-plotting our line, we are able to represent the exponential increase over time in a visually linear format.

Predicting Growth: Linear Regression w/Exponential Data

After applying the log-transformation to the Y-values in our data, we are presented with a visually linear relationship between confirmed cases, and date. If we can fit a least-squares regression line through these points, then we should be able to estimate the rate of change, and the number of cases that could be expected on future dates.

exp_model<-lm(UScoronavirus$Log_Confirmed_Total~UScoronavirus$DayNumber)
summary(exp_model)
## 
## Call:
## lm(formula = UScoronavirus$Log_Confirmed_Total ~ UScoronavirus$DayNumber)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34803 -0.15553 -0.05247  0.15960  0.81030 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             2.498482   0.100151   24.95   <2e-16 ***
## UScoronavirus$DayNumber 0.274738   0.005464   50.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2721 on 29 degrees of freedom
## Multiple R-squared:  0.9887, Adjusted R-squared:  0.9883 
## F-statistic:  2529 on 1 and 29 DF,  p-value: < 2.2e-16

Graph of Fitted vs. Observed Values

Exponential Growth Rate by State

Comparing Exponential Growth rates between different States in the US as of March 24th.

Percent of Cases Resulting in Death by Region

Map of Confirmed Cases