Email: jonescordell2@gmail.com
LinkedIn: https://www.linkedin.com/in/cordelljones96/

Introduction & Summary

From the CDC’s COVID-19 (2019 Novel Coronavirus) website:

“COVID-19 (2019 Novel Coronavirus) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people.

The data analysis below is of that of the running current US COVID-19 data. With the multiple plots illustrates the current pandemic presence of the Novel Coronavirus and that of the US placement. With all data ends with that in a 28-day predictive forecast.

The attached plots represents the COVID-19 confirmed cases, deaths, and recovered cases. This is represented in pie and bar charts. In each chart illustrates and shows the top 10 countries in that of the listed category. With each pie chart represents the percentage of the whole which of that country contributes.

# Summary 
report.summary(Nentries = 10,
               graphical.output = T)

US Totals

The attached plot is of the total number of COVID19 cases for the US represented in logarithmic and bar graphs. To see an interactive world map, please click here.

# Totals per location
tots.per.location(tsc, geo.loc = 'US')

Growth Rate

The attached plot from data pulled from 6/30/2020 shows the COVID19 growth data for the US. The top graph represents the number of changes in terms of reported cases. This is shown in two plots, a logarithmic (top) and a linear scale of changes in total cases (bottom). As you can see, as of mid-June with lock down restrictions lessen, cases are exponentially going back up. In the bottom graph, we can see the highest growth rate occurred in March.

# Growth Rate 
growth.rate(tsc, geo.loc = 'US')

US vs World

The below plot compares the US rates and counts of transmission, death, and recoveries to that of the world totals. On the left represents the data on a linear scale and on the right of a logarithmic scale.

totals.plt(tsa, c('US'))

US SIR Model

The below plot shows the current US SIR model. The top charts show the number of infected in the first 25 days, the left linear, and the right logarithmic. The bottom chart represents the US’s SIR model, again, the left linear and the right logarithmic. A SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. Interpreting the graph we can see that the steep decline of the blue line shows how viral COVID is. Then we can see that the epi-curve of the red line shows where the peak occurred back in March.

generate.SIR.model(tsc, 'US', tot.population = 328200000)

Forecasting

The below plots depict the US COVID-19 case forecast and predictions. Below is also an interactive plot that you can isolate transmission data/cases by a date range. As shown, rates of transmissions are statistically predicted to increase. This is with a more than 95% confidence shown with the high R-squared and low p-values.

A drastic increase in reported cases can be seen in the month of June. This can be contributed to country, state, and local restrictions being let go and businesses opening back up.

# Data
tsc <- covid19.data(case = 'ts-confirmed')
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
## --------------------------------------------------------------------------------
tsc <- tsc %>% filter(Country.Region == 'US')
tsc <- data.frame(t(tsc))
tsc <- cbind(rownames(tsc), data.frame(tsc, row.names = NULL))
colnames(tsc) <- c('Date', 'Confirmed')
tsc <- tsc[-c(1:4),]
tsc$Date <- ymd(tsc$Date)
tsc$Confirmed <- as.numeric(as.character(tsc$Confirmed))
str(tsc)
## 'data.frame':    185 obs. of  2 variables:
##  $ Date     : Date, format: "2020-01-22" "2020-01-23" ...
##  $ Confirmed: num  1 1 2 2 5 5 5 5 5 7 ...
# Plot
qplot(Date, Confirmed, data = tsc,
      main = 'Covid19 Confirmed Cases in US')

# Variables
ds <- tsc$Date
y <- tsc$Confirmed
df <- data.frame(ds, y)

# Forecasting
m <- prophet(df)

# Prediction
future <- make_future_dataframe(m, periods = 28)
forecast <- predict(m, future)

# Plot Forecast
plot(m, forecast)

dyplot.prophet(m, forecast)

Forecast Components

Interestingly, we can see that more cases are reported on a specific week day. Fridays are to be when more cases are reported, as of 07/25/2020. One can attest that this can be contributed to as a statistical error in that there is a time lapse in when an individual is tested and when their results are delivered.

# Forecast Components
prophet_plot_components(m, forecast)

Model Performance

As shown in all plots so far, rates of transmissions are statistically predicted to increase. Running the above summary data we receive a p-value of 2.2e-16 (0.00000000000000022). A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so we can reject the null hypothesis and we can say with a high confidence that this data is statistically significant.

# Model Performance
pred <- forecast$yhat[1:185] # df Observed
actual <- m$history$y
plot(actual, pred)
abline(lm(pred~actual), col ='red')

summary(lm(pred~actual))
## 
## Call:
## lm(formula = pred ~ actual)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -145441   -5251   -1694    2586  103293 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.607e+03  3.826e+03    0.42    0.675    
## actual      9.986e-01  2.326e-03  429.31   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 37390 on 183 degrees of freedom
## Multiple R-squared:  0.999,  Adjusted R-squared:  0.999 
## F-statistic: 1.843e+05 on 1 and 183 DF,  p-value: < 2.2e-16