Email: jonescordell2@gmail.com
LinkedIn: https://www.linkedin.com/in/cordelljones96/
From the CDC’s COVID-19 (2019 Novel Coronavirus) website:
“COVID-19 (2019 Novel Coronavirus) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people.
The data analysis below is of that of the running current US COVID-19 data. With the multiple plots illustrates the current pandemic presence of the Novel Coronavirus and that of the US placement. With all data ends with that in a 28-day predictive forecast.
The attached plots represents the COVID-19 confirmed cases, deaths, and recovered cases. This is represented in pie and bar charts. In each chart illustrates and shows the top 10 countries in that of the listed category. With each pie chart represents the percentage of the whole which of that country contributes.
# Summary
report.summary(Nentries = 10,
graphical.output = T)
The attached plot is of the total number of COVID19 cases for the US represented in logarithmic and bar graphs. To see an interactive world map, please click here.
# Totals per location
tots.per.location(tsc, geo.loc = 'US')
The attached plot from data pulled from 6/30/2020 shows the COVID19 growth data for the US. The top graph represents the number of changes in terms of reported cases. This is shown in two plots, a logarithmic (top) and a linear scale of changes in total cases (bottom). As you can see, as of mid-June with lock down restrictions lessen, cases are exponentially going back up. In the bottom graph, we can see the highest growth rate occurred in March.
# Growth Rate
growth.rate(tsc, geo.loc = 'US')
The below plot compares the US rates and counts of transmission, death, and recoveries to that of the world totals. On the left represents the data on a linear scale and on the right of a logarithmic scale.
totals.plt(tsa, c('US'))
The below plot shows the current US SIR model. The top charts show the number of infected in the first 25 days, the left linear, and the right logarithmic. The bottom chart represents the US’s SIR model, again, the left linear and the right logarithmic. A SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. Interpreting the graph we can see that the steep decline of the blue line shows how viral COVID is. Then we can see that the epi-curve of the red line shows where the peak occurred back in March.
generate.SIR.model(tsc, 'US', tot.population = 328200000)
The below plots depict the US COVID-19 case forecast and predictions. Below is also an interactive plot that you can isolate transmission data/cases by a date range. As shown, rates of transmissions are statistically predicted to increase. This is with a more than 95% confidence shown with the high R-squared and low p-values.
A drastic increase in reported cases can be seen in the month of June. This can be contributed to country, state, and local restrictions being let go and businesses opening back up.
# Data
tsc <- covid19.data(case = 'ts-confirmed')
## ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
## --------------------------------------------------------------------------------
tsc <- tsc %>% filter(Country.Region == 'US')
tsc <- data.frame(t(tsc))
tsc <- cbind(rownames(tsc), data.frame(tsc, row.names = NULL))
colnames(tsc) <- c('Date', 'Confirmed')
tsc <- tsc[-c(1:4),]
tsc$Date <- ymd(tsc$Date)
tsc$Confirmed <- as.numeric(as.character(tsc$Confirmed))
str(tsc)
## 'data.frame': 185 obs. of 2 variables:
## $ Date : Date, format: "2020-01-22" "2020-01-23" ...
## $ Confirmed: num 1 1 2 2 5 5 5 5 5 7 ...
# Plot
qplot(Date, Confirmed, data = tsc,
main = 'Covid19 Confirmed Cases in US')
# Variables
ds <- tsc$Date
y <- tsc$Confirmed
df <- data.frame(ds, y)
# Forecasting
m <- prophet(df)
# Prediction
future <- make_future_dataframe(m, periods = 28)
forecast <- predict(m, future)
# Plot Forecast
plot(m, forecast)
dyplot.prophet(m, forecast)
Interestingly, we can see that more cases are reported on a specific week day. Fridays are to be when more cases are reported, as of 07/25/2020. One can attest that this can be contributed to as a statistical error in that there is a time lapse in when an individual is tested and when their results are delivered.
# Forecast Components
prophet_plot_components(m, forecast)
As shown in all plots so far, rates of transmissions are statistically predicted to increase. Running the above summary data we receive a p-value of 2.2e-16 (0.00000000000000022). A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so we can reject the null hypothesis and we can say with a high confidence that this data is statistically significant.
# Model Performance
pred <- forecast$yhat[1:185] # df Observed
actual <- m$history$y
plot(actual, pred)
abline(lm(pred~actual), col ='red')
summary(lm(pred~actual))
##
## Call:
## lm(formula = pred ~ actual)
##
## Residuals:
## Min 1Q Median 3Q Max
## -145441 -5251 -1694 2586 103293
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.607e+03 3.826e+03 0.42 0.675
## actual 9.986e-01 2.326e-03 429.31 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37390 on 183 degrees of freedom
## Multiple R-squared: 0.999, Adjusted R-squared: 0.999
## F-statistic: 1.843e+05 on 1 and 183 DF, p-value: < 2.2e-16