COVID-19, or known in short as the Coronavirus, is a newly identified strain of the COVID virus that has caused a recent outbreak of respiratory illness. The virus is believed to have first been contracted by humans in Wuhan, China. The first known case of the virus in the US was confirmed on Janurary 21st, 2020 when a 35 year-old-man underwent a test after two days in hospital after visiting his family in Wuhan.
Since then, the virus has quickly spread throughout the country, ravaging cities. Since this is a novel virus, no one has immunity and a vaccine currently does not exist. The dataset I am exporing including reported cases of COVID-19 by states and counties from Janurary 21st, 2020 to March 30th, 2020. I am constructing this analysis on April 14th, 2020 and since March 30th, the general trends you will see in this analysis have continued thus far. Although it is worth mentioning we may be seeing a slowing rate of cases in the hardest hit city, NYC. I would love to do another analysis in a few months from now to see more data and trends over time develop.
## Importing Library
library(dplyr)
library(ggplot2)
library(plotly)
library(gganimate)
## Loading Data
covid_state <- read.csv("us-states.csv")
## Date
rdate <- as.Date(covid_state$date,"%m/%d/%y")
I altered the date data structure in R so I could use it for time series plots.
p1<-ggplot(covid_state,aes(rdate,covid_state$cases))+
geom_bar(stat="identity")+ggtitle("Confirmed Covid-19 Cases in the US")+xlab("Date")+ylab("Confirmed Cases")
ggplotly(p1)
p2<-ggplot(covid_state,aes(rdate,covid_state$deaths))+
geom_bar(stat="identity")+ggtitle("US Deaths Due to Covid-19")+xlab("Date")+ylab("Deaths")
ggplotly(p2)
From these plots above, we can see the expected exponential growth trend of how a novel virus would spread. As of the morning of April 14th, 2020 there are 587,173 total confirmed cases and 23,644 deaths in the United States [1]. After seeing these graphs and checking the most recent reported figures, it is evident the growth rate has continued to climb. It appears the growth rate of the virus really took off at the beginning of March, about a 6 weeks after the first case. The death rate seems to have taken off in mid-March. It is typical to see a lag from cases and deaths from viruses.
p2.5<-ggplot(covid_state,aes(rdate))+geom_bar()+theme_classic()+ggtitle("Number of States and Territories Reporting Covid-19 Cases")+xlab("Date")+ylab("States and Territories")
ggplotly(p2.5)
The graph above shows the number of US states, territories, and districts reporting positive cases of COVID-19. In early March you can see the sharp increase in regions reporting cases. On March 1st, there were 15 states reporting cases. By just two weeks later, 53 states/territories/districts were reporting cases. As of March 30th, 2020 all 50 of the US states, the District of Columbia (DC), and four out of the five US territories are reporting cases. The only US territory yet to report a case is the American Samoa territory.
ny<-covid_state[ which(covid_state$state=="New York"), ]
nydate<-as.Date(ny$date,"%m/%d/%y")
p3<-ggplot(ny,aes(nydate,ny$cases))+
geom_bar(stat="identity")+
ggtitle("Confirmed Covid-19 Cases in New York")+xlab("Date")+
ylab("Confirmed Cases")+geom_line(y=ny$deaths,aes(color="red"))
ggplotly(p3)
p4<-ggplot(ny,aes(nydate,ny$deaths))+
geom_bar(stat="identity")+
ggtitle("Deaths in New York")+xlab("Date")+
ylab("Count")
ggplotly(p4)
We all know New York, and in particular the New York City area, has been hit the hardest by the virus so far. This is why I wanted to look at confirmed cases in New York alone, because the virus has developed the fastest in this state. We can see from the graph an exponential growth rate during the first half of March. Throughout the month of March the governor of New York, Andrew Cuomo, has put pressure on citizens and businesses to stay home and practice “social distancing.”
By the end of March, at least from the graph, we can see a slowing down in the growth rate to resemble more of a linear trend. Whether it will continue to slow down into a halt or catch back up to pace will be determined in the future. From the graph of deaths in New York, unfortunately we have yet to see the slowing rate in the later parts of March. This graph resembles an exponential growth more than a linear trend.
ml<-covid_state[ which(covid_state$state=="Maryland"), ]
mldate<-as.Date(ml$date,"%m/%d/%y")
p5<-ggplot(ml,aes(mldate,ml$cases))+
geom_bar(stat="identity")+
ggtitle("Confirmed Covid-19 Cases in Maryland")+xlab("Date")+
ylab("Confirmed Cases")+geom_line(y=ml$deaths,aes(color="red"))
ggplotly(p5)
Since I have recently spent the last half year in the state of Maryland, I wanted to visit the state in term of COVID-19 reportings to see where it stands. Unfortunately, it looks like this state is just beginning the upswing of the outbreak. As of today, April 14th, 2020, there are nearly 9,000 confirmed cases in Maryland, an increase of 600% since March 30th, the last data point we have on this graph. There are also now 262 deaths, an increase of over 1,700% from two weeks ago [1].
## Importing
covid_county <- read.csv("us-counties.csv")
## Dating
cdate <- as.Date(covid_county$date,"%m/%d/%y")
covid_city_t <- covid_county %>%
filter(county == "New York City" | county == "Cook" |
county == "Los Angeles" | county == "Philadelphia" | county == "King" | county == "Miami-Dade")
covid_city_t$date<-as.Date(covid_city_t$date)
ggplot(covid_city_t,
aes(x=county, y=cases, label=county, color=county))+
geom_point(stat='identity',size=15)+
geom_segment(aes(
y=180,
x=county,
yend=cases,
xend=county))+
geom_text(color="black",size=3)+
coord_flip()+
theme(legend.position= "none")+
labs(title='Date: {frame_time}',x='County',y='Cases')+
transition_time(date)+
ease_aes('linear')
This is an animated graph of reported cases in major cities in the US. I selected to visually look at Philadelphia, New York City, Miami, Los Angeles, Seattle (King County), and Chicago (Cook County). From the animation, you can see that LA and Chicago had cases in early Feburary, nearly a month before any other major city had one. And in mid-March you can see New York City take off.
covid_city_t2 <- covid_county %>%
filter(county == "Prince William" | county == "Arlington" |
county == "Prince George's" | county == "District of Columbia" | county == "Frederick" | county == "Baltimore")
covid_city_t2$date<-as.Date(covid_city_t2$date)
ggplot(covid_city_t2,
aes(x=county, y=cases, label=county, color=county))+
geom_point(stat='identity',size=15)+
geom_segment(aes(
y=180,
x=county,
yend=cases,
xend=county))+
geom_text(color="black",size=3)+
coord_flip()+
theme(legend.position= "none")+
labs(title='Date: {frame_time}',x='County',y='Cases')+
transition_time(date)+
ease_aes('linear')
This is a similar animation to the other graph. This is an animation showing reported cases in counties that are close to me or ones that I have lived in. In March, none of these counties had more than 500 cases. During the last couple days of the animation, you can see District of Columbia take off much quicker than it had the rest of the duration. Now, the district is reporting nearly 2,000 cases, an increase of 400% from two weeks ago [1]. Most counties in the United States are not experiencing the level that the largest counties and metropolitan areas are. This means that if counties enact stay-at-home and social distancing measures locally, they can flatten the curve- at least locally.