This exploration is based on a project from DataCamp.com, found here: https://projects.datacamp.com/projects/870. The goal here is to explore the trends in COVID-19 cases over time by country. The data is regularly updated, and I pulled an updated version of it on November 27th, 2020. Data accessible here: https://github.com/RamiKrispin/coronavirus/tree/master/csv.
| date | province | country | lat | long | type | cases |
|---|---|---|---|---|---|---|
| 2020-01-22 | Anhui | China | 31.8257 | 117.2264 | confirmed | 1 |
| 2020-01-23 | Anhui | China | 31.8257 | 117.2264 | confirmed | 8 |
| 2020-01-24 | Anhui | China | 31.8257 | 117.2264 | confirmed | 6 |
| 2020-01-25 | Anhui | China | 31.8257 | 117.2264 | confirmed | 24 |
| 2020-01-26 | Anhui | China | 31.8257 | 117.2264 | confirmed | 21 |
To visualize the trend in confirmed COVID-19 cases worldwide, I create a new data frame consisting of date, worldwide total on a daily basis, and the running cumulative number of cases up to a given date. Below is a plot of cumulative worldwide cases over time.
plot_cumulative <- ggplot(cumulative, aes(x=Date, y=`Cumulative Cases`)) +
geom_line() +
xlab("Date") + ylab("Cumulative Cases") +
labs(title = "Cumulative Confirmed Cases Worldwide",
caption = "Johns Hopkins University Center for Systems Science and Engineering Coronavirus repository") +
scale_x_date(date_labels = "%b %Y", breaks = "2 months") +
scale_y_continuous(n.breaks=15, labels = comma)
ggplotly(plot_cumulative, tooltip = c("x","y"))
Data Source: Johns Hopkins University Center for Systems Science and Engineering Coronavirus repository
The DataCamp.com project that I used for guidance on this exploration prompted a comparison of China versus the rest of the world. Early on in the pandemic, that was an interesting comparison because China was ahead of the rest of the world. But as the year has progressed, the visualization is not so interesting:
china_v_world <- ggplot(cumulative_comparison) +
geom_line(aes(x=Date, y=`Cumulative Cases`, group=`Is China?`, color=`Is China?`)) +
xlab("Date") + ylab("Cumulative Cases") +
labs(title = "Cumulative Confirmed Cases Worldwide") +
scale_x_date(date_labels = "%b %Y", breaks = "2 months") +
scale_y_continuous(n.breaks=15, labels = comma)
ggplotly(china_v_world, tooltip = c("x","y"))
Let’s take a look at this comparison early on - from the beginning until May 2020. This comparison is much more interesting because it shows the point at which China cases began to plateau and cases in other countries began increasing exponentially.
china_v_world_zoom <- ggplot(cumulative_comparison) +
geom_line(aes(x=Date, y=`Cumulative Cases`, group=`Is China?`, color=`Is China?`)) +
xlab("Date") + ylab("Cumulative Cases") +
labs(title = "Cumulative Confirmed Cases Worldwide") +
scale_x_date(date_labels = "%b %Y", breaks = "1 month", limits=c(as.Date("2020-01-22"),as.Date("2020-05-01"))) +
scale_y_continuous(n.breaks=10, labels = comma, limits = c(0,200000))
ggplotly(china_v_world_zoom, tooltip = c("x","y"))