Introduction

This exploration is based on a project from DataCamp.com, found here: https://projects.datacamp.com/projects/870. The goal here is to explore the trends in COVID-19 cases over time by country. The data is regularly updated, and I pulled an updated version of it on November 27th, 2020. Data accessible here: https://github.com/RamiKrispin/coronavirus/tree/master/csv.

A piece of the data:
date province country lat long type cases
2020-01-22 Anhui China 31.8257 117.2264 confirmed 1
2020-01-23 Anhui China 31.8257 117.2264 confirmed 8
2020-01-24 Anhui China 31.8257 117.2264 confirmed 6
2020-01-25 Anhui China 31.8257 117.2264 confirmed 24
2020-01-26 Anhui China 31.8257 117.2264 confirmed 21

Confirmed Cases Throughout the World

To visualize the trend in confirmed COVID-19 cases worldwide, I create a new data frame consisting of date, worldwide total on a daily basis, and the running cumulative number of cases up to a given date. Below is a plot of cumulative worldwide cases over time.

plot_cumulative <- ggplot(cumulative, aes(x=Date, y=`Cumulative Cases`)) +
geom_line() +
  xlab("Date") + ylab("Cumulative Cases") +
  labs(title = "Cumulative Confirmed Cases Worldwide",
       caption = "Johns Hopkins University Center for Systems Science and Engineering Coronavirus repository") +
  scale_x_date(date_labels = "%b %Y", breaks = "2 months") + 
  scale_y_continuous(n.breaks=15, labels = comma)

ggplotly(plot_cumulative, tooltip = c("x","y"))

Data Source: Johns Hopkins University Center for Systems Science and Engineering Coronavirus repository

Comparing China with the Rest of the World

The DataCamp.com project that I used for guidance on this exploration prompted a comparison of China versus the rest of the world. Early on in the pandemic, that was an interesting comparison because China was ahead of the rest of the world. But as the year has progressed, the visualization is not so interesting:

china_v_world <- ggplot(cumulative_comparison) +
geom_line(aes(x=Date, y=`Cumulative Cases`, group=`Is China?`, color=`Is China?`)) +
  xlab("Date") + ylab("Cumulative Cases") +
  labs(title = "Cumulative Confirmed Cases Worldwide") +
  scale_x_date(date_labels = "%b %Y", breaks = "2 months") + 
  scale_y_continuous(n.breaks=15, labels = comma)

ggplotly(china_v_world, tooltip = c("x","y"))

Let’s take a look at this comparison early on - from the beginning until May 2020. This comparison is much more interesting because it shows the point at which China cases began to plateau and cases in other countries began increasing exponentially.

china_v_world_zoom <- ggplot(cumulative_comparison) +
geom_line(aes(x=Date, y=`Cumulative Cases`, group=`Is China?`, color=`Is China?`)) +
  xlab("Date") + ylab("Cumulative Cases") +
  labs(title = "Cumulative Confirmed Cases Worldwide") +
  scale_x_date(date_labels = "%b %Y", breaks = "1 month", limits=c(as.Date("2020-01-22"),as.Date("2020-05-01"))) + 
  scale_y_continuous(n.breaks=10, labels = comma, limits = c(0,200000))

ggplotly(china_v_world_zoom, tooltip = c("x","y"))

Which countries have been hit the hardest?

Let’s take a look at the ranking of countries based on total cases as of November 27th, 2020.

My GitHub repository for this project can be found here.

Check out my website, where I explore my interests in MUSIC + ART + DATA.