We are approaching one full year since the COVID-19 pandemic shut down the United States. We have come to a sort of hybrid state with many people back in school or at work but still meeting remotely, social distancing, and wearing masks as vaccines are slowly distributed. Many are wondering how we got here. Back in the late spring of 2020, the strictest lockdowns were being lifted, but the future of the pandemic was unclear. The University of Minnesota Center for Infectious Disease Research and Policy (CIDRAP) predicted three possible scenarios for the future of the pandemic depending on regional responses and the behavior of the virus itself.
The COVID-19 pandemic in the United States has been well-documented both in terms of cases and response. Now, nearly one year since it began, I want to use this data to determine which of CIDRAP’s predictions most closely matches what has occurred and will continue to occur for the next several months. Not only will I compare the shape of the COVID-19 cases over time curves to the predicted scenarios, but I will also overlay the COVID-19 cases over time curves with data on government policy, holidays, and public events to see how these factors have influenced the number of COVID-19 cases. I want to answer how the government and the public’s responses to the COVID-19 pandemic have affected its course.
COVID-19 case data are available on CRAN and GitHub in the ‘coronavirus’ package. The dataset comes as a tidy format R package or in CSV form and contains information on COVID-19 cases reported by nation-state/province. The data is pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) and is updated daily, which can be retrieved easily with the command ‘update_dataset()’.
Policy data, human activity, vaccination rates, and other similar data are available on GitHub from Our World in Data. The dataset is mostly in Python but is also available in XSLX, CSV, and JSON form, all of which are updated daily. COVID-19 case data is also included, pulled from the JHU CSSE. For the purposes of this study, the dataset was downloaded once in CSV format on March 12, 2021.
The following packages and datasets must be loaded
library(tidyverse)
library(coronavirus)
library(fmsb)
coronavirus::update_dataset(silence=TRUE) # Retrieve the most current data for coronavirus cases from coronavirus dev package
knitr::opts_chunk$set(cache=TRUE) # cache the results for quick compiling
Retrieve ‘coronavirus’ data from the coronavirus package. Import Our World in Data dataset from CSV. The CSV was viewed in Excel to determine and set column types.
data(coronavirus)
owid = read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv",col_names=TRUE,col_types="ccc?nnnnnnnnnnnnnnnnnnnnnnnnnnnnncnnnnnnnnnnnnnnnnnnnnnnnnn",skip_empty_rows = FALSE)
Filter ‘coronavirus’ data to United States confirmed cases
coronavirus <- filter(coronavirus,country=="US"&type=="confirmed")
Change units to thousands of cases for simpler y-axes
coronavirus$cases <- coronavirus$cases/1000
Filter ‘owid’ data to United States
owid_us <- filter(owid,location=="United States")
Add ‘height’ column to determine height of bars for bar graph to act as heat map
max(coronavirus$cases)
## [1] 300.473
owid_us$height <- 301
Define milestones (holidays and events)
memorial_day <- as.Date("2020-05-25")
christmas <- as.Date("2020-12-25")
thanksgiving <- as.Date("2020-11-26")
labor_day <- as.Date("2020-09-07")
election_day <- as.Date("2020-11-03")
first_lockdown <- as.Date("2020-03-13") # AJMC covid timeline
george_floyd <- as.Date("2020-05-25")
capitol_stormed <- as.Date("2021-01-06")
inauguration <- as.Date("2021-01-20")
holidays <- data.frame(Date=c(memorial_day,labor_day,election_day,thanksgiving,christmas,first_lockdown,george_floyd,capitol_stormed,inauguration),Event=c("Memorial Day","Labor Day","Election Day","Thanksgiving","Christmas","Trump declares National Emergency","George Floyd killed","Capitol building stormed","Biden inaugurated"))
Make sure text won’t overlap
baseline = min(coronavirus$cases)
delta = 0.05 * diff(range(coronavirus$cases))
holidays$ymin = baseline
holidays$timelapse = c(diff(holidays$Date),Inf)
holidays$bump = holidays$timelapse < 4*370 # ~4 years
offsets <- rle(holidays$bump)
holidays$offset <- unlist(mapply(function(l,v) {if(v){(l:1)+1}else{rep(1,l)}}, l=offsets$lengths, v=offsets$values, USE.NAMES=FALSE))
holidays$ymax <- holidays$ymin + holidays$offset * delta
Define base line graph of Cases over Time
base_line <- geom_line(data=coronavirus,aes(date,cases,alpha=0.8)) # base map
Define heat map of Government Stringency Index
heat_stringency <- geom_bar(data=owid_us,stat="identity",width=1,aes(date,height,fill=owid_us$stringency_index)) # stringency index heat map
Plot line graph over heat map and add milestones
ggplot()+heat_stringency+base_line+ # heat map of stringency index + line of cases
theme_classic()+ # remove gray background and grid lines
scale_fill_viridis_c("Stringency Index",begin=0.4,end=1,option="magma")+ # scale of heat map
labs(x="Date",y="Cases (thousands)",title="Confirmed Cases of COVID-19 in U.S.",caption=("Source: JHSU CCSE and Our World in Data."))+ # labels
scale_x_date(date_breaks = "1 month",date_labels="%b %y",limits=c(as.Date("2020-03-01"),as.Date("2021-03-01")))+ # change x axis scale
scale_y_continuous(limits=c(0,301))+ # change y axis scale
geom_segment(data = holidays, mapping=aes(x=Date, y=ymin, xend=Date, yend=ymax)) + # add milestone line segments
geom_point(data = holidays, mapping=aes(x=Date,y=ymax), size=1) + # add milestone points
geom_text(data = holidays, mapping=aes(x=Date, y=ymax, label=Event), hjust=-0.1, vjust=0.1, size=3)+ # add milestone labels
guides(alpha=FALSE) # remove default legend for "alpha"
Filter owid dataset for U.S., Italy, and Japan, and calculate cases per capita
owid_us2 <- filter(owid,location=="United States"&date>=as.Date("2020-03-01")) #filter for country and after March 1
owid_us2$total_cases_per_capita <- owid_us2$total_cases/owid_us2$population # calculate cases per capita as cases/population
owid_it <- filter(owid,location=="Italy"&date>=as.Date("2020-03-01")) #filter for country and after March 1
owid_it$total_cases_per_capita <- owid_it$total_cases/owid_it$population # calculate cases per capita as cases/population
owid_jap <- filter(owid,location=="Japan"&date>=as.Date("2020-03-01")) #filter for country and after March 1
owid_jap$total_cases_per_capita <- owid_jap$total_cases/owid_jap$population # calculate cases per capita as cases/population
Set bar graph heights as one third of the top of the graph
owid_us2$height <- max(owid_us2$total_cases_per_capita)/3 # set bar graph height to one third of top of graph
owid_it$height <- max(owid_us2$total_cases_per_capita)/3
owid_jap$height <- max(owid_us2$total_cases_per_capita)/3
Combine three country datasets into one graph for easy plotting
owid_us2$data <- 'United States'
owid_it$data <- 'Italy'
owid_jap$data <- 'Japan'
owid_all2 <- rbind.data.frame(owid_us2,owid_it,owid_jap)
Repeat the above steps with log of total cases per capita to more easily view peaks at lower numbers
owid_us3$total_cases_per_capita <- log(owid_us3$total_cases/owid_us3$population) # example
Plot U.S., Italy, and Japan case per capita data over time overlaid on heat maps of stringency index and label each heat map.
ggplot(owid_all2)+
theme_classic()+
geom_bar(stat="identity",position="stack",width=1,aes(date,height,fill=stringency_index))+
scale_fill_viridis_c("Stringency Index",begin=0.4,end=1,option="magma",aesthetics = "fill")+
geom_line(aes(date,total_cases_per_capita,color=data))+
scale_color_viridis_d("Country",begin=0,end=0.7,option="E",aesthetics = "color")+
labs(x="Date",y="Total Cases per capita",title="COVID-19 Cases in U.S., Italy, and Japan",caption=("Source: JHSU CCSE and Our World in Data."))+
scale_x_date(date_breaks = "1 month",date_labels="%b %y",limits=c(as.Date("2020-03-01"),as.Date("2021-03-01")))+
scale_y_continuous(limits=c(0,max(owid_us2$total_cases_per_capita)))+
annotate("text",x=as.Date("2020-06-01"),y=0.05,label="Japan\n\n\n\n\n\nItaly\n\n\n\n\nUnited States")
Repeat the above graph on a log scale.
ggplot(owid_all3)+
theme_classic()+
geom_bar(stat="identity",position="stack",width=1,aes(date,height,fill=stringency_index))+
scale_fill_viridis_c("Stringency Index",begin=0.4,end=1,option="magma",aesthetics = "fill")+
geom_line(aes(date,total_cases_per_capita,color=data))+
scale_color_viridis_d("Country",begin=0,end=0.7,option="E",aesthetics = "color")+
labs(x="Date",y="log of Total Cases per capita",title="COVID-19 Cases in U.S., Italy, and Japan (log)",caption=("Source: JHSU CCSE and Our World in Data."))+
scale_x_date(date_breaks = "1 month",date_labels="%b %y",limits=c(as.Date("2020-03-01"),as.Date("2021-03-01")))+
scale_y_continuous(limits=c(min(owid_us3$total_cases_per_capita),0))+
annotate("text",x=as.Date("2020-06-01"),y=-7.5,label="United States\n\n\n\n\nItaly\n\n\n\nJapan")
This plot compares the number of confirmed COVID-19 cases in the U.S. to levels of policy stringency, called the stringency index, and milestones in the year like events and holidays.
A strong correlation cannot be drawn between number of cases and stringency index. During some periods of high stringency, cases seem to somewhat stall, but increases and decreases in stringency have no consistent effect on the number of cases.
Some correlation is seen between important holidays in November and December, but this is not seen for other, somewhat less significant holidays.
Some small periodicity is also seen on a weekly basis of peaks and valleys, indicating either peaks in testing or peaks in data aggregation around the work week.
The first multi-heat/line plot indicates the total number of COVID-19 cases per capita in the United States, Italy, and Japan over the last year. Italy and Japan were hit by COVID-19 earlier than the U.S. but also took more extreme measures to prevent its spread, which is not necessarily shown here (pre-March 2020).
Spikes in the increase of COVID-19 cases can be seen in the fall to winter months in both Italy and the United States, but the scale of the graph makes it difficult to determine seasonal changes in Japan.
The second multi-heat/line plot indicates the total number of COVID-19 cases per capita in the United States, Italy, and Japan over the last year on a logarithmic scale. Italy and Japan were hit by COVID-19 earlier than the U.S. but also took more extreme measures to prevent its spread, which the logarithmic scale shows more clearly, though some measures are not necessarily shown here (pre-March 2020).
In Italy and Japan, trends in case data seem to more closely correlate with the stringency of government policy than in the U.S. Periods of high stringency occur in reaction to rising case numbers and are followed by periods of stagnating case numbers; the opposite is seen for periods of low stringency.
The graphs indicate some level of success by government policy in slowing the spread of COVID-19. While it is hard to predict what COVID-19 will look like in the later months of 2021 and 2022, COVID-19 case data in the U.S. most closely resembles CIDRAP’s predicted Scenario 2, which expected large fall peaks. With the widespread distribution of multiple vaccines, hopefully future peaks will be as minimal as possible.