Data

Here I use the data available at https://github.com/nytimes/covid-19-data to highlight a few aspects of the Covid-19 pandemic in Nassau county, New York. I show the R code that I have used.

I begin by making the tidyverse package available for use:

library(tidyverse)

Next, I download the data provided by The New York Times for all US counties and prepare a smaller data frame for Nassau county, NY, by specifying its FIPS code, which happens to be 36059:

corona.us.counties_2020 <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2020.csv")
corona.us.counties_2021 <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2021.csv")
corona.us.counties_2022 <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties-2022.csv")
corona.us.counties <- rbind(corona.us.counties_2020, corona.us.counties_2021, corona.us.counties_2022)
mycounty.mystate <- filter(corona.us.counties, fips == params$fips)
head(mycounty.mystate)
## # A tibble: 6 × 6
##   date       county state    fips  cases deaths
##   <date>     <chr>  <chr>    <chr> <dbl>  <dbl>
## 1 2020-03-05 Nassau New York 36059     1      0
## 2 2020-03-06 Nassau New York 36059     4      0
## 3 2020-03-07 Nassau New York 36059     4      0
## 4 2020-03-08 Nassau New York 36059     5      0
## 5 2020-03-09 Nassau New York 36059    17      0
## 6 2020-03-10 Nassau New York 36059    19      0

The crucial variables are cases and deaths, representing cumulative counts. Note that the data are arranged chronologically and begin on Thursday, March 05, 2020, the day the first case was recorded in Nassau county, New York.

Graphs of Cumulative Covid-19 Cases and Deaths

Cumulative Covid-19 Cases

As an example of the use of the xts and dygraphs packages, I present an interactive graph of the cumulative number of Covid-19 cases. The xts package is widely used to work with time series data. The dygraphs package creates, inter alia, interactive graphs when it is fed an xts data object.

library(xts)
library(dygraphs)
mycounty.mystate.coredata <- mycounty.mystate %>%
  select(cases, deaths) 
mycounty.mystate.index <- as.Date(mycounty.mystate$date, "%m/%d/%Y")

mycounty.mystate.xts <- xts(mycounty.mystate.coredata, order.by= mycounty.mystate.index)

dygraph(mycounty.mystate.xts$cases, main = "Cumulative Cases", width = 500, height = 300) %>%
  dyRangeSelector() %>%
  dyHighlight(highlightCircleSize = 5,
              highlightSeriesBackgroundAlpha = 0.2,
              hideOnMouseOut = FALSE)
# To show all the variables in the `xts` object, delete `$cases`.

Note, again, that this graph is interactive! If you glide your cursor over the graph, you should see an ever-changing label giving the cumulative cases of Covid-19 for the relevant day. You should also be able to drag the sliders on the graph’s horizontal axis to choose the beginning and end of the chart’s time period.

The next graph shows the same data as the one above, but using a logarithmic scale (with one unit of height along the vertical scale representing a doubling of the plotted variable). Moreover, this is a static – that is, non-interactive – graph.

For my static graphs, I use the ggplot2 package, which is part of the tidyverse package that I have already made ready for use.

ggplot(data = mycounty.mystate) +    
  geom_point(mapping = aes(x = date, y = cases), color = "blue") +
  scale_y_continuous(trans = 'log2') +
  labs(x = "Date", y = "Cumulative Cases", title = "The Spread of the Virus", subtitle = "Logarithmic Scale")

Cumulative Covid-19 Deaths

The next graph begins as non-interactive, but becomes interactive thanks to the ggplotly command of the Plotly package.

p <- ggplot(data = mycounty.mystate) +  
  geom_line(mapping = aes(x = date, y = deaths)) +
  labs(x = "Date", y = "Cumulative Deaths", title = "The Toll", subtitle = "Linear Scale")

#install.packages("plotly")
library(plotly)
ggplotly(p)

And in logarithmic scale:

ggplot(data = mycounty.mystate) +    
  geom_point(mapping = aes(x = date, y = deaths), color = "blue") +
  scale_y_continuous(trans = 'log2') +
  labs(x = "Date", y = "Cumulative Deaths", title = "The Toll", subtitle = "Logarithmic Scale")

And, having graphed the data for Covid-19 cases and deaths, it is not too much of a detour to look at the Case Fatality Rate, which is deaths as a percent of cases:

ggplot(data = mycounty.mystate) +  
  geom_line(mapping = aes(x = date, y = 100*(deaths/cases))) +
  labs(x = "Date", y = "Deaths as a percent of Cases", title = "Case Fatality Rate")

Note that this rate would depend heavily on the number of tests being done and on the criteria used to determine who gets tested. Moreover, this case fatality rate is cumulative deaths as a percent of cumulative cases. As time passes and the pandemic matures, day to day changes in these cumulative numbers will be relatively inconsequential. Consequently, the CFR, being a ratio of slow-changing numbers, will itself be slow to change.

The one exception to this was August 6, 2020 when the cumulative number of deaths actually fell by 512, probably because of some reassessment of the data.

Daily Numbers of New Covid-19 Cases and Deaths

The increase in the cumulative totals from one date to the next gives the increment for the second of the two dates. The seven-day averages of the daily increases are also calculated.

mycounty.mystate <- mycounty.mystate %>%
  arrange(date) %>%   # This is not strictly necessary
  mutate(increase.in.cases = cases - lag(cases), 
         increase.in.deaths = deaths - lag(deaths),
         increase.in.cases.7days = (cases - lag(cases, 7))/7,
         increase.in.deaths.7days = (deaths - lag(deaths, 7))/7)

Now the daily tallies of new cases and deaths can be graphed, with the seven-day averages overlaid in blue:

Daily Tally of New Covid-19 Cases and the Seven-Day Average

ggplot(data = mycounty.mystate) +  
  geom_line(mapping = aes(x = date, y = increase.in.cases)) +
  geom_line(mapping = aes(x = date, y = increase.in.cases.7days), color = "blue", linetype = 1, size = 1.5) +
  labs(x = NULL, y = NULL, title = "The Daily Increase in Cases and its Seven-Day Average")

Daily Tally of New Covid-19 Deaths and the Seven-Day Average

ggplot(data = mycounty.mystate) +  
  geom_line(mapping = aes(x = date, y = increase.in.deaths)) +
  geom_line(mapping = aes(x = date, y = increase.in.deaths.7days), color = "blue", linetype = 1, size = 1.5) +
  labs(x = NULL, y = NULL, title = "The Daily Increase in Deaths and its Seven-Day Average") +
  ylim(0, NA)

The Toughest Days So Far in Nassau County, NY

These Were the Days with the Most New Cases

mycounty.mystate %>% select(date, increase.in.cases) %>% arrange(increase.in.cases) %>% na.omit() %>% tail()
## # A tibble: 6 × 2
##   date       increase.in.cases
##   <date>                 <dbl>
## 1 2022-01-09              6668
## 2 2021-12-30              6861
## 3 2022-01-06              6983
## 4 2021-12-31              7346
## 5 2022-01-01              7716
## 6 2021-12-26              8121

These Were the Days with the Most New Deaths

mycounty.mystate %>% select(date, increase.in.deaths) %>% arrange(increase.in.deaths) %>% na.omit() %>% tail()
## # A tibble: 6 × 2
##   date       increase.in.deaths
##   <date>                  <dbl>
## 1 2020-04-14                108
## 2 2020-04-10                112
## 3 2020-04-06                139
## 4 2020-04-19                221
## 5 2020-04-04                258
## 6 2022-11-11                557

The Last Four Weeks

mycounty.mystate %>% 
  select(date, increase.in.cases, increase.in.deaths) %>% 
  tail(n = 28) %>%
  knitr::kable(caption = paste("The Covid-19 Pandemic During the Last Four Weeks:", params$county, "County,", params$state_short)) %>%
  kableExtra::kable_styling(full_width = FALSE)
The Covid-19 Pandemic During the Last Four Weeks: Nassau County, NY
date increase.in.cases increase.in.deaths
2022-11-13 309 0
2022-11-14 230 0
2022-11-15 313 0
2022-11-16 389 0
2022-11-17 392 0
2022-11-18 395 0
2022-11-19 406 0
2022-11-20 323 0
2022-11-21 310 0
2022-11-22 299 0
2022-11-23 353 0
2022-11-24 577 0
2022-11-25 386 0
2022-11-26 252 0
2022-11-27 297 0
2022-11-28 307 0
2022-11-29 360 0
2022-11-30 857 0
2022-12-01 829 0
2022-12-02 666 0
2022-12-03 639 0
2022-12-04 408 0
2022-12-05 403 0
2022-12-06 527 0
2022-12-07 606 0
2022-12-08 644 36
2022-12-09 626 0
2022-12-10 488 0

Last Four Weeks: Daily Tally of New Cases

ggplot(data = tail(mycounty.mystate, 28)) +  
  geom_line(mapping = aes(x = date, y = increase.in.cases)) +
  expand_limits(y = 0)

Last Four Weeks: Daily Tally of New Deaths

ggplot(data = tail(mycounty.mystate, 28), mapping = aes(x = date, y = increase.in.deaths)) +   
  geom_col() +
  scale_y_continuous(breaks = 0:5)

Conclusion

Needless to say, the code here can be used to present a similar profile for any other US county, by inserting the appropriate fips number for the county in the first of my code chunks.

This essay is meant to help me remember the R commands I used in it. I am an amateur “data scientist” and I work on simple projects on occasion. As a result of the long gaps between my “projects”, I tend to forget what I learn.

R Techniques Used