Abstract

    This project analyzes and visualizes regional death rates for COVID19 to make comparisons to the regional death rate statistics for several leading causes of death unrelated to the pandemic. The motivation of this analysis is to make tangible through a series of visualizations the impact that COVID19 has had on different regions of the United States. This analysis will utilize the COVID-19 dataset by USAfacts that aggregates the COVID-19 cases by county from data collected by US health agencies. A comparison with leading causes of death in the US will be made from regional statistics made available from the Center for Disease Control’s (CDC) public use database, the Wide-ranging ONline Data for Epidemiological Research (WONDER) database. Through WONDER, the CDC maintains detailed mortality data for many demographic variables. However, for this analysis, the querry of WONDER was limited to the top 15 causes of death in the US by county and year (2016-2018).

Introduction

    Coverage of the COVID-19 pandemic has been a leading topic of discussion and generated numerous headlines. One such headline read, “COVID-19 is now the leading cause of death in the United States” on 4/10/20 via livescience.com. However, another more recent headline in the Wall Street Journal reads “Leading Cause of Death in U.S.? Hint: It Isn’t Covid-19”. Caught in the middle of conflicting headlines, what is the concerned public to assume is correct?     The motivation of this analysis is to present numbers and visualizations to clarify the impact that the COVID-19 pandemic has had on the US thus far (the data used in this analysis was updated asrecently as May 8th 2020).

The Data

  1. COVID-19 by US County: USAFacts is a non-for-profit organization that cleans, organizes and maintains data on the American population, government and society. USAFacts collates data from various government agencies to present clean and coherent datasets and corresponding visualizations free of charge. In light of the COVID-19 pandemic, USAFacts has been presenting data on confirmed cases and related deaths in the US. This analysis makes use of a dataset unique to USAFacts that organizes the data on a local level with incidences organizes by county with corresponding. Importantly, the data includes the FIPs number for the county; this facilitates comparisons of the dataset with existing government datasets. The USAFacts Coronavirus Locations data set can be found here: USAFacts.org. The analysis presented here levies the ‘Deaths’ and ‘County population for population adjustments (2019 Census estimates)’ data files.
  2. Leading Causes of Death by US County The CDC maintains WONDER, a public use database with data for various epidemiological topics. For example, births, vaccinations and mortality data are all open to the public for free access. Mortality data for the the leading deaths by US county was pulled from WONDER. First, the WONDER was querried for the 15 leading causes of death in the US. The resulting text file yielded summary statistics and the corresponding International Classification of Disease (ICD) codes for each of the causes of death. Next, WONDER was manually querried for each of the 15 causes of death to group data by US county and year (2016-2018) for the corresponding ICD codes.

Data Acquisition & R Environment

R Environment: This project will use tidyverse methods to transform data, ggplot2 for basic visualizations and usmap for spatial representation of county data.
Load the necessary libraries in to the R evironment:

COVID-19 by US County: This analysis will use two .csv files from USAFact.org: ‘covid_deaths_usafacts.csv’ and ‘covid_county_population_usafacts.csv’. The data files were downloaded from USAFact.org and are made available via the author’s github account. The following code will load the data into the R environment as a data.frame:

Leading Causes of Death by US County Queries to address this analysis were made to the CDC’s WONDER database. Output from WONDER was exported as .txt files and is made available via the author’s github account. There is one .txt file with summary statistics and ICD codes for the 15 leading causes of death in the US for the year 2018. An additional 15 .txt files are the result of WONDER queries for each leading cause of death’s ICD code(s) grouped by county and year for the years 2016-2018. Three recent years were pulled in an effort to get mortality data estimates for an many counties as possible. The following code will load the leading causes of death data into the R environment as a data.frame:

##                                                  Causes Deaths Population
## 1          #Diseases of heart (I00-I09,I11,I13,I20-I51) 655381  327167434
## 2                        #Malignant neoplasms (C00-C97) 599274  327167434
## 3 #Accidents (unintentional injuries) (V01-X59,Y85-Y86) 167127  327167434
## 4         #Chronic lower respiratory diseases (J40-J47) 159486  327167434
## 5                   #Cerebrovascular diseases (I60-I69) 147810  327167434
## 6                              #Alzheimer disease (G30) 122019  327167434
##   CrudeRate
## 1     200.3
## 2     183.2
## 3      51.1
## 4      48.7
## 5      45.2
## 6      37.3


The above data.frame gives total deaths and national mortality rates for the top 15 leading causes of death in the US according to the CDC. This table was referenced to query WONDER for mortality statistics grouped by year and county for all counties and the years 2016-2018. The information above was necessary to reference the ICD codes for each top cause of death (alphanumerics in parentheses in the feature Causes).

The following code will access the WONDER query results: text files for each leading cause of death with three years of data for each US county

#filenames for query results text files
fileNames <- c("Accidents.txt", "Alzheimer.txt", "ChonicLowerResp.txt", "Diabetes.txt", "Heart.txt",
               "Hypertension.txt", "InfluenzaPneumonia.txt", "Kidney.txt", "Liver.txt", "Overall.txt",
               "Parkinson.txt", "Pneumonitis.txt", "Septicemia.txt", "Stroke.txt", "malNeoplasms.txt",
               "selfHarm.txt")
#stem address for author's github
gitStem <- "https://raw.githubusercontent.com/SmilodonCub/DATA607/master/DATA607_finalProject/"
#initialize a data.frame with appropriate column names
leadingCausesbyCounty_df <- data.frame(fips=numeric(),
                 Deaths=numeric(), 
                 Population=numeric(), 
                 Crude.Rate=numeric(),
                 Cause=numeric(),
                 stringsAsFactors=FALSE) 
#A for loop to read in each text file, extract data, format the data and add tothe data.frame
for (afile in seq(1,length(fileNames))){
fileURL <- paste0(gitStem,fileNames[afile])
disLabel <- str_remove(fileNames[afile],".txt")
df <- data.frame(read.delim( fileURL, sep = "\t", header = T )) #read a text file to a data.frame
df <- df %>%
  select( -Notes, -Year.Code, -County ) %>% #deselect irrelevant features
  filter( County.Code != 'NA') %>% #filter out 'NA' county codes
  mutate_if(is.factor, as.character) %>%
  mutate( Deaths = na_if( Deaths, "Suppressed"),
          Deaths = na_if( Deaths, "Missing"),
          Crude.Rate = na_if( Crude.Rate, "Suppressed"),
          Crude.Rate = na_if( Crude.Rate, "Unreliable"),
          Crude.Rate = na_if( Crude.Rate, "Missing")) %>% #filter out records with poor format
  rename( fips = County.Code ) %>% #rename countycode to work with usmap
  mutate( fips = as.character( fips ) ) %>%
  mutate( fips = ifelse( nchar( fips ) ==4, paste( '0', fips, sep = ""), fips ) ) %>% #some fips numbers need a '0' added in front. usmap requires character of length 5 for county
  mutate_if(is.character, as.numeric) %>%
  group_by( fips ) %>% #group by fips county number and get the best estimates the data will allow
  summarise( Deaths = mean( Deaths, na.rm = T), 
             Population = mean( Population, na.rm = T),
             Crude.Rate = mean( Crude.Rate, na.rm = T)) %>%
  mutate( Cause = disLabel )
  leadingCausesbyCounty_df <- rbind(leadingCausesbyCounty_df, df) #rbind the data.frame to the big guy.
}

COVID-19 by US County: exploration & Analysis

Explore the USAFacts COVID-19 Deaths by US County & County Population data frames through a series of visualizations.

Total Deaths per US County

First, Explore the basic structure of the COVID_DeathbyCounty_df data.frame:

##   [1] "countyFIPS"  "County.Name" "State"       "stateFIPS"   "X1.22.20"   
##   [6] "X1.23.20"    "X1.24.20"    "X1.25.20"    "X1.26.20"    "X1.27.20"   
##  [11] "X1.28.20"    "X1.29.20"    "X1.30.20"    "X1.31.20"    "X2.1.20"    
##  [16] "X2.2.20"     "X2.3.20"     "X2.4.20"     "X2.5.20"     "X2.6.20"    
##  [21] "X2.7.20"     "X2.8.20"     "X2.9.20"     "X2.10.20"    "X2.11.20"   
##  [26] "X2.12.20"    "X2.13.20"    "X2.14.20"    "X2.15.20"    "X2.16.20"   
##  [31] "X2.17.20"    "X2.18.20"    "X2.19.20"    "X2.20.20"    "X2.21.20"   
##  [36] "X2.22.20"    "X2.23.20"    "X2.24.20"    "X2.25.20"    "X2.26.20"   
##  [41] "X2.27.20"    "X2.28.20"    "X2.29.20"    "X3.1.20"     "X3.2.20"    
##  [46] "X3.3.20"     "X3.4.20"     "X3.5.20"     "X3.6.20"     "X3.7.20"    
##  [51] "X3.8.20"     "X3.9.20"     "X3.10.20"    "X3.11.20"    "X3.12.20"   
##  [56] "X3.13.20"    "X3.14.20"    "X3.15.20"    "X3.16.20"    "X3.17.20"   
##  [61] "X3.18.20"    "X3.19.20"    "X3.20.20"    "X3.21.20"    "X3.22.20"   
##  [66] "X3.23.20"    "X3.24.20"    "X3.25.20"    "X3.26.20"    "X3.27.20"   
##  [71] "X3.28.20"    "X3.29.20"    "X3.30.20"    "X3.31.20"    "X4.1.20"    
##  [76] "X4.2.20"     "X4.3.20"     "X4.4.20"     "X4.5.20"     "X4.6.20"    
##  [81] "X4.7.20"     "X4.8.20"     "X4.9.20"     "X4.10.20"    "X4.11.20"   
##  [86] "X4.12.20"    "X4.13.20"    "X4.14.20"    "X4.15.20"    "X4.16.20"   
##  [91] "X4.17.20"    "X4.18.20"    "X4.19.20"    "X4.20.20"    "X4.21.20"   
##  [96] "X4.22.20"    "X4.23.20"    "X4.24.20"    "X4.25.20"    "X4.26.20"   
## [101] "X4.27.20"    "X4.28.20"    "X4.29.20"    "X4.30.20"    "X5.1.20"    
## [106] "X5.2.20"     "X5.3.20"     "X5.4.20"     "X5.5.20"     "X5.6.20"    
## [111] "X5.7.20"     "X5.8.20"
## [1] 3195  112

From the output above, this data.frame holds 3195 records, one for each US county. Aside from the features that hold label information (countyFIPS, Count.Name, State and stateFIPS), there is a column that holds the cumulative deaths in a county for each successive day starting with January 22nd 2020 through to May 8th 2020. This represents the most complete version of the data from USAFacts at the time of this analysis. However, more recent data may be accessed directly via USAFacts.org.

Next, examine the summary statistics and visualize the distribution of cumulative deaths by US county for the last column (most recent date) in the data.frame COVID_DeathbyCounty_df:

##           County.Name State X5.8.20
## 1        Kings County    NY    5902
## 2       Queens County    NY    5717
## 3        Bronx County    NY    3784
## 4     New York County    NY    2412
## 5         Cook County    IL    2197
## 6        Wayne County    MI    2028
## 7       Nassau County    NY    1918
## 8      Suffolk County    NY    1568
## 9  Los Angeles County    CA    1468
## 10       Essex County    NJ    1398
## 11      Bergen County    NJ    1329
## 12 Westchester County    NY    1191
## 13   Middlesex County    MA    1132
## 14   Fairfield County    CT    1006
## 15      Hudson County    NJ     940
##     X5.8.20       
##  Min.   :   0.00  
##  1st Qu.:   0.00  
##  Median :   0.00  
##  Mean   :  24.26  
##  3rd Qu.:   4.00  
##  Max.   :5902.00

From the basic summary statistics and the plots above, it can be concluded that the distribution of cummulative deaths by US county is skewed heavily to the right. The skew is so severe, that the y-axis must be rescaled as a log10 plot to visualize the ‘box’ in the boxplot. The majority of counties report 0 or very few deaths (Median: 0) while a tail with extreme outliers (Max: 5542.0) shifts the mean value to the right (Mean: 23.99). The data is far from a normal distribution, but perhaps it can be described as a Poisson distribution?
How to interpret this distribution? This distribution shows that for the vast majority of US counties, the cumulative deaths as of May 8th 2020 are non-existant or very low. However, there are a relatively few number of counties with very high number of deaths.

Visualize the counties by number of deaths:


The figures above demostrate the earlier finding clearly: The vast majority of US counties report very low incidences of COVID-19 fatalities. However, for several counties, the death are significantly much higher.

Next, examine the COVID_CountyPop_df data.frame:

## [1] "countyFIPS"  "County.Name" "State"       "population"
## [1] 3195    4

This data.frame holds 3195 records, one record for each US county. Most features carry categorical identification information (e.g. fips numbers). However, the feature population, carries numerical data with the 2019 census estimates for population in each county.

Visualize the population data:

##              County.Name State countyPop
## 1     Los Angeles County    CA  10039107
## 2            Cook County    IL   5150233
## 3          Harris County    TX   4713325
## 4        Maricopa County    AZ   4485414
## 5       San Diego County    CA   3338330
## 6          Orange County    CA   3175692
## 7      Miami-Dade County    FL   2716940
## 8          Dallas County    TX   2635516
## 9           Kings County    NY   2559903
## 10      Riverside County    CA   2470546
## 11          Clark County    NV   2266715
## 12         Queens County    NY   2253858
## 13           King County    WA   2252782
## 14 San Bernardino County    CA   2180085
## 15        Tarrant County    TX   2102515


The figures above show that, similar to the COVID-19 data, the distribution of population by county is heavily skewed to the right. Additionally and perhaps unsurprisingly, there is a remarkable similarity to the spatial distribution of the densely population counties and the counties with a high incidence of COVID-19 deaths.

Mortality Rates across US counties

How well does population correspond to high cases of COVID-19 deaths? This figure shows how COVID-19 death totals vary as a funcion of population:


The figure shows no obvious linear trend between death totals and population. However, knowing the population of each county, a cause-specific mortality rate can be calculated to compare death rates between counties in a way that accounts for population. The CDC defines a mortality rate as: \(\frac{Deaths occurring during a given time period}{Size of the population among which the deaths occurred}\ x 10^n\)

The following will calculate a conditional Mortality Rate for COVID-19 by county. The mortality will use the total death as of this analysis (May 8th 2020) with the understanding and acknowledgement that the rate will change. The rate will represent a temporary and conservative estimate mortality as the pandemic is still underway and mortalities past and present have yet to be reported.

This code will Calculate and visualize a cause-specific mortality rate for COVID-19 deaths by county:

##                    County.Name State mortRate
## 1              Randolph County    GA 309.8259
## 2                 Bronx County    NY 266.8158
## 3                 Early County    GA 255.1521
## 4                Queens County    NY 253.6540
## 5               Terrell County    GA 246.1611
## 6                 Kings County    NY 230.5556
## 7  St. John the Baptist Parish    LA 177.4167
## 8                 Essex County    NJ 174.9742
## 9              Richmond County    NY 170.9570
## 10                Union County    NJ 151.7055
## 11               Turner County    GA 150.2818
## 12             New York County    NY 148.0930
## 13            Dougherty County    GA 143.2534
## 14               Bergen County    NJ 142.5657
## 15              Passaic County    NJ 142.4797


The following code will generate a series of .png images to visualize the mortality by county for each date in the dataset. These images can be used to create an animation of the spread of COVID-19. The code is available here, but does not execute in this document. Rather the images were collated as a .gif outside of the RStudio environment.

COVID-19 Mortality by County for NorthEastern US

COVID-19 by County Summary

    This brief overview of the USAFacts COVID-19 Deaths by US County data set has illustrated a very powerful point about the nature of the US COVID-19 pandemic: The impact has not be felt equally across the county. The heavily right scewed distribution of cases per county (as of 5/8/20) shows that a relatively few out of many US counties have been hit particularly hard by the pandemic while the vast majority of counties have either reported no or a very small incidence of deaths.
    When only the total number of deaths per county are analyzed, the impact appears isolated to a few urban centers. For instance, the top 15 ranked counties by total number of deaths are all urban counties in major US cities New York City, Chicago, Los Angeles, Boston or counties within densely populated megalopolies. This information is critical for action against the pandemic, because it quantifies the overall impact and signifies where the need for supplies should be diverted to these heavily hit counties. However, a drawback to a total of death visualization is that, in not taking population density into acount, it gives the misleading impression COVID-19 is not penetrating less populous regions.
    Analyzing the data as a mortality rate (e.g. deaths per 100k people) shifts the perspective. In this analysis, crude mortality data was calculated as total cases from pandemic onset to May 8th 2020 divided by the most recent population estimate (from the 2019 census). On the one hand, many of the counties that had the highest rates we also counties with the highest total counts (e.g. NY’s Kings and Queens County); this illustrates that even when population is factored in to the amount of deaths, Broolyn and Queens are exceptionally high mortalities. However, when county population is considered, the mortality rate reveals several other counties that have been relatively hard hit. For example, the county that was found to have the highest crude mortality rate is Randolph County, Georgia with 21 deaths and a small population of only 6778 people. Several other small rural counties from Georgia were similarly ranked as having high Mortality rates. This information is equally powerful to understanding the spread of COVID-19 across the US: while the total deaths in a county is informative from a perspective of where to allocate resources to address immediate impacts, the distribution of mortality rates lends to understanding that even less populous counties are impacted by the virus and that protective measures should be maintained and remain in place irrespective of population.

Top 10 leading causes of Death

And now to shift focus to the second data source. Visualize and analyze the leading causes of death by US county.

National Incidence of Death and Mortality rates.

National Mortality Rates for the top 15 Leading Causes of Death in the US:


The figures above reflect nationwide impact of the leading causes of death. Diseases of the Heart and Malignant neoplasms both cause numbers of deaths and crude mortality rates that are much higher than the other leading causes of disease.
Next, visualize how the numbers of deaths and mortality rates vary across the US by county for a few of the leading causes of death: Diseases of the Heart, Malignant neoplasms, Accidents and Influenza

A closer look: Heart Disease

#Visualizing Heart Disease by County across the US
mapCause <- "Heart"
mapDat <- leadingCausesbyCounty_df %>%
  filter( Cause == mapCause ) %>%
  mutate( fips = as.character( fips ) ) %>%
  mutate( fips = ifelse( nchar( fips ) ==4, paste( '0', fips, sep = ""), fips ) )

totMapDat <- mapDat%>%
  select( fips, Deaths )
maxTotDat <- max( leadingCausesbyCounty_df$Deaths, na.rm = T )
p1 <- plot_usmap( regions = "counties", data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'Deaths by Heart Disease Across US Counties') +
  theme( legend.position = "right")
p3 <- plot_usmap( regions = "counties", include = .northeast_region, 
                  data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'NorthEast: Deaths by Heart Disease') +
  theme( legend.position = "right")

rateMapDat <- mapDat%>%
  select( fips, Crude.Rate )
maxRateDat <- max( leadingCausesbyCounty_df$Crude.Rate, na.rm = T )
p2 <- plot_usmap( regions = "counties", data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'Heart Disease Mortality Rate Across US Counties') +
  theme( legend.position = "right")
p4 <- plot_usmap( regions = "counties", include = .northeast_region, 
                  data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'NorthEast: Heart Disease Mortality Rate') +
  theme( legend.position = "right")
grid.arrange(p1, p2, nrow = 2)

A closer look: Cancer

#Visualizing Cancer by County across the US
mapCause <- "malNeoplasms"
mapDat <- leadingCausesbyCounty_df %>%
  filter( Cause == mapCause ) %>%
  mutate( fips = as.character( fips ) ) %>%
  mutate( fips = ifelse( nchar( fips ) ==4, paste( '0', fips, sep = ""), fips ) )

totMapDat <- mapDat%>%
  select( fips, Deaths )
maxTotDat <- max( leadingCausesbyCounty_df$Deaths, na.rm = T )
p1 <- plot_usmap( regions = "counties", data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'Deaths by Cancer Across US Counties') +
  theme( legend.position = "right")
p3 <- plot_usmap( regions = "counties", include = .west_region, 
                  data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'West: Deaths by Cancer') +
  theme( legend.position = "right")

rateMapDat <- mapDat%>%
  select( fips, Crude.Rate )
maxRateDat <- max( leadingCausesbyCounty_df$Crude.Rate, na.rm = T )
p2 <- plot_usmap( regions = "counties", data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'Cancer Mortality Rate Across US Counties') +
  theme( legend.position = "right")
p4 <- plot_usmap( regions = "counties", include = .west_region, 
                  data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'West: Cancer Mortality Rate') +
  theme( legend.position = "right")
grid.arrange(p1, p2, nrow = 2)


### A closer look: The Flu

#Visualizing the flu by County across the US
mapCause <- "InfluenzaPneumonia"
mapDat <- leadingCausesbyCounty_df %>%
  filter( Cause == mapCause ) %>%
  mutate( fips = as.character( fips ) ) %>%
  mutate( fips = ifelse( nchar( fips ) ==4, paste( '0', fips, sep = ""), fips ) )

totMapDat <- mapDat%>%
  select( fips, Deaths )
maxTotDat <- max( leadingCausesbyCounty_df$Deaths, na.rm = T )
p1 <- plot_usmap( regions = "counties", data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'Deaths by Influenza and Pneumonia Across US Counties') +
  theme( legend.position = "right")
p3 <- plot_usmap( regions = "counties", include = .south_atlantic, 
                  data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'SouthEast: Deaths by Flu & Pneumonia') +
  theme( legend.position = "right")

maxRateDat <- max( leadingCausesbyCounty_df$Crude.Rate, na.rm = T )
p2 <- plot_usmap( regions = "counties", data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'Flu & Pneumonia Mortality Rate Across US Counties') +
  theme( legend.position = "right")
p4 <- plot_usmap( regions = "counties", include = .south_atlantic, 
                  data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'SouthEast: Flu &  Pneumonia Mortality Rate') +
  theme( legend.position = "right")
grid.arrange(p1, p2, nrow = 2)


### A closer look: Accidents

#Visualizing the flu by County across the US
mapCause <- "Accidents"
mapDat <- leadingCausesbyCounty_df %>%
  filter( Cause == mapCause ) %>%
  mutate( fips = as.character( fips ) ) %>%
  mutate( fips = ifelse( nchar( fips ) ==4, paste( '0', fips, sep = ""), fips ) )

totMapDat <- mapDat%>%
  select( fips, Deaths )
maxTotDat <- max( leadingCausesbyCounty_df$Deaths, na.rm = T )
p1 <- plot_usmap( regions = "counties", data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'Accidental Deaths Across US Counties') +
  theme( legend.position = "right")
p3 <- plot_usmap( regions = "counties", include = .north_central_region, 
                  data = totMapDat, values = "Deaths", color = "#0066FFFF") +
  scale_fill_gradientn(name='Deaths',colours = myColors ) +
  labs( title = 'NorthCentral: Accidental Deaths') +
  theme( legend.position = "right")

maxRateDat <- max( leadingCausesbyCounty_df$Crude.Rate, na.rm = T )
p2 <- plot_usmap( regions = "counties", data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'Accidental Death Mortality Rate Across US Counties') +
  theme( legend.position = "right")
p4 <- plot_usmap( regions = "counties", include = .north_central_region, 
                  data = rateMapDat, values = "Crude.Rate", color = "#0066FFFF") +
  scale_fill_gradientn(name='deaths/100k',colours = myColors) +
  labs( title = 'NorthCentral: Accidental Death Mortality Rate') +
  theme( legend.position = "right")
grid.arrange(p1, p2, nrow = 2)


Top 10 Leading Causes of Death by US County Summary

    The above figures visualize data for national and regional deaths and death rates for leading causes of disease in the US. The representation of death totals by county reveals a similar and predictable trend as seen with the COVID-19: more densely populated regions have a higher incidence of deaths due to a given disease. Similarly, when population is taken into account by representing the data as a mortality rate, we see the spatial pattern change. The pervasiveness of established diseases such as Heart Disease and Cancer penetate US counties regardless of whether the county is part of a major coastal US city or rural American heartland. Although the effect is not quantified here in this analysis, it is worth pointing out that the mortality rates for leading causes of death affect more counties and in a relatively even manor in comparison to COVID-19 as of May 8th 2020.

Comparison of COVID-19 deaths to leading causes of death in th US

    Below is a table published in a recent new article from The Becker’s Hospital Review suggesting that COVID-19 is, as of May 1st 2020, the 3rd leading cause of death in the US (see table below). The following analysis will expand on this finding by comparing County COVID-19 death statistics to know statistics for leading causes of death in US counties.

Disease Annual Death Toll
Heart Disease 269,583
Cancer 252,500
COVID-19 88,217 to 293,381
Stroke 60,833
Alzheimer’s 50,417
Drug Overdoses 29,265
Suicide 19,583

    Next, visualize national death totals and mortality rates for COVID-19 and the leading causes of death in the US for comparison. It is important to point out that the death rates calculated for COVID19 here are provisional and only reflect the deaths as of 5/8/20 whereas the statistics for the National leading causes of death are derived from 2018 data and account for an entire years worth of data. It is also important to consider that COVID-19 is a novel disease and might not be near peak infection and/or mortality rate in the US yet. It is important to cautiously qualify the following visualizations with this in mind.

The data table above ranks COVID-19 as the 3rd leading cause based on the projected number of deaths. The next visualization shows where COVID-19 ranks with the most current death statistics:


The visualization above shows the impact that COVID-19 has had on deaths in the US as of 5/8/20. This shows that if deaths were to hold at a standstill from the May 8th tally, that COVID-19 deaths rank 8th amongst the leading causes of death nationally, above influenza and below Diabetes. Unfortunately, it remains to be seen that the annual total for COVID-19 deaths will eventually be. Will the numbers surpass heart diseases and cancer?

COVID-19 currently ranks 8th among leading causes of disease in the US when compared here against national mortality rates. However, the mapped visualizations of COVID-19 demostrated that COVID-19 does not evenly effect US counties. The next series of visualizations demonstrate counties where COVID-19 is, in fact, currently ranked higher as a leading cause of death. How have county COVID-19 mortality rates progressed over time compared to national mortality rates for leading causes of disease?:


The figure above plots the monthly progression of crute mortality rate (total deaths/population*100k). There are a few thousand counties worth of data plotted on the same axis. The figure is not, but the point is clear: for many counties, the current mortality rate, with only several months worth of deaths to base a sum on, already excedes the mortality rate for Accidents. For a handful of counties, however, the mortality rate has already surpassed the annual national mortality rate for Cancer and Heart Disease.

Which counties have mortality rates greater than the national Heart Disease Rate?

##       County.Name State  X5.8.20
## 1 Randolph County    GA 309.8259
## 2    Bronx County    NY 266.8158
## 3    Early County    GA 255.1521
## 4   Queens County    NY 253.6540
## 5  Terrell County    GA 246.1611
## 6    Kings County    NY 230.5556

How do county COVID-19 Mortality rates compare to previous estimates for the county’s heart disease mortality rate?

Visualize the spatial relationship of counties where the current COVID-19 mortality rate already exceeds the 3rd leading cause of death: Accidents

Conclusions

  • From the COVID-19 death totals reported from counties: The vast majority of US counties report very low incidences of COVID-19 fatalities. However, for several counties, the death are significantly much higher.
  • While the total deaths in a county is informative from a perspective of where to allocate resources to address immediate impacts, the distribution of mortality rates lends to understanding that even less populous counties are impacted by the virus.
  • The representation of death totals by county reveals a similar and predictable trend as seen with the COVID-19: more densely populated regions have a higher incidence of deaths due to a given disease.
  • The mortality rates for leading causes of death affect more counties and in a relatively even manor in comparison to COVID-19 as of May 8th 2020.
  • For many counties, the current mortality rate, with only several months worth of deaths to base a sum on, already excedes the mortality rate for many of the leading causes of disease in the US.