COVID-19 Inequities

library(tidyverse)
library(rms)
library(plotly)
library(kableExtra)
library(openintro)
library(icon)
library(knitr)
library(circlize)
library(sf)
library(ggpubr)
library(rmdformats)
knitrSet(lang='markdown', fig.path='png/', fig.align='center', w=6, h=4, cache=TRUE)
options(prType='html')

LAST UPDATED February 11, 2021

Introduction

These figures use data (updated twice weekly) from The COVID Tracking Project to present the state-level inequities between proportion of COVID-19 infections and deaths in Black, Latinx, and White individuals in each state as compared to the proportion of Black, Latinx, and White individuals living in each state (estimated from 2019 US Census Bureau Projections). In some figures, I also present median 2015 Area Deprivation Indices by state to align state-wide socioeconomic status with differential COVID-19 infections and deaths in Black, Latinx, and White individuals in each state. The Area Deprivation Index is a measure that accounts for income, education, employment, and housing quality to rank the socioeconomic status of each Census Block Group in the United States.

Please note: On most figures, you can hover over areas of interest or reveal the underlying data (at each time - or the most recent time with data available) or zoom in to view smaller areas of interest.

Limitations

These data have the following limitations (to name a few):

These data represent the distribution of data in time and space, not a causal relationship between any of the measures of interest.
States have incomplete data about the number of people with COVID-19 infections and deaths, especially when stratified race and ethnicity. This is even more relevant when examing data from early in the pandemic.
To call testing uptake slow and variable across states would be an understatement, so earlier dates may reflect heterogeneity in testing uptake instead of heterogeneity in COVID-19 infections and/or deaths.
It is not clear that every state collects race and ethnicity data in the same way, for example, some states may use self-report whereas others may rely on race and ethnicity determinations made by health care professionals.
The COVID Tracking Project and the US Census Bureau used different categories for race and ethnicity.
Many states report COVID-19 infections/deaths in people of unknown race/ethnicity. I have included these data as part of the denominator (total infections/deaths) in my calculations, even though a portion (perhaps even a majority) of these infections/deaths have actually occured in Black or Latinx individuals, so my difference and ratio estimates are underestimates.
COVID-19 infections and deaths in the US have all occurred in 2020, whereas my denominator (number of people living in each state) uses the most recent data available, projections from 2019. Furthermore, the Area Deprivation Index data are from 2015.
As time progresses on each plot, the legend changes to accommodate the most recent data. This can make values that were relatively high appear lower despite remaining constant because, for example, The COVID Tracking Project started tracking data from a new state on that date with higher values.
The Area Deprivation Index represents relative percentiles for each People. To make it comparable with state-level measures of COVID-19 infections and deaths, I took the median of all ADI levels within each state, which obscures the intra-state heterogeneity visible at the Census Block Group level.
Previous iterations of this document dichotomized people to White and Non-White in the primary analyses for convenience, and due to concerns about sparse data in more specific categories, which assumes that I categorized White and Non-White correctly from the slightly different definitions from The COVID Tracking Project and the US Census Bureau. However, on further investigation, the differences in data collection were so heterogenous across states that I do not feel those numbers are comparable. This is, in part, due to differences in how different states report infections/deaths among Latinx individuals. Some states report them as Latinx and include them as a race and others report them as Hispanic and report them as an ethinicity. Including them would therefore inflate the Non-White infections/deaths numerator in some states, making the measurements unreliable.

Disclaimers

Although I occasionally stray into interpretations when providing context, the primary purpose of this report is to present the data. I do not attempt to create models, make predictions, or give advice. If you have questions, suggestions, or notice an error (there are likely a few), please reach out (danny_sack)!

Any views and opinions expressed here strickly represent those of the author and are not necessarily those of Vanderbilt University Medical Center, the Vanderbilt Medical Scientist Training Program, or the Division of Epidemiology.

Disclaimer from The COVID Tracking Project: “Our data will always be an undercount. We can only track tests that states report, and not all states report all tests. More significantly, per-capita testing levels in the US remain low, which means that an unknown but probably very large number of people are sick, but aren’t being tested. But this is the data we can collect, and it provides the most detailed information available about the shape and relative severity of outbreaks in US states and territories.”

Disclaimer from the Area Deprivation Index website: “The ADI is limited insofar as it uses American Community Survey (ACS) Five Year Estimates in its construction. For example, the 2015 ADI uses the ACS data for 2015, which is a 5-year average of ACS data obtained from 2011-2015. All limitations of the source data will persist throughout the ADI. The choice of geographic units will also influence the ADI value. In the case of the ADI the Census Block Group is the geographic unit of construction, as the Census Block Group is considered the closest approximation to a”neighborhood“. All results are subject to the accuracy and errors contained within the American Community Survey data release.”

Resources

While many excellent resources exist with more information on and strategies to address health inequities and the structures that underlie them (such as racism), the following are a good a place to start:

JAMA viewpoint entitled COVID-19 and Health Equity—A New Kind of “Herd Immunity” (Williams & Cooper, 2020)
Dr. Uché Blackstock’s organization, Advancing Health Equity
Tennessean opinion piece entitled, COVID-19 amplifies social inequalities that have driven racial health disparities for decades (Furuyama, Kaur, Lee, Limb, 2020)
NEJM perspective: Stolen Breaths (Hardeman, Medina, Boyd, 2020)
JAMA viewpoint on Responding to the COVID-19 Pandemic: The Need for a Structurally Competent Health Care System (Metzl, Maybank, De Maio, 2020)
Annals of Internal Medicine article on Historical Insights on Coronavirus Disease 2019 (COVID-19), the 1918 Influenza Pandemic, and Racial Disparities: Illuminating a Path Forward (Krishnan, Ogunwole, Cooper, 2020)
The American Association for Public Health’s Health Equity Page
The Conversation article on recognizing the social justice aspects of health (Julian, Hardeman, Huerto, 2020)
The Vanderbilt University Medical Center Office of Health Equity, including their Anti-Racism Resources

# pull data from https://covidtracking.com/race/dashboard
df <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vS8SzaERcKJOD_EzrtCDK1dX1zkoMochlA9iHoHg_RSw3V8bkpfk1mpw4pfL5RdtSOyx_oScsUtyXyk/pub?gid=43720681&single=true&output=csv")

names(df)[1] <- "day"
names(df)[2] <- "state"

df$day <- as.Date(as.character(df$day), "%Y%m%d")

# get rid of random NAs at the bottom
df <- subset(df, !is.na(day))

# just infections
infxn <- df[, c(1:15)]

# clean up columns
inf <- data.frame(day = infxn$day, 
                  date = NA,
                  state = infxn$state,
                  tot_inf = infxn$Cases_Total, 
                  inf_white = infxn$Cases_White,
                  inf_black = infxn$Cases_Black,
                  inf_hisp = NA,
                  inf_asian = infxn$Cases_Asian,
                  inf_aian = infxn$Cases_AIAN,
                  inf_nhpi = infxn$Cases_NHPI,
                  inf_mulra = infxn$Cases_Multiracial,
                  inf_other = infxn$Cases_Other,
                  inf_unk = infxn$Cases_Unknown)
# hispanic, take min of two measurements
for(i in 1:length(inf$inf_hisp)){
  inf$inf_hisp[i] <- min(as.numeric(infxn$Cases_LatinX[i]), 
                     as.numeric(infxn$Cases_Ethnicity_Hispanic[i]), na.rm = TRUE)
  if(is.infinite(inf$inf_hisp[i])) {inf$inf_hisp[i] <- NA}
}

# order dates, don't really know why this works...
inf$date <- factor(inf$day, levels = format(sort(unique(as.Date(inf$day))), "%b %d, %Y"))
for(i in 1:nrow(inf)){
  inf$date[i] <- format(inf$day[i], "%b %d, %Y")
}

# label columns
label(inf$day) <- "Day"
label(inf$date) <- "Date"
label(inf$state) <- "State"

# now make numeric
for(i in c("tot_inf", "inf_white", "inf_black", "inf_hisp", "inf_asian", "inf_aian", 
           "inf_nhpi", "inf_mulra", "inf_other", "inf_unk")) {
  inf[, c(i)] <- as.numeric(inf[, c(i)])
}

label(inf$tot_inf) <- "Total Infections"
label(inf$inf_white) <- "Reported Infections in White People"
label(inf$inf_black) <- "Reported Infections in Black People"
label(inf$inf_hisp) <- "Reported Infections in Latinx People"
label(inf$inf_asian) <- "Reported Infections in Asian People"
label(inf$inf_aian) <- "Reported Infections in AIAN People"
label(inf$inf_nhpi) <- "Reported Infections in NHPI People"
label(inf$inf_mulra) <- "Reported Infections in Multiracial People"
label(inf$inf_other) <- "Reported Infections in People on Another Race"
label(inf$inf_unk) <- "Reported Infections in People with an Unknown Race"

# do the same thing for deaths
# just deaths
deaths <- df[, c(1:2, 16:26)]

# names
names(deaths) <- c("day", "state", "tot_d", "d_white", "d_black", "d_hisp",
                   "d_asian", "d_aian", "d_nhpi", "d_mulra", "d_other", "d_unk",
                   "d_hisp1")

# order dates, don't really know why this works...
deaths$date <- factor(deaths$day, levels = format(sort(unique(as.Date(deaths$day))), "%b %d, %Y"))
for(i in 1:nrow(deaths)){
  deaths$date[i] <- format(deaths$day[i], "%b %d, %Y")
}

# label columns
label(deaths$day) <- "Day"
label(deaths$date) <- "Date"
label(deaths$state) <- "State"

# now make numeric
for(i in c("tot_d", "d_white", "d_black", "d_hisp", "d_asian", 
           "d_aian", "d_nhpi", "d_mulra", "d_other", "d_unk",
                   "d_hisp1")) {
  deaths[, c(i)] <- as.numeric(deaths[, c(i)])
}
# hispanic, take min of two measurements
for(i in 1:length(deaths$d_hisp)){
  deaths$d_hisp[i] <- min(deaths$d_hisp[i], deaths$d_hisp1[i], na.rm = TRUE)
  if(is.infinite(deaths$d_hisp[i])) {deaths$d_hisp[i] <- NA}
}

deaths$d_hisp1 <- NULL

label(deaths$tot_d) <- "Total Deaths"
label(deaths$d_white) <- "Reported Deaths in White People"
label(deaths$d_black) <- "Reported Deaths in Black People"
label(deaths$d_hisp) <- "Reported Deaths in Latinx People"
label(deaths$d_asian) <- "Reported Deaths in Asian People"
label(deaths$d_aian) <- "Reported Deaths in AIAN People"
label(deaths$d_nhpi) <- "Reported Deaths in NHPI People"
label(deaths$d_mulra) <- "Reported Deaths in Multiracial People"
label(deaths$d_other) <- "Reported Deaths in People on Another Race"
label(deaths$d_unk) <- "Reported Deaths in People with an Unknown Race"

# merge inf and death
data <- merge(inf, deaths, by = c("day", "date", "state"))

# https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-detail.html
# Annual County Resident Population Estimates by Age, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2019 (CC-EST2019-ALLDATA), only for 2019, which is 12, the 7/1/2019 population estimate, and make some new categories
# Sex, Race, and Hispanic Origin
# Annual Estimates of the Resident Population by Sex, Race, and Hispanic Origin

all19 <- read_csv("https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/asrh/cc-est2019-alldata.csv") %>% filter(YEAR == 12, AGEGRP == 0) %>%
  mutate(NHWA = NHWA_MALE + NHWA_FEMALE, # non hispanic whites
         NHBA = NHBA_MALE + NHBA_FEMALE, # non hispanic blacks
         NHAA = NHAA_MALE + NHAA_FEMALE, # non hispanic asians
         NHIA = NHIA_MALE + NHIA_FEMALE, # non hispanic american indian
         NHNA = NHNA_MALE + NHNA_FEMALE, # non hispanic native hawaiian
         NHTOM = NHTOM_MALE + NHTOM_FEMALE, # non hispanic two or more
         H = H_MALE + H_FEMALE # hispanic
         )

# condense by couties
pop19 <- all19 %>%
  dplyr::group_by(STATE, STNAME) %>% # get rid of age categories
  summarise(TOT_POP = sum(TOT_POP),
            NHWA = sum(NHWA), # non hispanic whites
            NHBA = sum(NHBA), # non hispanic blacks
            NHAA = sum(NHAA), # non hispanic asians
            NHIA = sum(NHIA), # non hispanic american indian
            NHNA = sum(NHNA), # non hispanic native hawaiian
            NHTOM = sum(NHTOM), # non hispanic two or more
            H = sum(H)) %>% # hispanic
  mutate(NH = sum(NHWA, NHBA, NHAA, NHIA, NHNA, NHTOM),
         prop_w = NHWA / TOT_POP,
         prop_b = NHBA / TOT_POP,
         prop_a = NHAA / TOT_POP,
         prop_o = sum(NHIA, NHNA, NHTOM) / TOT_POP,
         prop_h = H / TOT_POP,
         prop_nw = sum(NHBA, NHAA, NHIA, NHNA, NHTOM, H) / TOT_POP,
         prop_nh = NH / TOT_POP)

# make a new dataset to merge
pop <- data.frame(state = state2abbr(pop19$STNAME),
                  tot_pop = pop19$TOT_POP,
                  tot_white = pop19$NHWA,
                  tot_black = pop19$NHBA,
                  tot_hisp = pop19$H,
                  tot_asian = pop19$NHAA,
                  tot_aian = pop19$NHIA,
                  tot_nhpi = pop19$NHNA,
                  tot_mulra = pop19$NHTOM)

# shape file of US states
# https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
# https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip

# read in shapefile
states <- st_read("cb_2018_us_state_500k/cb_2018_us_state_500k.shp")

# read in adi file
# from https://www.neighborhoodatlas.medicine.wisc.edu
adi <- read.table("adi/US_blockgroup_15.txt", header = TRUE, sep = ",")

# states with geo codes less than 10 only have 11 digits
adi$state <- ifelse(nchar(adi$fips) == 11, substr(adi$fips, 1, 1), substr(adi$fips, 1, 2))

# make adi numeric
adi$adi_natrank <- as.numeric(adi$adi_natrank)

# add adi to state name

states$adi <- NA
# median adi per state
for(i in 1:length(states$adi)){
  # median
  states$adi[i] <- median(adi$adi_natrank[adi$state == as.numeric(states$GEOID[i])], na.rm = TRUE)
  if(is.nan(states$adi[i])) {states$adi[i] <- NA}
}

# merge these population data with "data"

# remove rows with states not in pop (american samoa, guam, etc.)
for(i in 1:nrow(data)){
  if(data$state[i] %nin% pop$state) {
    data <- data[-c(i), ]
  }
}

# initialize columns
data$tot_pop <- data$tot_white <- data$tot_black <- 
  data$tot_hisp <- data$tot_asian <- data$tot_aian <- 
  data$tot_nhpi <- data$tot_mulra <- NA

for(i in 1:nrow(data)) {
  data$tot_pop[i] <- pop[pop$state == data$state[i], "tot_pop"]
  data$tot_white[i] <- pop[pop$state == data$state[i], "tot_white"]
  data$tot_black[i] <- pop[pop$state == data$state[i], "tot_black"]
  data$tot_hisp[i] <- pop[pop$state == data$state[i], "tot_hisp"]
  data$tot_asian[i] <- pop[pop$state == data$state[i], "tot_asian"]
  data$tot_aian[i] <- pop[pop$state == data$state[i], "tot_aian"]
  data$tot_nhpi[i] <- pop[pop$state == data$state[i], "tot_nhpi"]
  data$tot_mulra[i] <- pop[pop$state == data$state[i], "tot_mulra"]
}

# label columns
label(data$tot_pop) <- "2019 Estimated Total Population"
label(data$tot_white) <- "2019 Estimated White Population"
label(data$tot_black) <- "2019 Estimated Black Population"
label(data$tot_hisp) <- "2019 Estimated Hispanic Population"
label(data$tot_asian) <- "2019 Estimated Asian Population"
label(data$tot_aian) <- "2019 Estimated AIAN Population"
label(data$tot_nhpi) <- "2019 Estimated NHPI Population"
label(data$tot_mulra) <- "2019 Estimated Multiracial Population"

# proportion non-white in each state
for(i in 1:nrow(data)){
  data$prop_nw_pop[i] <- sum(data[i, c("tot_black", "tot_hisp", "tot_asian",
                                 "tot_aian", "tot_nhpi", "tot_mulra")]) / data$tot_pop[i]
}
label(data$prop_nw_pop) <- "2019 Estimated Proportion of Non-White Population"

# create column in data dataframe with proportion of infections and deaths in nonwhite, 
# white, black, and latinx people
for(i in 1:nrow(data)){
  # total infections
  # non-white folks
  data$non_white_inf[i] <- sum(data[i, c("inf_black", "inf_asian",
                                    "inf_nhpi", "inf_aian", "inf_mulra", "inf_other")], na.rm = TRUE)
  data$prop_nw_inf[i] <- sum(as.numeric(data[i, c("inf_black", "inf_asian",
                                              "inf_nhpi", "inf_aian", "inf_mulra", "inf_other")]), 
                             na.rm = TRUE) / sum(data[i, c("tot_inf")], na.rm = TRUE)
  # white, black, and latinx folks if more than 25 infections per group
  # white
  if(data$inf_white[i] > 0 & !is.na(data$inf_white[i])) {data$prop_w_inf[i] <- data$inf_white[i] / data$tot_inf[i]}
  else{data$prop_w_inf[i] <- NA}
  # black
  if(data$inf_black[i] > 0 & !is.na(data$inf_black[i])) {data$prop_b_inf[i] <- data$inf_black[i] / data$tot_inf[i]}
  else{data$prop_b_inf[i] <- NA}
  # latinx
  if(data$inf_hisp[i] > 0 & !is.na(data$inf_hisp[i])) {data$prop_l_inf[i] <- data$inf_hisp[i] / data$tot_inf[i]}
  else{data$prop_l_inf[i] <- NA}
  # total deaths in non-white folks
  data$non_white_d[i] <- sum(data[i, c("d_black", "d_asian",
                                    "d_nhpi", "d_aian", "d_mulra", "d_other")], na.rm = TRUE)
  data$prop_nw_d[i] <- sum(as.numeric(data[i, c("d_black", "d_asian",
                                              "d_nhpi", "d_aian", "d_mulra", "d_other")]), 
                        na.rm = TRUE) / sum(data[i, c("tot_d")], na.rm = TRUE)
  # white, black, and latinx folks if more than 25 deaths per group
  # white
  if(data$d_white[i] > 0 & !is.na(data$d_white[i])) {data$prop_w_d[i] <- data$d_white[i] / data$tot_d[i]}
  else{data$prop_w_d[i] <- NA}
  # black
  if(data$d_black[i] > 0 & !is.na(data$d_black[i])) {data$prop_b_d[i] <- data$d_black[i] / data$tot_d[i]}
  else{data$prop_b_d[i] <- NA}
  # latinx
  if(data$d_hisp[i] > 0 & !is.na(data$d_hisp[i])) {data$prop_l_d[i] <- data$d_hisp[i] / data$tot_d[i]}
  else{data$prop_l_d[i] <- NA}
}
label(data$non_white_inf) <- "Reported Infections in Non-White People"
label(data$prop_nw_inf) <- "Proportion of Infections in Non-White People"
label(data$non_white_d) <- "Reported Deaths in Non-White People"
label(data$prop_nw_d) <- "Proportion of Deaths in Non-White People"
label(data$prop_w_inf) <- "Proportion of Infections in White People"
label(data$prop_w_d) <- "Proportion of Deaths in White People"
label(data$prop_b_inf) <- "Proportion of Infections in Black People"
label(data$prop_b_d) <- "Proportion of Deaths in Black People"
label(data$prop_l_inf) <- "Proportion of Infections in Latinx People"
label(data$prop_l_d) <- "Proportion of Deaths in Latinx People"

# get rid of infinte and non-numeric values created by missing data (and inadvertent zeros)
for(i in 1:nrow(data)){
  # infections
  if(is.infinite(data$non_white_inf[i]) | is.nan(data$non_white_inf[i]) | data$non_white_inf[i] == 0) {
    data$non_white_inf[i] <- NA
  }
  if(is.infinite(data$prop_nw_inf[i]) | is.nan(data$prop_nw_inf[i]) | data$prop_nw_inf[i] == 0) {
    data$prop_nw_inf[i] <- NA
  }
  # deaths
  if(is.infinite(data$non_white_d[i]) | is.nan(data$non_white_d[i]) | data$non_white_d[i] == 0) {
    data$non_white_d[i] <- NA
  }
  if(is.infinite(data$prop_nw_d[i]) | is.nan(data$prop_nw_d[i]) | data$prop_nw_d[i] == 0) {
    data$prop_nw_d[i] <- NA
  }
}

# now calculate the proportion of infections/deaths in non-white versus the population proportion of non-whites
# infections
# non-white
data$inf_ratio_nw <- data$prop_nw_inf / data$prop_nw_pop
label(data$inf_ratio_nw) <- "Ratio of Non-White Infection Proportion<br>to Non-White Population Proportion"
# whites
data$inf_ratio_w <- data$prop_w_inf / (data$tot_white / data$tot_pop)
label(data$inf_ratio_w) <- "Ratio of Non-White Infection Proportion<br>to White Population Proportion"
# blacks
data$inf_ratio_b <- data$prop_b_inf / (data$tot_black / data$tot_pop)
label(data$inf_ratio_b) <- "Ratio of Non-White Infection Proportion<br>to Black Population Proportion"
# latinx
data$inf_ratio_l <- data$prop_l_inf / (data$tot_hisp / data$tot_pop)
label(data$inf_ratio_l) <- "Ratio of Non-White Infection Proportion<br>to Latinx Population Proportion"
# deaths
# non-whites
data$d_ratio_nw <- data$prop_nw_d / data$prop_nw_pop
label(data$d_ratio_nw) <- "Ratio of Non-White Death Proportion<br>to Non-White Population Proportion"
# whites
data$d_ratio_w <- data$prop_w_d / (data$tot_white / data$tot_pop)
label(data$d_ratio_w) <- "Ratio of Non-White Death Proportion<br>to White Population Proportion"
# blacks
data$d_ratio_b <- data$prop_b_d / (data$tot_black / data$tot_pop)
label(data$d_ratio_b) <- "Ratio of Non-White Death Proportion<br>to Black Population Proportion"
# latinx
data$d_ratio_l <- data$prop_l_d / (data$tot_hisp / data$tot_pop)
label(data$d_ratio_l) <- "Ratio of Non-White Infection Proportion<br>to Latinx Population Proportion"

# get rid of places that are not reporting infection by race
for(i in 1:length(data$inf_ratio_nw)){
  # infections
  if(data$inf_ratio_nw[i] == 0 & !is.na(data$inf_ratio_nw[i])) {
    data$inf_ratio_nw[i] <- NA
  }
  # deaths
  if(data$d_ratio_nw[i] == 0 & !is.na(data$d_ratio_nw[i])) {
    data$d_ratio_nw[i] <- NA
  }
}

# calculate differences in proportions
# infections
# non-whites
data$inf_diff_nw <- data$prop_nw_inf - data$prop_nw_pop
label(data$inf_diff_nw) <- "Difference between Non-White Infection Proportion<br>and Non-White Population Proportion"
# whites
data$inf_diff_w <- data$prop_w_inf - (data$tot_white / data$tot_pop)
label(data$inf_diff_w) <- "Difference of Non-White Infection Proportion<br>to White Population Proportion"
# blacks
data$inf_diff_b <- data$prop_b_inf - (data$tot_black / data$tot_pop)
label(data$inf_diff_b) <- "Difference of Non-White Infection Proportion<br>to Black Population Proportion"
# latinx
data$inf_diff_l <- data$prop_l_inf - (data$tot_hisp / data$tot_pop)
label(data$inf_diff_l) <- "Ratio of Non-White Infection Proportion<br>to Latinx Population Proportion"
# deaths
# non-whites
data$d_diff_nw <- data$prop_nw_d - data$prop_nw_pop
label(data$d_diff_nw) <- "Difference between Non-White Infection Proportion<br>and Non-White Population Proportion"
# whites
data$d_diff_w <- data$prop_w_d - (data$tot_white / data$tot_pop)
label(data$d_diff_w) <- "Difference of Non-White Death Proportion<br>to White Population Proportion"
# blacks
data$d_diff_b <- data$prop_b_d - (data$tot_black / data$tot_pop)
label(data$d_diff_b) <- "Difference of Non-White Death Proportion<br>to Black Population Proportion"
# latinx
data$d_diff_l <- data$prop_l_d - (data$tot_hisp / data$tot_pop)
label(data$d_diff_l) <- "Difference of Non-White Infection Proportion<br>to Latinx Population Proportion"

# get rid of places that are not reporting infection by race
for(i in 1:length(data$inf_diff_nw)){
  # infections
  if(is.na(data$prop_nw_inf[i])) {
    data$inf_diff_nw[i] <- NA
  }
  # deaths
  if(is.na(data$prop_nw_d[i])) {
    data$d_diff_nw[i] <- NA
  }
}

# calculate infections/deaths per 100000 people in each state
# infections
data$infpp <- (data$tot_inf / data$tot_pop) * 100000
label(data$infpp) <- "Infections per 100,000"
# infections
data$dpp <- (data$tot_d / data$tot_pop) * 100000
label(data$dpp) <- "Infections per 100,000"

# remove territories from states dataframe
states <- states[states$STUSPS %in% data$state, ]

# now add in adi to data
data$adi <- NA
for(i in 1:nrow(data)){
  for(j in 1:nrow(states)){
    if(data$state[i] == states$STUSPS[j]){
      data$adi[i] <- states$adi[j]
    }
  }
}
label(data$adi) <- "Area Deprivation Index, 2015"

# get most recent date deaths an infection dataframes
data_rec <- subset(data, day == max(unique(day)))

# add in infections and death ratio and difference and raw numbers and proportions
states$inf_tot <- NA
states$death_tot <- NA
states$inf_nw <- NA
states$deaths_nw <- NA
states$inf_w <- NA
states$deaths_w <- NA
states$inf_b <- NA
states$deaths_b <- NA
states$inf_l <- NA
states$deaths_l <- NA
states$inf_prop_nw <- NA
states$deaths_prop_nw <- NA
states$inf_prop_w <- NA
states$deaths_prop_w <- NA
states$inf_prop_b <- NA
states$deaths_prop_b <- NA
states$inf_prop_l <- NA
states$deaths_prop_l <- NA
states$inf_ratio_nw <- NA
states$death_ratio_nw <- NA
states$inf_diff_nw <- NA
states$death_diff_nw <- NA
states$inf_ratio_w <- NA
states$death_ratio_w <- NA
states$inf_diff_w <- NA
states$death_diff_w <- NA
states$inf_ratio_b <- NA
states$death_ratio_b <- NA
states$inf_diff_b <- NA
states$death_diff_b <- NA
states$inf_ratio_l <- NA
states$death_ratio_l <- NA
states$inf_diff_l <- NA
states$death_diff_l <- NA
for(i in 1:length(states$death_ratio_nw)){
  # raw numbers
  states$inf_tot[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_inf"]
  states$death_tot[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_d"]
  states$inf_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "non_white_inf"]
  states$death_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "non_white_d"]
  states$inf_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_white"]
  states$deaths_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_white"]
  states$inf_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_black"]
  states$deaths_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_black"]
  states$inf_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_hisp"]
  states$deaths_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_hisp"]
  states$inf_prop_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_nw_inf"]
  states$deaths_prop_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_nw_d"]
  states$inf_prop_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_w_inf"]
  states$deaths_prop_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_w_d"]
  states$inf_prop_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_b_inf"]
  states$deaths_prop_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_b_d"]
  states$inf_prop_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_l_inf"]
  states$deaths_prop_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_l_d"]
  # ratio
  states$inf_ratio_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_nw"]
  states$death_ratio_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_nw"]
  states$inf_ratio_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_w"]
  states$death_ratio_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_w"]
  states$inf_ratio_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_b"]
  states$death_ratio_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_b"]
  states$inf_ratio_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_l"]
  states$death_ratio_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_l"]
  # difference
  states$inf_diff_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_nw"]
  states$death_diff_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_nw"]
  states$inf_diff_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_w"]
  states$death_diff_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_w"]
  states$inf_diff_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_b"]
  states$death_diff_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_b"]
  states$inf_diff_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_l"]
  states$death_diff_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_l"]
}

# add proportion of non-white people in each state
states$prop_nw <- NA
states$prop_w <- NA
states$prop_b <- NA
states$prop_l <- NA
for(i in 1:length(states$prop_nw)){
  states$prop_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_nw_pop"]
  states$prop_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_white"] / 
    data_rec[data_rec$state == states$STUSPS[i], "tot_pop"]
  states$prop_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_black"] / 
    data_rec[data_rec$state == states$STUSPS[i], "tot_pop"]
  states$prop_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_hisp"] / 
    data_rec[data_rec$state == states$STUSPS[i], "tot_pop"]
}

# set up for ploting
# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)

Population by State, 2019

This first map provides context via estimates of how many people lived in each state in 2019 per the US Census Bureau. California (39,512,223), Texas (28,995,881), Florida (21,477,737), and New York (19,453,561) were the four states with the highest estimated population in 2019.

# chlorpleth map of total people
plot_geo(data_rec, locationmode = 'USA-states') %>%
  add_trace(z = ~tot_pop, locations = ~state, text = "", name = " ",
            hovertemplate = "%{location}: %{z}",
            colors = "YlOrRd") %>%
  colorbar(title = "Population") %>%
  layout(geo = g)

COVID-19 by State

These next maps show the raw number of infections and deaths that each state reported per The COVID Tracking Project. These data do not account for the population of each state. As of February 10, 2021, California has the most COVID-19 infections (3362981) and New York has the most deaths from COVID-19 (45567).

# chlorpleth map of total infections
plot_geo(data, locationmode = 'USA-states') %>%
  add_trace(z = ~tot_inf, locations = ~state, text = "", name = " ",
            hovertemplate = "%{location}: %{z}",
            colors = "YlOrRd",
            frame = ~date) %>%
  colorbar(title = "Infections") %>%
  layout(title = 'Total Infections', 
         geo = g)

# chlorpleth map of total deaths
plot_geo(data, locationmode = 'USA-states') %>%
  add_trace(z = ~tot_d, locations = ~state, text = "", name = " ",
            hovertemplate = "%{location}: %{z}",
            colors = "YlOrRd",
            frame = ~date) %>%
  colorbar(title = "Deaths") %>%
  layout(title = 'Total Deaths', 
         geo = g)

COVID-19 per 100,000 People

We can now move to examine the number of infections and deaths per 100,000 people (using 2019 US Census Bureau estimates) that each state reported to The COVID Tracking Project. As of February 10, 2021, North Dakota has the most COVID-19 infections (12887.93 per 100,000 people) and New Jersey has the most deaths from COVID-19 (250.5 per 100,000 people).

# chloropleth map of infections per 1000 people
plot_geo(data, locationmode = 'USA-states') %>%
  add_trace(z = ~infpp, locations = ~state, text = "", name = " ",
            hovertemplate = "%{location}: %{z:.2f}",
            colors = "YlOrRd",
            frame = ~date) %>%
  colorbar(title = "Infections") %>%
  layout(title = 'Infections per 100,000 People', 
         geo = g)

# chloropleth map of deaths per 1000 people
plot_geo(data, locationmode = 'USA-states') %>%
  add_trace(z = ~dpp, locations = ~state, text = "", name = " ",
            hovertemplate = "%{location}: %{z:.2f}",
            colors = "YlOrRd",
            frame = ~date) %>%
  colorbar(title = "Deaths") %>%
  layout(title = 'Deaths per 100,000 People', 
         geo = g)

COVID-19 by Race/Ethinicity

Introduction

Finally, we can examine the ratio of COVID-19 infections and deaths in Black, Latinx, and White people to the proportion of Black, Latinx, and White people in each state. I calculated each difference and ratio with the following formulas, which generalize to calculations for Latinx, and White people:

\(\begin{align} \text{Proportion of Black People }(P_{b}) &= \frac{\text{2019 Estimated Black Population}}{\text{2019 Estimated Total Population}} \\ \text{Proportion of Black Infections }(P_{b\_infs}) &= \frac{\text{Reported COVID-19 Infections in Black Individuals}}{\text{Total Reported COVID-19 Infections}} \\ \text{Proportion of Black Deaths }(P_{b\_deaths}) &= \frac{\text{Reported COVID-19 Deaths in Black Individuals}}{\text{Total Reported COVID-19 Deaths}} \\ \text{Infection Difference} &= P_{b\_infs} - P_{b} \\ \text{Death Difference} &= P_{b\_deaths} - P_{b} \\ \text{Infection Ratio} &= \frac{P_{b\_infs}}{P_{b}} \\ \text{Death Ratio} &= \frac{P_{b\_deaths}}{P_{b}} \end{align}\)

As of February 10, 2021, COVID-19 infections and deaths are more prevalent among Black individuals than expected given their population proportion in 12 and 19 states respectively. Understanding that the data are likely incomplete and prone to measurement error, and that deaths typically lag behind infections, the infection and death difference is greater than 0.10 in 0 and 1 states respectively and greater than 0.20 in 0 and 1 states respectively. Furthermore, the infection and death ratio is greater than 1.5 in 2 and 2 states respectively and greater than 2 in 2 and 0 states respectively.

Among Latinx individuals, COVID-19 infections and deaths are more prevalent than expected given their population proportion in 25 and 5 states respectively. Understanding that the data are likely incomplete and prone to measurement error, and that deaths typically lag behind infections, the infection and death difference is greater than 0.10 in 1 and 0 states respectively and greater than 0.20 in 0 and 0 states respectively. Furthermore, the infection and death ratio is greater than 1.5 in 7 and 0 states respectively and greater than 2 in 2 and 0 states respectively.

Finally, among White individuals, COVID-19 infections and deaths are more prevalent than expected given their population proportion in 0 and 19 states respectively. Understanding that the data are likely incomplete and prone to measurement error, and that deaths typically lag behind infections, the infection and death difference is greater than 0.10 in 0 and 1 states respectively and greater than 0.20 in 0 and 0 states respectively. Furthermore, the infection and death ratio is greater than 1.5 in 0 and 0 states respectively and greater than 2 in 0 and 0 states respectively.

Tabular Data from February 10, 2021

We can also examine a table of each state, with each variables of interest. We can also directly compare the proportion of Black,Latinx, and White people (\(\blacksquare\)) to the proportion of COVID-19 infections (\(\blacktriangle\)) and deaths () among Black, Latinx, and White people. Each marker’s color represents the median 2015 Area Deprivation Index in each state.

comp_tab <-  data_rec %>% filter(day == max(data_rec$day)) %>%
  mutate(State = abbr2state(state), 
         "2018 Estimated Total Population" = format(tot_pop, big.mark = ","),
         "Total Infections" = format(tot_inf, big.mark = ","),
         "Total Deaths" = format(tot_d, big.mark = ","),
         "Total Black (% Population)" = paste0(format(tot_black, big.mark = ","),
                                               " (", round(tot_black / tot_pop * 100, 2), ")"),
         "Infections in Black People (% Infections)"= paste0(ifelse(!is.na(inf_black), format(inf_black, big.mark = ","),
                                                                    "-"), " (", 
                                                             ifelse(!is.na(prop_b_inf), 
                                                                    round(prop_b_inf*100, 2), "-"), ")"),
         "Deaths in Black People (% Deaths)"= paste0(ifelse(!is.na(d_black), format(d_black, big.mark = ","),
                                                            "-"), " (", 
                                                     ifelse(!is.na(prop_b_d), round(prop_b_d*100, 2), "-"), ")"),
         "Total Latinx (% Population)" = paste0(format(tot_hisp, big.mark = ","),
                                                " (", round(tot_hisp / tot_pop * 100, 2), ")"),
         "Infections in Latinx People (% Infections)"= paste0(ifelse(!is.na(inf_hisp), format(inf_hisp, big.mark = ","),
                                                                     "-"), " (", 
                                                              ifelse(!is.na(prop_l_inf), 
                                                                            round(prop_l_inf*100, 2), "-"), ")"),
         "Deaths in Latinx People (% Deaths)"= paste0(ifelse(!is.na(d_hisp), format(d_hisp, big.mark = ","),
                                                             "-"), " (", 
                                                      ifelse(!is.na(prop_l_d), round(prop_l_d*100, 2), "-"), ")"),
         "Total White (% Population)" = paste0(format(tot_white, big.mark = ","),
                                               " (", round(tot_white / tot_pop * 100, 2), ")"),
         "Infections in White People (% Infections)" = paste0(ifelse(!is.na(inf_white), format(inf_white, big.mark = ","),
                                                                     "-"), " (", 
                                                              ifelse(!is.na(prop_w_inf), 
                                                                     round(prop_nw_inf*100, 2), "-"), ")"),
         "Deaths in White People (% Deaths)" = paste0(ifelse(!is.na(d_white), format(d_white, big.mark = ","),
                                                             "-"), " (", 
                                                      ifelse(!is.na(prop_w_d), round(prop_w_d*100, 2), "-"), ")"),
         "Infections in People of Unknown Race/Ethinicity (% Infections)" = paste0(ifelse(!is.na(inf_unk), 
                                                                                          format(inf_unk, big.mark = ","),
                                                                                          "-"), " (", 
                                                                                   ifelse(!is.na(inf_unk), 
                                                                                          round(inf_unk/tot_inf*100, 2),
                                                                                          "-"), ")"),
         "Deaths in People of Unknown Race/Ethinicity (% Deaths)" = paste0(ifelse(!is.na(d_unk), 
                                                                                  format(d_unk, big.mark = ","),
                                                                                  "-"), " (", 
                                                                                   ifelse(!is.na(d_unk), 
                                                                                          round(d_unk/tot_d*100, 2),
                                                                                          "-"), ")"),
         "2015 Area Deprivation Index" = round(adi, 2)) %>%
  select((ncol(data_rec) + 1):(ncol(data_rec) + 16))

# sort states alphabetically
comp_tab <- comp_tab[order(comp_tab$State),]
rownames(comp_tab) <- comp_tab$State
comp_tab$State <- NULL

comp_tab %>%
  kable(align = "c", format = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
                full_width = F, font_size = 12) %>%
  footnote(paste0("Data from The COVID Tracking Project (", format(as.Date(data_rec$day[1]), "%B %d, %Y"), ")")) %>%
  scroll_box(height = "400px", fixed_thead = T)

	2018 Estimated Total Population	Total Infections	Total Deaths	Total Black (% Population)	Infections in Black People (% Infections)	Deaths in Black People (% Deaths)	Total Latinx (% Population)	Infections in Latinx People (% Infections)	Deaths in Latinx People (% Deaths)	Total White (% Population)	Infections in White People (% Infections)	Deaths in White People (% Deaths)	Infections in People of Unknown Race/Ethinicity (% Infections)	Deaths in People of Unknown Race/Ethinicity (% Deaths)	2015 Area Deprivation Index
Alabama	4,903,185	476,067	8,888	1,297,775 (26.47)	78,493 (16.49)	2,020 (22.73)	223,278 (4.55)	11,881 (2.5)	126 (1.42)	3,200,828 (65.28)	151,607 (24.72)	4,241 (47.72)	206,777 (43.43)	2,344 (26.37)	73
Alaska	731,545	56,097	280	24,190 (3.31)	1,444 (2.57)	9 (3.21)	53,212 (7.27)	3,090 (5.51)	12 (4.29)	440,078 (60.16)	17,353 (49.97)	115 (41.07)	10,711 (19.09)	3 (1.07)	30
Arizona	7,278,717	789,245	14,462	325,777 (4.48)	24,127 (3.06)	375 (2.59)	2,310,590 (31.74)	232,163 (29.42)	4,146 (28.67)	3,939,690 (54.13)	290,599 (14.83)	7,069 (48.88)	149,464 (18.94)	1,047 (7.24)	56
Arkansas	3,017,804	309,940	5,174	465,209 (15.42)	48,970 (15.8)	725 (14.01)	236,631 (7.84)	33,163 (10.7)	176 (3.4)	2,173,848 (72.03)	198,051 (23.71)	3,802 (73.48)	38,407 (12.39)	341 (6.59)	75
California	39,512,223	3,362,981	44,995	2,221,363 (5.62)	103,870 (3.09)	2,749 (6.11)	15,574,880 (39.42)	1,415,893 (42.1)	20,125 (44.73)	14,423,748 (36.5)	513,077 (19.15)	13,744 (30.55)	789,918 (23.49)	1,538 (3.42)	20
Colorado	5,758,736	407,961	5,770	233,523 (4.06)	11,790 (2.89)	188 (3.26)	1,256,904 (21.83)	113,454 (27.81)	1,159 (20.09)	3,896,103 (67.66)	165,714 (7.43)	3,688 (63.92)	98,482 (24.14)	366 (6.34)	34
Connecticut	3,565,287	265,496	7,326	368,834 (10.35)	18,189 (6.85)	881 (12.03)	600,955 (16.86)	39,250 (14.78)	726 (9.91)	2,350,123 (65.92)	77,745 (19)	5,197 (70.94)	98,063 (36.94)	116 (1.58)	27
Delaware	973,764	81,461	1,245	214,658 (22.04)	18,468 (22.67)	283 (22.73)	93,391 (9.59)	13,595 (16.69)	70 (5.62)	600,349 (61.65)	38,787 (30.67)	866 (69.56)	4,097 (5.03)	2 (0.16)	37
District of Columbia	705,749	38,348	965	313,290 (44.39)	18,546 (48.36)	717 (74.3)	79,477 (11.26)	8,020 (20.91)	115 (11.92)	264,400 (37.46)	9,864 (73.75)	102 (10.57)	202 (0.53)	19 (1.97)	13
Florida	21,477,737	1,798,280	28,691	3,335,263 (15.53)	218,927 (12.17)	4,351 (15.17)	5,663,860 (26.37)	557,320 (30.99)	6,550 (22.83)	11,436,685 (53.25)	605,295 (19.32)	15,323 (53.41)	288,172 (16.02)	1,227 (4.28)	59
Georgia	10,617,423	780,494	15,421	3,359,449 (31.64)	199,608 (25.57)	4,662 (30.23)	1,048,724 (9.88)	80,632 (10.33)	756 (4.9)	5,523,358 (52.02)	293,572 (32.13)	7,718 (50.05)	155,528 (19.93)	1,955 (12.68)	65
Hawaii	1,415,872	26,584	423	27,780 (1.96)	538 (2.02)	5 (1.18)	150,864 (10.66)	(-)	(-)	306,622 (21.66)	3,422 (55.94)	36 (8.51)	8,290 (31.18)	18 (4.26)	11
Idaho	1,787,065	166,553	1,791	13,584 (0.76)	972 (0.58)	9 (0.5)	229,490 (12.84)	16,907 (10.15)	177 (9.88)	1,458,277 (81.6)	95,660 (10.4)	1,680 (93.8)	53,570 (32.16)	15 (0.84)	55
Illinois	12,671,821	1,152,995	21,869	1,783,443 (14.07)	105,375 (9.14)	3,513 (16.06)	2,219,882 (17.52)	211,176 (18.32)	3,131 (14.32)	7,702,651 (60.79)	438,120 (14.94)	11,550 (52.81)	331,414 (28.74)	2,810 (12.85)	50
Indiana	6,732,219	643,305	12,001	645,375 (9.59)	42,798 (6.65)	889 (7.41)	489,353 (7.27)	36,948 (5.74)	248 (2.07)	5,278,982 (78.41)	433,344 (18.96)	8,531 (71.09)	87,967 (13.67)	1,028 (8.57)	66
Iowa	3,155,070	326,575	5,174	122,388 (3.88)	9,797 (3)	121 (2.34)	198,550 (6.29)	19,595 (6)	130 (2.51)	2,682,696 (85.03)	192,679 (5.25)	4,804 (92.85)	116,742 (35.75)	110 (2.13)	61
Kansas	2,913,314	284,894	4,303	167,325 (5.74)	13,777 (4.84)	212 (4.93)	356,073 (12.22)	37,932 (13.31)	281 (6.53)	2,196,863 (75.41)	200,542 (11.39)	3,325 (77.27)	51,889 (18.21)	456 (10.6)	65
Kentucky	4,467,673	383,032	4,175	367,899 (8.23)	24,919 (6.51)	323 (7.74)	174,706 (3.91)	16,231 (4.24)	66 (1.58)	3,759,463 (84.15)	249,944 (11.51)	3,387 (81.13)	88,983 (23.23)	333 (7.98)	71
Louisiana	4,648,794	414,687	9,212	1,500,991 (32.29)	130,870 (31.56)	3,567 (38.72)	246,972 (5.31)	(-)	217 (2.36)	2,715,282 (58.41)	208,107 (43.93)	5,504 (59.75)	24,402 (5.88)	16 (0.17)	66
Maine	1,344,212	41,883	641	21,554 (1.6)	2,155 (5.15)	10 (1.56)	23,700 (1.76)	642 (1.53)	6 (0.94)	1,249,597 (92.96)	32,812 (9.07)	541 (84.4)	5,273 (12.59)	78 (12.17)	51
Maryland	6,045,680	366,666	7,446	1,810,267 (29.94)	104,313 (28.45)	2,574 (34.57)	643,822 (10.65)	59,637 (16.26)	694 (9.32)	3,025,781 (50.05)	125,916 (35.35)	3,796 (50.98)	51,485 (14.04)	53 (0.71)	26
Massachusetts	6,892,503	548,946	15,207	505,757 (7.34)	33,334 (6.07)	1,025 (6.74)	854,907 (12.4)	118,232 (21.54)	1,181 (7.77)	4,897,800 (71.06)	203,625 (16.8)	11,290 (74.24)	134,884 (24.57)	69 (0.45)	22
Michigan	9,986,857	624,970	15,939	1,374,886 (13.77)	70,580 (11.29)	3,543 (22.23)	528,205 (5.29)	32,312 (5.17)	443 (2.78)	7,464,662 (74.74)	357,461 (22.8)	10,668 (66.93)	125,021 (20)	833 (5.23)	67
Minnesota	5,639,632	469,905	6,319	382,621 (6.78)	37,655 (8.01)	307 (4.86)	315,130 (5.59)	42,229 (8.99)	191 (3.02)	4,460,149 (79.09)	311,910 (16.42)	5,255 (83.16)	38,604 (8.22)	116 (1.84)	43
Mississippi	2,976,149	283,753	6,367	1,113,643 (37.42)	93,790 (33.05)	2,508 (39.39)	100,110 (3.36)	6,223 (2.19)	55 (0.86)	1,678,232 (56.39)	133,880 (37.71)	3,425 (53.79)	36,645 (12.91)	160 (2.51)	79
Missouri	6,137,428	468,189	7,161	711,702 (11.6)	40,704 (8.69)	927 (12.95)	268,708 (4.38)	16,548 (3.53)	150 (2.09)	4,857,512 (79.15)	241,154 (23.28)	4,455 (62.21)	118,024 (25.21)	496 (6.93)	65
Montana	1,068,778	96,595	1,320	5,766 (0.54)	433 (0.45)	1 (0.08)	43,289 (4.05)	2,190 (2.27)	23 (1.74)	917,711 (85.87)	59,999 (15.99)	808 (61.21)	21,148 (21.89)	193 (14.62)	50
Nebraska	1,934,408	194,632	1,986	94,830 (4.9)	5,056 (2.6)	56 (2.82)	219,645 (11.35)	23,021 (11.83)	149 (7.5)	1,513,172 (78.22)	102,303 (6.72)	1,275 (64.2)	79,259 (40.72)	577 (29.05)	60
Nevada	3,080,156	285,795	4,582	286,466 (9.3)	21,456 (7.51)	416 (9.08)	900,600 (29.24)	100,700 (35.24)	1,072 (23.4)	1,483,933 (48.18)	104,419 (28.23)	2,510 (54.78)	3 (0)	4 (0.09)	52
New Hampshire	1,359,711	69,247	1,116	20,054 (1.47)	921 (1.33)	12 (1.08)	54,589 (4.01)	3,854 (5.57)	23 (2.06)	1,220,437 (89.76)	34,048 (5.98)	851 (76.25)	27,202 (39.28)	211 (18.91)	33
New Jersey	8,882,190	732,674	22,250	1,148,564 (12.93)	61,809 (8.44)	3,513 (15.79)	1,856,844 (20.91)	144,469 (19.72)	4,063 (18.26)	4,850,995 (54.61)	235,798 (25)	11,932 (53.63)	169,204 (23.09)	880 (3.96)	22
New Mexico	2,096,829	178,790	3,461	39,583 (1.89)	2,181 (1.22)	47 (1.36)	1,032,942 (49.26)	79,329 (44.37)	1,329 (38.4)	772,630 (36.85)	29,267 (18.47)	999 (28.86)	37,173 (20.79)	72 (2.08)	61
New York	19,453,561	1,494,187	45,567	2,813,773 (14.46)	(-)	9,039 (19.84)	3,751,058 (19.28)	(-)	9,328 (20.47)	10,755,420 (55.29)	(-)	15,843 (34.77)	1,494,187 (100)	5,972 (13.11)	27
North Carolina	10,488,084	805,898	10,181	2,240,609 (21.36)	134,169 (16.65)	2,378 (23.36)	1,025,830 (9.78)	111,968 (13.89)	722 (7.09)	6,567,102 (62.61)	405,738 (30.74)	6,327 (62.15)	152,410 (18.91)	627 (6.16)	61
North Dakota	762,062	98,214	1,459	24,866 (3.26)	2,812 (2.86)	(-)	31,532 (4.14)	(-)	(-)	637,513 (83.66)	53,795 (9.85)	(-)	34,744 (35.38)	1,459 (100)	55
Ohio	11,689,100	928,631	11,856	1,481,352 (12.67)	96,827 (10.43)	1,390 (11.72)	470,462 (4.02)	28,929 (3.12)	225 (1.9)	9,169,092 (78.44)	543,623 (19.89)	9,221 (77.77)	200,266 (21.57)	783 (6.6)	65
Oklahoma	3,956,971	407,724	3,900	293,458 (7.42)	21,974 (5.39)	235 (6.03)	438,110 (11.07)	40,288 (9.88)	189 (4.85)	2,572,388 (65.01)	232,435 (20.41)	2,824 (72.41)	92,075 (22.58)	343 (8.79)	71
Oregon	4,217,737	148,475	2,044	82,610 (1.96)	3,586 (2.42)	44 (2.15)	566,847 (13.44)	39,360 (26.51)	194 (9.49)	3,165,484 (75.05)	71,463 (33.47)	1,360 (66.54)	27,312 (18.4)	321 (15.7)	38
Pennsylvania	12,801,989	880,291	22,745	1,394,630 (10.89)	74,669 (8.48)	2,706 (11.9)	1,000,150 (7.81)	54,796 (6.22)	906 (3.98)	9,693,578 (75.72)	442,754 (11.19)	18,444 (81.09)	339,008 (38.51)	506 (2.22)	54
Rhode Island	1,059,361	119,893	2,259	64,966 (6.13)	7,408 (6.18)	100 (4.43)	172,644 (16.3)	29,599 (24.69)	188 (8.32)	755,931 (71.36)	55,408 (10.96)	1,658 (73.4)	21,748 (18.14)	275 (12.17)	38
South Carolina	5,148,714	473,140	7,742	1,360,342 (26.42)	97,238 (20.55)	2,150 (27.77)	307,118 (5.96)	26,590 (5.62)	193 (2.49)	3,277,542 (63.66)	211,672 (32.72)	4,185 (54.06)	106,675 (22.55)	1,036 (13.38)	68
South Dakota	884,659	109,580	1,815	19,447 (2.2)	2,443 (2.23)	7 (0.39)	37,351 (4.22)	4,002 (3.65)	22 (1.21)	721,053 (81.51)	81,419 (17.07)	1,434 (79.01)	5,452 (4.98)	74 (4.08)	62
Tennessee	6,829,174	750,409	10,731	1,141,790 (16.72)	95,163 (12.68)	1,840 (17.15)	391,382 (5.73)	46,623 (6.21)	289 (2.69)	5,019,540 (73.5)	455,259 (21.73)	8,039 (74.91)	132,084 (17.6)	423 (3.94)	66
Texas	28,995,881	2,517,453	39,386	3,501,610 (12.08)	12,018 (0.48)	3,629 (9.21)	11,525,578 (39.75)	27,129 (1.08)	17,824 (45.25)	11,950,774 (41.22)	23,170 (0.53)	15,719 (39.91)	2,453,829 (97.47)	1,279 (3.25)	65
Utah	3,205,958	357,339	1,765	38,056 (1.19)	4,768 (1.33)	17 (0.96)	462,051 (14.41)	74,997 (20.99)	264 (14.96)	2,493,759 (77.79)	226,664 (14.14)	1,187 (67.25)	33,871 (9.48)	104 (5.89)	38
Vermont	623,989	13,122	187	8,152 (1.31)	550 (4.19)	1 (0.53)	12,719 (2.04)	268 (2.04)	0 (-)	577,539 (92.56)	10,400 (10.98)	166 (88.77)	1,281 (9.76)	12 (6.42)	39
Virginia	8,535,519	537,319	6,932	1,632,226 (19.12)	89,047 (16.57)	1,599 (23.07)	834,422 (9.78)	72,244 (13.45)	489 (7.05)	5,227,904 (61.25)	212,179 (23.75)	4,241 (61.18)	125,285 (23.32)	203 (2.93)	36
Washington	7,614,893	324,706	4,603	304,224 (4)	10,455 (3.22)	145 (3.15)	991,721 (13.02)	60,102 (18.51)	549 (11.93)	5,140,589 (67.51)	89,803 (11.41)	3,260 (70.82)	137,746 (42.42)	12 (0.26)	31
West Virginia	1,792,147	125,951	2,175	62,775 (3.5)	3,214 (2.55)	36 (1.66)	31,162 (1.74)	(-)	(-)	1,648,512 (91.99)	81,807 (4.42)	1,540 (70.8)	38,580 (30.63)	573 (26.34)	74
Wisconsin	5,822,434	603,255	6,705	372,273 (6.39)	39,056 (6.47)	443 (6.61)	413,208 (7.1)	61,328 (10.17)	376 (5.61)	4,709,065 (80.88)	460,947 (15.55)	5,751 (85.77)	48,499 (8.04)	198 (2.95)	51
Wyoming	578,759	52,874	647	6,520 (1.13)	408 (0.77)	1 (0.15)	58,609 (10.13)	4,056 (7.67)	50 (7.73)	484,380 (83.69)	27,084 (8.5)	572 (88.41)	20,726 (39.2)	8 (1.24)	40
Note:
Data from The COVID Tracking Project (February 10, 2021)

comp_fig <- data.frame(state = fct_rev(as.factor(states$NAME)),
                       adi = states$adi,
                       t_inf = states$inf_tot,
                       t_death = states$death_tot,
                       w_inf = states$inf_w,
                       w_death = states$deaths_w,
                       b_inf = states$inf_b,
                       b_death = states$deaths_b,
                       l_inf = states$inf_l,
                       l_death = states$deaths_l,
                       prop_nw_inf = states$inf_prop_nw,
                       prop_nw_death = states$deaths_prop_nw,
                       prop_nw = states$prop_nw,
                       prop_w_inf = states$inf_prop_w,
                       prop_w_death = states$deaths_prop_w,
                       prop_w = states$prop_w,
                       prop_b_inf = states$inf_prop_b,
                       prop_b_death = states$deaths_prop_b,
                       prop_b = states$prop_b,
                       prop_l_inf = states$inf_prop_l,
                       prop_l_death = states$deaths_prop_l,
                       prop_l = states$prop_l)

# make it long form
comp_fig_plot <- comp_fig %>% gather("Type", "value", prop_nw_inf:prop_l)

# add "group" column
comp_fig_plot$group <- NA
for(i in 1:nrow(comp_fig_plot)){
  if(str_detect(comp_fig_plot$Type[i], "_nw")){
    comp_fig_plot$group[i] <- "Non-White"
  }
  if(str_detect(comp_fig_plot$Type[i], "_w")){
    comp_fig_plot$group[i] <- "White"
  }
  if(str_detect(comp_fig_plot$Type[i], "_b")){
    comp_fig_plot$group[i] <- "Black"
  }
  if(str_detect(comp_fig_plot$Type[i], "_l")){
    comp_fig_plot$group[i] <- "Latinx"
  }
}

# add rounded colum for label
comp_fig_plot$proportion <- round(comp_fig_plot$value, 2)

# make nicer labels
comp_fig_plot$type <- NA
comp_fig_plot[comp_fig_plot$Type == "prop_nw_inf" |
                comp_fig_plot$Type == "prop_w_inf" |
                comp_fig_plot$Type == "prop_b_inf" |
                comp_fig_plot$Type == "prop_l_inf", ]$type <- "Proportion of Infections"
comp_fig_plot[comp_fig_plot$Type == "prop_nw_death" |
                comp_fig_plot$Type == "prop_w_death" |
                comp_fig_plot$Type == "prop_b_death" |
                comp_fig_plot$Type == "prop_l_death", ]$type <- "Proportion of Deaths"
comp_fig_plot[comp_fig_plot$Type == "prop_nw" |
                comp_fig_plot$Type == "prop_w" |
                comp_fig_plot$Type == "prop_b" |
                comp_fig_plot$Type == "prop_l", ]$type <- "Proportion of People"

# remove non-white
comp_fig_plot<- comp_fig_plot[comp_fig_plot$group != "Non-White", ]

# add column with cases
# little function to do it
sort_cases_comp <- function(case_type, i) {
  if(str_detect(case_type, "inf") & str_detect(case_type, "_w_")) {return(comp_fig_plot$w_inf[i])}
  if(str_detect(case_type, "inf") & str_detect(case_type, "_b")) {return(comp_fig_plot$b_inf[i])}
  if(str_detect(case_type, "inf") & str_detect(case_type, "_l")) {return(comp_fig_plot$l_inf[i])}
  if(str_detect(case_type, "_d") & str_detect(case_type, "_w")) {return(comp_fig_plot$w_death[i])}
  if(str_detect(case_type, "_d") & str_detect(case_type, "_b")) {return(comp_fig_plot$b_death[i])}
  if(str_detect(case_type, "_d") & str_detect(case_type, "_l")) {return(comp_fig_plot$l_death[i])}
  if(str_detect(case_type, "prop")){return(paste0("total infections: ", comp_fig_plot$t_inf[i], "\n", 
                                               "total deaths: ", comp_fig_plot$t_death[i]))}
}

# loop through (this is a slow way to do it...)
for(i in 1:nrow(comp_fig_plot)){
  comp_fig_plot$cases[i] <- sort_cases_comp(comp_fig_plot$Type[i], i)
}

d1 <- ggplot(comp_fig_plot) + 
  geom_point(aes(x = state, y = value, shape = type, label = proportion, color = adi, 
                   text = paste0(ifelse(str_detect(type, "Inf"), "infections: ", 
                                        ifelse(str_detect(type, "Dea"), "deaths:", "")), cases))) + 
  facet_wrap(~group) +
  coord_flip() + 
  ggtitle(paste0("Infection and Death Proportions by Race/Ethnicity (", format(as.Date(data_rec$day[1]), "%B %d, %Y"), ")")) + 
  ylab("") + 
  xlab("") + 
  ylim(0, 1) +
  scale_color_gradient(low = "#ffffcc", high = "#bd0026") +
  theme(plot.title = element_text(hjust = 0.5), 
        panel.background = element_rect(fill = NA), 
        panel.grid = element_line(color = "grey75"), 
        legend.key = element_rect(fill = NA), 
        legend.position = "top", 
        legend.title = element_blank()) +
  labs(color = "Area Deprivation Index, 2015")

ggplotly(d1, width = 1100, height = 1100, tooltip = c("x", "shape", "label", "colour", "text"))

Maps from February 10, 2021

We can also compare the most recent The COVID Tracking Project data with the 2015 Area Deprivation Index and the 2019 estimated proportion of Black, Latinx, and White people in each state. The left column shows the most recent infection difference data (February 10, 2021) for Black, Latinx, and White people as compared to their population proportion in each state (right column). The more red a state is, the greater the difference of Black, Latinx, and White COVID-19 infections to Black, Latinx, and White people in a state.

# exclude AK and HI for map
states_cont <- states[states$STUSPS %nin% c("AK", "HI"), ]

# create common theme
theme_map <- theme(plot.title = element_text(hjust = 0.5), legend.position = "right", 
        legend.key = element_rect(fill = NA), legend.text = element_text(size = 6.5), 
        legend.key.size = unit(0.5,"line"), axis.text.x = element_blank(), 
        axis.text.y = element_blank(), axis.ticks.x = element_blank(), 
        axis.ticks.y = element_blank(), axis.title.x = element_blank(), 
        axis.title.y = element_blank(), panel.background = element_blank(),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank())

p1 <- ggplot() + geom_sf(data = states_cont, aes(fill = inf_diff_b)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) + 
  ggtitle(paste("COVID-19 Infection Difference*\nin Black Individuals\n",
                format(data_rec$day[1], "%B %d, %Y"))) +
  labs(fill = "Difference") +
  theme_map
p2 <- ggplot() + geom_sf(data = states_cont, aes(fill = prop_b)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  ggtitle("Estimated Proportion of Black Individuals, 2019") +
  labs(fill = "Proportion") +
  theme_map
p3 <- ggplot() + geom_sf(data = states_cont, aes(fill = inf_diff_l)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) + 
  ggtitle(paste("COVID-19 Infection Difference*\nin Latinx Individuals\n",
                format(data_rec$day[1], "%B %d, %Y"))) +
  labs(fill = "Difference") +
  theme_map
p4 <- ggplot() + geom_sf(data = states_cont, aes(fill = prop_l)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  ggtitle("Estimated Proportion of Latinx Individuals, 2019") +
  labs(fill = "Proportion") +
  theme_map
p5 <- ggplot() + geom_sf(data = states_cont, aes(fill = inf_diff_w)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) + 
  ggtitle(paste("COVID-19 Infection Difference*\nin White Individuals\n",
                format(data_rec$day[1], "%B %d, %Y"))) +
  labs(fill = "Difference") +
  theme_map
p6 <- ggplot() + geom_sf(data = states_cont, aes(fill = prop_w)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  ggtitle("Estimated Proportion of White Individuals, 2019") +
  labs(fill = "Proportion") +
  theme_map

gridExtra::grid.arrange(p1, p2, p3, p4, p5, p6, 
                        nrow = 3,
                        bottom = text_grob("*Difference in Proportion of COVID-19 Infections\nto Population Proportion",
                                           just = "left"))

Trends in Differences and Ratios

Finally COVID-19 infection and death differences and ratios are plotted by state over time. The colors represent the median 2015 Area Deprivation Index in each state and have the same scale as above, where darker colors indicate a higher percentile of area deprivation. As an example, in Tennessee, where I am based, as of February 10, 2021 there were 46623 infections among Latinx people, which represented 0.06 of all infections in Tennessee. However, Latinx people only make up 0.06 of all people living in Tennessee, so the infection difference is 0.06 - 0.06, which equals 0, and the infection ratio is 0.06/0.06, which equals 1.08.

The infection and death differences/ratios are flat or increasing over time in the majority of states, although any interpretation should be mindful that data may not be reported consistently from one date to the next. Furthermore, given very striking inequities in some states, other states may appear to be doing relatively well by comparison, however, when you hover over the raw values you will see there is still obvious evidence of health inequities in most states, which suggests deeply entrenched forces and/or structures, including systemic racism, underlying these inequities.

To compare COVID-19 health inequities across states on one plot, please visit https://dannysack.shinyapps.io/covid_inequities/ (please note, it can take a minute for the server to wake up).

# make long form verions of infection proportion differences
data_inf_diff <- data %>% gather("type", "value", inf_diff_nw:inf_diff_l) %>%
  select(date, state, inf_white, inf_black, inf_hisp, d_white, d_black, d_hisp, type, value, adi)

# add "group" column
data_inf_diff$group <- NA
for(i in 1:nrow(data_inf_diff)){
  if(str_detect(data_inf_diff$type[i], "_nw")){
    data_inf_diff$group[i] <- "Non-White"
  }
  if(str_detect(data_inf_diff$type[i], "_w")){
    data_inf_diff$group[i] <- "White"
  }
  if(str_detect(data_inf_diff$type[i], "_b")){
    data_inf_diff$group[i] <- "Black"
  }
  if(str_detect(data_inf_diff$type[i], "_l")){
    data_inf_diff$group[i] <- "Latinx"
  }
}

# add rounded colum for label
data_inf_diff$difference <- round(data_inf_diff$value, 2)

# make nicer labels
data_inf_diff$measure <- NA
data_inf_diff[data_inf_diff$type == "inf_diff_nw" |
                data_inf_diff$type == "inf_diff_w" |
                data_inf_diff$type == "inf_diff_b" |
                data_inf_diff$type == "inf_diff_l", ]$measure <- "Infection Difference"

# remove non-white rows
data_inf_diff <- data_inf_diff[data_inf_diff$group != "Non-White", ]

# now get the correct value into a new column, "cases"
for(i in 1:nrow(data_inf_diff)) {
  if(str_detect(data_inf_diff$type[i], "inf") & str_detect(data_inf_diff$type[i], "_w")) {
    data_inf_diff$cases[i] <- data_inf_diff$inf_white[i]}
  if(str_detect(data_inf_diff$type[i], "inf") & str_detect(data_inf_diff$type[i], "_b")) {
    data_inf_diff$cases[i] <- data_inf_diff$inf_black[i]}
  if(str_detect(data_inf_diff$type[i], "inf") & str_detect(data_inf_diff$type[i], "_l")) {
    data_inf_diff$cases[i] <- data_inf_diff$inf_hisp[i]}
}

f1 <- ggplot(data_inf_diff) + 
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_point(aes(x = date, y = value, color = adi, label = difference, 
                                             name = measure, text = paste0("infections: ", cases))) +
  facet_grid(state ~ group, switch = "y") + 
  ggtitle("Differences in COVID-19 Infections") +
  xlab("Date (hover for date)") +
  ylab("Difference") +
  scale_color_gradient(low = "#ffffcc", high = "#bd0026") +
  theme_bw() +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.title = element_text(hjust = 0.5),
        legend.position = "bottom",
        legend.direction = "horizontal") +
  labs(color = "Area Deprivation Index, 2015") 

# make long form verions of death proportion differences
data_d_diff <- data %>% gather("type", "value", d_diff_nw:d_diff_l) %>%
  select(date, state, inf_white, inf_black, inf_hisp, d_white, d_black, d_hisp, type, value, adi)

# add "group" column
data_d_diff$group <- NA
for(i in 1:nrow(data_d_diff)){
  if(str_detect(data_d_diff$type[i], "_nw")){
    data_d_diff$group[i] <- "Non-White"
  }
  if(str_detect(data_d_diff$type[i], "_w")){
    data_d_diff$group[i] <- "White"
  }
  if(str_detect(data_d_diff$type[i], "_b")){
    data_d_diff$group[i] <- "Black"
  }
  if(str_detect(data_d_diff$type[i], "_l")){
    data_d_diff$group[i] <- "Latinx"
  }
}

# add rounded colum for label
data_d_diff$difference <- round(data_d_diff$value, 2)

# make nicer labels
data_d_diff$measure <- NA
data_d_diff[data_d_diff$type == "d_diff_nw" |
                data_d_diff$type == "d_diff_w" |
                data_d_diff$type == "d_diff_b" |
                data_d_diff$type == "d_diff_l", ]$measure <- "Death Difference"

# remove non-white rows
data_d_diff <- data_d_diff[data_d_diff$group != "Non-White", ]

# now get the correct value into a new column, "cases"
for(i in 1:nrow(data_d_diff)) {
  if(str_detect(data_d_diff$type[i], "d_") & str_detect(data_d_diff$type[i], "_w")) {
    data_d_diff$cases[i] <- data_d_diff$d_white[i]}
  if(str_detect(data_d_diff$type[i], "d_") & str_detect(data_d_diff$type[i], "_b")) {
    data_d_diff$cases[i] <- data_d_diff$d_black[i]}
  if(str_detect(data_d_diff$type[i], "d_") & str_detect(data_d_diff$type[i], "_l")) {
    data_d_diff$cases[i] <- data_d_diff$d_hisp[i]}
}

f2 <- ggplot(data_d_diff) + 
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_point(aes(x = date, y = value, color = adi, label = difference, 
                                             name = measure, text = paste0("deaths: ", cases))) +
  facet_grid(state ~ group, switch = "y") + 
  ggtitle("Differences in COVID-19 Deaths") +
  xlab("Date (hover for date)") +
  ylab("Difference") +
  scale_color_gradient(low = "#ffffcc", high = "#bd0026") +
  theme_bw() +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.title = element_text(hjust = 0.5),
        legend.position = "bottom",
        legend.direction = "horizontal") +
  labs(color = "Area Deprivation Index, 2015") 

# make long form verions of infection proportion differences
data_inf_ratio <- data %>% gather("type", "value", inf_ratio_nw:inf_ratio_l) %>%
  select(date, state, inf_white, inf_black, inf_hisp, d_white, d_black, d_hisp, type, value, adi)

# add "group" column
data_inf_ratio$group <- NA
for(i in 1:nrow(data_inf_ratio)){
  if(str_detect(data_inf_ratio$type[i], "_nw")){
    data_inf_ratio$group[i] <- "Non-White"
  }
  if(str_detect(data_inf_ratio$type[i], "_w")){
    data_inf_ratio$group[i] <- "White"
  }
  if(str_detect(data_inf_ratio$type[i], "_b")){
    data_inf_ratio$group[i] <- "Black"
  }
  if(str_detect(data_inf_ratio$type[i], "_l")){
    data_inf_ratio$group[i] <- "Latinx"
  }
}

# add rounded colum for label
data_inf_ratio$ratio <- round(data_inf_ratio$value, 2)

# make nicer labels
data_inf_ratio$measure <- NA
data_inf_ratio[data_inf_ratio$type == "inf_ratio_nw" |
                data_inf_ratio$type == "inf_ratio_w" |
                data_inf_ratio$type == "inf_ratio_b" |
                data_inf_ratio$type == "inf_ratio_l", ]$measure <- "Infection Ratio"

# remove non-white rows
data_inf_ratio <- data_inf_ratio[data_inf_ratio$group != "Non-White", ]

# now get the correct value into a new column, "cases"
for(i in 1:nrow(data_inf_ratio)) {
  if(str_detect(data_inf_ratio$type[i], "inf") & str_detect(data_inf_ratio$type[i], "_w")) {
    data_inf_ratio$cases[i] <- data_inf_ratio$inf_white[i]}
  if(str_detect(data_inf_ratio$type[i], "inf") & str_detect(data_inf_ratio$type[i], "_b")) {
    data_inf_ratio$cases[i] <- data_inf_ratio$inf_black[i]}
  if(str_detect(data_inf_ratio$type[i], "inf") & str_detect(data_inf_ratio$type[i], "_l")) {
    data_inf_ratio$cases[i] <- data_inf_ratio$inf_hisp[i]}
}

f3 <- ggplot(data_inf_ratio) + 
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_point(aes(x = date, y = value, color = adi, label = ratio, 
                                             name = measure, text = paste0("infections: ", cases))) +
  facet_grid(state ~ group, switch = "y") + 
  ggtitle("Ratios in COVID-19 Infections") +
  xlab("Date (hover for date)") +
  ylab("Ratio") +
  scale_color_gradient(low = "#ffffcc", high = "#bd0026") +
  theme_bw() +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.title = element_text(hjust = 0.5),
        legend.position = "bottom",
        legend.direction = "horizontal") +
  labs(color = "Area Deprivation Index, 2015")

# make long form verions of death proportion differences
data_d_ratio <- data %>% gather("type", "value", d_ratio_nw:d_ratio_l) %>%
  select(date, state, inf_white, inf_black, inf_hisp, d_white, d_black, d_hisp, type, value, adi)

# add "group" column
data_d_ratio$group <- NA
for(i in 1:nrow(data_d_ratio)){
  if(str_detect(data_d_ratio$type[i], "_nw")){
    data_d_ratio$group[i] <- "Non-White"
  }
  if(str_detect(data_d_ratio$type[i], "_w")){
    data_d_ratio$group[i] <- "White"
  }
  if(str_detect(data_d_ratio$type[i], "_b")){
    data_d_ratio$group[i] <- "Black"
  }
  if(str_detect(data_d_ratio$type[i], "_l")){
    data_d_ratio$group[i] <- "Latinx"
  }
}

# add rounded colum for label
data_d_ratio$ratio <- round(data_d_ratio$value, 2)

# make nicer labels
data_d_ratio$measure <- NA
data_d_ratio[data_d_ratio$type == "d_ratio_nw" |
                data_d_ratio$type == "d_ratio_w" |
                data_d_ratio$type == "d_ratio_b" |
                data_d_ratio$type == "d_ratio_l", ]$measure <- "Death Ratio"

# remove non-white rows
data_d_ratio <- data_d_ratio[data_d_ratio$group != "Non-White", ]

for(i in 1:nrow(data_d_ratio)) {
  if(str_detect(data_d_ratio$type[i], "d_") & str_detect(data_d_ratio$type[i], "_w")) {
    data_d_ratio$cases[i] <- data_d_ratio$d_white[i]}
  if(str_detect(data_d_ratio$type[i], "d_") & str_detect(data_d_ratio$type[i], "_b")) {
    data_d_ratio$cases[i] <- data_d_ratio$d_black[i]}
  if(str_detect(data_d_ratio$type[i], "d_") & str_detect(data_d_ratio$type[i], "_l")) {
    data_d_ratio$cases[i] <- data_d_ratio$d_hisp[i]}
}

f4 <- ggplot(data_d_ratio) + 
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  geom_point(aes(x = date, y = value, color = adi, label = ratio, 
                                             name = measure, text = paste0("deaths: ", cases))) +
  facet_grid(state ~ group, switch = "y") + 
  ggtitle("Ratios in COVID-19 Deaths") +
  xlab("Date (hover for date)") +
  ylab("Ratio") +
  scale_color_gradient(low = "#ffffcc", high = "#bd0026") +
  theme_bw() +
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank(),
        plot.title = element_text(hjust = 0.5),
        legend.position = "bottom",
        legend.direction = "horizontal") +
  labs(color = "Area Deprivation Index, 2015") 

ggplotly(f1, width = 800, height = 4000, tooltip = c("name", "x", "label", "colour", "text")) %>% hide_colorbar()

ggplotly(f2, width = 800, height = 4000, tooltip = c("name", "x", "label", "colour", "text")) %>% hide_colorbar()

ggplotly(f3, width = 800, height = 4000, tooltip = c("name", "x", "label", "colour", "text")) %>% hide_colorbar()

ggplotly(f4, width = 800, height = 4000, tooltip = c("name", "x", "label", "colour", "text")) %>% hide_colorbar()

Session Info

sessionInfo()

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rmdformats_1.0.0    ggpubr_0.4.0        sf_0.9-6           
 [4] circlize_0.4.11     knitr_1.30          icon_0.1.0         
 [7] openintro_2.0.0     usdata_0.1.0        cherryblossom_0.1.0
[10] airports_0.1.0      kableExtra_1.3.1    plotly_4.9.2.1     
[13] rms_6.0-1           SparseM_1.78        Hmisc_4.4-1        
[16] Formula_1.2-4       survival_3.2-7      lattice_0.20-41    
[19] forcats_0.5.0       stringr_1.4.0       dplyr_1.0.2        
[22] purrr_0.3.4         readr_1.4.0         tidyr_1.1.2        
[25] tibble_3.0.4        ggplot2_3.3.2       tidyverse_1.3.0    

loaded via a namespace (and not attached):
  [1] TH.data_1.0-10      colorspace_2.0-0    ggsignif_0.6.0     
  [4] rio_0.5.16          ellipsis_0.3.1      class_7.3-17       
  [7] htmlTable_2.1.0     GlobalOptions_0.1.2 base64enc_0.1-3    
 [10] fs_1.5.0            rstudioapi_0.13     farver_2.0.3       
 [13] MatrixModels_0.4-1  fansi_0.4.1         mvtnorm_1.1-1      
 [16] lubridate_1.7.9.2   xml2_1.3.2          codetools_0.2-18   
 [19] splines_4.0.3       jsonlite_1.7.2      broom_0.7.2        
 [22] cluster_2.1.0       dbplyr_2.0.0        png_0.1-7          
 [25] compiler_4.0.3      httr_1.4.2          backports_1.2.0    
 [28] assertthat_0.2.1    Matrix_1.2-18       lazyeval_0.2.2     
 [31] cli_2.2.0           htmltools_0.5.0     quantreg_5.75      
 [34] tools_4.0.3         gtable_0.3.0        glue_1.4.2         
 [37] Rcpp_1.0.5          carData_3.0-4       cellranger_1.1.0   
 [40] vctrs_0.3.5         nlme_3.1-150        conquer_1.0.2      
 [43] crosstalk_1.1.0.1   xfun_0.19           openxlsx_4.2.3     
 [46] rvest_0.3.6         lifecycle_0.2.0     rstatix_0.6.0      
 [49] polspline_1.1.19    MASS_7.3-53         zoo_1.8-8          
 [52] scales_1.1.1        hms_0.5.3           sandwich_3.0-0     
 [55] RColorBrewer_1.1-2  curl_4.3            yaml_2.2.1         
 [58] gridExtra_2.3       rpart_4.1-15        latticeExtra_0.6-29
 [61] stringi_1.5.3       highr_0.8           e1071_1.7-4        
 [64] checkmate_2.0.0     zip_2.1.1           shape_1.4.5        
 [67] rlang_0.4.10        pkgconfig_2.0.3     matrixStats_0.57.0 
 [70] evaluate_0.14       labeling_0.4.2      htmlwidgets_1.5.2  
 [73] tidyselect_1.1.0    magrittr_2.0.1      bookdown_0.21      
 [76] R6_2.5.0            generics_0.1.0      multcomp_1.4-15    
 [79] DBI_1.1.0           pillar_1.4.7        haven_2.3.1        
 [82] foreign_0.8-80      withr_2.3.0         units_0.6-7        
 [85] abind_1.4-5         nnet_7.3-14         car_3.0-10         
 [88] modelr_0.1.8        crayon_1.3.4        KernSmooth_2.23-18 
 [91] rmarkdown_2.5       jpeg_0.1-8.1        grid_4.0.3         
 [94] readxl_1.3.1        data.table_1.13.2   reprex_0.3.0       
 [97] digest_0.6.27       classInt_0.4-3      webshot_0.5.2      
[100] munsell_0.5.0       viridisLite_0.3.0