COVID-19 Inequities

LAST UPDATED February 11, 2021

Introduction

These figures use data (updated twice weekly) from The COVID Tracking Project to present the state-level inequities between proportion of COVID-19 infections and deaths in Black, Latinx, and White individuals in each state as compared to the proportion of Black, Latinx, and White individuals living in each state (estimated from 2019 US Census Bureau Projections). In some figures, I also present median 2015 Area Deprivation Indices by state to align state-wide socioeconomic status with differential COVID-19 infections and deaths in Black, Latinx, and White individuals in each state. The Area Deprivation Index is a measure that accounts for income, education, employment, and housing quality to rank the socioeconomic status of each Census Block Group in the United States.

Please note: On most figures, you can hover over areas of interest or reveal the underlying data (at each time - or the most recent time with data available) or zoom in to view smaller areas of interest.

Limitations

These data have the following limitations (to name a few):

  • These data represent the distribution of data in time and space, not a causal relationship between any of the measures of interest.
  • States have incomplete data about the number of people with COVID-19 infections and deaths, especially when stratified race and ethnicity. This is even more relevant when examing data from early in the pandemic.
  • To call testing uptake slow and variable across states would be an understatement, so earlier dates may reflect heterogeneity in testing uptake instead of heterogeneity in COVID-19 infections and/or deaths.
  • It is not clear that every state collects race and ethnicity data in the same way, for example, some states may use self-report whereas others may rely on race and ethnicity determinations made by health care professionals.
  • The COVID Tracking Project and the US Census Bureau used different categories for race and ethnicity.
  • Many states report COVID-19 infections/deaths in people of unknown race/ethnicity. I have included these data as part of the denominator (total infections/deaths) in my calculations, even though a portion (perhaps even a majority) of these infections/deaths have actually occured in Black or Latinx individuals, so my difference and ratio estimates are underestimates.
  • COVID-19 infections and deaths in the US have all occurred in 2020, whereas my denominator (number of people living in each state) uses the most recent data available, projections from 2019. Furthermore, the Area Deprivation Index data are from 2015.
  • As time progresses on each plot, the legend changes to accommodate the most recent data. This can make values that were relatively high appear lower despite remaining constant because, for example, The COVID Tracking Project started tracking data from a new state on that date with higher values.
  • The Area Deprivation Index represents relative percentiles for each People. To make it comparable with state-level measures of COVID-19 infections and deaths, I took the median of all ADI levels within each state, which obscures the intra-state heterogeneity visible at the Census Block Group level.
  • Previous iterations of this document dichotomized people to White and Non-White in the primary analyses for convenience, and due to concerns about sparse data in more specific categories, which assumes that I categorized White and Non-White correctly from the slightly different definitions from The COVID Tracking Project and the US Census Bureau. However, on further investigation, the differences in data collection were so heterogenous across states that I do not feel those numbers are comparable. This is, in part, due to differences in how different states report infections/deaths among Latinx individuals. Some states report them as Latinx and include them as a race and others report them as Hispanic and report them as an ethinicity. Including them would therefore inflate the Non-White infections/deaths numerator in some states, making the measurements unreliable.

Disclaimers

Although I occasionally stray into interpretations when providing context, the primary purpose of this report is to present the data. I do not attempt to create models, make predictions, or give advice. If you have questions, suggestions, or notice an error (there are likely a few), please reach out (danny_sack)!

Any views and opinions expressed here strickly represent those of the author and are not necessarily those of Vanderbilt University Medical Center, the Vanderbilt Medical Scientist Training Program, or the Division of Epidemiology.

Disclaimer from The COVID Tracking Project: “Our data will always be an undercount. We can only track tests that states report, and not all states report all tests. More significantly, per-capita testing levels in the US remain low, which means that an unknown but probably very large number of people are sick, but aren’t being tested. But this is the data we can collect, and it provides the most detailed information available about the shape and relative severity of outbreaks in US states and territories.”

Disclaimer from the Area Deprivation Index website: “The ADI is limited insofar as it uses American Community Survey (ACS) Five Year Estimates in its construction. For example, the 2015 ADI uses the ACS data for 2015, which is a 5-year average of ACS data obtained from 2011-2015. All limitations of the source data will persist throughout the ADI. The choice of geographic units will also influence the ADI value. In the case of the ADI the Census Block Group is the geographic unit of construction, as the Census Block Group is considered the closest approximation to a”neighborhood“. All results are subject to the accuracy and errors contained within the American Community Survey data release.”

Resources

While many excellent resources exist with more information on and strategies to address health inequities and the structures that underlie them (such as racism), the following are a good a place to start:

# pull data from https://covidtracking.com/race/dashboard
df <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vS8SzaERcKJOD_EzrtCDK1dX1zkoMochlA9iHoHg_RSw3V8bkpfk1mpw4pfL5RdtSOyx_oScsUtyXyk/pub?gid=43720681&single=true&output=csv")

names(df)[1] <- "day"
names(df)[2] <- "state"

df$day <- as.Date(as.character(df$day), "%Y%m%d")

# get rid of random NAs at the bottom
df <- subset(df, !is.na(day))

# just infections
infxn <- df[, c(1:15)]

# clean up columns
inf <- data.frame(day = infxn$day, 
                  date = NA,
                  state = infxn$state,
                  tot_inf = infxn$Cases_Total, 
                  inf_white = infxn$Cases_White,
                  inf_black = infxn$Cases_Black,
                  inf_hisp = NA,
                  inf_asian = infxn$Cases_Asian,
                  inf_aian = infxn$Cases_AIAN,
                  inf_nhpi = infxn$Cases_NHPI,
                  inf_mulra = infxn$Cases_Multiracial,
                  inf_other = infxn$Cases_Other,
                  inf_unk = infxn$Cases_Unknown)
# hispanic, take min of two measurements
for(i in 1:length(inf$inf_hisp)){
  inf$inf_hisp[i] <- min(as.numeric(infxn$Cases_LatinX[i]), 
                     as.numeric(infxn$Cases_Ethnicity_Hispanic[i]), na.rm = TRUE)
  if(is.infinite(inf$inf_hisp[i])) {inf$inf_hisp[i] <- NA}
}

# order dates, don't really know why this works...
inf$date <- factor(inf$day, levels = format(sort(unique(as.Date(inf$day))), "%b %d, %Y"))
for(i in 1:nrow(inf)){
  inf$date[i] <- format(inf$day[i], "%b %d, %Y")
}

# label columns
label(inf$day) <- "Day"
label(inf$date) <- "Date"
label(inf$state) <- "State"

# now make numeric
for(i in c("tot_inf", "inf_white", "inf_black", "inf_hisp", "inf_asian", "inf_aian", 
           "inf_nhpi", "inf_mulra", "inf_other", "inf_unk")) {
  inf[, c(i)] <- as.numeric(inf[, c(i)])
}

label(inf$tot_inf) <- "Total Infections"
label(inf$inf_white) <- "Reported Infections in White People"
label(inf$inf_black) <- "Reported Infections in Black People"
label(inf$inf_hisp) <- "Reported Infections in Latinx People"
label(inf$inf_asian) <- "Reported Infections in Asian People"
label(inf$inf_aian) <- "Reported Infections in AIAN People"
label(inf$inf_nhpi) <- "Reported Infections in NHPI People"
label(inf$inf_mulra) <- "Reported Infections in Multiracial People"
label(inf$inf_other) <- "Reported Infections in People on Another Race"
label(inf$inf_unk) <- "Reported Infections in People with an Unknown Race"

# do the same thing for deaths
# just deaths
deaths <- df[, c(1:2, 16:26)]

# names
names(deaths) <- c("day", "state", "tot_d", "d_white", "d_black", "d_hisp",
                   "d_asian", "d_aian", "d_nhpi", "d_mulra", "d_other", "d_unk",
                   "d_hisp1")

# order dates, don't really know why this works...
deaths$date <- factor(deaths$day, levels = format(sort(unique(as.Date(deaths$day))), "%b %d, %Y"))
for(i in 1:nrow(deaths)){
  deaths$date[i] <- format(deaths$day[i], "%b %d, %Y")
}

# label columns
label(deaths$day) <- "Day"
label(deaths$date) <- "Date"
label(deaths$state) <- "State"

# now make numeric
for(i in c("tot_d", "d_white", "d_black", "d_hisp", "d_asian", 
           "d_aian", "d_nhpi", "d_mulra", "d_other", "d_unk",
                   "d_hisp1")) {
  deaths[, c(i)] <- as.numeric(deaths[, c(i)])
}
# hispanic, take min of two measurements
for(i in 1:length(deaths$d_hisp)){
  deaths$d_hisp[i] <- min(deaths$d_hisp[i], deaths$d_hisp1[i], na.rm = TRUE)
  if(is.infinite(deaths$d_hisp[i])) {deaths$d_hisp[i] <- NA}
}

deaths$d_hisp1 <- NULL

label(deaths$tot_d) <- "Total Deaths"
label(deaths$d_white) <- "Reported Deaths in White People"
label(deaths$d_black) <- "Reported Deaths in Black People"
label(deaths$d_hisp) <- "Reported Deaths in Latinx People"
label(deaths$d_asian) <- "Reported Deaths in Asian People"
label(deaths$d_aian) <- "Reported Deaths in AIAN People"
label(deaths$d_nhpi) <- "Reported Deaths in NHPI People"
label(deaths$d_mulra) <- "Reported Deaths in Multiracial People"
label(deaths$d_other) <- "Reported Deaths in People on Another Race"
label(deaths$d_unk) <- "Reported Deaths in People with an Unknown Race"

# merge inf and death
data <- merge(inf, deaths, by = c("day", "date", "state"))

# https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-detail.html
# Annual County Resident Population Estimates by Age, Sex, Race, and Hispanic Origin: April 1, 2010 to July 1, 2019 (CC-EST2019-ALLDATA), only for 2019, which is 12, the 7/1/2019 population estimate, and make some new categories
# Sex, Race, and Hispanic Origin
# Annual Estimates of the Resident Population by Sex, Race, and Hispanic Origin

all19 <- read_csv("https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/asrh/cc-est2019-alldata.csv") %>% filter(YEAR == 12, AGEGRP == 0) %>%
  mutate(NHWA = NHWA_MALE + NHWA_FEMALE, # non hispanic whites
         NHBA = NHBA_MALE + NHBA_FEMALE, # non hispanic blacks
         NHAA = NHAA_MALE + NHAA_FEMALE, # non hispanic asians
         NHIA = NHIA_MALE + NHIA_FEMALE, # non hispanic american indian
         NHNA = NHNA_MALE + NHNA_FEMALE, # non hispanic native hawaiian
         NHTOM = NHTOM_MALE + NHTOM_FEMALE, # non hispanic two or more
         H = H_MALE + H_FEMALE # hispanic
         ) 
# make a new dataset to merge
pop <- data.frame(state = state2abbr(pop19$STNAME),
                  tot_pop = pop19$TOT_POP,
                  tot_white = pop19$NHWA,
                  tot_black = pop19$NHBA,
                  tot_hisp = pop19$H,
                  tot_asian = pop19$NHAA,
                  tot_aian = pop19$NHIA,
                  tot_nhpi = pop19$NHNA,
                  tot_mulra = pop19$NHTOM)

# shape file of US states
# https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
# https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_500k.zip

# read in shapefile
states <- st_read("cb_2018_us_state_500k/cb_2018_us_state_500k.shp")

# read in adi file
# from https://www.neighborhoodatlas.medicine.wisc.edu
adi <- read.table("adi/US_blockgroup_15.txt", header = TRUE, sep = ",")

# states with geo codes less than 10 only have 11 digits
adi$state <- ifelse(nchar(adi$fips) == 11, substr(adi$fips, 1, 1), substr(adi$fips, 1, 2))

# make adi numeric
adi$adi_natrank <- as.numeric(adi$adi_natrank)

# add adi to state name

states$adi <- NA
# median adi per state
for(i in 1:length(states$adi)){
  # median
  states$adi[i] <- median(adi$adi_natrank[adi$state == as.numeric(states$GEOID[i])], na.rm = TRUE)
  if(is.nan(states$adi[i])) {states$adi[i] <- NA}
}

# merge these population data with "data"

# remove rows with states not in pop (american samoa, guam, etc.)
for(i in 1:nrow(data)){
  if(data$state[i] %nin% pop$state) {
    data <- data[-c(i), ]
  }
}

# initialize columns
data$tot_pop <- data$tot_white <- data$tot_black <- 
  data$tot_hisp <- data$tot_asian <- data$tot_aian <- 
  data$tot_nhpi <- data$tot_mulra <- NA

for(i in 1:nrow(data)) {
  data$tot_pop[i] <- pop[pop$state == data$state[i], "tot_pop"]
  data$tot_white[i] <- pop[pop$state == data$state[i], "tot_white"]
  data$tot_black[i] <- pop[pop$state == data$state[i], "tot_black"]
  data$tot_hisp[i] <- pop[pop$state == data$state[i], "tot_hisp"]
  data$tot_asian[i] <- pop[pop$state == data$state[i], "tot_asian"]
  data$tot_aian[i] <- pop[pop$state == data$state[i], "tot_aian"]
  data$tot_nhpi[i] <- pop[pop$state == data$state[i], "tot_nhpi"]
  data$tot_mulra[i] <- pop[pop$state == data$state[i], "tot_mulra"]
}

# label columns
label(data$tot_pop) <- "2019 Estimated Total Population"
label(data$tot_white) <- "2019 Estimated White Population"
label(data$tot_black) <- "2019 Estimated Black Population"
label(data$tot_hisp) <- "2019 Estimated Hispanic Population"
label(data$tot_asian) <- "2019 Estimated Asian Population"
label(data$tot_aian) <- "2019 Estimated AIAN Population"
label(data$tot_nhpi) <- "2019 Estimated NHPI Population"
label(data$tot_mulra) <- "2019 Estimated Multiracial Population"

# proportion non-white in each state
for(i in 1:nrow(data)){
  data$prop_nw_pop[i] <- sum(data[i, c("tot_black", "tot_hisp", "tot_asian",
                                 "tot_aian", "tot_nhpi", "tot_mulra")]) / data$tot_pop[i]
}
label(data$prop_nw_pop) <- "2019 Estimated Proportion of Non-White Population"

# create column in data dataframe with proportion of infections and deaths in nonwhite, 
# white, black, and latinx people
for(i in 1:nrow(data)){
  # total infections
  # non-white folks
  data$non_white_inf[i] <- sum(data[i, c("inf_black", "inf_asian",
                                    "inf_nhpi", "inf_aian", "inf_mulra", "inf_other")], na.rm = TRUE)
  data$prop_nw_inf[i] <- sum(as.numeric(data[i, c("inf_black", "inf_asian",
                                              "inf_nhpi", "inf_aian", "inf_mulra", "inf_other")]), 
                             na.rm = TRUE) / sum(data[i, c("tot_inf")], na.rm = TRUE)
  # white, black, and latinx folks if more than 25 infections per group
  # white
  if(data$inf_white[i] > 0 & !is.na(data$inf_white[i])) {data$prop_w_inf[i] <- data$inf_white[i] / data$tot_inf[i]}
  else{data$prop_w_inf[i] <- NA}
  # black
  if(data$inf_black[i] > 0 & !is.na(data$inf_black[i])) {data$prop_b_inf[i] <- data$inf_black[i] / data$tot_inf[i]}
  else{data$prop_b_inf[i] <- NA}
  # latinx
  if(data$inf_hisp[i] > 0 & !is.na(data$inf_hisp[i])) {data$prop_l_inf[i] <- data$inf_hisp[i] / data$tot_inf[i]}
  else{data$prop_l_inf[i] <- NA}
  # total deaths in non-white folks
  data$non_white_d[i] <- sum(data[i, c("d_black", "d_asian",
                                    "d_nhpi", "d_aian", "d_mulra", "d_other")], na.rm = TRUE)
  data$prop_nw_d[i] <- sum(as.numeric(data[i, c("d_black", "d_asian",
                                              "d_nhpi", "d_aian", "d_mulra", "d_other")]), 
                        na.rm = TRUE) / sum(data[i, c("tot_d")], na.rm = TRUE)
  # white, black, and latinx folks if more than 25 deaths per group
  # white
  if(data$d_white[i] > 0 & !is.na(data$d_white[i])) {data$prop_w_d[i] <- data$d_white[i] / data$tot_d[i]}
  else{data$prop_w_d[i] <- NA}
  # black
  if(data$d_black[i] > 0 & !is.na(data$d_black[i])) {data$prop_b_d[i] <- data$d_black[i] / data$tot_d[i]}
  else{data$prop_b_d[i] <- NA}
  # latinx
  if(data$d_hisp[i] > 0 & !is.na(data$d_hisp[i])) {data$prop_l_d[i] <- data$d_hisp[i] / data$tot_d[i]}
  else{data$prop_l_d[i] <- NA}
}
label(data$non_white_inf) <- "Reported Infections in Non-White People"
label(data$prop_nw_inf) <- "Proportion of Infections in Non-White People"
label(data$non_white_d) <- "Reported Deaths in Non-White People"
label(data$prop_nw_d) <- "Proportion of Deaths in Non-White People"
label(data$prop_w_inf) <- "Proportion of Infections in White People"
label(data$prop_w_d) <- "Proportion of Deaths in White People"
label(data$prop_b_inf) <- "Proportion of Infections in Black People"
label(data$prop_b_d) <- "Proportion of Deaths in Black People"
label(data$prop_l_inf) <- "Proportion of Infections in Latinx People"
label(data$prop_l_d) <- "Proportion of Deaths in Latinx People"

# get rid of infinte and non-numeric values created by missing data (and inadvertent zeros)
for(i in 1:nrow(data)){
  # infections
  if(is.infinite(data$non_white_inf[i]) | is.nan(data$non_white_inf[i]) | data$non_white_inf[i] == 0) {
    data$non_white_inf[i] <- NA
  }
  if(is.infinite(data$prop_nw_inf[i]) | is.nan(data$prop_nw_inf[i]) | data$prop_nw_inf[i] == 0) {
    data$prop_nw_inf[i] <- NA
  }
  # deaths
  if(is.infinite(data$non_white_d[i]) | is.nan(data$non_white_d[i]) | data$non_white_d[i] == 0) {
    data$non_white_d[i] <- NA
  }
  if(is.infinite(data$prop_nw_d[i]) | is.nan(data$prop_nw_d[i]) | data$prop_nw_d[i] == 0) {
    data$prop_nw_d[i] <- NA
  }
}

# now calculate the proportion of infections/deaths in non-white versus the population proportion of non-whites
# infections
# non-white
data$inf_ratio_nw <- data$prop_nw_inf / data$prop_nw_pop
label(data$inf_ratio_nw) <- "Ratio of Non-White Infection Proportion<br>to Non-White Population Proportion"
# whites
data$inf_ratio_w <- data$prop_w_inf / (data$tot_white / data$tot_pop)
label(data$inf_ratio_w) <- "Ratio of Non-White Infection Proportion<br>to White Population Proportion"
# blacks
data$inf_ratio_b <- data$prop_b_inf / (data$tot_black / data$tot_pop)
label(data$inf_ratio_b) <- "Ratio of Non-White Infection Proportion<br>to Black Population Proportion"
# latinx
data$inf_ratio_l <- data$prop_l_inf / (data$tot_hisp / data$tot_pop)
label(data$inf_ratio_l) <- "Ratio of Non-White Infection Proportion<br>to Latinx Population Proportion"
# deaths
# non-whites
data$d_ratio_nw <- data$prop_nw_d / data$prop_nw_pop
label(data$d_ratio_nw) <- "Ratio of Non-White Death Proportion<br>to Non-White Population Proportion"
# whites
data$d_ratio_w <- data$prop_w_d / (data$tot_white / data$tot_pop)
label(data$d_ratio_w) <- "Ratio of Non-White Death Proportion<br>to White Population Proportion"
# blacks
data$d_ratio_b <- data$prop_b_d / (data$tot_black / data$tot_pop)
label(data$d_ratio_b) <- "Ratio of Non-White Death Proportion<br>to Black Population Proportion"
# latinx
data$d_ratio_l <- data$prop_l_d / (data$tot_hisp / data$tot_pop)
label(data$d_ratio_l) <- "Ratio of Non-White Infection Proportion<br>to Latinx Population Proportion"

# get rid of places that are not reporting infection by race
for(i in 1:length(data$inf_ratio_nw)){
  # infections
  if(data$inf_ratio_nw[i] == 0 & !is.na(data$inf_ratio_nw[i])) {
    data$inf_ratio_nw[i] <- NA
  }
  # deaths
  if(data$d_ratio_nw[i] == 0 & !is.na(data$d_ratio_nw[i])) {
    data$d_ratio_nw[i] <- NA
  }
}

# calculate differences in proportions
# infections
# non-whites
data$inf_diff_nw <- data$prop_nw_inf - data$prop_nw_pop
label(data$inf_diff_nw) <- "Difference between Non-White Infection Proportion<br>and Non-White Population Proportion"
# whites
data$inf_diff_w <- data$prop_w_inf - (data$tot_white / data$tot_pop)
label(data$inf_diff_w) <- "Difference of Non-White Infection Proportion<br>to White Population Proportion"
# blacks
data$inf_diff_b <- data$prop_b_inf - (data$tot_black / data$tot_pop)
label(data$inf_diff_b) <- "Difference of Non-White Infection Proportion<br>to Black Population Proportion"
# latinx
data$inf_diff_l <- data$prop_l_inf - (data$tot_hisp / data$tot_pop)
label(data$inf_diff_l) <- "Ratio of Non-White Infection Proportion<br>to Latinx Population Proportion"
# deaths
# non-whites
data$d_diff_nw <- data$prop_nw_d - data$prop_nw_pop
label(data$d_diff_nw) <- "Difference between Non-White Infection Proportion<br>and Non-White Population Proportion"
# whites
data$d_diff_w <- data$prop_w_d - (data$tot_white / data$tot_pop)
label(data$d_diff_w) <- "Difference of Non-White Death Proportion<br>to White Population Proportion"
# blacks
data$d_diff_b <- data$prop_b_d - (data$tot_black / data$tot_pop)
label(data$d_diff_b) <- "Difference of Non-White Death Proportion<br>to Black Population Proportion"
# latinx
data$d_diff_l <- data$prop_l_d - (data$tot_hisp / data$tot_pop)
label(data$d_diff_l) <- "Difference of Non-White Infection Proportion<br>to Latinx Population Proportion"

# get rid of places that are not reporting infection by race
for(i in 1:length(data$inf_diff_nw)){
  # infections
  if(is.na(data$prop_nw_inf[i])) {
    data$inf_diff_nw[i] <- NA
  }
  # deaths
  if(is.na(data$prop_nw_d[i])) {
    data$d_diff_nw[i] <- NA
  }
}

# calculate infections/deaths per 100000 people in each state
# infections
data$infpp <- (data$tot_inf / data$tot_pop) * 100000
label(data$infpp) <- "Infections per 100,000"
# infections
data$dpp <- (data$tot_d / data$tot_pop) * 100000
label(data$dpp) <- "Infections per 100,000"

# remove territories from states dataframe
states <- states[states$STUSPS %in% data$state, ]

# now add in adi to data
data$adi <- NA
for(i in 1:nrow(data)){
  for(j in 1:nrow(states)){
    if(data$state[i] == states$STUSPS[j]){
      data$adi[i] <- states$adi[j]
    }
  }
}
label(data$adi) <- "Area Deprivation Index, 2015"

# get most recent date deaths an infection dataframes
data_rec <- subset(data, day == max(unique(day)))

# add in infections and death ratio and difference and raw numbers and proportions
states$inf_tot <- NA
states$death_tot <- NA
states$inf_nw <- NA
states$deaths_nw <- NA
states$inf_w <- NA
states$deaths_w <- NA
states$inf_b <- NA
states$deaths_b <- NA
states$inf_l <- NA
states$deaths_l <- NA
states$inf_prop_nw <- NA
states$deaths_prop_nw <- NA
states$inf_prop_w <- NA
states$deaths_prop_w <- NA
states$inf_prop_b <- NA
states$deaths_prop_b <- NA
states$inf_prop_l <- NA
states$deaths_prop_l <- NA
states$inf_ratio_nw <- NA
states$death_ratio_nw <- NA
states$inf_diff_nw <- NA
states$death_diff_nw <- NA
states$inf_ratio_w <- NA
states$death_ratio_w <- NA
states$inf_diff_w <- NA
states$death_diff_w <- NA
states$inf_ratio_b <- NA
states$death_ratio_b <- NA
states$inf_diff_b <- NA
states$death_diff_b <- NA
states$inf_ratio_l <- NA
states$death_ratio_l <- NA
states$inf_diff_l <- NA
states$death_diff_l <- NA
for(i in 1:length(states$death_ratio_nw)){
  # raw numbers
  states$inf_tot[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_inf"]
  states$death_tot[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_d"]
  states$inf_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "non_white_inf"]
  states$death_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "non_white_d"]
  states$inf_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_white"]
  states$deaths_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_white"]
  states$inf_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_black"]
  states$deaths_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_black"]
  states$inf_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_hisp"]
  states$deaths_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_hisp"]
  states$inf_prop_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_nw_inf"]
  states$deaths_prop_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_nw_d"]
  states$inf_prop_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_w_inf"]
  states$deaths_prop_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_w_d"]
  states$inf_prop_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_b_inf"]
  states$deaths_prop_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_b_d"]
  states$inf_prop_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_l_inf"]
  states$deaths_prop_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_l_d"]
  # ratio
  states$inf_ratio_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_nw"]
  states$death_ratio_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_nw"]
  states$inf_ratio_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_w"]
  states$death_ratio_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_w"]
  states$inf_ratio_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_b"]
  states$death_ratio_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_b"]
  states$inf_ratio_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_ratio_l"]
  states$death_ratio_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_ratio_l"]
  # difference
  states$inf_diff_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_nw"]
  states$death_diff_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_nw"]
  states$inf_diff_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_w"]
  states$death_diff_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_w"]
  states$inf_diff_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_b"]
  states$death_diff_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_b"]
  states$inf_diff_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "inf_diff_l"]
  states$death_diff_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "d_diff_l"]
}

# add proportion of non-white people in each state
states$prop_nw <- NA
states$prop_w <- NA
states$prop_b <- NA
states$prop_l <- NA
for(i in 1:length(states$prop_nw)){
  states$prop_nw[i] <- data_rec[data_rec$state == states$STUSPS[i], "prop_nw_pop"]
  states$prop_w[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_white"] / 
    data_rec[data_rec$state == states$STUSPS[i], "tot_pop"]
  states$prop_b[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_black"] / 
    data_rec[data_rec$state == states$STUSPS[i], "tot_pop"]
  states$prop_l[i] <- data_rec[data_rec$state == states$STUSPS[i], "tot_hisp"] / 
    data_rec[data_rec$state == states$STUSPS[i], "tot_pop"]
}

# set up for ploting
# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white')
)

Population by State, 2019

This first map provides context via estimates of how many people lived in each state in 2019 per the US Census Bureau. California (39,512,223), Texas (28,995,881), Florida (21,477,737), and New York (19,453,561) were the four states with the highest estimated population in 2019.

COVID-19 by State

These next maps show the raw number of infections and deaths that each state reported per The COVID Tracking Project. These data do not account for the population of each state. As of February 10, 2021, California has the most COVID-19 infections (3362981) and New York has the most deaths from COVID-19 (45567).

COVID-19 per 100,000 People

We can now move to examine the number of infections and deaths per 100,000 people (using 2019 US Census Bureau estimates) that each state reported to The COVID Tracking Project. As of February 10, 2021, North Dakota has the most COVID-19 infections (12887.93 per 100,000 people) and New Jersey has the most deaths from COVID-19 (250.5 per 100,000 people).

COVID-19 by Race/Ethinicity

Introduction

Finally, we can examine the ratio of COVID-19 infections and deaths in Black, Latinx, and White people to the proportion of Black, Latinx, and White people in each state. I calculated each difference and ratio with the following formulas, which generalize to calculations for Latinx, and White people:


\(\begin{align} \text{Proportion of Black People }(P_{b}) &= \frac{\text{2019 Estimated Black Population}}{\text{2019 Estimated Total Population}} \\ \text{Proportion of Black Infections }(P_{b\_infs}) &= \frac{\text{Reported COVID-19 Infections in Black Individuals}}{\text{Total Reported COVID-19 Infections}} \\ \text{Proportion of Black Deaths }(P_{b\_deaths}) &= \frac{\text{Reported COVID-19 Deaths in Black Individuals}}{\text{Total Reported COVID-19 Deaths}} \\ \text{Infection Difference} &= P_{b\_infs} - P_{b} \\ \text{Death Difference} &= P_{b\_deaths} - P_{b} \\ \text{Infection Ratio} &= \frac{P_{b\_infs}}{P_{b}} \\ \text{Death Ratio} &= \frac{P_{b\_deaths}}{P_{b}} \end{align}\)

As of February 10, 2021, COVID-19 infections and deaths are more prevalent among Black individuals than expected given their population proportion in 12 and 19 states respectively. Understanding that the data are likely incomplete and prone to measurement error, and that deaths typically lag behind infections, the infection and death difference is greater than 0.10 in 0 and 1 states respectively and greater than 0.20 in 0 and 1 states respectively. Furthermore, the infection and death ratio is greater than 1.5 in 2 and 2 states respectively and greater than 2 in 2 and 0 states respectively.

Among Latinx individuals, COVID-19 infections and deaths are more prevalent than expected given their population proportion in 25 and 5 states respectively. Understanding that the data are likely incomplete and prone to measurement error, and that deaths typically lag behind infections, the infection and death difference is greater than 0.10 in 1 and 0 states respectively and greater than 0.20 in 0 and 0 states respectively. Furthermore, the infection and death ratio is greater than 1.5 in 7 and 0 states respectively and greater than 2 in 2 and 0 states respectively.

Finally, among White individuals, COVID-19 infections and deaths are more prevalent than expected given their population proportion in 0 and 19 states respectively. Understanding that the data are likely incomplete and prone to measurement error, and that deaths typically lag behind infections, the infection and death difference is greater than 0.10 in 0 and 1 states respectively and greater than 0.20 in 0 and 0 states respectively. Furthermore, the infection and death ratio is greater than 1.5 in 0 and 0 states respectively and greater than 2 in 0 and 0 states respectively.

Tabular Data from February 10, 2021

We can also examine a table of each state, with each variables of interest. We can also directly compare the proportion of Black,Latinx, and White people (\(\blacksquare\)) to the proportion of COVID-19 infections (\(\blacktriangle\)) and deaths () among Black, Latinx, and White people. Each marker’s color represents the median 2015 Area Deprivation Index in each state.

comp_tab <-  data_rec %>% filter(day == max(data_rec$day)) %>%
  mutate(State = abbr2state(state), 
         "2018 Estimated Total Population" = format(tot_pop, big.mark = ","),
         "Total Infections" = format(tot_inf, big.mark = ","),
         "Total Deaths" = format(tot_d, big.mark = ","),
         "Total Black (% Population)" = paste0(format(tot_black, big.mark = ","),
                                               " (", round(tot_black / tot_pop * 100, 2), ")"),
         "Infections in Black People (% Infections)"= paste0(ifelse(!is.na(inf_black), format(inf_black, big.mark = ","),
                                                                    "-"), " (", 
                                                             ifelse(!is.na(prop_b_inf), 
                                                                    round(prop_b_inf*100, 2), "-"), ")"),
         "Deaths in Black People (% Deaths)"= paste0(ifelse(!is.na(d_black), format(d_black, big.mark = ","),
                                                            "-"), " (", 
                                                     ifelse(!is.na(prop_b_d), round(prop_b_d*100, 2), "-"), ")"),
         "Total Latinx (% Population)" = paste0(format(tot_hisp, big.mark = ","),
                                                " (", round(tot_hisp / tot_pop * 100, 2), ")"),
         "Infections in Latinx People (% Infections)"= paste0(ifelse(!is.na(inf_hisp), format(inf_hisp, big.mark = ","),
                                                                     "-"), " (", 
                                                              ifelse(!is.na(prop_l_inf), 
                                                                            round(prop_l_inf*100, 2), "-"), ")"),
         "Deaths in Latinx People (% Deaths)"= paste0(ifelse(!is.na(d_hisp), format(d_hisp, big.mark = ","),
                                                             "-"), " (", 
                                                      ifelse(!is.na(prop_l_d), round(prop_l_d*100, 2), "-"), ")"),
         "Total White (% Population)" = paste0(format(tot_white, big.mark = ","),
                                               " (", round(tot_white / tot_pop * 100, 2), ")"),
         "Infections in White People (% Infections)" = paste0(ifelse(!is.na(inf_white), format(inf_white, big.mark = ","),
                                                                     "-"), " (", 
                                                              ifelse(!is.na(prop_w_inf), 
                                                                     round(prop_nw_inf*100, 2), "-"), ")"),
         "Deaths in White People (% Deaths)" = paste0(ifelse(!is.na(d_white), format(d_white, big.mark = ","),
                                                             "-"), " (", 
                                                      ifelse(!is.na(prop_w_d), round(prop_w_d*100, 2), "-"), ")"),
         "Infections in People of Unknown Race/Ethinicity (% Infections)" = paste0(ifelse(!is.na(inf_unk), 
                                                                                          format(inf_unk, big.mark = ","),
                                                                                          "-"), " (", 
                                                                                   ifelse(!is.na(inf_unk), 
                                                                                          round(inf_unk/tot_inf*100, 2),
                                                                                          "-"), ")"),
         "Deaths in People of Unknown Race/Ethinicity (% Deaths)" = paste0(ifelse(!is.na(d_unk), 
                                                                                  format(d_unk, big.mark = ","),
                                                                                  "-"), " (", 
                                                                                   ifelse(!is.na(d_unk), 
                                                                                          round(d_unk/tot_d*100, 2),
                                                                                          "-"), ")"),
         "2015 Area Deprivation Index" = round(adi, 2)) %>%
  select((ncol(data_rec) + 1):(ncol(data_rec) + 16))

# sort states alphabetically
comp_tab <- comp_tab[order(comp_tab$State),]
rownames(comp_tab) <- comp_tab$State
comp_tab$State <- NULL

comp_tab %>%
  kable(align = "c", format = "html") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
                full_width = F, font_size = 12) %>%
  footnote(paste0("Data from The COVID Tracking Project (", format(as.Date(data_rec$day[1]), "%B %d, %Y"), ")")) %>%
  scroll_box(height = "400px", fixed_thead = T)
2018 Estimated Total Population Total Infections Total Deaths Total Black (% Population) Infections in Black People (% Infections) Deaths in Black People (% Deaths) Total Latinx (% Population) Infections in Latinx People (% Infections) Deaths in Latinx People (% Deaths) Total White (% Population) Infections in White People (% Infections) Deaths in White People (% Deaths) Infections in People of Unknown Race/Ethinicity (% Infections) Deaths in People of Unknown Race/Ethinicity (% Deaths) 2015 Area Deprivation Index
Alabama 4,903,185 476,067 8,888 1,297,775 (26.47) 78,493 (16.49) 2,020 (22.73) 223,278 (4.55) 11,881 (2.5) 126 (1.42) 3,200,828 (65.28) 151,607 (24.72) 4,241 (47.72) 206,777 (43.43) 2,344 (26.37) 73
Alaska 731,545 56,097 280 24,190 (3.31) 1,444 (2.57) 9 (3.21) 53,212 (7.27) 3,090 (5.51) 12 (4.29) 440,078 (60.16) 17,353 (49.97) 115 (41.07) 10,711 (19.09) 3 (1.07) 30
Arizona 7,278,717 789,245 14,462 325,777 (4.48) 24,127 (3.06) 375 (2.59) 2,310,590 (31.74) 232,163 (29.42) 4,146 (28.67) 3,939,690 (54.13) 290,599 (14.83) 7,069 (48.88) 149,464 (18.94) 1,047 (7.24) 56
Arkansas 3,017,804 309,940 5,174 465,209 (15.42) 48,970 (15.8) 725 (14.01) 236,631 (7.84) 33,163 (10.7) 176 (3.4) 2,173,848 (72.03) 198,051 (23.71) 3,802 (73.48) 38,407 (12.39) 341 (6.59) 75
California 39,512,223 3,362,981 44,995 2,221,363 (5.62) 103,870 (3.09) 2,749 (6.11) 15,574,880 (39.42) 1,415,893 (42.1) 20,125 (44.73) 14,423,748 (36.5) 513,077 (19.15) 13,744 (30.55) 789,918 (23.49) 1,538 (3.42) 20
Colorado 5,758,736 407,961 5,770 233,523 (4.06) 11,790 (2.89) 188 (3.26) 1,256,904 (21.83) 113,454 (27.81) 1,159 (20.09) 3,896,103 (67.66) 165,714 (7.43) 3,688 (63.92) 98,482 (24.14) 366 (6.34) 34
Connecticut 3,565,287 265,496 7,326 368,834 (10.35) 18,189 (6.85) 881 (12.03) 600,955 (16.86) 39,250 (14.78) 726 (9.91) 2,350,123 (65.92) 77,745 (19) 5,197 (70.94) 98,063 (36.94) 116 (1.58) 27
Delaware 973,764 81,461 1,245 214,658 (22.04) 18,468 (22.67) 283 (22.73) 93,391 (9.59) 13,595 (16.69) 70 (5.62) 600,349 (61.65) 38,787 (30.67) 866 (69.56) 4,097 (5.03) 2 (0.16) 37
District of Columbia 705,749 38,348 965 313,290 (44.39) 18,546 (48.36) 717 (74.3) 79,477 (11.26) 8,020 (20.91) 115 (11.92) 264,400 (37.46) 9,864 (73.75) 102 (10.57) 202 (0.53) 19 (1.97) 13
Florida 21,477,737 1,798,280 28,691 3,335,263 (15.53) 218,927 (12.17) 4,351 (15.17) 5,663,860 (26.37) 557,320 (30.99) 6,550 (22.83) 11,436,685 (53.25) 605,295 (19.32) 15,323 (53.41) 288,172 (16.02) 1,227 (4.28) 59
Georgia 10,617,423 780,494 15,421 3,359,449 (31.64) 199,608 (25.57) 4,662 (30.23) 1,048,724 (9.88) 80,632 (10.33) 756 (4.9) 5,523,358 (52.02) 293,572 (32.13) 7,718 (50.05) 155,528 (19.93) 1,955 (12.68) 65
Hawaii 1,415,872 26,584 423 27,780 (1.96) 538 (2.02) 5 (1.18) 150,864 (10.66)
  • (-)
  • (-)
306,622 (21.66) 3,422 (55.94) 36 (8.51) 8,290 (31.18) 18 (4.26) 11
Idaho 1,787,065 166,553 1,791 13,584 (0.76) 972 (0.58) 9 (0.5) 229,490 (12.84) 16,907 (10.15) 177 (9.88) 1,458,277 (81.6) 95,660 (10.4) 1,680 (93.8) 53,570 (32.16) 15 (0.84) 55
Illinois 12,671,821 1,152,995 21,869 1,783,443 (14.07) 105,375 (9.14) 3,513 (16.06) 2,219,882 (17.52) 211,176 (18.32) 3,131 (14.32) 7,702,651 (60.79) 438,120 (14.94) 11,550 (52.81) 331,414 (28.74) 2,810 (12.85) 50
Indiana 6,732,219 643,305 12,001 645,375 (9.59) 42,798 (6.65) 889 (7.41) 489,353 (7.27) 36,948 (5.74) 248 (2.07) 5,278,982 (78.41) 433,344 (18.96) 8,531 (71.09) 87,967 (13.67) 1,028 (8.57) 66
Iowa 3,155,070 326,575 5,174 122,388 (3.88) 9,797 (3) 121 (2.34) 198,550 (6.29) 19,595 (6) 130 (2.51) 2,682,696 (85.03) 192,679 (5.25) 4,804 (92.85) 116,742 (35.75) 110 (2.13) 61
Kansas 2,913,314 284,894 4,303 167,325 (5.74) 13,777 (4.84) 212 (4.93) 356,073 (12.22) 37,932 (13.31) 281 (6.53) 2,196,863 (75.41) 200,542 (11.39) 3,325 (77.27) 51,889 (18.21) 456 (10.6) 65
Kentucky 4,467,673 383,032 4,175 367,899 (8.23) 24,919 (6.51) 323 (7.74) 174,706 (3.91) 16,231 (4.24) 66 (1.58) 3,759,463 (84.15) 249,944 (11.51) 3,387 (81.13) 88,983 (23.23) 333 (7.98) 71
Louisiana 4,648,794 414,687 9,212 1,500,991 (32.29) 130,870 (31.56) 3,567 (38.72) 246,972 (5.31)
  • (-)
217 (2.36) 2,715,282 (58.41) 208,107 (43.93) 5,504 (59.75) 24,402 (5.88) 16 (0.17) 66
Maine 1,344,212 41,883 641 21,554 (1.6) 2,155 (5.15) 10 (1.56) 23,700 (1.76) 642 (1.53) 6 (0.94) 1,249,597 (92.96) 32,812 (9.07) 541 (84.4) 5,273 (12.59) 78 (12.17) 51
Maryland 6,045,680 366,666 7,446 1,810,267 (29.94) 104,313 (28.45) 2,574 (34.57) 643,822 (10.65) 59,637 (16.26) 694 (9.32) 3,025,781 (50.05) 125,916 (35.35) 3,796 (50.98) 51,485 (14.04) 53 (0.71) 26
Massachusetts 6,892,503 548,946 15,207 505,757 (7.34) 33,334 (6.07) 1,025 (6.74) 854,907 (12.4) 118,232 (21.54) 1,181 (7.77) 4,897,800 (71.06) 203,625 (16.8) 11,290 (74.24) 134,884 (24.57) 69 (0.45) 22
Michigan 9,986,857 624,970 15,939 1,374,886 (13.77) 70,580 (11.29) 3,543 (22.23) 528,205 (5.29) 32,312 (5.17) 443 (2.78) 7,464,662 (74.74) 357,461 (22.8) 10,668 (66.93) 125,021 (20) 833 (5.23) 67
Minnesota 5,639,632 469,905 6,319 382,621 (6.78) 37,655 (8.01) 307 (4.86) 315,130 (5.59) 42,229 (8.99) 191 (3.02) 4,460,149 (79.09) 311,910 (16.42) 5,255 (83.16) 38,604 (8.22) 116 (1.84) 43
Mississippi 2,976,149 283,753 6,367 1,113,643 (37.42) 93,790 (33.05) 2,508 (39.39) 100,110 (3.36) 6,223 (2.19) 55 (0.86) 1,678,232 (56.39) 133,880 (37.71) 3,425 (53.79) 36,645 (12.91) 160 (2.51) 79
Missouri 6,137,428 468,189 7,161 711,702 (11.6) 40,704 (8.69) 927 (12.95) 268,708 (4.38) 16,548 (3.53) 150 (2.09) 4,857,512 (79.15) 241,154 (23.28) 4,455 (62.21) 118,024 (25.21) 496 (6.93) 65
Montana 1,068,778 96,595 1,320 5,766 (0.54) 433 (0.45) 1 (0.08) 43,289 (4.05) 2,190 (2.27) 23 (1.74) 917,711 (85.87) 59,999 (15.99) 808 (61.21) 21,148 (21.89) 193 (14.62) 50
Nebraska 1,934,408 194,632 1,986 94,830 (4.9) 5,056 (2.6) 56 (2.82) 219,645 (11.35) 23,021 (11.83) 149 (7.5) 1,513,172 (78.22) 102,303 (6.72) 1,275 (64.2) 79,259 (40.72) 577 (29.05) 60
Nevada 3,080,156 285,795 4,582 286,466 (9.3) 21,456 (7.51) 416 (9.08) 900,600 (29.24) 100,700 (35.24) 1,072 (23.4) 1,483,933 (48.18) 104,419 (28.23) 2,510 (54.78) 3 (0) 4 (0.09) 52
New Hampshire 1,359,711 69,247 1,116 20,054 (1.47) 921 (1.33) 12 (1.08) 54,589 (4.01) 3,854 (5.57) 23 (2.06) 1,220,437 (89.76) 34,048 (5.98) 851 (76.25) 27,202 (39.28) 211 (18.91) 33
New Jersey 8,882,190 732,674 22,250 1,148,564 (12.93) 61,809 (8.44) 3,513 (15.79) 1,856,844 (20.91) 144,469 (19.72) 4,063 (18.26) 4,850,995 (54.61) 235,798 (25) 11,932 (53.63) 169,204 (23.09) 880 (3.96) 22
New Mexico 2,096,829 178,790 3,461 39,583 (1.89) 2,181 (1.22) 47 (1.36) 1,032,942 (49.26) 79,329 (44.37) 1,329 (38.4) 772,630 (36.85) 29,267 (18.47) 999 (28.86) 37,173 (20.79) 72 (2.08) 61
New York 19,453,561 1,494,187 45,567 2,813,773 (14.46)
  • (-)
9,039 (19.84) 3,751,058 (19.28)
  • (-)
9,328 (20.47) 10,755,420 (55.29)
  • (-)
15,843 (34.77) 1,494,187 (100) 5,972 (13.11) 27
North Carolina 10,488,084 805,898 10,181 2,240,609 (21.36) 134,169 (16.65) 2,378 (23.36) 1,025,830 (9.78) 111,968 (13.89) 722 (7.09) 6,567,102 (62.61) 405,738 (30.74) 6,327 (62.15) 152,410 (18.91) 627 (6.16) 61
North Dakota 762,062 98,214 1,459 24,866 (3.26) 2,812 (2.86)
  • (-)
31,532 (4.14)
  • (-)
  • (-)
637,513 (83.66) 53,795 (9.85)
  • (-)
34,744 (35.38) 1,459 (100) 55
Ohio 11,689,100 928,631 11,856 1,481,352 (12.67) 96,827 (10.43) 1,390 (11.72) 470,462 (4.02) 28,929 (3.12) 225 (1.9) 9,169,092 (78.44) 543,623 (19.89) 9,221 (77.77) 200,266 (21.57) 783 (6.6) 65
Oklahoma 3,956,971 407,724 3,900 293,458 (7.42) 21,974 (5.39) 235 (6.03) 438,110 (11.07) 40,288 (9.88) 189 (4.85) 2,572,388 (65.01) 232,435 (20.41) 2,824 (72.41) 92,075 (22.58) 343 (8.79) 71
Oregon 4,217,737 148,475 2,044 82,610 (1.96) 3,586 (2.42) 44 (2.15) 566,847 (13.44) 39,360 (26.51) 194 (9.49) 3,165,484 (75.05) 71,463 (33.47) 1,360 (66.54) 27,312 (18.4) 321 (15.7) 38
Pennsylvania 12,801,989 880,291 22,745 1,394,630 (10.89) 74,669 (8.48) 2,706 (11.9) 1,000,150 (7.81) 54,796 (6.22) 906 (3.98) 9,693,578 (75.72) 442,754 (11.19) 18,444 (81.09) 339,008 (38.51) 506 (2.22) 54
Rhode Island 1,059,361 119,893 2,259 64,966 (6.13) 7,408 (6.18) 100 (4.43) 172,644 (16.3) 29,599 (24.69) 188 (8.32) 755,931 (71.36) 55,408 (10.96) 1,658 (73.4) 21,748 (18.14) 275 (12.17) 38
South Carolina 5,148,714 473,140 7,742 1,360,342 (26.42) 97,238 (20.55) 2,150 (27.77) 307,118 (5.96) 26,590 (5.62) 193 (2.49) 3,277,542 (63.66) 211,672 (32.72) 4,185 (54.06) 106,675 (22.55) 1,036 (13.38) 68
South Dakota 884,659 109,580 1,815 19,447 (2.2) 2,443 (2.23) 7 (0.39) 37,351 (4.22) 4,002 (3.65) 22 (1.21) 721,053 (81.51) 81,419 (17.07) 1,434 (79.01) 5,452 (4.98) 74 (4.08) 62
Tennessee 6,829,174 750,409 10,731 1,141,790 (16.72) 95,163 (12.68) 1,840 (17.15) 391,382 (5.73) 46,623 (6.21) 289 (2.69) 5,019,540 (73.5) 455,259 (21.73) 8,039 (74.91) 132,084 (17.6) 423 (3.94) 66
Texas 28,995,881 2,517,453 39,386 3,501,610 (12.08) 12,018 (0.48) 3,629 (9.21) 11,525,578 (39.75) 27,129 (1.08) 17,824 (45.25) 11,950,774 (41.22) 23,170 (0.53) 15,719 (39.91) 2,453,829 (97.47) 1,279 (3.25) 65
Utah 3,205,958 357,339 1,765 38,056 (1.19) 4,768 (1.33) 17 (0.96) 462,051 (14.41) 74,997 (20.99) 264 (14.96) 2,493,759 (77.79) 226,664 (14.14) 1,187 (67.25) 33,871 (9.48) 104 (5.89) 38
Vermont 623,989 13,122 187 8,152 (1.31) 550 (4.19) 1 (0.53) 12,719 (2.04) 268 (2.04) 0 (-) 577,539 (92.56) 10,400 (10.98) 166 (88.77) 1,281 (9.76) 12 (6.42) 39
Virginia 8,535,519 537,319 6,932 1,632,226 (19.12) 89,047 (16.57) 1,599 (23.07) 834,422 (9.78) 72,244 (13.45) 489 (7.05) 5,227,904 (61.25) 212,179 (23.75) 4,241 (61.18) 125,285 (23.32) 203 (2.93) 36
Washington 7,614,893 324,706 4,603 304,224 (4) 10,455 (3.22) 145 (3.15) 991,721 (13.02) 60,102 (18.51) 549 (11.93) 5,140,589 (67.51) 89,803 (11.41) 3,260 (70.82) 137,746 (42.42) 12 (0.26) 31
West Virginia 1,792,147 125,951 2,175 62,775 (3.5) 3,214 (2.55) 36 (1.66) 31,162 (1.74)
  • (-)
  • (-)
1,648,512 (91.99) 81,807 (4.42) 1,540 (70.8) 38,580 (30.63) 573 (26.34) 74
Wisconsin 5,822,434 603,255 6,705 372,273 (6.39) 39,056 (6.47) 443 (6.61) 413,208 (7.1) 61,328 (10.17) 376 (5.61) 4,709,065 (80.88) 460,947 (15.55) 5,751 (85.77) 48,499 (8.04) 198 (2.95) 51
Wyoming 578,759 52,874 647 6,520 (1.13) 408 (0.77) 1 (0.15) 58,609 (10.13) 4,056 (7.67) 50 (7.73) 484,380 (83.69) 27,084 (8.5) 572 (88.41) 20,726 (39.2) 8 (1.24) 40
Note:
Data from The COVID Tracking Project (February 10, 2021)
comp_fig <- data.frame(state = fct_rev(as.factor(states$NAME)),
                       adi = states$adi,
                       t_inf = states$inf_tot,
                       t_death = states$death_tot,
                       w_inf = states$inf_w,
                       w_death = states$deaths_w,
                       b_inf = states$inf_b,
                       b_death = states$deaths_b,
                       l_inf = states$inf_l,
                       l_death = states$deaths_l,
                       prop_nw_inf = states$inf_prop_nw,
                       prop_nw_death = states$deaths_prop_nw,
                       prop_nw = states$prop_nw,
                       prop_w_inf = states$inf_prop_w,
                       prop_w_death = states$deaths_prop_w,
                       prop_w = states$prop_w,
                       prop_b_inf = states$inf_prop_b,
                       prop_b_death = states$deaths_prop_b,
                       prop_b = states$prop_b,
                       prop_l_inf = states$inf_prop_l,
                       prop_l_death = states$deaths_prop_l,
                       prop_l = states$prop_l)

# make it long form
comp_fig_plot <- comp_fig %>% gather("Type", "value", prop_nw_inf:prop_l)

# add "group" column
comp_fig_plot$group <- NA
for(i in 1:nrow(comp_fig_plot)){
  if(str_detect(comp_fig_plot$Type[i], "_nw")){
    comp_fig_plot$group[i] <- "Non-White"
  }
  if(str_detect(comp_fig_plot$Type[i], "_w")){
    comp_fig_plot$group[i] <- "White"
  }
  if(str_detect(comp_fig_plot$Type[i], "_b")){
    comp_fig_plot$group[i] <- "Black"
  }
  if(str_detect(comp_fig_plot$Type[i], "_l")){
    comp_fig_plot$group[i] <- "Latinx"
  }
}

# add rounded colum for label
comp_fig_plot$proportion <- round(comp_fig_plot$value, 2)

# make nicer labels
comp_fig_plot$type <- NA
comp_fig_plot[comp_fig_plot$Type == "prop_nw_inf" |
                comp_fig_plot$Type == "prop_w_inf" |
                comp_fig_plot$Type == "prop_b_inf" |
                comp_fig_plot$Type == "prop_l_inf", ]$type <- "Proportion of Infections"
comp_fig_plot[comp_fig_plot$Type == "prop_nw_death" |
                comp_fig_plot$Type == "prop_w_death" |
                comp_fig_plot$Type == "prop_b_death" |
                comp_fig_plot$Type == "prop_l_death", ]$type <- "Proportion of Deaths"
comp_fig_plot[comp_fig_plot$Type == "prop_nw" |
                comp_fig_plot$Type == "prop_w" |
                comp_fig_plot$Type == "prop_b" |
                comp_fig_plot$Type == "prop_l", ]$type <- "Proportion of People"

# remove non-white
comp_fig_plot<- comp_fig_plot[comp_fig_plot$group != "Non-White", ]

# add column with cases
# little function to do it
sort_cases_comp <- function(case_type, i) {
  if(str_detect(case_type, "inf") & str_detect(case_type, "_w_")) {return(comp_fig_plot$w_inf[i])}
  if(str_detect(case_type, "inf") & str_detect(case_type, "_b")) {return(comp_fig_plot$b_inf[i])}
  if(str_detect(case_type, "inf") & str_detect(case_type, "_l")) {return(comp_fig_plot$l_inf[i])}
  if(str_detect(case_type, "_d") & str_detect(case_type, "_w")) {return(comp_fig_plot$w_death[i])}
  if(str_detect(case_type, "_d") & str_detect(case_type, "_b")) {return(comp_fig_plot$b_death[i])}
  if(str_detect(case_type, "_d") & str_detect(case_type, "_l")) {return(comp_fig_plot$l_death[i])}
  if(str_detect(case_type, "prop")){return(paste0("total infections: ", comp_fig_plot$t_inf[i], "\n", 
                                               "total deaths: ", comp_fig_plot$t_death[i]))}
}

# loop through (this is a slow way to do it...)
for(i in 1:nrow(comp_fig_plot)){
  comp_fig_plot$cases[i] <- sort_cases_comp(comp_fig_plot$Type[i], i)
}

d1 <- ggplot(comp_fig_plot) + 
  geom_point(aes(x = state, y = value, shape = type, label = proportion, color = adi, 
                   text = paste0(ifelse(str_detect(type, "Inf"), "infections: ", 
                                        ifelse(str_detect(type, "Dea"), "deaths:", "")), cases))) + 
  facet_wrap(~group) +
  coord_flip() + 
  ggtitle(paste0("Infection and Death Proportions by Race/Ethnicity (", format(as.Date(data_rec$day[1]), "%B %d, %Y"), ")")) + 
  ylab("") + 
  xlab("") + 
  ylim(0, 1) +
  scale_color_gradient(low = "#ffffcc", high = "#bd0026") +
  theme(plot.title = element_text(hjust = 0.5), 
        panel.background = element_rect(fill = NA), 
        panel.grid = element_line(color = "grey75"), 
        legend.key = element_rect(fill = NA), 
        legend.position = "top", 
        legend.title = element_blank()) +
  labs(color = "Area Deprivation Index, 2015")

ggplotly(d1, width = 1100, height = 1100, tooltip = c("x", "shape", "label", "colour", "text"))

Maps from February 10, 2021

We can also compare the most recent The COVID Tracking Project data with the 2015 Area Deprivation Index and the 2019 estimated proportion of Black, Latinx, and White people in each state. The left column shows the most recent infection difference data (February 10, 2021) for Black, Latinx, and White people as compared to their population proportion in each state (right column). The more red a state is, the greater the difference of Black, Latinx, and White COVID-19 infections to Black, Latinx, and White people in a state.

# exclude AK and HI for map
states_cont <- states[states$STUSPS %nin% c("AK", "HI"), ]

# create common theme
theme_map <- theme(plot.title = element_text(hjust = 0.5), legend.position = "right", 
        legend.key = element_rect(fill = NA), legend.text = element_text(size = 6.5), 
        legend.key.size = unit(0.5,"line"), axis.text.x = element_blank(), 
        axis.text.y = element_blank(), axis.ticks.x = element_blank(), 
        axis.ticks.y = element_blank(), axis.title.x = element_blank(), 
        axis.title.y = element_blank(), panel.background = element_blank(),
        panel.grid.major = element_blank(), panel.grid.minor = element_blank())

p1 <- ggplot() + geom_sf(data = states_cont, aes(fill = inf_diff_b)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) + 
  ggtitle(paste("COVID-19 Infection Difference*\nin Black Individuals\n",
                format(data_rec$day[1], "%B %d, %Y"))) +
  labs(fill = "Difference") +
  theme_map
p2 <- ggplot() + geom_sf(data = states_cont, aes(fill = prop_b)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  ggtitle("Estimated Proportion of Black Individuals, 2019") +
  labs(fill = "Proportion") +
  theme_map
p3 <- ggplot() + geom_sf(data = states_cont, aes(fill = inf_diff_l)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) + 
  ggtitle(paste("COVID-19 Infection Difference*\nin Latinx Individuals\n",
                format(data_rec$day[1], "%B %d, %Y"))) +
  labs(fill = "Difference") +
  theme_map
p4 <- ggplot() + geom_sf(data = states_cont, aes(fill = prop_l)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  ggtitle("Estimated Proportion of Latinx Individuals, 2019") +
  labs(fill = "Proportion") +
  theme_map
p5 <- ggplot() + geom_sf(data = states_cont, aes(fill = inf_diff_w)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) + 
  ggtitle(paste("COVID-19 Infection Difference*\nin White Individuals\n",
                format(data_rec$day[1], "%B %d, %Y"))) +
  labs(fill = "Difference") +
  theme_map
p6 <- ggplot() + geom_sf(data = states_cont, aes(fill = prop_w)) +
  scale_fill_distiller(palette = "YlOrRd", direction = 1) +
  ggtitle("Estimated Proportion of White Individuals, 2019") +
  labs(fill = "Proportion") +
  theme_map

gridExtra::grid.arrange(p1, p2, p3, p4, p5, p6, 
                        nrow = 3,
                        bottom = text_grob("*Difference in Proportion of COVID-19 Infections\nto Population Proportion",
                                           just = "left"))

Session Info

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rmdformats_1.0.0    ggpubr_0.4.0        sf_0.9-6           
 [4] circlize_0.4.11     knitr_1.30          icon_0.1.0         
 [7] openintro_2.0.0     usdata_0.1.0        cherryblossom_0.1.0
[10] airports_0.1.0      kableExtra_1.3.1    plotly_4.9.2.1     
[13] rms_6.0-1           SparseM_1.78        Hmisc_4.4-1        
[16] Formula_1.2-4       survival_3.2-7      lattice_0.20-41    
[19] forcats_0.5.0       stringr_1.4.0       dplyr_1.0.2        
[22] purrr_0.3.4         readr_1.4.0         tidyr_1.1.2        
[25] tibble_3.0.4        ggplot2_3.3.2       tidyverse_1.3.0    

loaded via a namespace (and not attached):
  [1] TH.data_1.0-10      colorspace_2.0-0    ggsignif_0.6.0     
  [4] rio_0.5.16          ellipsis_0.3.1      class_7.3-17       
  [7] htmlTable_2.1.0     GlobalOptions_0.1.2 base64enc_0.1-3    
 [10] fs_1.5.0            rstudioapi_0.13     farver_2.0.3       
 [13] MatrixModels_0.4-1  fansi_0.4.1         mvtnorm_1.1-1      
 [16] lubridate_1.7.9.2   xml2_1.3.2          codetools_0.2-18   
 [19] splines_4.0.3       jsonlite_1.7.2      broom_0.7.2        
 [22] cluster_2.1.0       dbplyr_2.0.0        png_0.1-7          
 [25] compiler_4.0.3      httr_1.4.2          backports_1.2.0    
 [28] assertthat_0.2.1    Matrix_1.2-18       lazyeval_0.2.2     
 [31] cli_2.2.0           htmltools_0.5.0     quantreg_5.75      
 [34] tools_4.0.3         gtable_0.3.0        glue_1.4.2         
 [37] Rcpp_1.0.5          carData_3.0-4       cellranger_1.1.0   
 [40] vctrs_0.3.5         nlme_3.1-150        conquer_1.0.2      
 [43] crosstalk_1.1.0.1   xfun_0.19           openxlsx_4.2.3     
 [46] rvest_0.3.6         lifecycle_0.2.0     rstatix_0.6.0      
 [49] polspline_1.1.19    MASS_7.3-53         zoo_1.8-8          
 [52] scales_1.1.1        hms_0.5.3           sandwich_3.0-0     
 [55] RColorBrewer_1.1-2  curl_4.3            yaml_2.2.1         
 [58] gridExtra_2.3       rpart_4.1-15        latticeExtra_0.6-29
 [61] stringi_1.5.3       highr_0.8           e1071_1.7-4        
 [64] checkmate_2.0.0     zip_2.1.1           shape_1.4.5        
 [67] rlang_0.4.10        pkgconfig_2.0.3     matrixStats_0.57.0 
 [70] evaluate_0.14       labeling_0.4.2      htmlwidgets_1.5.2  
 [73] tidyselect_1.1.0    magrittr_2.0.1      bookdown_0.21      
 [76] R6_2.5.0            generics_0.1.0      multcomp_1.4-15    
 [79] DBI_1.1.0           pillar_1.4.7        haven_2.3.1        
 [82] foreign_0.8-80      withr_2.3.0         units_0.6-7        
 [85] abind_1.4-5         nnet_7.3-14         car_3.0-10         
 [88] modelr_0.1.8        crayon_1.3.4        KernSmooth_2.23-18 
 [91] rmarkdown_2.5       jpeg_0.1-8.1        grid_4.0.3         
 [94] readxl_1.3.1        data.table_1.13.2   reprex_0.3.0       
 [97] digest_0.6.27       classInt_0.4-3      webshot_0.5.2      
[100] munsell_0.5.0       viridisLite_0.3.0