We examine various correlates of community resilience estimates (CREs) to understand and assess their comparative performance across different quantitative models. The CRE data sets were developed by the U.S. Census Bureau to assess the social vulnerability and resilience of neighborhoods across the United States in response to disasters, such as COVID-19, extreme weather events, and economic shocks. These estimates measure the capacity of individuals and households within a community to absorb, endure, and recover from external stresses. The CREs combine granular data from the American Community Survey (ACS) and the Population Estimates Program (PEP) to identify social and economic vulnerabilities at a detailed geographic level. We argue, through the concept of the proxy war that estimates such as the CREs provide useful tools to help add substantive insights over the use of single indicators that are largely used to split individuals into a small set of discrete groupings. We model, however, the importance of maintaining the critical language and analyses that serve the historical linkages of ideas to their maturation in the research literature, such as would be found in qualitative studies. We assess a set of models of social vulnerability and examine their relationships to a series of single indicators.
In this draft case analysis, we explore relationships in DC and NC.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(tidycensus)
library(sf)
library(tidyverse)
library(ggplot2)
library(weights)
library(dplyr)
library(stringr)
Information on the community resilience esitmates datasets can be found online here. Update (Feb 6 2025), the CRE data was removed from the Census site, so we will not have access to 2023 data for now; modifying the code below to provide access to the 2022 data.
Pulling in 2022 CRE data directly from GitHub.
# I am adding a link to the raw data on Github
link <- "https://raw.githubusercontent.com/quant-shop/census/refs/heads/main/data/cre.2022.tract.csv"
df <- read.csv(link)
str(df)
FIPS codes by state.
## Warning: 'xfun::attr()' is deprecated.
## Use 'xfun::attr2()' instead.
## See help("Deprecated")
AL (01) | HI (15) | MI (26) | NC (37) | UT (49) |
AK (02) | ID (16) | MN (27) | ND (38) | VT (50) |
AZ (04) | IL (17) | MS (28) | OH (39) | VA (51) |
AR (05) | IN (18) | MO (29) | OK (40) | WA (53) |
CA (06) | IA (19) | MT (30) | OR (41) | WV (54) |
CO (08) | KS (20) | NE (31) | PA (42) | WI (55) |
CT (09) | KY (21) | NV (32) | RI (44) | WY (56) |
DE (10) | LA (22) | NH (33) | SC (45) | |
DC (11) | ME (23) | NJ (34) | SD (46) | |
FL (12) | MD (24) | NM (35) | TN (47) | |
GA (13) | MA (25) | NY (36) | TX (48) |
Adding the state codes for DC and NC. You should change this to match your state of analysis.
# CRE data for DC
cre.2022.dc <- df %>%
filter(STATE == "11") # FIPS code for DC
# CRE data for NC
cre.2022.nc <- df %>%
filter(STATE == "37") # FIPS code for NC
Given the size of DC, we will analyze all of DC and the Anacostia neighborhood in Southeast (SE).
# CRE data for Anacostia, DC
cre.2022.anacostia <- df %>%
filter(STATE == "11" & COUNTY == "001" & TRACT %in% c("7503", "7504", "7903"))
To find census tract information for any state, you can use the U.S. Census Bureau’s official website (census.gov). Specifically, their “Geography” section provides access to TIGER/Line Shapefiles and Reference Maps, which contain detailed census tract data for all states and can be downloaded or viewed online.
In the case of NC, we will focus on Mecklenburg County.
# CRE data for Mecklenburg County, NC
cre.2022.mecklenburg <- df %>%
filter(STATE == "37" & COUNTY == "119") # FIPS code for NC and Mecklenburg County
You will need to View
the state-level df
to
examine county-level info. The county numbers in FIPS codes are derived
from the alphabetical order of counties within each state. County FIPS
codes are assigned in alphabetic order, counting by odd numbers. The
first county alphabetically in a state is always 001, the second is 003,
the third is 005, and so on. This method leaves space between each
county for insertion of renamed or newly created counties while
preserving the alphabetic ordering.
I am now adding a draft list of potential census variables that we would like to assess in relation to the CRE.
# draft list of CRE correlates
cre_correlates_dc <- get_acs(
geography = "tract",
state = "DC",
year = 2023,
survey = "acs5",
variables = c(
median_income = "B19013_001", # Median household income in the past 12 months
poverty_rate = "B17001_002", # Number of people below poverty level
unemployment_rate = "B23025_005", # Number of civilians (16 years and over) unemployed
no_health_insurance = "B27010_033", # Number of people with no health insurance coverage
disability_status = "B18101_001", # Total civilian non-institutionalized population for whom disability status is determined
education_less_than_hs = "B15003_002", # Population 25 years and over with less than 9th grade education
median_age = "B01002_001", # Median age
housing_cost_burden = "B25070_010", # Housing units spending 50% or more of income on rent
no_vehicle = "B08201_002", # Households with no vehicle available
black_population = "B02001_003", # Black or African American alone population
median_rent = "B25058_001" # Median contract rent
),
summary_var = "B02001_001", # Total population (for calculating proportions)
output = "wide",
geometry = FALSE # set to FALSE since CRE data doesn't have geometry
)
## Getting data from the 2019-2023 5-year ACS
## Warning: • You have not set a Census API key. Users without a key are limited to 500
## queries per day and may experience performance limitations.
## ℹ For best results, get a Census API key at
## http://api.census.gov/data/key_signup.html and then supply the key to the
## `census_api_key()` function to use it throughout your tidycensus session.
## This warning is displayed once per session.
# Calculate proportion of Black population
cre_correlates_dc <- cre_correlates_dc %>%
mutate(prop_black = black_populationE / summary_est)
# Print first few rows
head(cre_correlates_dc)
## # A tibble: 6 × 27
## GEOID NAME median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 11001000101 Census … 135708 43153 37 41
## 2 11001000102 Census … 159583 70430 108 109
## 3 11001000201 Census … NA NA 64 5
## 4 11001000202 Census … 152059 60482 465 180
## 5 11001000300 Census … 174470 21374 372 173
## 6 11001000400 Census … 188929 71412 242 115
## # ℹ 21 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
# draft list of CRE correlates for Mecklenburg County, NC
cre_correlates_mecklenburg <- get_acs(
geography = "tract",
state = "NC",
county = "Mecklenburg",
year = 2023,
survey = "acs5",
variables = c(
median_income = "B19013_001", # Median household income in the past 12 months
poverty_rate = "B17001_002", # Number of people below poverty level
unemployment_rate = "B23025_005", # Number of civilians (16 years and over) unemployed
no_health_insurance = "B27010_033", # Number of people with no health insurance coverage
disability_status = "B18101_001", # Total civilian non-institutionalized population for whom disability status is determined
education_less_than_hs = "B15003_002", # Population 25 years and over with less than 9th grade education
median_age = "B01002_001", # Median age
housing_cost_burden = "B25070_010", # Housing units spending 50% or more of income on rent
no_vehicle = "B08201_002", # Households with no vehicle available
black_population = "B02001_003", # Black or African American alone population
median_rent = "B25058_001" # Median contract rent
),
summary_var = "B02001_001", # Total population (for calculating proportions)
output = "wide",
geometry = FALSE # set to FALSE since CRE data doesn't have geometry
)
## Getting data from the 2019-2023 5-year ACS
# Calculate proportion of Black population
cre_correlates_mecklenburg <- cre_correlates_mecklenburg %>%
mutate(prop_black = black_populationE / summary_est)
# Print first few rows
head(cre_correlates_mecklenburg)
## # A tibble: 6 × 27
## GEOID NAME median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 37119000101 Census … 106601 16694 170 155
## 2 37119000102 Census … 129693 16100 195 153
## 3 37119000103 Census … 135533 25153 0 14
## 4 37119000104 Census … 109805 8019 158 94
## 5 37119000301 Census … 97411 16618 69 34
## 6 37119000302 Census … 78341 13225 110 70
## # ℹ 21 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
We then merge the CRE estimates with the CRE correlates, which will provide us with some talking points as we consider the technical documentation.
We first need to modify the GEO_ID in the CRE data by extracting some of the characters.
# Modify the GEO_ID in cre.2022.dc
cre.2022.dc_modified <- cre.2022.dc %>%
mutate(GEOID_modified = str_extract(GEO_ID, "\\d+$")) %>%
mutate(GEOID_modified = str_sub(GEOID_modified, -11)) %>%
relocate(GEOID_modified)
We then join the modified data with the correlates.
# Join data
merged_dc <- cre_correlates_dc %>%
left_join(cre.2022.dc_modified, by = c("GEOID" = "GEOID_modified"))
# Print the first few rows of the merged dataset
head(merged_dc)
## # A tibble: 6 × 46
## GEOID NAME.x median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 11001000101 Census … 135708 43153 37 41
## 2 11001000102 Census … 159583 70430 108 109
## 3 11001000201 Census … NA NA 64 5
## 4 11001000202 Census … 152059 60482 465 180
## 5 11001000300 Census … 174470 21374 372 173
## 6 11001000400 Census … 188929 71412 242 115
## # ℹ 40 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
We first need to modify the GEO_ID in the CRE data by extracting some of the characters.
# Modify the GEO_ID in cre.2022.nc
cre.2022.nc.mecklenburg_modified <- cre.2022.mecklenburg %>%
mutate(GEOID_modified = str_extract(GEO_ID, "\\d+$")) %>%
mutate(GEOID_modified = str_sub(GEOID_modified, -11)) %>%
relocate(GEOID_modified)
We then join the modified data with the correlates.
# Join data
merged_nc_mecklenburg <- cre_correlates_mecklenburg %>%
left_join(cre.2022.nc.mecklenburg_modified, by = c("GEOID" = "GEOID_modified"))
# Print the first few rows of the merged dataset
head(merged_nc_mecklenburg)
## # A tibble: 6 × 46
## GEOID NAME.x median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 37119000101 Census … 106601 16694 170 155
## 2 37119000102 Census … 129693 16100 195 153
## 3 37119000103 Census … 135533 25153 0 14
## 4 37119000104 Census … 109805 8019 158 94
## 5 37119000301 Census … 97411 16618 69 34
## 6 37119000302 Census … 78341 13225 110 70
## # ℹ 40 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
We explore some initial visualizations to test initial hypotheses.
ggplot(merged_dc, aes(x = median_incomeE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "DC: Median Income vs. Community Resilience Estimate",
x = "Median Income",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
ggplot(merged_nc_mecklenburg, aes(x = median_incomeE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "Mecklenburg County, NC: Median Income vs. Community Resilience Estimate",
x = "Median Income",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For DC
ggplot(merged_dc, aes(x = median_rentE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "DC: Median Rent vs. Community Resilience Estimate",
x = "Median Rent Estimate",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For Mecklenburg County, NC
ggplot(merged_nc_mecklenburg, aes(x = median_rentE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "Mecklenburg County, NC: Median Rent vs. Community Resilience Estimate",
x = "Median Rent Estimate",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For DC
ggplot(merged_dc, aes(x = prop_black, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "DC: Proportion of Black Population vs. Community Resilience Estimate",
x = "Proportion of Black Population",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For Mecklenburg County, NC
ggplot(merged_nc_mecklenburg, aes(x = prop_black, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "Mecklenburg County, NC: Proportion of Black Population vs. Community Resilience Estimate",
x = "Proportion of Black Population",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
ggplot(df, aes(x = median_incomeE)) +
geom_point(aes(y = PRED3_PE, color = "CRE"), alpha = 0.5) +
geom_point(aes(y = percent_black, color = "Black"), alpha = 0.5) +
geom_smooth(aes(y = PRED3_PE, color = "CRE"), method = "lm", se = FALSE) +
geom_smooth(aes(y = percent_black, color = "Black"), method = "lm", se = FALSE) +
scale_color_manual(values = c("CRE" = "red", "Black" = "blue")) +
labs(title = "Median Income vs. CRE and % Black Population in DC",
x = "Median Income Estimate",
y = "Percentage",
color = "Variable") +
theme_minimal()
Non-linear relationship between the variables.
ggplot(df, aes(x = estimate)) +
geom_point(aes(y = PRED3_PE, color = "CRE"), alpha = 0.5) +
geom_point(aes(y = percent_black, color = "Black"), alpha = 0.5) +
geom_smooth(aes(y = PRED3_PE, color = "CRE"), method = "loess", se = FALSE) +
geom_smooth(aes(y = percent_black, color = "% Black"), method = "loess", se = FALSE) +
scale_color_manual(values = c("CRE" = "red", "Black" = "blue")) +
labs(title = "Median Rent vs. CRE and % Black Population in DC (Non-linear)",
x = "Median Rent Estimate",
y = "Percentage",
color = "Variable") +
theme_minimal()