We examine various correlates of community resilience estimates (CREs) to understand and assess their comparative performance across different quantitative models. The CRE datasets were developed by the U.S. Census Bureau to assess the social vulnerability and resilience of neighborhoods across the United States in response to disasters, such as COVID-19, extreme weather events, and economic shocks. These estimates measure the capacity of individuals and households within a community to absorb, endure, and recover from external stresses. The CREs combine granular data from the American Community Survey (ACS) and the Population Estimates Program (PEP) to identify social and economic vulnerabilities at a detailed geographic level.
In this case analysis, we explore relationships in DC and NC.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(tidycensus)
library(sf)
library(tidyverse)
library(ggplot2)
library(weights)
library(dplyr)
library(stringr)
Information on the community resilience esitmates datasets can be found online here. Update (Feb 6 2025), the CRE data was removed from the Census site, so we will not have access to 2023 data for now; modifying the code below to provide access to the 2022 data.
Pulling in 2022 CRE data directly from GitHub.
# I am adding a link to the raw data on Github
link <- "https://raw.githubusercontent.com/quant-shop/census/refs/heads/main/data/cre.2022.tract.csv"
df <- read.csv(link)
str(df)
## 'data.frame': 84415 obs. of 19 variables:
## $ GEO_ID : chr "1400000US01001020100" "1400000US01001020200" "1400000US01001020300" "1400000US01001020400" ...
## $ STATE : int 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY : int 1 1 1 1 1 1 1 1 1 1 ...
## $ TRACT : int 20100 20200 20300 20400 20501 20502 20503 20600 20700 20801 ...
## $ NAME : chr "Census Tract 201, Autauga County, Alabama" "Census Tract 202, Autauga County, Alabama" "Census Tract 203, Autauga County, Alabama" "Census Tract 204, Autauga County, Alabama" ...
## $ WATER_TRACT: int NA NA NA NA NA NA NA NA NA NA ...
## $ POPUNI : int 1797 1984 3278 4297 4393 3364 3680 3807 3470 3163 ...
## $ PRED0_E : int 545 751 1283 1836 2002 1268 1586 1787 1399 927 ...
## $ PRED0_M : num 218 259 425 501 504 ...
## $ PRED0_PE : num 30.3 37.9 39.1 42.7 45.6 ...
## $ PRED0_PM : num 12.1 13.1 13 11.7 11.5 ...
## $ PRED12_E : int 912 850 1327 1736 1622 1578 1444 1088 1162 1393 ...
## $ PRED12_M : num 230 266 424 499 500 ...
## $ PRED12_PE : num 50.8 42.8 40.5 40.4 36.9 ...
## $ PRED12_PM : num 12.8 13.4 12.9 11.6 11.4 ...
## $ PRED3_E : int 340 383 668 725 769 518 650 932 909 843 ...
## $ PRED3_M : num 199 222 338 395 389 ...
## $ PRED3_PE : num 18.9 19.3 20.4 16.9 17.5 ...
## $ PRED3_PM : num 11.1 11.2 10.31 9.2 8.86 ...
FIPS codes by state.
AL (01) | HI (15) | MI (26) | NC (37) | UT (49) |
AK (02) | ID (16) | MN (27) | ND (38) | VT (50) |
AZ (04) | IL (17) | MS (28) | OH (39) | VA (51) |
AR (05) | IN (18) | MO (29) | OK (40) | WA (53) |
CA (06) | IA (19) | MT (30) | OR (41) | WV (54) |
CO (08) | KS (20) | NE (31) | PA (42) | WI (55) |
CT (09) | KY (21) | NV (32) | RI (44) | WY (56) |
DE (10) | LA (22) | NH (33) | SC (45) | |
DC (11) | ME (23) | NJ (34) | SD (46) | |
FL (12) | MD (24) | NM (35) | TN (47) | |
GA (13) | MA (25) | NY (36) | TX (48) |
Adding the state codes for DC and NC. You should change this to match your state of analysis.
# CRE data for DC
cre.2022.dc <- df %>%
filter(STATE == "11") # FIPS code for DC
# CRE data for NC
cre.2022.nc <- df %>%
filter(STATE == "37") # FIPS code for NC
Given the size of DC, we will analyze all of DC and the Anacostia neighborhood in Southeast (SE).
# CRE data for Anacostia, DC
cre.2022.anacostia <- df %>%
filter(STATE == "11" & COUNTY == "001" & TRACT %in% c("7503", "7504", "7903"))
To find census tract information for any state, you can use the U.S. Census Bureau’s official website (census.gov). Specifically, their “Geography” section provides access to TIGER/Line Shapefiles and Reference Maps, which contain detailed census tract data for all states and can be downloaded or viewed online.
In the case of NC, we will focus on Mecklenburg county.
# CRE data for Mecklenburg County, NC
cre.2022.mecklenburg <- df %>%
filter(STATE == "37" & COUNTY == "119") # FIPS code for NC and Mecklenburg County
You will need to View
the state-level df
to
examine county-level info. The county numbers in FIPS codes are derived
from the alphabetical order of counties within each state. County FIPS
codes are assigned in alphabetic order, counting by odd numbers. The
first county alphabetically in a state is always 001, the second is 003,
the third is 005, and so on. This method leaves space between each
county for insertion of renamed or newly created counties while
preserving the alphabetic ordering.
I am now adding a draft list of potential census variables that we would like to assess in relation to the CRE.
# draft list of CRE correlates
cre_correlates_dc <- get_acs(
geography = "tract",
state = "DC",
year = 2023,
survey = "acs5",
variables = c(
median_income = "B19013_001", # Median household income in the past 12 months
poverty_rate = "B17001_002", # Number of people below poverty level
unemployment_rate = "B23025_005", # Number of civilians (16 years and over) unemployed
no_health_insurance = "B27010_033", # Number of people with no health insurance coverage
disability_status = "B18101_001", # Total civilian non-institutionalized population for whom disability status is determined
education_less_than_hs = "B15003_002", # Population 25 years and over with less than 9th grade education
median_age = "B01002_001", # Median age
housing_cost_burden = "B25070_010", # Housing units spending 50% or more of income on rent
no_vehicle = "B08201_002", # Households with no vehicle available
black_population = "B02001_003", # Black or African American alone population
median_rent = "B25058_001" # Median contract rent
),
summary_var = "B02001_001", # Total population (for calculating proportions)
output = "wide",
geometry = FALSE # set to FALSE since CRE data doesn't have geometry
)
## Getting data from the 2019-2023 5-year ACS
# Calculate proportion of Black population
cre_correlates_dc <- cre_correlates_dc %>%
mutate(prop_black = black_populationE / summary_est)
# Print first few rows
head(cre_correlates_dc)
## # A tibble: 6 × 27
## GEOID NAME median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 11001000101 Census … 135708 43153 37 41
## 2 11001000102 Census … 159583 70430 108 109
## 3 11001000201 Census … NA NA 64 5
## 4 11001000202 Census … 152059 60482 465 180
## 5 11001000300 Census … 174470 21374 372 173
## 6 11001000400 Census … 188929 71412 242 115
## # ℹ 21 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
# draft list of CRE correlates for Mecklenburg County, NC
cre_correlates_mecklenburg <- get_acs(
geography = "tract",
state = "NC",
county = "Mecklenburg",
year = 2023,
survey = "acs5",
variables = c(
median_income = "B19013_001", # Median household income in the past 12 months
poverty_rate = "B17001_002", # Number of people below poverty level
unemployment_rate = "B23025_005", # Number of civilians (16 years and over) unemployed
no_health_insurance = "B27010_033", # Number of people with no health insurance coverage
disability_status = "B18101_001", # Total civilian non-institutionalized population for whom disability status is determined
education_less_than_hs = "B15003_002", # Population 25 years and over with less than 9th grade education
median_age = "B01002_001", # Median age
housing_cost_burden = "B25070_010", # Housing units spending 50% or more of income on rent
no_vehicle = "B08201_002", # Households with no vehicle available
black_population = "B02001_003", # Black or African American alone population
median_rent = "B25058_001" # Median contract rent
),
summary_var = "B02001_001", # Total population (for calculating proportions)
output = "wide",
geometry = FALSE # set to FALSE since CRE data doesn't have geometry
)
## Getting data from the 2019-2023 5-year ACS
# Calculate proportion of Black population
cre_correlates_mecklenburg <- cre_correlates_mecklenburg %>%
mutate(prop_black = black_populationE / summary_est)
# Print first few rows
head(cre_correlates_mecklenburg)
## # A tibble: 6 × 27
## GEOID NAME median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 37119000101 Census … 106601 16694 170 155
## 2 37119000102 Census … 129693 16100 195 153
## 3 37119000103 Census … 135533 25153 0 14
## 4 37119000104 Census … 109805 8019 158 94
## 5 37119000301 Census … 97411 16618 69 34
## 6 37119000302 Census … 78341 13225 110 70
## # ℹ 21 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
We then merge the CRE estimates with the CRE correlates, which will provide us with some talking points as we consider the technical documentation.
We first need to modify the GEO_ID in the CRE data by extracting some of the characters.
# Modify the GEO_ID in cre.2022.dc
cre.2022.dc_modified <- cre.2022.dc %>%
mutate(GEOID_modified = str_extract(GEO_ID, "\\d+$")) %>%
mutate(GEOID_modified = str_sub(GEOID_modified, -11)) %>%
relocate(GEOID_modified)
We then join the modified data with the correlates.
# Join data
merged_dc <- cre_correlates_dc %>%
left_join(cre.2022.dc_modified, by = c("GEOID" = "GEOID_modified"))
# Print the first few rows of the merged dataset
head(merged_dc)
## # A tibble: 6 × 46
## GEOID NAME.x median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 11001000101 Census … 135708 43153 37 41
## 2 11001000102 Census … 159583 70430 108 109
## 3 11001000201 Census … NA NA 64 5
## 4 11001000202 Census … 152059 60482 465 180
## 5 11001000300 Census … 174470 21374 372 173
## 6 11001000400 Census … 188929 71412 242 115
## # ℹ 40 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
We first need to modify the GEO_ID in the CRE data by extracting some of the characters.
# Modify the GEO_ID in cre.2022.nc
cre.2022.nc.mecklenburg_modified <- cre.2022.mecklenburg %>%
mutate(GEOID_modified = str_extract(GEO_ID, "\\d+$")) %>%
mutate(GEOID_modified = str_sub(GEOID_modified, -11)) %>%
relocate(GEOID_modified)
We then join the modified data with the correlates.
# Join data
merged_nc_mecklenburg <- cre_correlates_mecklenburg %>%
left_join(cre.2022.nc.mecklenburg_modified, by = c("GEOID" = "GEOID_modified"))
# Print the first few rows of the merged dataset
head(merged_nc_mecklenburg)
## # A tibble: 6 × 46
## GEOID NAME.x median_incomeE median_incomeM poverty_rateE poverty_rateM
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 37119000101 Census … 106601 16694 170 155
## 2 37119000102 Census … 129693 16100 195 153
## 3 37119000103 Census … 135533 25153 0 14
## 4 37119000104 Census … 109805 8019 158 94
## 5 37119000301 Census … 97411 16618 69 34
## 6 37119000302 Census … 78341 13225 110 70
## # ℹ 40 more variables: unemployment_rateE <dbl>, unemployment_rateM <dbl>,
## # no_health_insuranceE <dbl>, no_health_insuranceM <dbl>,
## # disability_statusE <dbl>, disability_statusM <dbl>,
## # education_less_than_hsE <dbl>, education_less_than_hsM <dbl>,
## # median_ageE <dbl>, median_ageM <dbl>, housing_cost_burdenE <dbl>,
## # housing_cost_burdenM <dbl>, no_vehicleE <dbl>, no_vehicleM <dbl>,
## # black_populationE <dbl>, black_populationM <dbl>, median_rentE <dbl>, …
We explore some initial visualizations to test initial hypotheses.
ggplot(merged_dc, aes(x = median_incomeE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "DC: Median Income vs. Community Resilience Estimate",
x = "Median Income",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
ggplot(merged_nc_mecklenburg, aes(x = median_incomeE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "Mecklenburg County, NC: Median Income vs. Community Resilience Estimate",
x = "Median Income",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For DC
ggplot(merged_dc, aes(x = median_rentE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "DC: Median Rent vs. Community Resilience Estimate",
x = "Median Rent Estimate",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For Mecklenburg County, NC
ggplot(merged_nc_mecklenburg, aes(x = median_rentE, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "Mecklenburg County, NC: Median Rent vs. Community Resilience Estimate",
x = "Median Rent Estimate",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For DC
ggplot(merged_dc, aes(x = prop_black, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "DC: Proportion of Black Population vs. Community Resilience Estimate",
x = "Proportion of Black Population",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
# For Mecklenburg County, NC
ggplot(merged_nc_mecklenburg, aes(x = prop_black, y = PRED3_PE)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", aes(color = "Linear"), se = FALSE) +
geom_smooth(method = "loess", aes(color = "Non-linear"), se = FALSE) +
labs(title = "Mecklenburg County, NC: Proportion of Black Population vs. Community Resilience Estimate",
x = "Proportion of Black Population",
y = "% Population with 3+ Risk Factors",
color = "Fit Type") +
scale_color_manual(values = c("Linear" = "red", "Non-linear" = "blue")) +
theme_minimal()
Forthcoming.