# -----------------------------
# Load Required Libraries
# -----------------------------
library(tidyverse) # data manipulation and plotting
library(sf) # spatial data handling
library(readxl) # importing Excel files
library(stringr) # string manipulation for GEOID cleaning
library(tidycensus) # census data and tract geometry
library(scales) # transparency settings for maps
This R Markdown document presents the current progress for my spatial analysis project focused on food-related need in Brooklyn, New York. The project examines patterns of economic vulnerability using American Community Survey data, especially SNAP participation and poverty rates. These indicators are mapped at the census tract level and displayed with Neighborhood Tabulation Area boundaries for neighborhood-level context.
At this stage, the project focuses on preparing the spatial and demographic data, joining ACS indicators to census tract geometry, and creating initial visualizations. Later steps will incorporate food infrastructure data, such as farmers markets, community gardens, community fridges, and other food resources, to compare areas of need with areas of food access.
The first dataset used in this project is the 2020 Neighborhood
Tabulation Area boundary file. This file contains a geometry column
stored as Well-Known Text. I convert the data into an sf
spatial object so that it can be mapped and used as a geographic
reference layer.
# Import NTA data
nta <- read_csv("Data/2020_Neighborhood_Tabulation_Areas_(NTAs)_20260501.csv", show_col_types = FALSE)
# Convert to spatial object
nta_sf <- st_as_sf(nta, wkt = "the_geom", crs = 4326)
# Filter to Brooklyn only
bk_nta <- nta_sf %>%
filter(BoroName == "Brooklyn")
This first map is used to confirm that the NTA spatial data imported correctly and that the Brooklyn filter worked. This is an important early data validation step because it confirms that the project’s geographic boundary layer is usable.
# Plot Brooklyn NTAs
ggplot() +
geom_sf(data = bk_nta, fill = "lightblue", color = "black") +
labs(title = "Brooklyn NTA Boundaries") +
theme_minimal()
The next step is to import American Community Survey data. The ACS file contains demographic and economic indicators related to SNAP participation, poverty, and race and ethnicity. These variables are used to measure food-related need across Brooklyn.
The Excel file includes an extra descriptive row that is not part of the actual data, so that row is removed. Then, only the variables needed for this stage of the analysis are selected and renamed.
# -----------------------------
# Import and Clean ACS Data
# -----------------------------
# Import ACS Excel file (Census data)
acs <- read_excel("Data/MINE_ACSST5Y2020.S2201.xlsx")
# Remove the second row (contains descriptive labels, not actual data)
acs_clean <- acs %>%
slice(-1)
# Select only relevant variables and rename for easier use
acs_clean <- acs_clean %>%
select(
GEO_ID,
NAME,
snap = S2201_C04_001E, # % households receiving SNAP
poverty = S2201_C02_021E, # % households below poverty
pct_black = S2201_C02_026E, # % Black households
pct_hispanic = S2201_C02_032E # % Hispanic households
)
The selected ACS variables originally imported as character values.
Since these variables need to be mapped and analyzed as percentages,
they are converted into numeric format. Some non-numeric Census values
may become NA, which is expected when the original dataset
contains missing or suppressed values.
# Convert selected variables from character to numeric
# (necessary for mapping and analysis)
acs_clean <- acs_clean %>%
mutate(
snap = as.numeric(snap),
poverty = as.numeric(poverty),
pct_black = as.numeric(pct_black),
pct_hispanic = as.numeric(pct_hispanic)
)
# Filter ACS data to Brooklyn (Kings County census tracts only)
acs_bk <- acs_clean %>%
filter(str_detect(NAME, "Kings County"))
# Check cleaned dataset
#glimpse(acs_bk)
# Check summary statistics (data validation step)
# summary(acs_clean$snap)
# summary(acs_clean$poverty)
The ACS Excel data contains tract-level values but does not contain
spatial geometry. To map these values, I use tidycensus to
retrieve Brooklyn census tract boundaries. The population variable is
pulled mainly so that the function returns tract-level geometry.
options(tigris_use_cache = TRUE)
# -----------------------------
# Get Brooklyn Census Tract Geometry
# -----------------------------
# Get Brooklyn census tract boundaries using ACS 2020 data
bk_tracts <- get_acs(
geography = "tract",
variables = "B01003_001",
state = "NY",
county = "Kings",
year = 2020,
survey = "acs5",
geometry = TRUE
)
# Check tract geometry
#glimpse(bk_tracts)
The ACS file uses a GEO_ID field that includes a Census
prefix. The tract geometry file uses a shorter GEOID field.
To join the two datasets correctly, I create a matching
GEOID field by removing the Census prefix from the ACS
identifier.
I also remove identical duplicate tract records as a data validation step. This ensures that each tract appears only once before the spatial join.
# -----------------------------
# Prepare ACS Data for Joining
# -----------------------------
# Create a GEOID column from the ACS GEO_ID field
# This removes the Census prefix and keeps only the tract ID
acs_bk <- acs_bk %>%
mutate(GEOID = str_remove(GEO_ID, "1400000US"))
# Remove identical duplicate tract records if any are present
# This prevents the spatial join from doubling tract rows
acs_bk <- acs_bk %>%
distinct(GEOID, .keep_all = TRUE)
# Check that each census tract appears only once
#acs_bk %>%
#count(GEOID) %>%
#filter(n > 1)
# Check that ACS GEOID was created correctly
#glimpse(acs_bk)
This step joins the cleaned ACS variables to the Brooklyn census tract geometry. The result is a spatial dataset that contains both geometry and demographic indicators. This joined dataset is the main dataset used for the maps in the progress report.
# -----------------------------
# Join ACS Data to Census Tract Geometry
# -----------------------------
# Join SNAP, poverty, and demographic variables to the Brooklyn census tract shapes
# This creates one spatial dataset that has both geometry and ACS data
bk_tracts_acs <- bk_tracts %>%
left_join(acs_bk, by = "GEOID")
# Confirm the join did not duplicate rows
#nrow(bk_tracts_acs)
# Check the joined spatial dataset
# This lets us confirm that the ACS variables were added to the tract geometry
#glimpse(bk_tracts_acs)
This map shows the percentage of households receiving SNAP benefits by census tract in Brooklyn. SNAP participation is used as one indicator of food-related economic need. Darker areas represent higher SNAP participation, while lighter areas represent lower SNAP participation. NTA boundaries are included as a black outline to provide neighborhood-level context.
# -----------------------------
# Map SNAP Participation with Visible Tract Boundaries
# -----------------------------
# Create a SNAP choropleth map and draw tract boundaries on top
# This keeps missing-data tracts visible while showing their individual outlines
ggplot() +
# Fill census tracts by SNAP percentage
geom_sf(data = bk_tracts_acs, aes(fill = snap), color = NA) +
# Draw all census tract boundaries on top of the fill layer
geom_sf(data = bk_tracts_acs, fill = NA, color = "white", size = 0.08) +
# Draw Brooklyn outer boundary in black
geom_sf(data = bk_nta, fill = NA, color = "black", size = 0.5) +
# Set color scale
scale_fill_gradient(
low = "lightblue",
high = "darkblue",
na.value = scales::alpha("grey60", 0.45)
) +
labs(
title = "Households Receiving SNAP Benefits by Census Tract
in Brooklyn (with NTA Boundaries)",
fill = "% SNAP"
) +
theme_minimal()
This map shows the percentage of households below poverty by census tract in Brooklyn. Poverty is used as a broader measure of economic vulnerability. Comparing this map with the SNAP map helps identify whether food assistance participation and poverty follow similar spatial patterns.
# -----------------------------
# Map Poverty Rate by Census Tract
# -----------------------------
# Create a choropleth map showing percent of households below poverty
ggplot() +
geom_sf(data = bk_tracts_acs, aes(fill = poverty), color = NA) +
# Add tract boundaries
geom_sf(data = bk_tracts_acs, fill = NA, color = "white", size = 0.08) +
# Add NTA boundaries
geom_sf(data = bk_nta, fill = NA, color = "black", size = 0.4) +
scale_fill_gradient(
low = "lightgreen",
high = "darkgreen",
na.value = scales::alpha("grey60", 0.45)
) +
labs(
title = "Households Below Poverty by Census Tract in Brooklyn (with NTA Boundaries)",
fill = "% Poverty"
) +
theme_minimal()
To combine SNAP and poverty into one index, both variables need to be placed on the same scale. I standardize each variable using z-scores. This makes the two indicators comparable before combining them.
# -----------------------------
# Create Standardized Need Variables
# -----------------------------
bk_tracts_acs <- bk_tracts_acs %>%
mutate(
snap_z = scale(snap),
poverty_z = scale(poverty)
)
So far, I have completed the main data preparation steps for the need side of the project. I imported and converted Brooklyn NTA boundaries, cleaned ACS demographic data, pulled Brooklyn census tract geometry, joined the ACS variables to the tract boundaries, and created initial maps of SNAP participation, poverty, and a composite need index.
The next step will be to incorporate food infrastructure datasets, including CEANYC food resource data and farmers market locations. These point datasets will be converted into spatial objects and compared against the need index to identify areas where high need may not be matched by nearby food infrastructure.