DATA 606 Data Project Proposal

Part 1 - Introduction

This project explores food access and pricing inequality in NYC, focusing on differences between Brownsville (Brooklyn) and Lower Manhattan. The motivating observation is that essential items (e.g., eggs) can be more expensive in Brooklyn bodegas than at “expensive” chains (Whole Foods, Trader Joe’s) in Manhattan.

Research question: Are staple grocery items more expensive and/or less accessible in low-income neighborhoods compared with higher-income neighborhoods in NYC?

Why it matters: Results connect to food justice, urban planning, and resource allocation.

##Part 2 - Data

Cases. A geographic area (ZIP code or census tract) summarized by store counts/density and demographics (income/poverty/SNAP). Optionally, include a small self-collected price table (4–6 staple items).

Collection method. Public administrative datasets (NYC Open Data, USDA), plus optional self-collected prices.

Type of study. Observational.

Data sources (for later):

NYC Open Data – food retailers / FRESH

USDA Food Access Research Atlas

U.S. Census / ACS

For this test, we’ll use a tiny toy dataset so knitting works without any external files.

# ---------------------------------------------------------------
# NOTE: This "toy_store_counts" dataset is a TEMPORARY PLACEHOLDER
# It allows the R Markdown file to knit successfully before I
# download and import the real datasets (NYC Open Data, USDA, ACS).
#
# The toy data below simply mimics the structure of the actual data
# I will use later (area_id, n_stores, population, income, poverty).
# Once I have real data, I will replace this section with code that
# reads and joins my CSVs, as shown in later instructions.
# ---------------------------------------------------------------

toy_store_counts <- tribble(
  ~area_id,               ~n_stores, ~population, ~median_income, ~poverty_rate,
  "Brownsville_11212",           35,       86000,          36500,          0.27,
  "LowerManhattan_10013",        80,       62000,         112000,          0.11
) |>
  mutate(store_density_per_10k = n_stores / (population / 10000))

# Quick summary of store density
toy_store_counts

## # A tibble: 2 × 6
##   area_id   n_stores population median_income poverty_rate store_density_per_10k
##   <chr>        <dbl>      <dbl>         <dbl>        <dbl>                 <dbl>
## 1 Brownsvi…       35      86000         36500         0.27                  4.07
## 2 LowerMan…       80      62000        112000         0.11                 12.9

summary(toy_store_counts$store_density_per_10k)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   4.070   6.278   8.486   8.486  10.695  12.903

# Simple visualization (toy example)
ggplot(toy_store_counts, aes(median_income, store_density_per_10k, label = area_id)) +
  geom_point(size = 3) +
  geom_text(nudge_y = 0.2, size = 3) +
  labs(
    title = "Income vs Store Density (toy example)",
    x = "Median household income ($)",
    y = "Stores per 10,000 residents"
  )

# Toy model to illustrate structure:
toy_model <- lm(store_density_per_10k ~ median_income + poverty_rate, data = toy_store_counts)
summary(toy_model)

## 
## Call:
## lm(formula = store_density_per_10k ~ median_income + poverty_rate, 
##     data = toy_store_counts)
## 
## Residuals:
## ALL 2 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (1 not defined because of singularities)
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)   -0.200712        NaN     NaN      NaN
## median_income  0.000117        NaN     NaN      NaN
## poverty_rate         NA         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 1 and 0 DF,  p-value: NA

##Part 5 - Conclusion

This analysis will quantify disparities in grocery access (and optionally staple prices) between NYC neighborhoods, informing equity and policy discussions.

##References

NYC Open Data – Food retail/FRESH

USDA ERS – Food Access Research Atlas

U.S. Census Bureau – ACS

DATA 606 Data Project Proposal

Kevin Martin

2025-10-13

R Markdown

Part 1 - Introduction