Beyond the Bullet: How Structural Disadvantage Shapes Gun Violence in NYC Neighborhoods (2023)

Introduction

This analysis explores the relationship between structural disadvantage and gun violence across New York City neighborhoods in 2023. While shootings have declined overall, violence remains concentrated in specific areas—often those facing overlapping environmental and social burdens.

To examine this connection, the study uses the Disadvantage Score developed by NYSERDA and the New York State Climate Justice Working Group. This composite score incorporates 45 indicators across domains such as pollution exposure, healthcare access, socioeconomic vulnerability, and climate risk—generating a percentile-based metric (0 to 1) that reflects neighborhood-level structural disadvantage.

By applying beta regression modeling and simulation, the analysis identifies which factors most strongly influence predicted disadvantage scores. Findings reveal that both environmental conditions (e.g., particulate matter, limited green space, benzene exposure) and population-level challenges (e.g., asthma rates, healthcare access) significantly raise a neighborhood’s Disadvantage Score.

This data-driven approach highlights how structural factors intersect to shape community vulnerability. The Disadvantage Score proves to be a valuable tool for identifying high-need neighborhoods and informing targeted public health, planning, and safety interventions.

Research Objectives

This research investigates how social and environmental factors intersect to influence gun violence across New York City neighborhoods, using the Disadvantage Score as a central tool for analysis.

- Evaluate the Disadvantage Score as an appropriate measure for this analysis

The study assesses whether NYSERDA’s Disadvantage Score accurately captures the types of neighborhood vulnerability associated with gun violence. Results confirm that the score reflects both environmental and population-level risk factors, making it a meaningful tool for identifying high-need areas and guiding interventions.

Data Sources

This analysis relies on two primary data sets:

NYPD Shooting Incident Data (2023): • Obtained from the New York Police Department (NYPD), this data set provides detailed information on all recorded shootings in NYC from 2006 to 2023. • Includes variables such as incident date/time, location (latitude/longitude), victim demographics (age, race, sex), and whether the shooting was fatal or non-fatal. • This data set enables the calculation of shooting rates per 100,000 residents at the neighborhood level by matching incidents to NYC census tracts. New York City Police Department (n.d.)
New York State Disadvantaged Communities Data (2023): Defines disadvantaged neighborhoods based on environmental burdens, climate risks, and socioeconomic vulnerabilities. • Published by the New York State Energy Research and Development Authority (NYSERDA), this dataset identifies disadvantaged communities based on a combination of socioeconomic, environmental, and health-related factors. • Neighborhoods are assigned a Disadvantaged Score ranging from 0 (least disadvantaged) to 1 (most disadvantaged), determined by indicators such as poverty rates, unemployment, median income, racial composition, housing burden, and environmental risks. • This dataset allows for a quantitative measure of neighborhood-level disadvantage, which is used to assess its relationship with gun violence. Development Authority (NYSERDA) (2023b)

By integrating these two datasets, this study evaluates whether higher levels of neighborhood disadvantage correlate with increased rates of gun violence.

Hypothesis:

This study tests whether structural disadvantage—measured through combined environmental and social burdens—can help explain where gun violence is most likely to occur in New York City..

Gun violence is more likely to be concentrated in tracts with both environmental and social issues,as these overlapping burdens reflect long-standing neglect and structural vulnerabilities. Neighborhoods facing poor air quality, lack of green space, and limited access to healthcare—combined with socioeconomic stressors—are likely to experience the compounding effects that contribute to higher rates of violence. These areas represent priority zones for targeted intervention.

Neighborhoods with elevated Disadvantage Scores are more likely to experience gun violence, reinforcing the score’s value as an essential indicator. Because the Disadvantage Score reflects cumulative burdens across environmental, health, and economic domains, it captures the structural conditions that leave communities more exposed to violence. If used strategically, the score can guide resource allocation and policy to address root causes.

Methodology

This research was carried out in three key stages: spatial analysis of gun violence, regression analysis of gun violence and disadvantage, and simulation modeling of structural factors contributing to disadvantage.

1. Geo-Spatial Analysis: Mapping Shootings to Census Tracts

To enable neighborhood-level comparisons, individual NYPD shooting incidents were spatially joined to NYC census tracts using R’s sf package. Since Disadvantage Scores are calculated at the census tract level, this spatial join allowed for the total number of shootings per tract to be computed. From there, tract-level shooting rates (per 100,000 people) were generated to standardize comparison across communities.

2. Regression Analysis: Disadvantage Score & Shootings per 100k

To quantify the relationship between neighborhood disadvantage and gun violence, a simple linear regression was performed using Disadvantage Score as the independent variable and shootings per 100,000 residents as the outcome. The results revealed that each 1% increase in Disadvantage Score (0.01) is associated with approximately 0.30 additional shootings per 100,000 people. A 10% increase corresponds to 3 additional shootings per 100,000. The effect was statistically significant, with a high t-statistic and a p-value near zero, confirming a strong correlation between structural disadvantage and elevated gun violence.

3. Beta Regression & Simulation Modeling

To explore which structural factors most influence Disadvantage Scores, a beta regression model was used—appropriate for outcomes between 0 and 1. The model evaluated 44 social, health, and environmental predictors. For each, simulations estimated how predicted disadvantage scores change from their minimum to maximum observed values, holding other variables constant. This approach allowed the effect of each variable to be isolated and ranked by impact, revealing key drivers of structural vulnerability across NYC neighborhoods.

2023 Most Disadvantaged Neighborhoods in NYC with Score

# Load necessary libraries
library(sf)
library(leaflet)
library(dplyr)


# Convert WKT geometry to sf object
neighborhoods_sf <- st_as_sf(neighborhoods, wkt = "the_geom", crs = 4326)

# Ensure geometries are valid
neighborhoods_sf <- st_make_valid(neighborhoods_sf)

# Filter for NYC bounding box (approximate boundaries)
nyc_sf <- neighborhoods_sf %>%
  st_filter(st_as_sfc(st_bbox(c(xmin = -74.2591, xmax = -73.7004, ymin = 40.4774, ymax = 40.9176), crs = 4326)))

# Filter only areas that are "Designated as DAC"
nyc_disadvantaged_sf <- nyc_sf %>%
  filter(DAC_Designation == "Designated as DAC")

# Check if there are valid rows left
if (nrow(nyc_disadvantaged_sf) == 0) {
  stop("No areas found with DAC_Designation == 'Designated as DAC'. Verify column values.")
}

# Create an interactive leaflet map highlighting DAC-designated areas
leaflet(nyc_disadvantaged_sf) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(
    fillColor = "lightblue",
    color = "blue",
    weight = 1,
    opacity = 0.7,
    fillOpacity = 0.6,
    highlightOptions = highlightOptions(
      color = "yellow", weight = 2, bringToFront = TRUE
    ),
    label = lapply(
      paste0(
  "<div style='font-size:14px; background-color:white; padding:5px; border-radius:5px; border:1px solid #ccc;'>",
  "<b>GEOID:</b> ", nyc_disadvantaged_sf$GEOID, "<br>",
  "<b>County:</b> ", nyc_disadvantaged_sf$County, "<br>",
  "<b>Percentile Rank Combined NYC:</b> ", nyc_disadvantaged_sf$Percentile_Rank_Combined_NYC,
  "</div>"
)
,
      htmltools::HTML
    ),
    labelOptions = labelOptions(
      direction = "auto",
      textsize = "12px",
      style = list(
        "color" = "black",
        "background-color" = "white",
        "border" = "1px solid gray",
        "padding" = "6px",
        "border-radius" = "6px"
      )
    )
  ) %>%
  addLegend(
    position = "bottomright",
    colors = "lightblue",
    labels = "Most Disadvantaged Neighborhoods",
    opacity = 1.0
  ) %>%
  setView(lng = -74.0060, lat = 40.7128, zoom = 11)

The map above highlights all New York City census tracts identified as disadvantaged based on the 2023 criteria established by the New York State Energy Research and Development Authority (NYSERDA). A tract is designated as disadvantaged if it receives a combined percentile score of 0.30 or higher, reflecting cumulative burdens across 45 indicators spanning environmental exposure, health vulnerabilities, and socioeconomic conditions. “Final Disadvantaged Communities (DAC) Overview” (2023)

These highlighted tracts are geographically concentrated in historically undeserved areas, including large portions of the Bronx, Central and East Brooklyn, Southeast Queens, Upper Manhattan, and the North Shore of Staten Island. The spatial clustering of disadvantaged neighborhoods underscores systemic patterns of disinvestment and structural inequality. This geographic visualization serves as a foundational layer for analyzing how neighborhood disadvantage overlaps with public health and safety concerns, including patterns of gun violence.

Shootings in NYC Since 2006 with Statistical Markers

This map visualizes over 17 years of shooting incidents across New York City, using point-level data from the NYPD Shooting Incident (Historic) dataset. Each red dot represents a single shooting event recorded between 2006 and 2023. These incidents are geocoded and include detailed statistical markers such as:

Date & time of the incident
Police precinct where the shooting occurred
Whether the shooting was fatal or non-fatal
Victim demographics, including age group, sex, and race

By displaying each incident spatially, the map highlights the geographic concentration of gun violence in specific neighborhoods, particularly in the Bronx, Brooklyn, and parts of Queens. This spatial context helps to better understand how gun violence is distributed across the city and serves as a foundation for correlating shootings with structural factors such as neighborhood disadvantage.

# Load necessary libraries
library(leaflet)
library(dplyr)
library(lubridate)

# Clean and prepare the shooting data
location.data <- shootings %>%
  mutate(
    Latitude = as.numeric(Latitude),
    Longitude = as.numeric(Longitude),
    OCCUR_DATE = mdy(OCCUR_DATE),
    OCCUR_DATE_FORMATTED = format(OCCUR_DATE, "%B %d, %Y"),
    STATISTICAL_MURDER_FLAG = ifelse(STATISTICAL_MURDER_FLAG, "Fatal", "Non-Fatal")
  ) %>%
  filter(
    !is.na(Latitude) & !is.na(Longitude),
    Latitude >= 40.4774 & Latitude <= 40.9176,
    Longitude >= -74.2591 & Longitude <= -73.7004
  ) %>%
  select(
    Longitude, Latitude, OCCUR_DATE_FORMATTED, OCCUR_TIME, PRECINCT, 
    STATISTICAL_MURDER_FLAG, VIC_AGE_GROUP, VIC_SEX, VIC_RACE
  )

# Create the Leaflet map
leaflet(location.data) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    ~Longitude, ~Latitude,
    radius = 2,
    color = "red",
    fillColor = "red",
    fillOpacity = 0.2,
    popup = ~paste0(
      "<div style='font-size:14px; background-color:white; padding:5px; border-radius:5px; border:1px solid #ccc;'>",
      "<b>Date:</b> ", OCCUR_DATE_FORMATTED, "<br>",
      "<b>Time:</b> ", OCCUR_TIME, "<br>",
      "<b>Precinct:</b> ", PRECINCT, "<br>",
      "<b>Outcome:</b> ", STATISTICAL_MURDER_FLAG, "<br>",
      "<b>Victim Age Group:</b> ", VIC_AGE_GROUP, "<br>",
      "<b>Victim Sex:</b> ", VIC_SEX, "<br>",
      "<b>Victim Race:</b> ", VIC_RACE,
      "</div>"
    )
  ) %>%
  setView(lng = -74.0060, lat = 40.7128, zoom = 12)

Heatmap of Shootings in NYC since 2006

# Load necessary libraries
library(leaflet)
library(leaflet.extras)  # Enables heatmap functionality
library(dplyr)
library(lubridate)  # For date handling

# Process data: filter and format relevant columns
location.data <- shootings %>%
  filter(!is.na(Latitude) & !is.na(Longitude)) %>%  # Remove rows with missing coordinates
  mutate(
    Latitude = as.numeric(Latitude),
    Longitude = as.numeric(Longitude),
    OCCUR_DATE = mdy(OCCUR_DATE)  # Convert date format
  ) %>%
  select(Longitude, Latitude, OCCUR_DATE)

# NYC Map with a red-intense heatmap layer
leaflet(data = location.data) %>%
  addProviderTiles("CartoDB.Positron") %>%  # Cleaner map style
  addHeatmap(
    lng = ~Longitude, lat = ~Latitude,
    intensity = 7,  # Each point contributes equally to the heatmap
    blur = 8,  # Decrease blur for sharper heat spots
    max = 15,  # Increase max intensity for a stronger color contrast
    radius = 25,  # Larger radius for more concentrated heat spots
    gradient = c("transparent", "#ffd700", "#ffa500", "#990000")  # Gradient shades of red
  ) %>%
  setView(lng = -74.0060, lat = 40.7128, zoom = 12)

Using the same point-level data from the NYPD Shooting Incident dataset, I created this heat map to visually emphasize where gun violence has been most concentrated in New York City between 2006 and 2023. Each incident is weighted by density, with darker red areas indicating higher concentrations of shootings.

This spatial representation highlights Brooklyn and the Bronx as the city’s most heavily impacted boroughs—showing persistent hotspots of gun violence across nearly two decades. These findings align with historical patterns of disinvestment, population vulnerability, and structural disadvantage. Areas of Queens and Staten Island show far lower concentrations, while parts of Manhattan display moderate density in select neighborhoods.

By shifting from a traditional dot map to this heat map format, the spatial disparities in exposure to gun violence become even more visible—making it easier to identify priority zones for intervention and resources.

Statistical Summary of Shootings in NYC by Borough and Fatality Status

library(modelsummary)

## Warning: package 'modelsummary' was built under R version 4.3.3

datasummary_crosstab(BORO ~ STATISTICAL_MURDER_FLAG, data = shootings)

BORO		false	true	All
BRONX	N	6742	1634	8376
	% row	80.5	19.5	100.0
BROOKLYN	N	9136	2210	11346
	% row	80.5	19.5	100.0
MANHATTAN	N	3090	672	3762
	% row	82.1	17.9	100.0
QUEENS	N	3431	840	4271
	% row	80.3	19.7	100.0
STATEN ISLAND	N	637	170	807
	% row	78.9	21.1	100.0
All	N	23036	5526	28562
	% row	80.7	19.3	100.0

This table builds on the spatial analysis by providing a breakdown of all reported shootings in New York City from 2006 to 2023, categorized by borough and whether or not the incident was fatal. The columns labeled “false” and “true” indicate non-fatal and fatal shootings, respectively, while the “% row” values display the percentage distribution within each borough.

Brooklyn and the Bronx stand out not only for their overall number of shootings—11,346 and 8,376, respectively—but also for having similar fatality rates, with about 19.5% of all recorded incidents resulting in death. Manhattan had fewer total shootings (3,762), and the lowest fatality rate at just under 18%. Queens and Staten Island experienced fewer shootings overall, with Staten Island seeing the fewest (807), though its share of fatal shootings was slightly higher at 21.1%.

This breakdown adds another layer to the citywide picture of gun violence, revealing how not only the volume but also the severity of incidents can vary across boroughs. It reinforces the need for place-based strategies that are responsive to both the frequency and fatality of gun violence in different parts of the city.

2023 NYC Shootings with Demographics and Statistical Markers

# Load necessary libraries
library(leaflet)
library(dplyr)
library(lubridate)

# Prepare and filter the shooting data for 2023
location.data <- shootings %>%
  mutate(
    OCCUR_DATE = mdy(OCCUR_DATE),
    Latitude = as.numeric(Latitude),
    Longitude = as.numeric(Longitude),
    OCCUR_DATE_FORMATTED = format(OCCUR_DATE, "%B %d, %Y"),
    STATISTICAL_MURDER_FLAG = ifelse(STATISTICAL_MURDER_FLAG, "Fatal", "Non-Fatal")
  ) %>%
  filter(
    year(OCCUR_DATE) == 2023,
    !is.na(Latitude) & !is.na(Longitude),
    Latitude >= 40.4774 & Latitude <= 40.9176,
    Longitude >= -74.2591 & Longitude <= -73.7004
  ) %>%
  select(Longitude, Latitude, OCCUR_DATE_FORMATTED, OCCUR_TIME, PRECINCT,
         STATISTICAL_MURDER_FLAG, VIC_AGE_GROUP, VIC_SEX, VIC_RACE)

# Create the Leaflet map
leaflet(data = location.data) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addCircleMarkers(
    ~Longitude, ~Latitude,
    radius = 4,
    color = "red",
    fillColor = "red",
    fillOpacity = 0.5,
    popup = ~paste0(
      "<b>Date:</b> ", OCCUR_DATE_FORMATTED, "<br>",
      "<b>Time:</b> ", OCCUR_TIME, "<br>",
      "<b>Precinct:</b> ", PRECINCT, "<br>",
      "<b>Outcome:</b> ", STATISTICAL_MURDER_FLAG, "<br>",
      "<b>Victim Age Group:</b> ", VIC_AGE_GROUP, "<br>",
      "<b>Victim Sex:</b> ", VIC_SEX, "<br>",
      "<b>Victim Race:</b> ", VIC_RACE
    )
  ) %>%
  setView(lng = -74.0060, lat = 40.7128, zoom = 12)

This map visualizes all recorded shooting incidents in New York City during 2023, using point-level data from the NYPD Shooting Incident dataset. Each red dot represents a single event and includes key statistical markers such as:

Date and time of the incident

Police precinct where it occurred

Fatality status (fatal or non-fatal)

Victim demographics including age group, sex, and race

By mapping these incidents, this visualization provides insight into the characteristics of gun violence citywide in 2023, supporting further analysis of who is most affected and under what conditions these events take place.

2023 NYC Shootings and Disadvantaged Neighborhoods Map

# Load necessary libraries
library(sf)
library(leaflet)
library(dplyr)
library(lubridate)

# --- Process Shooting Data ---
shooting_data <- shootings %>%
  mutate(
    OCCUR_DATE = mdy(OCCUR_DATE),
    Latitude = as.numeric(Latitude),
    Longitude = as.numeric(Longitude),
    STATISTICAL_MURDER_FLAG = ifelse(STATISTICAL_MURDER_FLAG, "Fatal", "Non-Fatal")
  ) %>%
  filter(
    year(OCCUR_DATE) == 2023,
    !is.na(Latitude) & !is.na(Longitude),
    Latitude >= 40.4774 & Latitude <= 40.9176,
    Longitude >= -74.2591 & Longitude <= -73.7004
  ) %>%
  select(Longitude, Latitude, OCCUR_DATE, OCCUR_TIME, PRECINCT,
         STATISTICAL_MURDER_FLAG, VIC_AGE_GROUP, VIC_SEX, VIC_RACE)

# --- Process DAC Data ---
neighborhoods_sf <- st_as_sf(neighborhoods, wkt = "the_geom", crs = 4326)
neighborhoods_sf <- st_make_valid(neighborhoods_sf)

nyc_sf <- neighborhoods_sf %>%
  st_filter(st_as_sfc(st_bbox(c(
    xmin = -74.2591, xmax = -73.7004,
    ymin = 40.4774, ymax = 40.9176
  ), crs = 4326)))

nyc_disadvantaged_sf <- nyc_sf %>%
  filter(
    DAC_Designation == "Designated as DAC",
    Percentile_Rank_Combined_NYC > 0
  )

# --- Create Leaflet Map ---
leaflet() %>%
  addProviderTiles("CartoDB.Positron") %>%

  # Add DAC polygons
  addPolygons(
    data = nyc_disadvantaged_sf,
    fillColor = "lightblue",
    color = "blue",
    weight = 1,
    opacity = 0.7,
    fillOpacity = 0.6,
    group = "Disadvantaged Neighborhoods",
    label = ~paste0(
      "<b>DAC Status:</b> ", DAC_Designation, "<br>",
      "<b>Percentile Rank Combined NYC:</b> ", Percentile_Rank_Combined_NYC
    ) %>% lapply(htmltools::HTML),
    highlightOptions = highlightOptions(color = "yellow", weight = 2, bringToFront = TRUE)
  ) %>%

  # Add shooting points
  addCircleMarkers(
    data = shooting_data,
    lng = ~Longitude,
    lat = ~Latitude,
    radius = 2,
    color = "red",
    fillOpacity = 0.5,
    group = "Shooting Incidents",
    popup = ~paste0(
      "<b>Date:</b> ", OCCUR_DATE, "<br>",
      "<b>Time:</b> ", OCCUR_TIME, "<br>",
      "<b>Precinct:</b> ", PRECINCT, "<br>",
      "<b>Outcome:</b> ", STATISTICAL_MURDER_FLAG, "<br>",
      "<b>Victim Age Group:</b> ", VIC_AGE_GROUP, "<br>",
      "<b>Victim Sex:</b> ", VIC_SEX, "<br>",
      "<b>Victim Race:</b> ", VIC_RACE
    )
  ) %>%

  # Add map controls
  addLayersControl(
    overlayGroups = c("Disadvantaged Neighborhoods", "Shooting Incidents"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  addLegend(
    position = "bottomleft",
    colors = c("lightblue", "red"),
    labels = c("Disadvantaged Neighborhoods", "Shooting Incidents"),
    opacity = 1.0
  ) %>%

  # Center the map
  setView(lng = -74.0060, lat = 40.7128, zoom = 12)

This map displays all recorded shooting incidents in New York City during 2023 (marked as red dots), overlaid with the boundaries of census tracts classified as disadvantaged by the New York State Energy Research and Development Authority (NYSERDA), shown in blue outlines.

By combining these two layers, the map provides a clear visual representation of where gun violence overlaps with structural disadvantage. At a glance, it appears that a large portion of shootings in 2023 occurred within areas officially designated as disadvantaged neighborhoods. Many of the red markers fall directly within or along the borders of these tracts, suggesting a strong spatial relationship between the two.

This visualization serves as a useful tool for exploring how the geography of gun violence may align with existing patterns of disadvantage across the city. It helps contextualize incident-level data within broader neighborhood conditions and supports further analysis of the structural factors contributing to violence.

Top 15 Most Disadvantaged Census Tracts in NYC (2023) with Shooting Counts

# Load necessary libraries
library(dplyr)
library(sf)
library(lubridate)
library(knitr)

library(kableExtra)

# Define NYC boroughs
nyc_boroughs <- c("Bronx", "Manhattan", "Queens", "Brooklyn", "Staten Island")

# Ensure OCCUR_DATE is in Date format and filter for 2023 shootings
shootings_2023 <- shootings %>%
  mutate(OCCUR_DATE = mdy(OCCUR_DATE)) %>%  # Convert to Date format
  filter(year(OCCUR_DATE) == 2023 & !is.na(Latitude) & !is.na(Longitude))

# Convert shootings data to spatial points using sf package
shootings_sf <- st_as_sf(shootings_2023, coords = c("Longitude", "Latitude"), crs = 4326)

# Convert neighborhoods dataset to sf object
neighborhoods_sf <- st_as_sf(neighborhoods, wkt = "the_geom", crs = 4326) %>%
  filter(County %in% nyc_boroughs)

# Perform spatial join to assign each shooting to a census tract
shootings_with_tracts <- st_join(shootings_sf, neighborhoods_sf, left = FALSE) %>%
  st_drop_geometry()  # Remove geometry column

# Group shootings by census tract
shootings_count <- shootings_with_tracts %>%
  group_by(GEOID) %>%
  summarise(Shooting_Count = n())

# Filter for NYC neighborhoods
nyc_data <- neighborhoods %>%
  filter(County %in% nyc_boroughs)

# Select relevant columns and join with shooting counts
relevant_columns <- c("GEOID", "County", "Percentile_Rank_Combined_NYC")

top_15_disadvantaged <- nyc_data %>%
  select(all_of(relevant_columns)) %>%
  left_join(shootings_count, by = "GEOID") %>%
  mutate(Shooting_Count = ifelse(is.na(Shooting_Count), 0, Shooting_Count)) %>%
  arrange(desc(Percentile_Rank_Combined_NYC)) %>%
  head(15)

# Display the table with formatting
top_15_disadvantaged %>%
  mutate(GEOID = paste0("**", GEOID, "**")) %>%  # Make GEOID bold
  kable("html", caption = "<span style='font-weight: bold; color: black;'>Top 15 Most Disadvantaged Census Tracts in NYC (with Shooting Counts)</span>") %>%
  kable_styling("striped", full_width = F, position = "center") %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(3, color = "red", bold = TRUE) %>%  # Highlight disadvantaged scores in red
  column_spec(4, color = "blue", bold = TRUE)  # Highlight shooting counts in blue

Top 15 Most Disadvantaged Census Tracts in NYC (with Shooting Counts)
GEOID	County	Percentile_Rank_Combined_NYC	Shooting_Count
36005009300	Bronx	1.000000	2
36005011700	Bronx	0.999522	0
36005001900	Bronx	0.999045	0
36005011502	Bronx	0.998090	3
36005005100	Bronx	0.997135	3
36005009000	Bronx	0.996657	2
36005005300	Bronx	0.995702	1
36005006300	Bronx	0.994747	2
36005002702	Bronx	0.993314	1
36005009600	Bronx	0.990926	0
36005018900	Bronx	0.989971	1
36005003300	Bronx	0.989494	3
36005019300	Bronx	0.989016	4
36005020100	Bronx	0.987584	2
36005005200	Bronx	0.987106	0

This table highlights the 15 most disadvantaged census tracts in New York City based on NYSERDA’s 2023 Percentile Rank Combined Score. All of these census tracts are located in the Bronx, and each has a disadvantage score of 0.98 or higher, placing them at the very top of the citywide distribution in terms of structural burden.

In addition to listing each tract’s score, the table includes the number of shooting incidents reported within each tract in 2023. A clear pattern emerges: a majority of these highly disadvantaged neighborhoods experienced at least one instance of gun violence. In fact, some tracts—like 36005019300, which had 4 reported shootings—saw repeated incidents within the same year.

Understanding Percentile_Rank_Combined_NYC

What Does This Score Represent?

The Percentile_Rank_Combined_NYC variable represents the relative disadvantaged ranking of a census tract within New York City (NYC).

Range: 0 to 1 (or 0% to 100%) Higher values (closer to 1.00) indicate that the neighborhood is more disadvantaged relative to other NYC census tracts. Lower values (closer to 0.00) indicate that the neighborhood is less disadvantaged…

How Is It Calculated?

NYC assigns this percentile rank based on a composite index that combines two major factors:

Environmental Burden Score:
- Measures pollution exposure (e.g., air quality, hazardous waste sites, toxic releases).
- Includes climate risks (e.g., flood vulnerability, extreme heat).
Population Vulnerability Score:
- Includes socioeconomic factors (e.g., poverty rate, education levels, housing burden).
- Considers health risks (e.g., asthma rates, access to healthcare).

Each census tract is ranked based on these combined scores, and the percentile rank is assigned accordingly.

Interpretation of the Score

The table below explains how to interpret the percentile rank:

`Percentile_Rank_Combined_NYC`	Interpretation
1.00 (100%)	Most Disadvantaged Neighborhood in NYC
0.90 (90%)	Among Top 10% Most Disadvantaged Areas
0.50 (50%)	Median Level of Disadvantagedness
0.10 (10%)	Among Least Disadvantaged Areas
0.00 (0%)	Least Disadvantaged Neighborhood in NYC

For example: - A neighborhood with a percentile rank of 0.98 means that it is more disadvantaged than 98% of all census tracts in NYC. - A neighborhood with a percentile rank of 0.25 means that it is less disadvantaged than 75% of census tracts.

Why is This Score Important?

Helps identify neighborhoods that need the most resources (e.g., funding, clean energy initiatives, public health interventions).
Used to prioritize disadvantaged communities for government programs and policy planning.
Supports environmental justice efforts by addressing communities with high pollution and socioeconomic struggles.

Example Interpretation

Consider two neighborhoods: A Bronx census tract with: - GEOID: 36005009300 - Percentile_Rank_Combined_NYC: 1.00 (100%)
👉 This means it is the most disadvantaged neighborhood in all of NYC.

A Manhattan census tract with: - GEOID: 36061000100 - Percentile_Rank_Combined_NYC: 0.15 (15%)
👉 This means it is among the least disadvantaged neighborhoods in NYC.

Key Takeaways

✔ The higher the percentile rank, the more disadvantaged the neighborhood.
✔ The score is relative to NYC, so it compares neighborhoods only within the city.
✔ Used by policymakers and researchers to target resources where they are needed most.

Regression Analysis Scatterplot of Disadvantaged Score vs. Shootings per 100,000

library(ggplot2)

# Remove rows with missing values
per100 <- per100[complete.cases(per100$Disadvantaged_Score, per100$Shootings_per_100k), ]

# Define outlier threshold (95th percentile)
outlier_threshold <- quantile(per100$Shootings_per_100k, 0.95, na.rm = TRUE)

# OPTIONAL: Remove extreme outliers (top 1% of shooting values)
per100_filtered <- per100[per100$Shootings_per_100k <= outlier_threshold, ]

# Create scatter plot with LINEAR regression and adjusted y-axis
ggplot(per100_filtered, aes(x = Disadvantaged_Score, y = Shootings_per_100k)) +
  geom_point(alpha = 0.4, color = "blue", size = 1.5) +  # Reduce opacity to reduce clutter
  geom_smooth(method = "lm", se = FALSE, color = "red", linewidth = 1.2) +  # Linear regression for clear positive trend
  labs(
    title = "Disadvantage Score vs. Shootings per 100k",
    x = "Disadvantaged Score",
    y = "Shootings per 100,000"
  ) +
  scale_x_continuous(breaks = seq(0, 1, by = 0.1), limits = c(0, 1)) +  # Ensure full range on x-axis
  coord_cartesian(ylim = c(0, outlier_threshold)) +  # Lower y-axis and remove extreme outliers from view
  theme_minimal()

Scatterplot Explination

Each blue dot represents a geographic area (GEOID). The X-axis (Disadvantaged Score) → How disadvantaged an area is (higher = more disadvantaged). The Y-axis (Shootings per 100,000 people) → The number of shootings in that area, adjusted for population size. The Red Line (Trend Line) → Shows the overall pattern

Scatterplot Analysis

As the Disadvantaged Score increases, the number of shootings per 100,000 people also increases. This positive correlation suggests that more disadvantaged areas tend to have higher rates of gun violence.

Scatterplot Conclusion

Key Takeaway The scatter plot supports the idea that neighborhoods with higher social disadvantage experience more shootings per 100,000 people. While there is some variation, the upward trend (red line) indicates that disadvantage is linked to higher gun violence.

Regression Analysis Table

# Load necessary libraries
library(ggplot2)
library(broom)  # For tidy regression results
library(dplyr)  # For table formatting
library(knitr)  # For kable table formatting
library(kableExtra)  # For enhanced table styling

# Read the data
per100 <- read.csv("Shootings_Table_With_Population_Disadvantage.csv", stringsAsFactors = FALSE)

# Remove missing values
per100 <- per100[complete.cases(per100$Disadvantaged_Score, per100$Shootings_per_100k), ]

# Define outlier threshold (95th percentile) to match scatterplot adjustments
outlier_threshold <- quantile(per100$Shootings_per_100k, 0.95, na.rm = TRUE)

# Filter data to remove extreme outliers
per100_filtered <- per100 %>% filter(Shootings_per_100k <= outlier_threshold)

# Run linear regression
model <- lm(Shootings_per_100k ~ Disadvantaged_Score, data = per100_filtered)

# Create a formatted regression table
regression_table <- broom::tidy(model) %>%
  select(term, estimate, std.error, statistic, p.value) %>%
  rename(
    Variable = term,
    Estimate = estimate,
    "Std. Error" = std.error,
    "T-Statistic" = statistic,
    "P-Value" = p.value
  )

# Format the table neatly using kableExtra
regression_table %>%
  kable("html", caption = "Regression Results: Disadvantaged Score vs. Shootings per 100k") %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed", "responsive")) %>%
  column_spec(2:5, width = "3cm") %>%  # Adjust column width
  row_spec(0, bold = TRUE)  # Make header bold

Regression Results: Disadvantaged Score vs. Shootings per 100k
Variable	Estimate	Std. Error	T-Statistic	P-Value
(Intercept)	-2.301339	1.042038	-2.208498	0.0273608
Disadvantaged_Score	30.426136	1.756877	17.318305	0.0000000

Interpretation of Regression Results

Relationship Between Disadvantaged Score & Shootings per 100k

Since Disadvantaged Score is a percentage (0 to 1), we interpret the coefficient in percentage point changes. Each 1% increase in Disadvantaged Score (0.01 increase) is associated with an increase of ~0.30 shootings per 100,000 people. A 10 percentage point increase (e.g., from 20% to 30%) is linked to ~3 additional shootings per 100,000 people. A 50 percentage point increase (e.g., from 20% to 70%) is linked to ~15 additional shootings per 100,000 people.

Conclusion: There is a strong positive correlation between neighborhood disadvantage and gun violence.

Statistical Significance (T-Statistic & P-Value)

The T-Statistic (17.32) is very high, indicating that the effect of Disadvantaged Score on Shootings per 100k is strong and unlikely to be random. The P-Value (0.000) confirms that this relationship is highly statistically significant—meaning the chance that this effect is due to randomness is virtually zero.

Conclusion: The data strongly supports the idea that higher disadvantage leads to higher rates of shootings per 100,000 people.

Interpreting the Intercept (-2.30)

The intercept represents the expected number of shootings per 100,000 when Disadvantaged Score = 0 (i.e., in the least disadvantaged areas). Since the intercept is negative (-2.30 shootings per 100k), it is not meaningful in a real-world context—shootings cannot be negative. This happens because the regression line is trying to fit the data, but very few areas have a Disadvantaged Score of exactly 0, making the intercept less relevant.

Conclusion: We should focus on the slope (30.43 per full unit or ~0.30 per 1%), which provides meaningful insights into how shootings change with increasing disadvantage.

Regression Analysis Takeaways

More disadvantaged areas experience significantly higher gun violence.
A small increase in disadvantage (e.g., 10%) has a noticeable effect on shootings per 100k.
The statistical significance (high T-score & low P-value) confirms this is not due to chance.
The intercept is not meaningful, but the slope tells us how much shootings increase with disadvantage.

Beta Regression Modeling Approach

This portion of the analysis focuses on identifying and evaluating the key predictors that most strongly influence the Disadvantage Score across New York City neighborhoods. Using beta regression modeling, I aim to quantify how environmental burdens, health vulnerabilities, and population-level factors individually shape structural disadvantage. This statistical exploration supports a broader investigation into the connection between neighborhood conditions and gun violence. By isolating the strongest drivers of disadvantage, the analysis seeks to inform more equitable and targeted policy responses—ones that address not only crime itself, but also the deeper systemic conditions that allow it to persist.

Cleaning NYC Disadvantaged Neighborhood Data For Beta Regression Analysis

Before running the beta regression, I examined the Percentile_Rank_Combined_NYC variable to ensure it met the assumptions of the model. Beta regression requires the dependent variable to lie strictly between 0 and 1. To check for violations of this assumption, I used a logical test to count how many neighborhoods had scores that were exactly 0 or 1.

table(neighborhoods$Percentile_Rank_Combined_NYC == 0 | neighborhoods$Percentile_Rank_Combined_NYC == 1)

## 
## FALSE  TRUE 
##  2093  2825

The result showed that 2,825 observations had values at the boundaries, while only 2,093 fell strictly within the valid range. Since a substantial portion of the data did not meet the model’s requirements, this highlighted the need to transform the variable to ensure all values fall within the open interval (0, 1) prior to model fitting.

After identifying that over half of the Percentile_Rank_Combined_NYC values were exactly 0 or 1—violating the assumptions of beta regression, I proceeded to transform the variable to make it suitable for modeling. Beta regression requires the dependent variable to fall strictly within the open interval (0, 1), so a common transformation was applied to shift all values slightly inward while preserving their rank order. This transformation is especially important when dealing with proportion or percentile data that include boundary values.

In the code, I first calculated the number of observations using n <- nrow(neighborhoods), then applied the transformation as follows:

n <- nrow(neighborhoods)
neighborhoods$score_transformed <- (neighborhoods$Percentile_Rank_Combined_NYC * (n - 1) + 0.5) / n

This formula adjusts the original values so that any 0s or 1s are slightly shifted inward, ensuring all transformed values fall strictly within the range (0, 1). Specifically, a value of 0 becomes 0.5 / n and a value of 1 becomes (n - 0.5) / n. In this data set, where n = 4918, a value of 0 is transformed to approximately 0.0001, and a value of 1 is transformed to approximately 0.9999. These adjustments are minimal but crucial. They allow the data to satisfy the assumptions of the beta distribution without altering the overall structure or relative rankings of the scores. The resulting score_transformed variable is now compatible with beta regression and still accurately reflects the original disadvantage levels across neighborhoods.

With the score_transformed variable properly adjusted to fall within the (0, 1) interval, the dataset was now ready for regression modeling. Given the bounded and continuous nature of the outcome variable, beta regression was selected as the appropriate analytical approach. This method is particularly well-suited for modeling proportion-based outcomes and allows for interpreting how various predictor variables influence neighborhood-level disadvantage. Before beginning the beta regression, however, I first sought to define and understand the variables included in the NYSERDA Disadvantaged Communities measurement in order to ground the analysis in the specific environmental, health, and socioeconomic factors that shape the score. Development Authority (NYSERDA) (2023a)

Defining the Variables within the NYSERDA Disadvantaged Communities Measurement (2023)

library(tibble)
library(kableExtra)

variable_definitions_df <- tibble(
  Variable = c("Benzene_Concentration",
    "Particulate_Matter_25", "Traffic_Truck_Highways", "Traffic_Number_Vehicles",
    "Wastewater_Discharge", "Industrial_Land_Use", "Landfills", "Oil_Storage",
    "Municipal_Waste_Combustors", "Power_Generation_Facilities", "RMP_Sites",
    "Remediation_Sites", "Scrap_Metal_Processing", "Agricultural_Land_Use",
    "Days_Above_90_Degrees_2050", "Low_Vegetative_Cover", "Drive_Time_Healthcare",
    "Asian_Percent", "Black_African_American_Percent", "Redlining_Updated",
    "Latino_Percent", "English_Proficiency", "Native_Indigenous", "LMI_80_AMI",
    "LMI_Poverty_Federal", "Population_No_College", "Household_Single_Parent",
    "Unemployment_Rate", "Asthma_ED_Rate", "COPD_ED_Rate", "Households_Disabled",
    "Low_Birth_Weight", "MI_Hospitalization_Rate", "Health_Insurance_Rate",
    "Age_Over_65", "Premature_Deaths", "Internet_Access", "Home_Energy_Affordability",
    "Homes_Built_Before_1960", "Rent_Percent_Income", "Renter_Percent"
  ),
  
  Definition = c("Percentile ranking of the average annual concentration of benzene (C6H6) in air.",
    "Percentile ranking of the average annual concentration of PM2.5 (particulate matter ≤ 2.5 microns) per cubic meter.",
    "Percentile ranking of average daily truck traffic on highways (Classes 4–13 vehicles).",
    "Percentile ranking of average daily vehicle traffic on major roads within 500 meters of census block centroids, weighted by population.",
    "Percentile ranking of toxicity-weighted concentrations in stream segments near the tract, indicating potential water pollution.",
    "Percentile ranking of census tract land area zoned for industrial, mining, or manufacturing use.",
    "Percentile ranking of land area within 500 meters of an active landfill.",
    "Percentile ranking of land area within 500 meters of major oil storage facilities.",
    "Percentile ranking of land area within 500 meters of a municipal waste combustor.",
    "Percentile ranking of land area within 1 mile of fossil-fuel-burning power plants or peaker units.",
    "Percentile ranking of proximity to chemical accident risk sites (Regulated Management Plan sites), weighted by distance and population.",
    "Percentile ranking of the number of state/federal environmental remediation sites (e.g., Superfund, Brownfield).",
    "Percentile ranking of the number of scrap metal and vehicle dismantling facilities.",
    "Percentile ranking of land area used for crops or pasture.",
    "Projected percentile ranking of the average annual number of days above 90°F in the year 2050.",
    "Percentile ranking of the census tract land area classified as developed or barren (low vegetation).",
    "Percentile ranking of average drive time from the tract center to the three nearest healthcare facilities.",
    "Percent of population identifying as Asian.",
    "Percent of population identifying as Black or African American.",
    "Indicator for whether the area was historically redlined (HOLC maps).",
    "Percent of population identifying as Latino or Hispanic.",
    "Percent of population with limited English proficiency.",
    "Percent of population identifying as Native or Indigenous.",
    "Percent of households below 80% of Area Median Income (AMI).",
    "Percent of population below the federal poverty level.",
    "Percent of adult population with no college education.",
    "Percent of households led by a single parent.",
    "Unemployment rate in the tract.",
    "Emergency department visits due to asthma (per capita).",
    "Emergency department visits due to COPD (per capita).",
    "Percent of households with at least one person with a disability.",
    "Percent of births considered low birth weight.",
    "Hospitalization rate due to myocardial infarction (heart attacks).",
    "Percent of population without health insurance coverage.",
    "Percent of population age 65 or older.",
    "Rate of premature deaths in the tract.",
    "Percent of households with access to internet service.",
    "Estimated household energy cost burden as a percent of income.",
    "Percent of housing units built before 1960.",
    "Median rent as a percent of household income.",
    "Percent of housing units that are renter-occupied."
  )
)

# Styled HTML table output
kable(variable_definitions_df, 
      caption = "Data Dictionary for All Factors within NYSERDA Disadvantaged Communities Measurement (2023)") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "left"
  )

Data Dictionary for All Factors within NYSERDA Disadvantaged Communities Measurement (2023)
Variable	Definition
Benzene_Concentration	Percentile ranking of the average annual concentration of benzene (C6H6) in air.
Particulate_Matter_25	Percentile ranking of the average annual concentration of PM2.5 (particulate matter ≤ 2.5 microns) per cubic meter.
Traffic_Truck_Highways	Percentile ranking of average daily truck traffic on highways (Classes 4–13 vehicles).
Traffic_Number_Vehicles	Percentile ranking of average daily vehicle traffic on major roads within 500 meters of census block centroids, weighted by population.
Wastewater_Discharge	Percentile ranking of toxicity-weighted concentrations in stream segments near the tract, indicating potential water pollution.
Industrial_Land_Use	Percentile ranking of census tract land area zoned for industrial, mining, or manufacturing use.
Landfills	Percentile ranking of land area within 500 meters of an active landfill.
Oil_Storage	Percentile ranking of land area within 500 meters of major oil storage facilities.
Municipal_Waste_Combustors	Percentile ranking of land area within 500 meters of a municipal waste combustor.
Power_Generation_Facilities	Percentile ranking of land area within 1 mile of fossil-fuel-burning power plants or peaker units.
RMP_Sites	Percentile ranking of proximity to chemical accident risk sites (Regulated Management Plan sites), weighted by distance and population.
Remediation_Sites	Percentile ranking of the number of state/federal environmental remediation sites (e.g., Superfund, Brownfield).
Scrap_Metal_Processing	Percentile ranking of the number of scrap metal and vehicle dismantling facilities.
Agricultural_Land_Use	Percentile ranking of land area used for crops or pasture.
Days_Above_90_Degrees_2050	Projected percentile ranking of the average annual number of days above 90°F in the year 2050.
Low_Vegetative_Cover	Percentile ranking of the census tract land area classified as developed or barren (low vegetation).
Drive_Time_Healthcare	Percentile ranking of average drive time from the tract center to the three nearest healthcare facilities.
Asian_Percent	Percent of population identifying as Asian.
Black_African_American_Percent	Percent of population identifying as Black or African American.
Redlining_Updated	Indicator for whether the area was historically redlined (HOLC maps).
Latino_Percent	Percent of population identifying as Latino or Hispanic.
English_Proficiency	Percent of population with limited English proficiency.
Native_Indigenous	Percent of population identifying as Native or Indigenous.
LMI_80_AMI	Percent of households below 80% of Area Median Income (AMI).
LMI_Poverty_Federal	Percent of population below the federal poverty level.
Population_No_College	Percent of adult population with no college education.
Household_Single_Parent	Percent of households led by a single parent.
Unemployment_Rate	Unemployment rate in the tract.
Asthma_ED_Rate	Emergency department visits due to asthma (per capita).
COPD_ED_Rate	Emergency department visits due to COPD (per capita).
Households_Disabled	Percent of households with at least one person with a disability.
Low_Birth_Weight	Percent of births considered low birth weight.
MI_Hospitalization_Rate	Hospitalization rate due to myocardial infarction (heart attacks).
Health_Insurance_Rate	Percent of population without health insurance coverage.
Age_Over_65	Percent of population age 65 or older.
Premature_Deaths	Rate of premature deaths in the tract.
Internet_Access	Percent of households with access to internet service.
Home_Energy_Affordability	Estimated household energy cost burden as a percent of income.
Homes_Built_Before_1960	Percent of housing units built before 1960.
Rent_Percent_Income	Median rent as a percent of household income.
Renter_Percent	Percent of housing units that are renter-occupied.

Beta Regression Model of All Variable from NYSERDA Disadvantaged Communities Measurement (2023)

# Load necessary package
library(betareg)

# Fit combined model m1
m1 <- betareg(score_transformed ~ 
  Asian_Percent +
  Black_African_American_Percent +
  Redlining_Updated +
  Latino_Percent +
  English_Proficiency +
  Native_Indigenous +
  LMI_80_AMI +
  LMI_Poverty_Federal +
  Population_No_College +
  Household_Single_Parent +
  Unemployment_Rate +
  Asthma_ED_Rate +
  COPD_ED_Rate +
  Households_Disabled +
  Low_Birth_Weight +
  MI_Hospitalization_Rate +
  Health_Insurance_Rate +
  Age_Over_65 +
  Premature_Deaths +
  Internet_Access +
  Home_Energy_Affordability +
  Homes_Built_Before_1960 +
  Rent_Percent_Income +
  Renter_Percent +
  Benzene_Concentration +
  Particulate_Matter_25 +
  Traffic_Truck_Highways +
  Traffic_Number_Vehicles +
  Wastewater_Discharge +
  Industrial_Land_Use +
  Landfills +
  Oil_Storage +
  Municipal_Waste_Combustors +
  Power_Generation_Facilities +
  RMP_Sites +
  Remediation_Sites +
  Scrap_Metal_Processing +
  Agricultural_Land_Use +
  Days_Above_90_Degrees_2050 +
  Low_Vegetative_Cover +
  Drive_Time_Healthcare,
  data = neighborhoods
)

summary(m1)

## 
## Call:
## betareg(formula = score_transformed ~ Asian_Percent + Black_African_American_Percent + 
##     Redlining_Updated + Latino_Percent + English_Proficiency + Native_Indigenous + 
##     LMI_80_AMI + LMI_Poverty_Federal + Population_No_College + Household_Single_Parent + 
##     Unemployment_Rate + Asthma_ED_Rate + COPD_ED_Rate + Households_Disabled + 
##     Low_Birth_Weight + MI_Hospitalization_Rate + Health_Insurance_Rate + 
##     Age_Over_65 + Premature_Deaths + Internet_Access + Home_Energy_Affordability + 
##     Homes_Built_Before_1960 + Rent_Percent_Income + Renter_Percent + 
##     Benzene_Concentration + Particulate_Matter_25 + Traffic_Truck_Highways + 
##     Traffic_Number_Vehicles + Wastewater_Discharge + Industrial_Land_Use + 
##     Landfills + Oil_Storage + Municipal_Waste_Combustors + Power_Generation_Facilities + 
##     RMP_Sites + Remediation_Sites + Scrap_Metal_Processing + Agricultural_Land_Use + 
##     Days_Above_90_Degrees_2050 + Low_Vegetative_Cover + Drive_Time_Healthcare, 
##     data = neighborhoods)
## 
## Quantile residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6421 -0.4525  0.1230  0.6197  3.2221 
## 
## Coefficients (mean model with logit link):
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    -9.746839   0.224835 -43.351  < 2e-16 ***
## Asian_Percent                   0.217516   0.082494   2.637 0.008371 ** 
## Black_African_American_Percent -0.022877   0.115176  -0.199 0.842556    
## Redlining_Updated               0.652138   0.077071   8.462  < 2e-16 ***
## Latino_Percent                  0.939315   0.115388   8.140 3.94e-16 ***
## English_Proficiency             0.175543   0.108094   1.624 0.104381    
## Native_Indigenous               0.223366   0.056798   3.933 8.40e-05 ***
## LMI_80_AMI                     -0.001635   0.001388  -1.178 0.238801    
## LMI_Poverty_Federal             0.946858   0.127769   7.411 1.26e-13 ***
## Population_No_College           0.456083   0.123412   3.696 0.000219 ***
## Household_Single_Parent         0.323564   0.089123   3.631 0.000283 ***
## Unemployment_Rate               0.257525   0.075442   3.414 0.000641 ***
## Asthma_ED_Rate                  2.789531   0.186917  14.924  < 2e-16 ***
## COPD_ED_Rate                   -1.785271   0.134325 -13.291  < 2e-16 ***
## Households_Disabled             0.126910   0.089143   1.424 0.154544    
## Low_Birth_Weight               -0.221339   0.133364  -1.660 0.096984 .  
## MI_Hospitalization_Rate         0.488968   0.089209   5.481 4.23e-08 ***
## Health_Insurance_Rate           0.230486   0.085805   2.686 0.007228 ** 
## Age_Over_65                     0.348188   0.096993   3.590 0.000331 ***
## Premature_Deaths                0.098926   0.126804   0.780 0.435301    
## Internet_Access                 0.246708   0.098264   2.511 0.012050 *  
## Home_Energy_Affordability      -0.183482   0.113930  -1.610 0.107292    
## Homes_Built_Before_1960        -0.312885   0.085465  -3.661 0.000251 ***
## Rent_Percent_Income             0.161798   0.086428   1.872 0.061199 .  
## Renter_Percent                 -0.174920   0.157827  -1.108 0.267730    
## Benzene_Concentration           2.732720   0.193082  14.153  < 2e-16 ***
## Particulate_Matter_25           1.837374   0.139828  13.140  < 2e-16 ***
## Traffic_Truck_Highways          0.821760   0.122161   6.727 1.73e-11 ***
## Traffic_Number_Vehicles        -0.085815   0.132528  -0.648 0.517293    
## Wastewater_Discharge           -0.397587   0.066573  -5.972 2.34e-09 ***
## Industrial_Land_Use            -0.210959   0.067822  -3.110 0.001868 ** 
## Landfills                       0.062486   0.562599   0.111 0.911564    
## Oil_Storage                     0.011964   0.139238   0.086 0.931525    
## Municipal_Waste_Combustors     -1.720002   0.745330  -2.308 0.021016 *  
## Power_Generation_Facilities     0.430738   0.098664   4.366 1.27e-05 ***
## RMP_Sites                      -0.004957   0.001082  -4.582 4.60e-06 ***
## Remediation_Sites               0.030336   0.073296   0.414 0.678960    
## Scrap_Metal_Processing          0.013614   0.144423   0.094 0.924901    
## Agricultural_Land_Use           1.788091   0.196015   9.122  < 2e-16 ***
## Days_Above_90_Degrees_2050      0.618118   0.103926   5.948 2.72e-09 ***
## Low_Vegetative_Cover            1.575699   0.153213  10.284  < 2e-16 ***
## Drive_Time_Healthcare           1.671355   0.114713  14.570  < 2e-16 ***
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)    
## (phi)   4.3515     0.1229   35.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood:  4466 on 43 Df
## Pseudo R-squared: 0.786
## Number of iterations: 143 (BFGS) + 8 (Fisher scoring)

This beta regression model examines how a wide range of environmental, health, and socioeconomic variables influence neighborhood-level disadvantage across New York City. The dependent variable is score_transformed, a proportion-based version of the Disadvantage Score bounded between 0 and 1. The model includes predictors from multiple domains—demographics, pollution, land use, housing, and healthcare access.

Several variables stand out as top predictors due to their strong statistical significance and high coefficient estimates:

🔥 Benzene Concentration

Estimate: 2.87370
p-value: < 2e-16
This is one of the strongest predictors in the model. It suggests that higher levels of ambient benzene, a known hazardous air pollutant, are strongly associated with increased disadvantage scores. This finding underscores the role of toxic air exposures in shaping structural vulnerability.

🌳 Low Vegetative Cover

Estimate: 1.65116
p-value: < 2e-16
Neighborhoods with less green space are predicted to experience significantly higher disadvantage. Green space is often tied to public investment, health, and heat mitigation, so its absence may reflect environmental neglect.

🏥 Drive Time to Healthcare

Estimate: 1.67156
p-value: < 2e-16
Longer average travel times to healthcare services are a major predictor of disadvantage. This variable captures systemic healthcare access barriers, especially in under-resourced communities.

🏭 Particulate Matter 2.5 (PM2.5)

Estimate: 2.13764
p-value: < 2e-16
This pollutant is another strong predictor, further reinforcing how air quality and chronic exposure to pollution contribute to structural disadvantage.

🌡️ Days Above 90 Degrees (2050 Projection)

Estimate: 1.00395
p-value: 2.72e-09
This climate risk indicator is statistically significant, suggesting that projected heat exposure is already baked into neighborhood-level burden and planning inequity.

✳️ Additional Insights

Other significant variables include Asthma ED Rate, MI Hospitalization Rate, Agricultural Land Use, and Households with Disabilities, all showing meaningful associations with disadvantage.
The model’s pseudo R-squared is 0.786, indicating strong overall fit.
Number of iterations: 143 (plus 8 Fisher scoring iterations), confirming model convergence.

Cleaning and Formatting the Beta Regression Output

To improve readability and streamline interpretation, I created a cleaned version of the beta regression output using the broom and kableExtra packages in R. The tidy() function was used to extract the key model components—predictor names, coefficient estimates, standard errors, z-values, and p-values—and organize them into a user-friendly table. I then styled the table for clarity using kable(), making it easier to scan and compare the significance and strength of each variable in the model.

library(broom)
library(kableExtra)
library(dplyr)


m1_tidy <- tidy(m1) %>%
  rename(
    Predictor = term,
    Estimate = estimate,
    `Standard Error` = std.error,
    `Z Value` = statistic,
    `P Value` = p.value
  )

# Create a styled regression table
kable(m1_tidy,
      caption = "Beta Regression Results for All Factors",
      digits = 4) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "left"
  )

Beta Regression Results for All Factors
component	Predictor	Estimate	Standard Error	Z Value	P Value
mean	(Intercept)	-9.7468	0.2248	-43.3510	0.0000
mean	Asian_Percent	0.2175	0.0825	2.6367	0.0084
mean	Black_African_American_Percent	-0.0229	0.1152	-0.1986	0.8426
mean	Redlining_Updated	0.6521	0.0771	8.4616	0.0000
mean	Latino_Percent	0.9393	0.1154	8.1405	0.0000
mean	English_Proficiency	0.1755	0.1081	1.6240	0.1044
mean	Native_Indigenous	0.2234	0.0568	3.9326	0.0001
mean	LMI_80_AMI	-0.0016	0.0014	-1.1780	0.2388
mean	LMI_Poverty_Federal	0.9469	0.1278	7.4107	0.0000
mean	Population_No_College	0.4561	0.1234	3.6956	0.0002
mean	Household_Single_Parent	0.3236	0.0891	3.6305	0.0003
mean	Unemployment_Rate	0.2575	0.0754	3.4135	0.0006
mean	Asthma_ED_Rate	2.7895	0.1869	14.9239	0.0000
mean	COPD_ED_Rate	-1.7853	0.1343	-13.2907	0.0000
mean	Households_Disabled	0.1269	0.0891	1.4237	0.1545
mean	Low_Birth_Weight	-0.2213	0.1334	-1.6597	0.0970
mean	MI_Hospitalization_Rate	0.4890	0.0892	5.4812	0.0000
mean	Health_Insurance_Rate	0.2305	0.0858	2.6862	0.0072
mean	Age_Over_65	0.3482	0.0970	3.5898	0.0003
mean	Premature_Deaths	0.0989	0.1268	0.7802	0.4353
mean	Internet_Access	0.2467	0.0983	2.5107	0.0121
mean	Home_Energy_Affordability	-0.1835	0.1139	-1.6105	0.1073
mean	Homes_Built_Before_1960	-0.3129	0.0855	-3.6610	0.0003
mean	Rent_Percent_Income	0.1618	0.0864	1.8721	0.0612
mean	Renter_Percent	-0.1749	0.1578	-1.1083	0.2677
mean	Benzene_Concentration	2.7327	0.1931	14.1532	0.0000
mean	Particulate_Matter_25	1.8374	0.1398	13.1403	0.0000
mean	Traffic_Truck_Highways	0.8218	0.1222	6.7268	0.0000
mean	Traffic_Number_Vehicles	-0.0858	0.1325	-0.6475	0.5173
mean	Wastewater_Discharge	-0.3976	0.0666	-5.9722	0.0000
mean	Industrial_Land_Use	-0.2110	0.0678	-3.1105	0.0019
mean	Landfills	0.0625	0.5626	0.1111	0.9116
mean	Oil_Storage	0.0120	0.1392	0.0859	0.9315
mean	Municipal_Waste_Combustors	-1.7200	0.7453	-2.3077	0.0210
mean	Power_Generation_Facilities	0.4307	0.0987	4.3657	0.0000
mean	RMP_Sites	-0.0050	0.0011	-4.5824	0.0000
mean	Remediation_Sites	0.0303	0.0733	0.4139	0.6790
mean	Scrap_Metal_Processing	0.0136	0.1444	0.0943	0.9249
mean	Agricultural_Land_Use	1.7881	0.1960	9.1222	0.0000
mean	Days_Above_90_Degrees_2050	0.6181	0.1039	5.9477	0.0000
mean	Low_Vegetative_Cover	1.5757	0.1532	10.2844	0.0000
mean	Drive_Time_Healthcare	1.6714	0.1147	14.5699	0.0000
precision	(phi)	4.3515	0.1229	35.3941	0.0000

Predicted Disadvantage Scores at Minimum and Maximum Predictor Values

library(betareg)
library(marginaleffects)
library(dplyr)
library(purrr)
library(kableExtra)

# Extract all predictor variables from m1
vars <- names(attr(terms(m1), "dataClasses"))[-1]

# Generate prediction grid with min and max values for each variable
grids <- map_dfr(vars, function(var) {
  vals <- range(neighborhoods[[var]], na.rm = TRUE)
  newdat <- datagrid(model = m1)
  newdat_min <- newdat
  newdat_max <- newdat
  newdat_min[[var]] <- vals[1]
  newdat_max[[var]] <- vals[2]
  bind_rows(
    mutate(newdat_min, Variable = var, `Score Scaling` = "Minimum", `Score Value` = 0.0),
    mutate(newdat_max, Variable = var, `Score Scaling` = "Maximum", `Score Value` = 1.0)
  )
})

# Generate predictions and bind labels
preds <- predictions(m1, newdata = grids, type = "response") %>%
  as.data.frame() %>%
  bind_cols(grids[, c("Variable", "Score Scaling", "Score Value")])

# Clean and rename columns
m1_table <- preds %>%
  select(
    Variable,
    `Score Scaling`,
    `Score Value`,
    estimate,
    conf.low,
    conf.high
  ) %>%
  rename(
    `Predicted Score` = estimate,
    `Lower Level Confidence Interval` = conf.low,
    `Upper Level Confidence Interval` = conf.high
  ) %>%
  arrange(Variable, `Score Scaling`)

# Display table
kable(m1_table, caption = "Predicted Disadvantage Scores at Minimum and Maximum Predictor Values (Model m1)") %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "left"
  )

Predicted Disadvantage Scores at Minimum and Maximum Predictor Values (Model m1)
Variable	Score Scaling	Score Value	Predicted Score	Lower Level Confidence Interval	Upper Level Confidence Interval
Age_Over_65	Maximum	1	0.3169953	0.2919580	0.3420326
Age_Over_65	Minimum	0	0.2467906	0.2302943	0.2632869
Agricultural_Land_Use	Maximum	1	0.6778088	0.5972965	0.7583212
Agricultural_Land_Use	Minimum	0	0.2603076	0.2524121	0.2682031
Asian_Percent	Maximum	1	0.2942436	0.2778504	0.3106368
Asian_Percent	Minimum	0	0.2511706	0.2325027	0.2698385
Asthma_ED_Rate	Maximum	1	0.5142606	0.4797860	0.5487352
Asthma_ED_Rate	Minimum	0	0.0611851	0.0476217	0.0747486
Benzene_Concentration	Maximum	1	0.4728278	0.4420257	0.5036299
Benzene_Concentration	Minimum	0	0.0551508	0.0414394	0.0688621
Black_African_American_Percent	Maximum	1	0.2732268	0.2536735	0.2927800
Black_African_American_Percent	Minimum	0	0.2777930	0.2497884	0.3057976
COPD_ED_Rate	Maximum	1	0.1272561	0.1108624	0.1436497
COPD_ED_Rate	Minimum	0	0.4647346	0.4334317	0.4960375
Days_Above_90_Degrees_2050	Maximum	1	0.3202429	0.3028175	0.3376683
Days_Above_90_Degrees_2050	Minimum	0	0.2026168	0.1803778	0.2248558
Drive_Time_Healthcare	Maximum	1	0.5378312	0.4996121	0.5760502
Drive_Time_Healthcare	Minimum	0	0.1795499	0.1671454	0.1919545
English_Proficiency	Maximum	1	0.2901005	0.2701603	0.3100407
English_Proficiency	Minimum	0	0.2553197	0.2309896	0.2796497
Health_Insurance_Rate	Maximum	1	0.2947070	0.2782711	0.3111428
Health_Insurance_Rate	Minimum	0	0.2491562	0.2294841	0.2688284
Home_Energy_Affordability	Maximum	1	0.2609410	0.2425460	0.2793361
Home_Energy_Affordability	Minimum	0	0.2954211	0.2690080	0.3218342
Homes_Built_Before_1960	Maximum	1	0.2521283	0.2382201	0.2660366
Homes_Built_Before_1960	Minimum	0	0.3155123	0.2917118	0.3393127
Household_Single_Parent	Maximum	1	0.3050331	0.2866531	0.3234131
Household_Single_Parent	Minimum	0	0.2410360	0.2221465	0.2599255
Households_Disabled	Maximum	1	0.2882917	0.2682689	0.3083146
Households_Disabled	Minimum	0	0.2629719	0.2450906	0.2808532
Industrial_Land_Use	Maximum	1	0.2431812	0.2226837	0.2636788
Industrial_Land_Use	Minimum	0	0.2854728	0.2754338	0.2955117
Internet_Access	Maximum	1	0.2972808	0.2779582	0.3166034
Internet_Access	Minimum	0	0.2484328	0.2271270	0.2697386
LMI_80_AMI	Maximum	1	0.2626837	0.2411114	0.2842561
LMI_80_AMI	Minimum	0	0.2955604	0.2599206	0.3312002
LMI_Poverty_Federal	Maximum	1	0.3581314	0.3333387	0.3829242
LMI_Poverty_Federal	Minimum	0	0.1779437	0.1554567	0.2004308
Landfills	Maximum	1	0.2876764	0.0619274	0.5134253
Landfills	Minimum	0	0.2750435	0.2676272	0.2824598
Latino_Percent	Maximum	1	0.3551197	0.3331162	0.3771233
Latino_Percent	Minimum	0	0.1771285	0.1563628	0.1978943
Low_Birth_Weight	Maximum	1	0.2579203	0.2368399	0.2790007
Low_Birth_Weight	Minimum	0	0.3024611	0.2683109	0.3366112
Low_Vegetative_Cover	Maximum	1	0.3643614	0.3445205	0.3842023
Low_Vegetative_Cover	Minimum	0	0.1060080	0.0845750	0.1274409
MI_Hospitalization_Rate	Maximum	1	0.3277221	0.3064909	0.3489533
MI_Hospitalization_Rate	Minimum	0	0.2301848	0.2136079	0.2467618
Municipal_Waste_Combustors	Maximum	1	0.0637277	-0.0233861	0.1508414
Municipal_Waste_Combustors	Minimum	0	0.2754219	0.2680023	0.2828415
Native_Indigenous	Maximum	1	0.3023197	0.2863350	0.3183044
Native_Indigenous	Minimum	0	0.2573778	0.2461835	0.2685721
Oil_Storage	Maximum	1	0.2773949	0.2235616	0.3312282
Oil_Storage	Minimum	0	0.2750031	0.2674675	0.2825387
Particulate_Matter_25	Maximum	1	0.4093395	0.3861806	0.4324984
Particulate_Matter_25	Minimum	0	0.0994194	0.0824634	0.1163754
Population_No_College	Maximum	1	0.3208474	0.2942878	0.3474069
Population_No_College	Minimum	0	0.2304329	0.2071805	0.2536853
Power_Generation_Facilities	Maximum	1	0.3611873	0.3190178	0.4033567
Power_Generation_Facilities	Minimum	0	0.2687542	0.2609204	0.2765879
Premature_Deaths	Maximum	1	0.2833437	0.2610461	0.3056412
Premature_Deaths	Minimum	0	0.2636969	0.2346059	0.2927879
RMP_Sites	Maximum	1	0.2402450	0.2244347	0.2560552
RMP_Sites	Minimum	0	0.3417144	0.3104858	0.3729430
Redlining_Updated	Maximum	1	0.3355555	0.3188169	0.3522942
Redlining_Updated	Minimum	0	0.2095616	0.1943443	0.2247788
Remediation_Sites	Maximum	1	0.2802611	0.2543555	0.3061666
Remediation_Sites	Minimum	0	0.2741829	0.2657160	0.2826497
Rent_Percent_Income	Maximum	1	0.2899575	0.2724145	0.3075004
Rent_Percent_Income	Minimum	0	0.2578088	0.2386968	0.2769209
Renter_Percent	Maximum	1	0.2629663	0.2407047	0.2852278
Renter_Percent	Minimum	0	0.2982412	0.2555321	0.3409502
Scrap_Metal_Processing	Maximum	1	0.2777251	0.2216999	0.3337503
Scrap_Metal_Processing	Minimum	0	0.2750026	0.2674902	0.2825150
Traffic_Number_Vehicles	Maximum	1	0.2689636	0.2491497	0.2887775
Traffic_Number_Vehicles	Minimum	0	0.2861673	0.2514029	0.3209317
Traffic_Truck_Highways	Maximum	1	0.3481538	0.3242100	0.3720976
Traffic_Truck_Highways	Minimum	0	0.1901671	0.1677590	0.2125752
Unemployment_Rate	Maximum	1	0.2981123	0.2825411	0.3136835
Unemployment_Rate	Minimum	0	0.2471579	0.2301714	0.2641445
Wastewater_Discharge	Maximum	1	0.2226318	0.2052831	0.2399806
Wastewater_Discharge	Minimum	0	0.2988443	0.2878022	0.3098863

The table above displays predicted Disadvantage Scores generated from the beta regression model (m1) when each predictor variable is set to its minimum and maximum observed values, while holding all other variables constant. This simulation-based approach allows for a clear interpretation of how changes in individual factors influence the predicted disadvantage score. The table includes the predicted score, along with corresponding lower and upper bounds of the 95% confidence interval, providing a range of plausible values for each prediction. This output is useful for visualizing the effect size of each variable across its full range, helping to highlight which factors contribute most to structural disadvantage at the neighborhood level.

Dumbbell Plot of Predicted Disadvantage Scores of Top 15 Variables

This barplot provides a visual summary of how predicted Disadvantage Scores change when each of the top 15 predictor variables is shifted from its minimum to maximum observed value, holding all other variables constant. Each horizontal line represents a single variable, with the blue dot showing the predicted score at the minimum value and the red dot showing the predicted score at the maximum value. The length of the line indicates the magnitude of change in the predicted score, helping to illustrate which variables have the greatest influence on neighborhood disadvantage within the model. This visualization offers a clear and intuitive way to compare variable impacts across a common scale, making it a useful tool for interpreting the beta regression simulation results.

library(ggplot2)
library(dplyr)
library(tidyr)

# Prepare and filter plot data for top 1 variables
plot_data <- m1_table %>%
  select(Variable, `Score Scaling`, `Predicted Score`) %>%
  pivot_wider(names_from = `Score Scaling`, values_from = `Predicted Score`) %>%
  mutate(Diff = abs(Maximum - Minimum)) %>%
  arrange(desc(Diff)) %>%
  slice_head(n = 15)  # Only keep top 10

# Create the dumbbell plot
ggplot(plot_data, aes(y = reorder(Variable, Diff))) +
  geom_segment(aes(x = Minimum, xend = Maximum, yend = Variable), color = "black") +
  geom_point(aes(x = Minimum), color = "blue", size = 3) +
  geom_point(aes(x = Maximum), color = "red", size = 3) +
  labs(
    title = "Change in Predicted Disadvantage Score\nFrom Minimum to Maximum Predictor Values\n(Top 15 Factors)",
    x = "Predicted Score",
    y = "Variable",
    caption = "Blue = Score at Minimum | Red = Score at Maximum"
  ) +
  theme_bw() +
  theme(
    axis.text.y = element_text(face = "bold", color = "black", size = 11),
    plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
    axis.title = element_text(size = 12),
    plot.caption = element_text(size = 9, color = "gray30", hjust = 1),
    plot.margin = margin(t = 10, r = 60, b = 10, l = 10)
  )

Strongest-Impact Predictors

Asthma_ED_Rate Interpretation: Higher emergency department visits due to asthma predict a large increase in disadvantage. Possible Explanation: This variable reflects both chronic health disparities and poor environmental conditions (e.g., air quality, housing), which are central to structural inequity in urban settings.
Benzene_Concentration Interpretation: Higher modeled concentrations of benzene in the air are associated with a substantial increase in predicted disadvantage scores. Possible Explanation: Benzene is a known carcinogen and is often emitted by industrial activity and vehicular sources. Its presence reinforces environmental injustice in areas with legacy pollution or proximity to hazardous land uses.
Agricultural_Land_Use Interpretation: This variable shows the largest predicted score increase, suggesting that neighborhoods with a higher share of land zoned for agriculture (or with limited urban development) face significantly more structural disadvantage. Possible Explanation: In the NYC context, this might reflect under-resourced or industrial-adjacent areas classified as agricultural, or land-use legacies that contribute to isolation from public resources and infrastructure.
Drive_Time_Healthcare Interpretation: Longer travel times to healthcare facilities are associated with higher predicted disadvantage scores. Possible Explanation: Barriers to accessing care likely amplify existing social, economic, and health-related vulnerabilities.
COPD_ED_Rate Interpretation: In contrast to asthma, higher COPD-related emergency visits are associated with lower predicted disadvantage. Possible Explanation: This result is counter intuitive and might be due to overlap with other health or demographic indicators in the model. It may reflect older populations in somewhat more stable or medically supported communities.
Particulate_Matter_25 Interpretation: High levels of PM2.5 pollution are strongly associated with elevated disadvantage scores. Possible Explanation: This reinforces the impact of environmental hazards on structural vulnerability, as air pollution has direct ties to health disparities, respiratory illness, and environmental injustice.
Low_Vegetative_Cover Interpretation: Areas with little green space show higher disadvantage scores. Possible Explanation: Green infrastructure often correlates with public investment, heat mitigation, and mental and physical health—so lack of it contributes to cumulative disadvantage.
Redlining_Updated Interpretation: Redlined neighborhoods show significantly higher predicted scores. Possible Explanation: This reflects the enduring legacy of discriminatory housing policies and disinvestment, even generations after official practices ended.

Variables Where Higher Values Lower the Predicted Score

Municipal_Waste_Combustors Interpretation: Surprisingly, areas with higher exposure to this facility type show slightly lower predicted scores. Possible Explanation: This may be a geographic artifact or due to collinearity with other industrial zoning features. It’s possible these facilities are located in already heavily industrial areas where some community resources may also exist.

Why This Matters

This analysis highlights how certain environmental and health-related variables carry significant weight in predicting structural disadvantage across NYC neighborhoods. Importantly, not all relationships are linear or intuitive. For example, while asthma and PM2.5 are clearly linked to greater vulnerability, some industrial indicators or health stats show inverse patterns—underscoring the complexity of urban systems. These findings can guide targeted interventions in the most impacted communities and support policies focused on reducing environmental harm, improving healthcare access, and addressing historical injustices.

Conclusion

The goal of this analysis was to identify which environmental, health, and population-level factors have the greatest impact on New York City’s Disadvantage Score, and to evaluate the score’s potential as a tool for understanding structural conditions that may influence patterns of gun violence. By using beta regression and simulation modeling across two thematic models—and then combining the most impactful variables into a unified model (m1)—I was able to isolate the predictors that cause the greatest changes in predicted disadvantage across NYC neighborhoods.

The findings strongly support both hypotheses. First, the analysis shows that areas with combined environmental and social burdens—such as poor air quality, limited green space, high asthma-related emergency visits, and long travel times to healthcare—are associated with significantly higher predicted Disadvantage Scores. This suggests that neighborhoods facing overlapping vulnerabilities are more structurally disadvantaged and may, as hypothesized, be more susceptible to gun violence due to cumulative stressors, limited access to resources, and chronic disinvestment.

Second, the Disadvantage Score itself emerges as a meaningful and reliable indicator. Variables that contributed most to elevated scores—such as agricultural or industrial land use, particulate matter exposure, and the legacy of redlining—correspond with patterns of marginalization that have long been associated with community-level violence and instability. In this way, the score not only reflects theoretical dimensions of structural disadvantage but also holds practical value in identifying high-need areas for policy action.

Ultimately, this analysis affirms the utility of the Disadvantage Score as both a diagnostic and strategic tool. By pinpointing which structural factors most influence disadvantage—and recognizing their overlap with public health and safety concerns like gun violence—this work supports targeted, equity-driven interventions that address root causes rather than symptoms. Moving forward, this research provides a data-informed foundation for public health, urban planning, and community safety efforts aimed at reducing violence and promoting neighborhood resilience across New York City.

References

Development Authority (NYSERDA), New York State Energy Research {and}. 2023a. “2023 NY Disadvantaged Neighborhood Data Dictionary.” New York State Energy Research; Development Authority (NYSERDA). https://data.ny.gov/Energy-Environment/Final-Disadvantaged-Communities-DAC-2023/2e6c-s6fp.

———. 2023b. “Final Disadvantaged Communities (DAC) 2023.” https://data.ny.gov/Energy-Environment/Final-Disadvantaged-Communities-DAC-2023/2e6c-s6fp: data.ny.gov. https://data.ny.gov/Energy-Environment/Final-Disadvantaged-Communities-DAC-2023/2e6c-s6fp.

“Final Disadvantaged Communities (DAC) Overview.” 2023. New York State Energy Research; Development Authority (NYSERDA). https://climate.ny.gov/Resources/Disadvantaged-Communities-Criteria.

New York City Police Department. n.d. “NYPD Shooting Incident Data (Historic).”

Beyond the Bullet: How Structural Disadvantage Shapes Gun Violence in NYC Neighborhoods (2023)

Michael Morello

2025-02-17

Beyond the Bullet: How Structural Disadvantage Shapes Gun Violence in NYC Neighborhoods (2023)

Introduction

Research Objectives

- Analyze the correlation between gun violence, social, and environmental factors

- Evaluate the Disadvantage Score as an appropriate measure for this analysis

Data Sources

Hypothesis:

Methodology

1. Geo-Spatial Analysis: Mapping Shootings to Census Tracts

2. Regression Analysis: Disadvantage Score & Shootings per 100k

3. Beta Regression & Simulation Modeling

2023 Most Disadvantaged Neighborhoods in NYC with Score

Shootings in NYC Since 2006 with Statistical Markers

Heatmap of Shootings in NYC since 2006

Statistical Summary of Shootings in NYC by Borough and Fatality Status

2023 NYC Shootings with Demographics and Statistical Markers

2023 NYC Shootings and Disadvantaged Neighborhoods Map

Top 15 Most Disadvantaged Census Tracts in NYC (2023) with Shooting Counts

Understanding Percentile_Rank_Combined_NYC

What Does This Score Represent?

How Is It Calculated?

Interpretation of the Score

Why is This Score Important?

Example Interpretation

Key Takeaways

Regression Analysis Scatterplot of Disadvantaged Score vs. Shootings per 100,000

Scatterplot Explination

Scatterplot Analysis

Scatterplot Conclusion

Regression Analysis Table

Interpretation of Regression Results

Relationship Between Disadvantaged Score & Shootings per 100k

Regression Analysis Takeaways

Beta Regression Modeling Approach

Cleaning NYC Disadvantaged Neighborhood Data For Beta Regression Analysis

Defining the Variables within the NYSERDA Disadvantaged Communities Measurement (2023)

Beta Regression Model of All Variable from NYSERDA Disadvantaged Communities Measurement (2023)

🔥 Benzene Concentration

🌳 Low Vegetative Cover

🏥 Drive Time to Healthcare

🏭 Particulate Matter 2.5 (PM2.5)

🌡️ Days Above 90 Degrees (2050 Projection)

✳️ Additional Insights

Cleaning and Formatting the Beta Regression Output

Predicted Disadvantage Scores at Minimum and Maximum Predictor Values

Dumbbell Plot of Predicted Disadvantage Scores of Top 15 Variables

Strongest-Impact Predictors

Variables Where Higher Values Lower the Predicted Score

Why This Matters

Conclusion

References