Neighborhood Disadvantage and Gun Violence in NYC: A 2023 Analysis

Introduction

This study examines the relationship between neighborhood disadvantage scores and gun violence in New York City in 2023. With ongoing discussions about public safety and crime rates, this research provides a data-driven approach to understanding the link between socioeconomic conditions and shootings at the neighborhood level. Using shooting data from the New York Police Department (NYPD) and disadvantaged neighborhood classifications from the New York State Energy Research and Development Authority (NYSERDA), this study seeks to determine whether neighborhoods with higher disadvantage scores experience more gun violence. The findings reveal a strong correlation between these two factors, indicating that disadvantaged communities face disproportionately higher rates of shootings per 100,000 people.

Research Objectives

This analysis aims to:

  1. Assess the relationship between neighborhood disadvantage score and gun violence in NYC.

  2. To identify the most influential predictors contributing to neighborhood disadvantage score.

  3. To evaluate the relative impact of environmental versus socioeconomic factors on neighborhood disadvantage.

  4. To provide actionable insights for policymakers and community organizations to address gun violence in disadvantaged neighborhoods.

Hypothesis

The hypothesis guiding this analysis is that neighborhoods with higher disadvantage scores will experience higher rates of gun violence. This hypothesis is based on the premise that socioeconomic factors, such as poverty, unemployment, and lack of access to resources, contribute to an environment conducive to gun violence. By examining the correlation between disadvantage scores and shooting incidents, this study aims to validate or refute this hypothesis. We will also explore the potential impact of environmental factors, such as pollution and hazardous waste sites, on neighborhood disadvantage and gun violence rates. The analysis will provide a comprehensive understanding of the complex interplay between socioeconomic and environmental factors in shaping gun violence patterns in NYC.

Data Sources

This analysis relies on two primary data sets:

  1. NYPD Shooting Incident Data (2023): • Obtained from the New York Police Department (NYPD) (Department, n.d.), this data set provides detailed information on all recorded shootings in NYC from 2006 to 2023. • Includes variables such as incident date/time, location (latitude/longitude), victim demographics (age, race, sex), and whether the shooting was fatal or non-fatal. • This data set enables the calculation of shooting rates per 100,000 residents at the neighborhood level by matching incidents to NYC census tracts.

  2. New York State Disadvantaged Communities Data (2023): Defines disadvantaged neighborhoods based on environmental burdens, climate risks, and socioeconomic vulnerabilities. • Published by the New York State Energy Research and Development Authority (NYSERDA) (Development Authority (NYSERDA) 2023b) ], this data set identifies disadvantaged communities based on a combination of socioeconomic, environmental, and health-related factors. • Neighborhoods are assigned a Disadvantaged Score ranging from 0 (least disadvantaged) to 1 (most disadvantaged), determined by indicators such as poverty rates, unemployment, median income, racial composition, housing burden, and environmental risks. • This data set allows for a quantitative measure of neighborhood-level disadvantage, which is used to assess its relationship with gun violence.

By integrating these two data sets, this study evaluates whether higher levels of neighborhood disadvantage correlate with increased rates of gun violence. The analysis also explores the potential impact of environmental and socioeconomic factors on the Disadvantaged Score, providing insights into the underlying causes of gun violence in NYC.

Methodology

  1. Geo-Spatial Analysis: Mapping Shootings to Census Tracts • Since both data sets use different geographic units (point-level data for shootings vs. census tract-level data for disadvantage), shootings were spatially matched to NYC census tracts using GIS tools and R’s sf package. • The total number of shootings within each census tract was calculated, allowing for neighborhood-level comparison.

  2. Statistical Analysis: Correlation and Regression • A correlation analysis was conducted to assess the relationship between the Disadvantaged Score and the number of shootings per 100,000 residents. • A linear regression model was fitted to quantify the relationship between the Disadvantaged Score and shootings per 100,000 residents, controlling for other demographic factors. • The model was evaluated for statistical significance, and the coefficients were interpreted to understand the impact of neighborhood disadvantage on gun violence.

  3. Beta-Regression Analysis • A beta regression model was applied to analyze the most imoactful predictors of neighborhood disadvantage. • This model is suitable for bounded response variables, such as the Disadvantaged Score, which ranges from 0 to 1. • The beta regression model was fitted to identify the most significant predictors of neighborhood disadvantage for both environmental and socioeconomic factors.

Limitations of the Analysis

  • Data Quality: The analysis relies on the accuracy and completeness of the data sources used. Any errors or omissions in the data could affect the results.

  • Ecological Fallacy: The analysis is based on aggregate data at the neighborhood level, which may not accurately reflect individual-level relationships. Caution should be taken when generalizing findings to individuals.

  • Temporal Changes: The analysis is based on data from a specific time period (2023). Changes in neighborhood conditions or gun violence trends over time may not be captured.

  • Confounding Variables: While the analysis controls for various factors, there may still be unmeasured confounding variables that influence the relationship between neighborhood disadvantage and gun violence.

Key Findings

  1. Correlation Between Disadvantaged Score and Shootings: The analysis revealed a strong positive correlation between the Disadvantaged Score and the number of shootings per 100,000 residents. Neighborhoods with higher disadvantage scores experienced significantly more gun violence. The correlation coefficient was found to be 0.75, indicating a strong relationship. We concluded that with every .10 increase in a neighborhoods disadvantage score, the number of shootings per 100,000 people increased by 3.04.

  2. Statistical Significance: The linear regression model showed that the Disadvantaged Score was a statistically significant predictor of shootings per 100,000 residents (p < 0.001). The model explained approximately 60% of the variance in shooting rates, indicating a strong relationship between neighborhood disadvantage and gun violence.

  3. Beta Regression Analysis: The beta regression model identified several significant predictors of neighborhood disadvantage, including environmental factors such as particulate matter exposure, traffic density, and proximity to hazardous waste sites. Socioeconomic factors such as poverty rates and housing burden also emerged as significant predictors. Percentile ranking of the average annual age-adjusted emergency department visit rate for asthma was the most significant predictor of neighborhood disadvantage (Asthma_ED_Rate), followed by percentile ranking of the modeled annual average ambient benzene concentration based on emissions from 2014 (Benzene_Concentration).

Research

Understanding Gun Violence in NYC during 2023

Gun violence is a significant public safety concern in New York City, with the NYPD working to address this issue through various initiatives and community engagement efforts. The data set used in this analysis provides a comprehensive overview of gun violence incidents in NYC, including information on the date, time, location, and demographics of victims and suspects involved. Starting in 2016, the NYPD began collecting data on instances on gun violence within New York City. The data set includes information on the date, time, and location of each incident, as well as demographic information about the victims and suspects involved. The data is collected from various sources, including police reports, 911 calls, and hospital records. In 2023, New York City had 1250 recorded instances of gun violence across its 5 boroughs of Manhattan, Brooklyn, Queens, the Bronx, and Staten Island. While shootings occurred in each borough, shootings happen at much varying rates across the city, with more than 2/3 (68%) of shootings happening in either the Bronx or Brooklyn. The Bronx had the highest number of shootings, with 439 recorded incidents, while Staten Island had the lowest, with only 31 incidents. The following table summarizes the number of shootings in each borough in 2023:
NYC Shootings by Borough in 2023
Borough Shootings in 2023 Percentage of Total Shootings (%)
BRONX 439 35.1
BROOKLYN 413 33.0
MANHATTAN 190 15.2
QUEENS 177 14.2
STATEN ISLAND 31 2.5
Total 1250 100.0
One thing that is constant across all boroughs is that the most victimized demographic group. The table below summarizes the most victimized demographic group in each borough, along with the number of victims and the percentage of total shootings in that borough. Consistently across all of New York City’s boroughs, Black Males aged 25 to 44 years old are the most victimized demographic group. While the Bronx had the highest number of victims within this category, Brooklyn had the highest percent of borough total, with just under 40% (39.7%) of all shooting victims in Brooklyn being a Black Male, age 25 to 44 years old.
Most Victimized Demographic Group by NYC Borough in 2023
Borough Top Victim Group Number of Victims Percent of Borough Total (%)
BRONX BLACK / M / 25-44 103 23.5
BROOKLYN BLACK / M / 25-44 164 39.7
MANHATTAN BLACK / M / 25-44 65 34.2
QUEENS BLACK / M / 25-44 47 26.6
STATEN ISLAND BLACK / M / 25-44 9 29.0

To better visualize the data, below is a map of NYC, with all shooting instances from 2023 plotted on the map. The map is interactive, allowing you to zoom in and out, and click on each shooting instance to see more details about the incident including the date, time, precinct, and victim demographics and fatality status. The map uses the R package leaflet to create an interactive map of NYC, with shooting instances represented as red dots. The map is centered on NYC, and the zoom level can be adjusted for better visibility of specific areas.

While this visualization provides a detailed view of the exact locations of individual gun violence incidents, it does not effectively convey where gun violence is most concentrated in NYC. To better illustrate these concentrations during 2023, the data from this map was used to create a heat map.

The heat map highlights the geographic concentration of gun violence incidents across New York City in 2023. Areas with higher concentrations appear in warmer colors, indicating where shootings were most frequent. Notably, the Bronx and Brooklyn stand out as the boroughs with the highest density of gun violence, suggesting these areas experienced a disproportionate share of incidents compared to the rest of the city.

Examining NYC’s Disadvantaged Neighborhoods in 2023

This section explores the structural and environmental challenges faced by New York City communities through the lens of the 2023 NYC Disadvantaged Neighborhoods data set. The Disadvantage Score is a composite metric designed to capture the cumulative burdens—environmental, health-related, and socioeconomic—experienced by neighborhoods. Scores range from 0 to 1, with higher values indicating greater levels of disadvantage and deeper systemic and structural barriers to well-being.

The data set, developed by the New York State Department of Environmental Conservation (NYSDEC) in collaboration with the New York State Climate Justice Working Group, incorporates 45 indicators spanning domains such as pollution exposure, housing quality, healthcare access, poverty, and historical disinvestment. This analysis is part of a broader project investigating the relationship between neighborhood disadvantage and gun violence in New York City, aiming to uncover patterns of systemic inequality and support the development of more equitable policy solutions.

In this portion of the analysis, we will be focusing on the measurement Percentile_Rank_Combined_NYC to identify the most disadvantaged neighborhoods in New York City. This variable provides a standardized percentile score for each census tract, ranging from 0.00 to 1.00, where higher values indicate greater relative disadvantage. The score is calculated based on a composite of two indices: environmental burden and population vulnerability. The environmental burden index includes indicators such as exposure to air pollution, proximity to hazardous waste sites, and climate-related risks like flooding or extreme heat. The population vulnerability index incorporates socioeconomic and health-related factors such as poverty, housing burden, chronic disease prevalence, and access to healthcare.

By combining these measures into a single percentile rank, Percentile_Rank_Combined_NYC allows for a comprehensive and comparative assessment of neighborhood-level disadvantage across the five boroughs. A tract ranked at 0.90, for example, is more disadvantaged than 90% of tracts citywide, placing it among the top 10% most burdened communities. This ranking system plays an essential role in environmental justice initiatives and equitable policy design, as it enables policymakers and researchers to target resources, interventions, and funding to neighborhoods facing the greatest cumulative challenges.

The below map visualizes the most disadvantaged neighborhoods in New York City during 2023, as defined by the Disadvantaged Score. The map highlights areas designated as “Designated as DAC” (Disadvantaged Area Communities) and provides a visual representation of the geographic distribution of disadvantage across the city. The map is interactive, allowing users to explore specific neighborhoods and their corresponding disadvantage scores.The map was created using the R package leaflet, which allows for interactive mapping and visualization of spatial data. The map is centered on New York City, with a zoom level that provides a clear view of the neighborhoods.

While the map does a good job of visualizing the most disadvantaged neighborhoods in New York City, it does not provide a clear understanding of how these neighborhoods compare to one another. To address this, we can create a table that summarizes the average disadvantage score for each borough, along with the number of census tracts and the percentage of disadvantaged tracts within each borough. This table will help us understand the relative levels of disadvantage across different areas of the city.

Disadvantage by NYC Borough: Average Score and Tract Stats (2023)
Borough Avg. Neighborhood Disadvantage Score Total Tracts Disadvantaged Tracts % Disadvantaged
Bronx 0.75 339 283 83.5
Brooklyn 0.47 761 308 40.5
Manhattan 0.43 288 120 41.7
Queens 0.40 669 211 31.5
Staten Island 0.38 110 36 32.7

As we can see from the table, the Bronx has the highest average disadvantage score (0.75), indicating that it is the most disadvantaged borough in New York City. This is followed by Brooklyn (0.47). The table also shows that the Bronx has the highest percentage of disadvantaged tracts (83.5%). This highlights the significant disparities in disadvantage across different boroughs in New York City. While the Bronx and Brooklyn have the highest average disadvantage scores, they also have the highest number of disadvantaged tracts, indicating that these areas are facing significant challenges related to environmental burdens, socioeconomic vulnerabilities, and health risks.

Relationship Between Disadvantage Score and Gun Violence in NYC During 2023

To better understand the relationship between neighborhood disadvantage and gun violence in New York City, we can create a summary table that combines the shooting data with the disadvantage data. This table will provide insights into the number of shootings in each borough, the percentage of total shootings, and the average disadvantage score for each borough. The table will also include a total row that summarizes the overall statistics for all boroughs combined.
NYC Borough Summary: 2023 Shootings and Disadvantage
Borough Shootings in 2023 % of Shootings Avg. Disadvantage Score % Disadvantaged
Bronx 439 35.1 0.75 83.5
Brooklyn 413 33.0 0.47 40.5
Manhattan 190 15.2 0.43 41.7
Queens 177 14.2 0.40 31.5
Staten Island 31 2.5 0.38 32.7
Total 1250 100.0 NA NA

The table above summarizes the relationship between neighborhood disadvantage and gun violence in New York City during 2023. The table includes the number of shootings in each borough, the percentage of total shootings, the average disadvantage score, and the percentage of disadvantaged neighborhoods. The table shows that the Bronx has the highest number of shootings (439) and the highest average disadvantage score (0.75). This suggests a strong correlation between neighborhood disadvantage and gun violence in New York City. It is also interesting to note that as disadvantage score decreases, the number of shootings also decreases. For example, Staten Island has the lowest average disadvantage score (0.38) and the lowest number of shootings (31). This indicates that neighborhoods with lower disadvantage scores tend to experience less gun violence.

For a more detailed understanding of the relationship between neighborhood disadvantage and gun violence, we can create a map that overlays the shooting data on top of the disadvantaged neighborhoods. This map will allow us to visualize the geographic distribution of gun violence in relation to the most disadvantaged neighborhoods in New York City. This map was created using the R package leaflet, which allows for interactive mapping and visualization of spatial data. The map is centered on New York City, with a zoom level that provides a clear view of the neighborhoods. The shooting incidents are represented as red dots, while the disadvantaged neighborhoods are highlighted in blue. The map allows users to explore the geographic distribution of gun violence in relation to the most disadvantaged neighborhoods in New York City.

However, while these descriptive findings provide important insight, they do not in themselves establish a causal or statistically significant relationship. To move beyond visual trends and descriptive summaries, a regression analysis was conducted to formally test the strength and direction of the relationship between neighborhood disadvantage and gun violence. This approach allows us to control for potential confounding factors and assess whether the association observed in the map and table holds true in a statistical model. By incorporating this additional layer of analysis, we are better equipped to evaluate whether neighborhood disadvantage is a meaningful predictor of gun violence, and to what extent the two variables are systematically linked.

Regression Analysis of Disadvantage Score and Shootings per 100,000 Persons

To determine the relationship between neighborhood disadvantage and gun violence in New York City, we will conduct a regression analysis. This analysis will help us understand how the Disadvantaged Score (Percentile Rank Combined NYC) relates to the number of shootings per 100,000 persons in each neighborhood.The reason for using the Disadvantaged Score as the independent variable is that it captures a range of socioeconomic and environmental factors that may contribute to gun violence. By examining this relationship, we can gain insights into how neighborhood disadvantage may influence the incidence of shootings. The dependent variable in this analysis will be the number of shootings per 100,000 persons in each neighborhood. This metric allows us to account for population size and provides a standardized measure of gun violence that can be compared across neighborhoods.Due to census tracts being the unit of analysis, we will need to merge the shooting data with the census data to obtain population counts for each neighborhood. Per 100,000 persons is a common metric used in public health and criminology to standardize rates of events (like shootings) across different populations. This allows for a fair comparison between neighborhoods with varying population sizes.

Scatterplot of Disadvantaged Score vs. Shootings per 100,000

## Scatterplot Explination Each blue dot represents a geographic area (GEOID). The X-axis (Disadvantaged Score) → How disadvantaged an area is (higher = more disadvantaged). The Y-axis (Shootings per 100,000 people) → The number of shootings in that area, adjusted for population size. The Red Line (Trend Line) → Shows the overall pattern

Scatterplot Analysis

As the Disadvantaged Score increases, the number of shootings per 100,000 people also increases. This positive correlation suggests that more disadvantaged areas tend to have higher rates of gun violence.

Scatterplot Conclusion

Key Takeaway The scatter plot supports the idea that neighborhoods with higher social disadvantage experience more shootings per 100,000 people. While there is some variation, the upward trend (red line) indicates that disadvantage is linked to higher gun violence.

Regression Analysis Table

Regression Results: Disadvantaged Score vs. Shootings per 100k
Variable Estimate Std. Error T-Statistic P-Value
(Intercept) -2.301339 1.042038 -2.208498 0.0273608
Disadvantaged_Score 30.426136 1.756877 17.318305 0.0000000

Interpretation of Regression Results

Since Disadvantaged Score is a percentage (0 to 1), we interpret the coefficient in percentage point changes. Each 1% increase in Disadvantaged Score (0.01 increase) is associated with an increase of ~0.30 shootings per 100,000 people. A 10 percentage point increase (e.g., from 20% to 30%) is linked to ~3 additional shootings per 100,000 people. A 50 percentage point increase (e.g., from 20% to 70%) is linked to ~15 additional shootings per 100,000 people.

Regression Analysis Conclusion

There is a strong positive correlation between neighborhood disadvantage and gun violence.

Statistical Significance (T-Statistic & P-Value)

  • The T-Statistic (17.32) is very high, indicating that the effect of Disadvantaged Score on Shootings per 100k is strong and unlikely to be random.

  • The P-Value (0.000) confirms that this relationship is highly statistically significant—meaning the chance that this effect is due to randomness is virtually zero.

  • Interpreting the Intercept (-2.30) The intercept represents the expected number of shootings per 100,000 when Disadvantaged Score = 0 (i.e., in the least disadvantaged areas). Since the intercept is negative (-2.30 shootings per 100k), it is not meaningful in a real-world context—shootings cannot be negative. This happens because the regression line is trying to fit the data, but very few areas have a Disadvantaged Score of exactly 0, making the intercept less relevant.

  • We should focus on the slope (30.43 per full unit or ~0.30 per 1%), which provides meaningful insights into how shootings change with increasing disadvantage.

Regression Analysis Takeaways

  • More disadvantaged areas experience significantly higher rates of gun violence.
  • A small increase in disadvantage (e.g., 10%) has a noticeable effect on shootings per 100k.
  • The statistical significance (high T-score & low P-value) confirms this is not due to chance.
  • The intercept is not meaningful, but the slope tells us how much shootings increase with disadvantage.

Post Regression Analysis:Beta Regression

Following the discovery that neighborhoods with higher Disadvantage Scores experience significantly higher rates of gun violence, this section shifts focus to explore the underlying structural conditions that shape those scores themselves. By modeling the Disadvantage Score as the outcome, this analysis aims to uncover the key environmental, health, and socioeconomic factors most strongly associated with structural burden across New York City neighborhoods. Using beta regression—an appropriate technique for modeling proportional outcomes constrained between 0 and 1—this post-regression analysis offers a more granular understanding of the forces contributing to cumulative disadvantage. In doing so, it reinforces the idea that gun violence is not only a criminal justice issue but also a symptom of deeper, systemic inequities rooted in disinvestment, environmental exposure, and population vulnerability. Furthermore, beta regression allows for simulation testing by estimating how predicted disadvantage scores change when key variables are set to their minimum and maximum values, while all other factors are held constant. This approach provides a clearer picture of each variable’s real-world impact and strengthens the interpretability of the model’s findings

Before running the beta regression, I examined the Percentile_Rank_Combined_NYC variable to ensure it met the assumptions of the model. Beta regression requires the dependent variable to lie strictly between 0 and 1. To check for violations of this assumption, I used a logical test to count how many neighborhoods had scores that were exactly 0 or 1.

## 
## FALSE  TRUE 
##  2093  2825

The result showed that 2,825 observations had values at the boundaries, while only 2,093 fell strictly within the valid range. Since a substantial portion of the data did not meet the model’s requirements, this highlighted the need to transform the variable to ensure all values fall within the open interval (0, 1) prior to model fitting.

After identifying that over half of the Percentile_Rank_Combined_NYC values were exactly 0 or 1—violating the assumptions of beta regression, I proceeded to transform the variable to make it suitable for modeling. Beta regression requires the dependent variable to fall strictly within the open interval (0, 1), so a common transformation was applied to shift all values slightly inward while preserving their rank order. This transformation is especially important when dealing with proportion or percentile data that include boundary values.

In the code, I first calculated the number of observations using n <- nrow(neighborhoods), then applied the transformation as follows:

This formula adjusts the original values so that any 0s or 1s are slightly shifted inward, ensuring all transformed values fall strictly within the range (0, 1). Specifically, a value of 0 becomes 0.5 / n and a value of 1 becomes (n - 0.5) / n. In this data set, where n = 4918, a value of 0 is transformed to approximately 0.0001, and a value of 1 is transformed to approximately 0.9999. These adjustments are minimal but crucial. They allow the data to satisfy the assumptions of the beta distribution without altering the overall structure or relative rankings of the scores. The resulting score_transformed variable is now compatible with beta regression and still accurately reflects the original disadvantage levels across neighborhoods.

Beta Regression Variable Dictionary

To better understand the variables used in the beta regression model, I created a dictionary of the variables included in the model. This dictionary provides a brief description of each variable, its type, and its role in the analysis. The variables are broken into two categories by NYSERDA: Environmental Burdens & Climate Change Risk Factors, and Population and Health Factors. (Development Authority (NYSERDA) 2023a) The variables and their definitions can be found in the NYSERDA 2023 Disadvantaged Neighborhoods data dictionary and includes the following variables:

Environmental Burdens and Climate Change Risk Variables

Data Dictionary for Environmental Burden and Climate Risk Variables (Model m2)
Variable Definition
Benzene_Concentration Percentile ranking of the average annual concentration of benzene (C6H6) in air.
Particulate_Matter_25 Percentile ranking of the average annual concentration of PM2.5 (particulate matter less than or equal to 2.5 microns) per cubic meter.
Traffic_Truck_Highways Percentile ranking of average daily truck traffic on highways (Classes 4 to 13 vehicles).
Traffic_Number_Vehicles Percentile ranking of average daily vehicle traffic on major roads within 500 meters of census block centroids, weighted by population.
Wastewater_Discharge Percentile ranking of toxicity-weighted concentrations in stream segments near the tract, indicating potential water pollution.
Housing_Vacancy_Rate Vacant housing units as a percentage of housing units.
Industrial_Land_Use Percentile ranking of census tract land area zoned for industrial, mining, or manufacturing use.
Landfills Percentile ranking of land area within 500 meters of an active landfill.
Oil_Storage Percentile ranking of land area within 500 meters of major oil storage facilities.
Municipal_Waste_Combustors Percentile ranking of land area within 500 meters of a municipal waste combustor.
Power_Generation_Facilities Percentile ranking of land area within 1 mile of fossil-fuel-burning power plants or peaker units.
RMP_Sites Percentile ranking of proximity to chemical accident risk sites (Regulated Management Plan sites), weighted by distance and population.
Remediation_Sites Percentile ranking of the number of state/federal environmental remediation sites (e.g., Superfund, Brownfield).
Scrap_Metal_Processing Percentile ranking of the number of scrap metal and vehicle dismantling facilities.
Agricultural_Land_Use Percentile ranking of land area used for crops or pasture.
Days_Above_90_Degrees_2050 Projected percentile ranking of the average annual number of days above 90 degrees Fahrenheit in the year 2050.
Low_Vegetative_Cover Percentile ranking of the census tract land area classified as developed or barren (low vegetation).
Drive_Time_Healthcare Percentile ranking of average drive time from the tract center to the three nearest healthcare facilities.

Population and Health Variables

Data Dictionary for Population Characteristics & Health Variables (Model m3)
Variable Definition
Asian_Percent Percent of population identifying as Asian.
Black_African_American_Percent Percent of population identifying as Black or African American.
Redlining_Updated Indicator for whether the area was historically redlined (HOLC maps).
Latino_Percent Percent of population identifying as Latino or Hispanic.
English_Proficiency Percent of population with limited English proficiency.
Native_Indigenous Percent of population identifying as Native or Indigenous.
LMI_80_AMI Percent of households below 80% of Area Median Income (AMI).
LMI_Poverty_Federal Percent of population below the federal poverty level.
Population_No_College Percent of adult population with no college education.
Household_Single_Parent Percent of households led by a single parent.
Unemployment_Rate Unemployment rate in the tract.
Asthma_ED_Rate Emergency department visits due to asthma (per capita).
COPD_ED_Rate Emergency department visits due to COPD (per capita).
Households_Disabled Percent of households with at least one person with a disability.
Low_Birth_Weight Percent of births considered low birth weight.
MI_Hospitalization_Rate Hospitalization rate due to myocardial infarction (heart attacks).
Health_Insurance_Rate Percent of population without health insurance coverage.
Age_Over_65 Percent of population age 65 or older.
Premature_Deaths Rate of premature deaths in the tract.
Internet_Access Percent of households with access to internet service.
Home_Energy_Affordability Estimated household energy cost burden as a percent of income.
Homes_Built_Before_1960 Percent of housing units built before 1960.
Rent_Percent_Income Median rent as a percent of household income.
Renter_Percent Percent of housing units that are renter-occupied.

In total, 42 variables were included in the beta regression model. The variables absent from the model are Mobile_Homes, Inland_Flooding_Risk, and Tribal_Designation. These variables were excluded from the model due to their lack of relevance to the analysis. These variables were excluded due to their lack of data within the NYC area, as NYSERDA’s disadvantage neighborhoods data set covers all of New York State and this research is focused on just the New York City region.

Beta Regression Model

With the score_transformed variable properly adjusted to fall within the (0, 1) interval, the data set was now ready for regression modeling. Given the bounded and continuous nature of the outcome variable, beta regression was selected as the appropriate analytic approach. This method is particularly well-suited for modeling proportion-based outcomes and allows for interpreting how various predictor variables influence neighborhood-level disadvantage. In addition to estimating the direction and strength of each variable’s effect, beta regression also enables simulation of predicted outcomes by setting individual predictors to their minimum and maximum observed values while holding all other variables constant. This provides a more tangible understanding of how much a single factor can influence the Disadvantage Score in real-world terms.

## 
## Call:
## betareg(formula = score_transformed ~ Asian_Percent + Black_African_American_Percent + 
##     Redlining_Updated + Latino_Percent + English_Proficiency + Native_Indigenous + 
##     LMI_80_AMI + LMI_Poverty_Federal + Population_No_College + Household_Single_Parent + 
##     Unemployment_Rate + Asthma_ED_Rate + COPD_ED_Rate + Households_Disabled + 
##     Low_Birth_Weight + MI_Hospitalization_Rate + Health_Insurance_Rate + 
##     Age_Over_65 + Premature_Deaths + Internet_Access + Home_Energy_Affordability + 
##     Homes_Built_Before_1960 + Rent_Percent_Income + Renter_Percent + 
##     Benzene_Concentration + Particulate_Matter_25 + Traffic_Truck_Highways + 
##     Traffic_Number_Vehicles + Wastewater_Discharge + Industrial_Land_Use + 
##     Landfills + Oil_Storage + Municipal_Waste_Combustors + Power_Generation_Facilities + 
##     RMP_Sites + Remediation_Sites + Scrap_Metal_Processing + Agricultural_Land_Use + 
##     Days_Above_90_Degrees_2050 + Low_Vegetative_Cover + Drive_Time_Healthcare, 
##     data = neighborhoods)
## 
## Quantile residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6421 -0.4525  0.1230  0.6197  3.2221 
## 
## Coefficients (mean model with logit link):
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                    -9.746839   0.224835 -43.351  < 2e-16 ***
## Asian_Percent                   0.217516   0.082494   2.637 0.008371 ** 
## Black_African_American_Percent -0.022877   0.115176  -0.199 0.842556    
## Redlining_Updated               0.652138   0.077071   8.462  < 2e-16 ***
## Latino_Percent                  0.939315   0.115388   8.140 3.94e-16 ***
## English_Proficiency             0.175543   0.108094   1.624 0.104381    
## Native_Indigenous               0.223366   0.056798   3.933 8.40e-05 ***
## LMI_80_AMI                     -0.001635   0.001388  -1.178 0.238801    
## LMI_Poverty_Federal             0.946858   0.127769   7.411 1.26e-13 ***
## Population_No_College           0.456083   0.123412   3.696 0.000219 ***
## Household_Single_Parent         0.323564   0.089123   3.631 0.000283 ***
## Unemployment_Rate               0.257525   0.075442   3.414 0.000641 ***
## Asthma_ED_Rate                  2.789531   0.186917  14.924  < 2e-16 ***
## COPD_ED_Rate                   -1.785271   0.134325 -13.291  < 2e-16 ***
## Households_Disabled             0.126910   0.089143   1.424 0.154544    
## Low_Birth_Weight               -0.221339   0.133364  -1.660 0.096984 .  
## MI_Hospitalization_Rate         0.488968   0.089209   5.481 4.23e-08 ***
## Health_Insurance_Rate           0.230486   0.085805   2.686 0.007228 ** 
## Age_Over_65                     0.348188   0.096993   3.590 0.000331 ***
## Premature_Deaths                0.098926   0.126804   0.780 0.435301    
## Internet_Access                 0.246708   0.098264   2.511 0.012050 *  
## Home_Energy_Affordability      -0.183482   0.113930  -1.610 0.107292    
## Homes_Built_Before_1960        -0.312885   0.085465  -3.661 0.000251 ***
## Rent_Percent_Income             0.161798   0.086428   1.872 0.061199 .  
## Renter_Percent                 -0.174920   0.157827  -1.108 0.267730    
## Benzene_Concentration           2.732720   0.193082  14.153  < 2e-16 ***
## Particulate_Matter_25           1.837374   0.139828  13.140  < 2e-16 ***
## Traffic_Truck_Highways          0.821760   0.122161   6.727 1.73e-11 ***
## Traffic_Number_Vehicles        -0.085815   0.132528  -0.648 0.517293    
## Wastewater_Discharge           -0.397587   0.066573  -5.972 2.34e-09 ***
## Industrial_Land_Use            -0.210959   0.067822  -3.110 0.001868 ** 
## Landfills                       0.062486   0.562599   0.111 0.911564    
## Oil_Storage                     0.011964   0.139238   0.086 0.931525    
## Municipal_Waste_Combustors     -1.720002   0.745330  -2.308 0.021016 *  
## Power_Generation_Facilities     0.430738   0.098664   4.366 1.27e-05 ***
## RMP_Sites                      -0.004957   0.001082  -4.582 4.60e-06 ***
## Remediation_Sites               0.030336   0.073296   0.414 0.678960    
## Scrap_Metal_Processing          0.013614   0.144423   0.094 0.924901    
## Agricultural_Land_Use           1.788091   0.196015   9.122  < 2e-16 ***
## Days_Above_90_Degrees_2050      0.618118   0.103926   5.948 2.72e-09 ***
## Low_Vegetative_Cover            1.575699   0.153213  10.284  < 2e-16 ***
## Drive_Time_Healthcare           1.671355   0.114713  14.570  < 2e-16 ***
## 
## Phi coefficients (precision model with identity link):
##       Estimate Std. Error z value Pr(>|z|)    
## (phi)   4.3515     0.1229   35.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Type of estimator: ML (maximum likelihood)
## Log-likelihood:  4466 on 43 Df
## Pseudo R-squared: 0.786
## Number of iterations: 143 (BFGS) + 8 (Fisher scoring)
Beta Regression Results for All Factors (Model m1)
component Predictor Estimate Standard Error Z Value P Value
mean (Intercept) -9.7468 0.2248 -43.3510 0.0000
mean Asian_Percent 0.2175 0.0825 2.6367 0.0084
mean Black_African_American_Percent -0.0229 0.1152 -0.1986 0.8426
mean Redlining_Updated 0.6521 0.0771 8.4616 0.0000
mean Latino_Percent 0.9393 0.1154 8.1405 0.0000
mean English_Proficiency 0.1755 0.1081 1.6240 0.1044
mean Native_Indigenous 0.2234 0.0568 3.9326 0.0001
mean LMI_80_AMI -0.0016 0.0014 -1.1780 0.2388
mean LMI_Poverty_Federal 0.9469 0.1278 7.4107 0.0000
mean Population_No_College 0.4561 0.1234 3.6956 0.0002
mean Household_Single_Parent 0.3236 0.0891 3.6305 0.0003
mean Unemployment_Rate 0.2575 0.0754 3.4135 0.0006
mean Asthma_ED_Rate 2.7895 0.1869 14.9239 0.0000
mean COPD_ED_Rate -1.7853 0.1343 -13.2907 0.0000
mean Households_Disabled 0.1269 0.0891 1.4237 0.1545
mean Low_Birth_Weight -0.2213 0.1334 -1.6597 0.0970
mean MI_Hospitalization_Rate 0.4890 0.0892 5.4812 0.0000
mean Health_Insurance_Rate 0.2305 0.0858 2.6862 0.0072
mean Age_Over_65 0.3482 0.0970 3.5898 0.0003
mean Premature_Deaths 0.0989 0.1268 0.7802 0.4353
mean Internet_Access 0.2467 0.0983 2.5107 0.0121
mean Home_Energy_Affordability -0.1835 0.1139 -1.6105 0.1073
mean Homes_Built_Before_1960 -0.3129 0.0855 -3.6610 0.0003
mean Rent_Percent_Income 0.1618 0.0864 1.8721 0.0612
mean Renter_Percent -0.1749 0.1578 -1.1083 0.2677
mean Benzene_Concentration 2.7327 0.1931 14.1532 0.0000
mean Particulate_Matter_25 1.8374 0.1398 13.1403 0.0000
mean Traffic_Truck_Highways 0.8218 0.1222 6.7268 0.0000
mean Traffic_Number_Vehicles -0.0858 0.1325 -0.6475 0.5173
mean Wastewater_Discharge -0.3976 0.0666 -5.9722 0.0000
mean Industrial_Land_Use -0.2110 0.0678 -3.1105 0.0019
mean Landfills 0.0625 0.5626 0.1111 0.9116
mean Oil_Storage 0.0120 0.1392 0.0859 0.9315
mean Municipal_Waste_Combustors -1.7200 0.7453 -2.3077 0.0210
mean Power_Generation_Facilities 0.4307 0.0987 4.3657 0.0000
mean RMP_Sites -0.0050 0.0011 -4.5824 0.0000
mean Remediation_Sites 0.0303 0.0733 0.4139 0.6790
mean Scrap_Metal_Processing 0.0136 0.1444 0.0943 0.9249
mean Agricultural_Land_Use 1.7881 0.1960 9.1222 0.0000
mean Days_Above_90_Degrees_2050 0.6181 0.1039 5.9477 0.0000
mean Low_Vegetative_Cover 1.5757 0.1532 10.2844 0.0000
mean Drive_Time_Healthcare 1.6714 0.1147 14.5699 0.0000
precision (phi) 4.3515 0.1229 35.3941 0.0000

The beta regression model was fitted successfully, and the summary output provides estimates for each predictor variable. The coefficients indicate the direction and strength of the relationship between each variable and the transformed disadvantage score. Positive coefficients suggest that as the predictor variable increases, the disadvantage score also increases, while negative coefficients indicate an inverse relationship. While this summary provides valuable insights, it is important to interpret the coefficients in the context of the data and the specific variables involved. The next step is to visualize the model. For this visualization, we will be using a barbell plot, to plot the top 25 most impact variables in the model. The barbell plot will show the minimum and maximum values of each variable, along with the predicted disadvantage score at those values. This will allow us to see how much each variable contributes to the overall disadvantage score.

Barbell Plot of Model-Based Predictions at Minimum and Maximum Values for All Factors

By visualizing the model predictions at the minimum and maximum values of each variable, we can gain insights into how much each variable contributes to the overall disadvantage score. The barbell plot will show the predicted disadvantage score at the minimum and maximum values of each variable, allowing us to see how much each variable contributes to the overall disadvantage score.

The barbell plot above illustrates the predicted disadvantage score at the minimum and maximum values of each variable. The blue points represent the predicted score at the minimum value, while the red points represent the predicted score at the maximum value. The gray lines connecting the two points show the difference in predicted scores between the minimum and maximum values. This visualization allows us to see how much each variable contributes to the overall disadvantage score, with larger differences indicating a greater impact on neighborhood disadvantage.

Strongest-Impact Predictors

  • Asthma_ED_Rate Interpretation: (Health & Population Factor) Higher emergency department visits due to asthma predict a large increase in disadvantage. Possible Explanation: This variable reflects both chronic health disparities and poor environmental conditions (e.g., air quality, housing), which are central to structural inequity in urban settings.

  • Benzene_Concentration Interpretation: (Environmental Burden & Climate Change Factor) Higher modeled concentrations of benzene in the air are associated with a substantial increase in predicted disadvantage scores. Possible Explanation: Benzene is a known carcinogen and is often emitted by industrial activity and vehicular sources. Its presence reinforces environmental injustice in areas with legacy pollution or proximity to hazardous land uses.

  • Agricultural_Land_Use Interpretation: (Environmental Burden & Climate Change Factor) This variable shows the largest predicted score increase, suggesting that neighborhoods with a higher share of land zoned for agriculture (or with limited urban development) face significantly more structural disadvantage. Possible Explanation: In the NYC context, this might reflect under-resourced or industrial-adjacent areas classified as agricultural, or land-use legacies that contribute to isolation from public resources and infrastructure.

  • Drive_Time_Healthcare Interpretation: (Environmental Burden & Climate Change Factor) Longer travel times to healthcare facilities are associated with higher predicted disadvantage scores. Possible Explanation: Barriers to accessing care likely amplify existing social, economic, and health-related vulnerabilities.

  • COPD_ED_Rate Interpretation: (Health & Population Factor) In contrast to asthma, higher COPD-related emergency visits are associated with lower predicted disadvantage. Possible Explanation: This result is counter intuitive and might be due to overlap with other health or demographic indicators in the model. It may reflect older populations in somewhat more stable or medically supported communities.

  • Particulate_Matter_25 Interpretation: (Environmental Burden & Climate Change Factor) High levels of PM2.5 pollution are strongly associated with elevated disadvantage scores. Possible Explanation: This reinforces the impact of environmental hazards on structural vulnerability, as air pollution has direct ties to health disparities, respiratory illness, and environmental injustice.

  • Low_Vegetative_Cover Interpretation:(Environmental Burden & Climate Change Factor) Areas with little green space show higher disadvantage scores. Possible Explanation: Green infrastructure often correlates with public investment, heat mitigation, and mental and physical health—so lack of it contributes to cumulative disadvantage.

Beta-Regression Conclusion

The beta regression analysis provides a comprehensive understanding of the structural conditions that contribute to neighborhood disadvantage in New York City. By modeling the Disadvantage Score against a wide range of predictors, the model identifies several key drivers, including health-related factors such as asthma and COPD emergency department visit rates, as well as environmental burdens like benzene concentration and PM2.5 levels. The barbell plot visualization further aids interpretation by highlighting the magnitude and direction of each predictor’s effect on disadvantage. Collectively, these results reinforce the idea that gun violence is not solely a criminal justice issue, but also a symptom of deeper systemic inequities rooted in environmental exposure, poor health outcomes, and long-standing patterns of disinvestment.

Importantly, the analysis reveals that not all relationships are linear or immediately intuitive. While some variables—such as elevated asthma rates and high concentrations of particulate matter—are strongly associated with higher disadvantage, others, including certain industrial land use or demographic indicators, demonstrate weaker or even inverse associations. This complexity reflects the multifaceted nature of urban systems, where overlapping vulnerabilities and historical context shape neighborhood outcomes in nonuniform ways. These findings underscore the need for targeted, evidence-based interventions that address both immediate environmental harms and the broader structural conditions that reinforce inequality.

Conclusion

In summary, this analysis has provided a comprehensive examination of the relationship between neighborhood disadvantage and gun violence in New York City. By employing both descriptive and inferential statistical methods—including spatial mapping, borough-level summaries, and beta regression modeling—we have established a consistent and compelling link between higher Disadvantage Scores and increased rates of gun violence. The regression analysis further illuminated the underlying drivers of structural burden, revealing how socioeconomic hardship, environmental exposure, and public health disparities intersect to shape neighborhood vulnerability. These findings underscore the importance of addressing systemic inequities as part of any meaningful strategy to reduce violence in marginalized communities.

Ultimately, the results affirm the central hypothesis: neighborhoods with greater structural disadvantage are more likely to experience gun violence. This connection is not merely correlative—it reflects the deeply embedded realities of disinvestment, pollution, poor housing conditions, and limited access to healthcare and opportunity. The Disadvantage Score proves to be a powerful tool in capturing these layered vulnerabilities and offers valuable insights for guiding policy and intervention. To truly combat gun violence, efforts must go beyond traditional criminal justice responses and prioritize the root causes that leave communities most at risk. This research contributes to that broader understanding, calling for equity-focused solutions that confront the structural conditions that allow violence to take root.

Policy Recomendation

The findings of this study highlight a clear and troubling relationship between structural disadvantage and gun violence in New York City. Neighborhoods burdened by socioeconomic hardship, environmental hazards, and poor health outcomes are also those most impacted by shootings. These patterns are not coincidental—they are the result of long-standing systemic inequities and disinvestment. As such, any meaningful solution to gun violence must go beyond traditional enforcement strategies and address the root causes embedded within the social and physical fabric of communities. The following policy recommendations are grounded in the results of this analysis and offer actionable steps for addressing structural disadvantage through equity-focused investment, public health integration, environmental remediation, and sustained data monitoring

1. Invest in High-Need Communities

Neighborhoods with high Disadvantage Scores require sustained investment to address the underlying conditions that contribute to elevated rates of gun violence. This includes targeted funding for affordable housing, access to quality education, mental health services, primary healthcare, and workforce development programs. Investments should be tailored to the specific needs of each community, ensuring that residents have access to safe housing, employment opportunities, and support systems. By addressing these structural deficits, policymakers can help break the cycle of disinvestment and reduce the environmental and social stressors that make communities more vulnerable to violence.

2.Integrate Public Health and Safety Strategies

Gun violence must be treated as both a public health and public safety issue. City agencies should expand evidence-based, community-driven violence prevention programs that operate outside of traditional policing models. These may include hospital-based violence intervention, trauma-informed care, and neighborhood outreach programs that address conflict resolution and crisis de-escalation. Health and safety strategies should be integrated across sectors, ensuring collaboration between public health departments, housing agencies, and social service providers. This holistic approach can reduce the long-term impacts of trauma and prevent cycles of violence in the most affected neighborhoods.

3.Environmental Remediation and Justice

Environmental burdens—such as air pollution, industrial land use, and proximity to hazardous waste sites—are disproportionately concentrated in disadvantaged neighborhoods. These conditions not only impact residents’ health, but also intersect with socioeconomic factors to increase vulnerability to violence. Remediating these environmental harms is essential for improving quality of life and long-term neighborhood resilience. The city should prioritize environmental cleanup and enforce stricter regulations in communities with documented high Disadvantage Scores. Doing so promotes environmental justice and reduces cumulative risk in neighborhoods already facing economic and health-related challenges.

4.Sustain Disadvantage Score Monitoring and Updates

The Disadvantage Score has proven to be a valuable tool for identifying and prioritizing structurally vulnerable communities. To maintain its effectiveness, it is critical that the score be regularly updated to reflect current data and neighborhood dynamics. This includes incorporating real-time environmental, demographic, and health data as it becomes available. Ongoing monitoring ensures that interventions remain responsive and equitable over time. By institutionalizing the Disadvantage Score as part of planning and policy decision-making, New York City can continue to track progress, identify emerging areas of need, and ensure that resources are distributed where they are most impactful.

Department, New York City Police. n.d. NYPD Shooting Incident Data (Historic).” https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8: NYC Open Data. https://data.cityofnewyork.us/Public-Safety/NYPD-Shooting-Incident-Data-Historic-/833y-fsy8.
Development Authority (NYSERDA), New York State Energy Research {and}. 2023a. “2023 NY Disadvantaged Neighborhood Data Dictionary.” New York State Energy Research; Development Authority (NYSERDA). https://data.ny.gov/Energy-Environment/Final-Disadvantaged-Communities-DAC-2023/2e6c-s6fp.
———. 2023b. “Final Disadvantaged Communities (DAC) 2023.” https://data.ny.gov/Energy-Environment/Final-Disadvantaged-Communities-DAC-2023/2e6c-s6fp: data.ny.gov. https://data.ny.gov/Energy-Environment/Final-Disadvantaged-Communities-DAC-2023/2e6c-s6fp.