Approach -
For my final project I will be investigating the relationship between urban infrastructure investments, like public transit access and green spaces, and the demographic makeup of different city districts. Just in my observation of the makeup of the city, I have always believed that wealthier districts receive higher investments in infrastructure and I am hoping that there can be data to support this claim. In order to accomplish this I will be leveraging the OSMEN data science workflow. I will obtain publicly available data from two distinct sources and scrub it to be able to better explore it and interpret it. I will generate visuals that use the sf package, which integrated perfectly with the Tidyverse. That is the primary package I will be using for this project.
Data Sources:
I first attempted to get a breakdown of demographics and income level by city council district but struggled to find a data source that contained all of the necessary information. Eventually, I found a dataset available on the data.cccnewyork.org website for “Median Incomes in NYC”. Based on the site, the data was last updated in 2023. I was able to select the indicators I wanted to include and it produced distinct data files. For this analysis I focused specifically on the median income levels for the various neighborhoods in New York City. Using the median as the measure for income will serve as a better reflection of typical incomes in a group rather than average income because of income inequality.
The second data set I found was available on the data.ny.gov website for the latest “MTA Subway Stations”. I will pull this data dynamically as a GeoJSON using the sf package.
Conclusion -
After running my linear regression model, the summary displayed a p-value of 0.0005787 for all households. Because this p-value is less than 0.05, I can confidently conclude that there is a statistically significant relationship between a district’s wealth and its subway access. The line in my scatter plot goes slightly up and to the right which indicates that the wealthier neighborhoods have more stations. This affirms my initial assumption.
Using the heatmap I produced, it is evident that lower Manhattan around the financial district holds the most wealth distribution, with a median income of over $198K, and the most access to MTA subways, with district 101 alone having 23 stations. The Bronx holds the least wealth distribution, with a median income of less than $31K in district 203 and 206 and only 2 subway stations.
It is a very interesting analysis for me because my general experience is that folks who have lower income rely heavily on public transit to navigate their day to day. This disproportionate access to subways impacts the quality of life for individuals in these neighborhoods. They are much more limited in their ability to find alternative transportation methods due to income restrains. I would likely extend this analysis and see if I could find data on households with vehicles to compare further.