The NYC MTA system prides itself in being the largest transportation system in North America, serving approximately 15 million people a day, meaning it serves as a general example that other developing cities around America may emulate.. As commuters travel to and through subway stations, there has been growing commentary on the extreme conditions experienced in many. New York City has begun taking legislative measures to address the overall influx of heat and debilitating air quality through infrastructure and development around many areas, but there has been little to no action being taken regarding such factors and the MTA. Research has shown that not only have many of the train stations experienced serious extreme heat, but the air quality has also played a part in creating such an unappealing experience for travelers. Some scientists have begun looking into how this is affecting the health of all who enter these stations. The data gathered by the end of this project overall aims to prove the longer you are traveling, the more you will be exposed to the hazardous air quality. R is used to find the trends of the air quality, relate it to specific MTA stations and using income as a variable to show a disparity through background.
Packages
The following packages were used for this project.
Part 1: MTA Data Analysis
The First part of this Project focused on cleaning and examining the MTA Data. It was cleaned and listed by the 2022 Ridership data, sourced from the MTA’s open data. Limitation: GeoIDs had to be manually added because it was not provided. I used the Census Geocorder to find for each.
###cleaning and Loading in MTA Ridership transit data
Show the code
RD2022 <-read.csv("D:/Gtech-331/RD2022.csv", stringsAsFactors =FALSE)names(RD2022)<-c("SubwayStation","Lines","Ridership")RD2022 <- RD2022[-c(1,12), ]#Adding GeoIDs to the top subway stationsRD2022$GeoID<-c("36061011300","36061007600","36061009200","36061005000","36061011100","36061001501","36081026700","36061011300","36081087100","36061010100")
###Understanding and Visualizing the MTA Transit Originally, I hoped to create a chloropeth map to show that the busiest train systems had the worst surrounding air quality. To do this, I loaded in a shape file of all the MTA Subway Stations.
Limitations: Point Values of the Stations did not include GeoIDs, only their coordinates.
Show the code
#Loading locational data of all stations.MTA_SubStations <-st_read("D:/Gtech-331/geo_export_07ac367e-9e83-4387-8cc5-246e23fc1c93.shp")
Reading layer `geo_export_07ac367e-9e83-4387-8cc5-246e23fc1c93' from data source `D:\Gtech-331\geo_export_07ac367e-9e83-4387-8cc5-246e23fc1c93.shp'
using driver `ESRI Shapefile'
Simple feature collection with 496 features and 0 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -74.25196 ymin: 40.51276 xmax: -73.7554 ymax: 40.90313
CRS: NA
Show the code
# pure geometryplot(st_geometry(MTA_SubStations), main='Plotting of MTA Subway Stations')
Show the code
png("mta_substations_geometry.png", width =800, height =600)plot(st_geometry(MTA_SubStations), main ="Plotting of MTA Subway Stations")dev.off()
png
2
Part 2: Census Tract Data Manipulation
Once the MTA Data was cleaned, I decided to work with the census data, loading it in as it will be my main point to relate/load my dataframes into.
Show the code
#Loading and census tract in.nyc_census_tracts_sf <-st_read("D:/Gtech-331/nyct2020_25a/nyct2020.shp")
Reading layer `nyct2020' from data source `D:\Gtech-331\nyct2020_25a\nyct2020.shp' using driver `ESRI Shapefile'
Simple feature collection with 2325 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 913175.1 ymin: 120128.4 xmax: 1067383 ymax: 272844.3
Projected CRS: NAD83 / New York Long Island (ftUS)
Part 3: Demographic Data
In my project thesis, I discussed how I believe that people with a “worse: socioeconomic background would be experiencing bad air quality in the MTA longer because they have to travel farther but I did not know if I wanted to base this on income or race. As a result, I decided to look into both for better comparison. (This part is still ongoing as the data I worked with is incorrect. This data shows income per Capita based on race.)
NOTE: The darker the color, the lower the income of that census tract. Total refers to the total income per Capita of all races (combined)
Show the code
#Loading and Cleaning of Income (Per Mean Capita) data of New York CityPopulation <-read.csv("D:/Gtech-331/IncomeRace.csv")Population <- Population[,-c(3:5,7:9,11:13,15:17,19:21)]nyc_pop <- readr::read_csv("D:/Gtech-331/IncomeRace.csv",lazy =FALSE)
Rows: 2201 Columns: 13
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): GEO_ID, Name
dbl (11): GEOID, Total, White, Black, American Indian/Alaska Native, Asian, ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Show the code
#To join the Income data frame with Census Tractsnyc_sf_merged <- base::merge(nyc_census_tracts_sf, nyc_pop, by.x ="GEOID", by.y ="GEOID")names(nyc_sf_merged)
[1] "GEOID"
[2] "CTLabel"
[3] "BoroCode"
[4] "BoroName"
[5] "CT2020"
[6] "BoroCT2020"
[7] "CDEligibil"
[8] "NTAName"
[9] "NTA2020"
[10] "CDTA2020"
[11] "CDTANAME"
[12] "PUMA"
[13] "Shape_Leng"
[14] "Shape_Area"
[15] "GEO_ID"
[16] "Name"
[17] "Total"
[18] "White"
[19] "Black"
[20] "American Indian/Alaska Native"
[21] "Asian"
[22] "Native Hawaiian and Other Pacific Islander"
[23] "Other"
[24] "Two or more races"
[25] "Hispanic or Latino origin (of any race)"
[26] "White alone, not Hispanic or Latino"
[27] "geometry"
Show the code
#Loading the Mapmapview(nyc_sf_merged, zcol=c('Total'))