MTA R Progress Report Assignment

Author

Victoria Campbell

Published

May 2, 2025

Project Thesis

The NYC MTA system prides itself in being the largest transportation system in North America, serving approximately 15 million people a day, meaning it serves as a general example that other developing cities around America may emulate.. As commuters travel to and through subway stations, there has been growing commentary on the extreme conditions experienced in many. New York City has begun taking legislative measures to address the overall influx of heat and debilitating air quality through infrastructure and development around many areas, but there has been little to no action being taken regarding such factors and the MTA. Research has shown that not only have many of the train stations experienced serious extreme heat, but the air quality has also played a part in creating such an unappealing experience for travelers. Some scientists have begun looking into how this is affecting the health of all who enter these stations. The data gathered by the end of this project overall aims to prove the longer you are traveling, the more you will be exposed to the hazardous air quality. R is used to find the trends of the air quality, relate it to specific MTA stations and using income as a variable to show a disparity through background.

Packages

The following packages were used for this project.

Part 1: MTA Data Analysis

The First part of this Project focused on cleaning and examining the MTA Data. It was cleaned and listed by the 2022 Ridership data, sourced from the MTA’s open data. Limitation: GeoIDs had to be manually added because it was not provided. I used the Census Geocorder to find for each.

###cleaning and Loading in MTA Ridership transit data

Show the code

RD2022 <- read.csv("D:/Gtech-331/RD2022.csv", stringsAsFactors = FALSE)
names(RD2022)<-c("SubwayStation","Lines","Ridership")
RD2022 <- RD2022[-c(1,12), ]
#Adding GeoIDs to the top subway stations
RD2022$GeoID<-c("36061011300","36061007600","36061009200",
                "36061005000","36061011100","36061001501","36081026700",
                "36061011300","36081087100","36061010100")

###Understanding and Visualizing the MTA Transit Originally, I hoped to create a chloropeth map to show that the busiest train systems had the worst surrounding air quality. To do this, I loaded in a shape file of all the MTA Subway Stations.

Limitations: Point Values of the Stations did not include GeoIDs, only their coordinates.

Show the code

#Loading locational data of all stations.
MTA_SubStations <- st_read("D:/Gtech-331/geo_export_07ac367e-9e83-4387-8cc5-246e23fc1c93.shp")

Reading layer `geo_export_07ac367e-9e83-4387-8cc5-246e23fc1c93' from data source `D:\Gtech-331\geo_export_07ac367e-9e83-4387-8cc5-246e23fc1c93.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 496 features and 0 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -74.25196 ymin: 40.51276 xmax: -73.7554 ymax: 40.90313
CRS:           NA

Show the code

# pure geometry
plot(st_geometry(MTA_SubStations), main='Plotting of MTA Subway Stations')

Show the code

st_crs(MTA_SubStations)

Coordinate Reference System: NA

###Part 2: Census Tract Data Manipulation

Once the MTA Data was cleaned, I decided to work with the census data, loading it in as it will be my main point to relate/load my dataframes into.

Show the code

#Loading and census tract in.
nyc_census_tracts_sf <-st_read("D:/Gtech-331/nyct2020_25a/nyct2020.shp")

Reading layer `nyct2020' from data source `D:\Gtech-331\nyct2020_25a\nyct2020.shp' using driver `ESRI Shapefile'
Simple feature collection with 2325 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 913175.1 ymin: 120128.4 xmax: 1067383 ymax: 272844.3
Projected CRS: NAD83 / New York Long Island (ftUS)

Show the code

str(nyc_census_tracts_sf)

Classes 'sf' and 'data.frame':  2325 obs. of  15 variables:
 $ CTLabel   : chr  "1" "14.01" "14.02" "18" ...
 $ BoroCode  : chr  "1" "1" "1" "1" ...
 $ BoroName  : chr  "Manhattan" "Manhattan" "Manhattan" "Manhattan" ...
 $ CT2020    : chr  "000100" "001401" "001402" "001800" ...
 $ BoroCT2020: chr  "1000100" "1001401" "1001402" "1001800" ...
 $ CDEligibil: chr  "I" "I" "E" "I" ...
 $ NTAName   : chr  "The Battery-Governors Island-Ellis Island-Liberty Island" "Lower East Side" "Lower East Side" "Lower East Side" ...
 $ NTA2020   : chr  "MN0191" "MN0302" "MN0302" "MN0302" ...
 $ CDTA2020  : chr  "MN01" "MN03" "MN03" "MN03" ...
 $ CDTANAME  : chr  "MN01 Financial District-Tribeca (CD 1 Equivalent)" "MN03 Lower East Side-Chinatown (CD 3 Equivalent)" "MN03 Lower East Side-Chinatown (CD 3 Equivalent)" "MN03 Lower East Side-Chinatown (CD 3 Equivalent)" ...
 $ GEOID     : chr  "36061000100" "36061001401" "36061001402" "36061001800" ...
 $ PUMA      : chr  "4121" "4103" "4103" "4103" ...
 $ Shape_Leng: num  10833 5075 4459 6392 5779 ...
 $ Shape_Area: num  1843005 1006117 1226206 2399277 1740174 ...
 $ geometry  :sfc_MULTIPOLYGON of length 2325; first list element: List of 2
  ..$ :List of 1
  .. ..$ : num [1:48, 1:2] 972082 972185 972399 972385 972407 ...
  ..$ :List of 1
  .. ..$ : num [1:24, 1:2] 973173 973311 973330 973793 973686 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:14] "CTLabel" "BoroCode" "BoroName" "CT2020" ...