Are the MTA's Constant Delays Benefitting Ridsharing Companies?

Natalie, Luis, Iden and Alejandro
12/20/2018

Abstract

  • We Believe New York City's subway performance has had an inverse relationship. Delays are increasing as ridership has increased.
  • We used indicators such as ridership and mean distance between failures to evaluate a regression model to attempt and prove our theory correct
  • We also look to predict the impact OTP has on ridership with the rise of ride sharing services like Uber and Lyft.

Introduction

  • The NYC subway system is one of the oldest and most widely used public transportation systems in the world.
  • the MTA has seen an increase in ridership by 9.94%, from 1.53 million rides in per year in 2015 to 1.72 million rides in 2017.
  • “Delays caused by overcrowding have quadrupled since 2012 to more than 20,000 each month.” This prompted New York Governor Andrew Cuomo to declare a state of emergency for the MTA

Literature review

  • A team at Virginia Tech examined Washington Metropolitan Area Transit Authority data to predict on-time performance in real-time. The team used “RNN (recurrent neural network) predictions of the future network state to make OTP predictions for passengers who have not yet entered the network”, using real-time passenger and train movement data provided by WMATA.
  • While similar,we did not have similar data at hand for this project, as not all New York City trains are equipped with real-time tracking
  • Much of the literature is also clear that real-time tracking offers the best results. The MTA is far behind in this regard. This represents a big challenge in predicting subway on-time performance for New York City in that the poor data collection methods is hard to overcome.

Methodology

  • Our primary source is the MTA's Key Performance Indicators dataset taken directly from their website.
  • Using the dplyr package, we stretch the INDICATOR_NAME column into several columns so that each row of the data frame is the recorded data for a specific month and year combination.

Methodology (cont...)

  • Ridership peaked three years ago, and has been on a sharp decline. If the MDBF and OTP were more closely linked, then we might see a slight improvement in OTP as the ridership eases.
  • As subway OTP decreases in recent years, a quick look at the for hire vehicle data shows how its usage has been increasing. Perhaps a sign that as delays or poor performance strike a station, people rush to use ride sharing alternatives. Those companies are also hiring more vehicles to keep up with the increased demand.

Experimentation/Results

  • Our first attempt utilized linear regression on all available variables (FAILURE, RIDERSHIP, INJURY, ELEV, ESCA).

Experimentation/Results

  • Despite the fact that both INJURY and ESCA were deemed insignificant by the model, we keep them as any customer injury would surely impact the OTP of a subway, as would the availability of escalators. If people had to sprint for a train in a station with broken escalators, or lacking escalators, there may be a stronger likelihood of others holding the door as they attempt to trickle in at the last moment.

  • A second model attempts to incorporate the FHV data, incorporating TOTAL.TRIPS, SHARED.TRIPS, and VEHICLES into a linear regression model. Unfortunately the additional data impacts the model negatively. While commuters may use Uber and Lyft more frequently in the face of falling OTP, it has no impact in predicting OTP on a monthly basis.

Experimentation/Results

  • Our last model tries to use the data for predicting ridership, testing how both FHV and OTP affect commuters. It goes through several iterations in an attempt to build a naive linear model accounting for any serial correlation via a GLS approach. This is to mitigate the seasonality aspect of the data, which if done through cleaning, would have left us with too few observations to work with (< 30). This is done by building a time series model on lags of the first and second order of each predictor, then fitting the model on the lagged predictors.

Experimentation/Results

  • To correct the lag we treat the response variable, RIDERSHIP, by taking its log. Then refit the model on the month-over-month growth of each variable.

Conclusion

  • Our attempts at predicting the MTA's monthly on-time performance score had a decent showing with a very naive linear model. More complicated models exist, allowing for real-time prediction given real-time position of subway cars and passenger flows. As the MTA continues to update their subway cars and stations, such data could help commuters better plan their routes.
  • The model itself could be improved if more features from the MTA are provided, such as the age of the line, the car models in use, or the last time track work had been done on that particular line.
  • Regardless, as subway work and maintenance increase, and New York City attracts more businesses like Amazon's second headquarters to Long Island City, it is in the MTA's best interest to look at improving on-time performance in a more efficient manner.