In 2018, SEPTA experienced its lowest ridership in nearly 20 years, losing nearly 14 million bus trips from 2017 to 2018. This is troubling, but not unique among North American cities. Transit ridership has been steadily declining since 2014 across the United States and Canada. Still, Philadelphia had one of the largest declines in bus ridership over the last year, more than Los Angeles, Dallas, Chicago & Boston.
Multiple studies have evaluated the factors that affect transit ridership, such as external factors like income to gas prices to personal preferences. Others have looked at internal factors that transit agencies and cities control like fare prices and level of service.
Our analysis builds off of this growing body of research to look at the spatial relationship between bus stop distances. We feel this is a topic of great important as Philadelphia grapples with strategies to improve ridership, and embarks on a comprehensive bus network redesign in the coming years.
Our predictive model is meant to show how distance between SEPTA’s bus stops affects ridership. We built this model on Route 21, which runs east-west on Chestnut and Walnut streets, between 69th Street Transportation Center and Penn’s Landing, and tested it along Route 16, which runs north-south between Cheltenham and City Hall on Broad Street. We chose these routes because of their relatively high ridership and service between high-density neighborhoods and job centers. We also chose lines that have different socio-economic and ethnic ridership profiles. Ultimately, this ended up complicating our test model results, but provides some important insights about modeling spatial factors along SEPTA’s bus system, which serves a large and diverse geographic area.
Many studies have explored the determinants of transit ridership, looking at a wide array of internal and external factors that influence ridership.
Internal factors relate to decisions, policies and conditions that are determined directly by the transit agency or municipality. Fare pricing is one of the most frequently analyzed internal factor influencing transit ridership. But improvements in service (frequency, coverage, reliability) are often more important than fare prices in determining ridership.
External factors, on the other hand, related to broader economic and geographic conditions like gas prices and unemployment rates. Studies have found some external factors to be more significant than others. For example, population size and employment rate have demonstrated statistical significance across multiple studies.
Many studies of internal influences find out of all the factors transit operators do control are quality of service, quantity of transit service, and fare pricing. A very recent study from Transportation of McGill Research found reduction in quantity in bus service in particular as a key determinants of ridership. This was found to be statistically significant among 25 North American cities.
When it comes to spatial factors, studies find that housing density and employment density are the most important spatial variables for determining transit demand. However, while spatial factors like these are used in studies to explain transit ridership variation, the colinearity among spatial variables and with other socio-economic factors raise more questions about causes and effects and their relative influence ridership.
2017 American Community Survey (ACS) and Longitudinal Employer-Household Dynamic (LEHD) datasets were used to gain demographic and employment data at the census block group level for the City of Philadelphia. This data was then mapped in ArcMap in preparation for a spatial join. Bus stop and average weekday ridership data was also acquired from the Southeastern Pennsylvania Transportation Authority (SEPTA). The bus stop points were then projected onto a map of the City of Philadelphia with census block groups from the Open Data Philly site. With the three datasets from the ACS, LEHD, and SEPTA projected onto ArcMap, we are ready to conduct our analysis.
We used the editor toolbar to create lines and placed the vertices of these lines where the bus stops were located for Route 21 and Route 16. We then used the Split Line at Vertices tool to divide the route line into smaller segments. After doing this, we calculate the geometry of each line segment and associate the length to a bus stop point using the spatial join function. Every bus stop now has a distance in feet of how close it is to the nearest bus stop. Next, we merge the Longitudinal Employer-Household Dynamic Dataset with the American Community Survey Dataset to have one shapefile with all the census block groups and the demographics associated with each. We then need to calculate our buffers.
Scholars still debate whether Euclidean Distance or Network Analyst is more effective when analyzing spatial components in a model. We decided to use the network analyst tool in GIS because we want to consider only areas that pedestrians can traverse. Since travelling across space is not ubiquitous, we must consider areas where people may not have access such as private property. We created a network dataset using the street network of Philadelphia acquired from the Philly Open Data site and then created a series of three buffers of 300, 600, and 900 feet that define how difficult it is to access a bus stop with the existing street grid. The map is displayed in figure x. We used the 900 foot buffer around the bus stop which has a radius of less than a quarter mile to represent the catchment area of the stop.
Each 900 meter buffer has a unique ID that matches that of the bus stop points. We use spatial join to join the buffers to all the census block groups that it intersects and then average out all the data of those block groups. For instance, if a 900 meter buffer around a stop intersects 3 census block groups than ArcMap would average all of the demographic data from the intersected block groups and add them into the row for each buffer. After doing this, we used a join based on the unique ID attribute column to transfer the demographic data from the buffers to the bus stop points. The attribute table of our bus stop points dataset finally has all of the data needed to conduct our analysis. We can now transfer the data into R.
In R we will visualize the distributions of our data using histograms, density, and scatter plots. We will then create a correlation matrix, analyze the means of all of our variables, and map the ridership numbers to gain a better idea of how strong our model is and how well it will perform. If we see that the independent variables are too correlated, than we will choose to not include them in our final mode. After running all these preliminary tests, we will perform several Ordinary Least Squares (OLS) Regressions and compare them using both a VIF test and an Anova test with all the models. We will then choose the model that best predicts our data and plot the residuals with a line of best of fit. This will tell us how well our original model with the Route 21 variables as our training set predicted ridership on the Route 16 bus line, helping us understand the generalizability of our model. If the predicted and observed variables are close, than we can have more confidence in our final model.
boundingBox <- st_read("http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson")%>%
st_transform(crs = 6318)
cg <- st_read("http://data.phl.opendata.arcgis.com/datasets/1eed3c9b6d3c4561aaa62e1fc2dd81c4_0.geojson")%>%
st_transform(crs = 6318)
busroutes <- st_read("C:/Users/AnthonyJ/Box/Spring 2019/PlanningByNumbers/FinalProject/Stop Data/BusRoutes.shp")%>%
st_transform(crs = 6318)
mpd <- c("OBJECTID_12", "Weekday_Bo", "Route", "Latitude", "Longitude")
mp <- c %>%
dplyr::select(mpd)
tmap_mode("view")
# make spatial
dat_sf <- st_as_sf(mp, coords = c("Longitude", "Latitude"), crs = 6318)
xyz <- st_read("C:/Users/AnthonyJ/Box/Spring 2019/PlanningByNumbers/FinalProject/Route_21_16_Dist.shp")