Introduction

We develop regression models for analyzing impact of the ride sharing services on transit demand. Our analysis is motivated by the problem of the loss of transit ridership over the past several years. Those losses overlap with the introduction of Transportation Network Companies (TNCs). There is a seeming correlation between the two, which has led to speculation that TNCs have caused transit ridership loss. However, many other factors outside of TNC ridership could be driving transit ridership loss, including changing populations, shifting land use patterns, transit service changes, macro-economic factors, and so on. Our model allows for disaggregate analysis of long-term transit ridership data from Chicago between 2010 and 2020, before and after the introduction of TNC services, along with TNC trip patterns to determine relationships between TNC ridership and transit ridership for similar areas.

Our work builds on the previous analysis of Jacob W. Ward et al. (2021) who analysed the effect of TNC on transit ridership and vehicle ownership across US during the 2011-2017 period. Similar to our results, they found out that there is no effect of TNC on transit use on averages, however there are local effects in the areas with large transit ridershops and high income. Jacob W. Ward et al. (2019) and Jacob W. Ward et al. (2018) authors analysed 2005-2015 data and dound out that vehicle registration declined by 3% on average as a result of TNC eneting the transportaiton market.

We analyze TNC and transit data collected in Chicago at monthly time resolution and community areas as spacial units of analysis. The analysis of non-overlapping spatial areal units has been analyzed in statistics, agriculture, and epidemiology (Wall (2004), Brewer and Nolan (2007), Besag and Higdon (1999), Lesage (1997)).

Data

Transportation Network Companies (TNC)

The TNC data provided by the city of Chicago contains hourly number of trips between each origin/destination community areas. We have first summarized this table to daily trips, the resulting data has 794179 and ranged from 2018-11-01 to 2020-02-29. Each record has origin, destination, date and averages over attributes of the trip.

tripdate pickupca dropoffca ntrips cost ttime
2018-11-01 1 5 6 13.08333 1.333333
2018-11-01 1 7 12 16.16667 79.750000
2018-11-01 1 8 179 18.24453 60.082353
2018-11-01 1 10 2 20.00000 1.000000
2018-11-01 1 11 2 13.75000 1.500000
2018-11-01 1 12 1 20.00000 1.000000

Then we have aggregated TNC trips by month and drop-off location, so the resulting table has daily average departures for each CA and each month of the observed period

ca my tnc
1 2018-11-01 567.6364
1 2018-12-01 609.3333
1 2019-01-01 453.8696
1 2019-02-01 636.5238
1 2019-03-01 701.9524
1 2019-04-01 648.4762

The average number of daily trips across the region 1.6082857^{5} and the following barplot shows daily averages for each month of the observed period

The map below shows daily averages (on the log scale) across the entire observed period for each of the community areas

L-System

The average number of daily trips across the region 4.6336576^{5} and the daily plot is given below

barplot(height = tmp$ct,names.arg = tmp$my)

The map below shows daily averages (on the log scale) across the entire observed period for each of the community areas

Bus

The average number of daily trips across the region 7.0193534^{5} and the daily plot is given below

barplot(height = tmp$ct,names.arg = tmp$my)

The map below shows daily averages (on the log scale) across the entire observed period for each of the community areas

Community Area Analysis

We use a model proposed by Bernardinelli et al. (1995), which represents the spatio-temporal pattern in the mean response with spatially varying linear time trends. We assume the data is Gaussian. The model estimates autocorrelated linear time trends for each community area (areal unit). Thus it is appropriate if the goal of the analysis is to estimate which areas are exhibiting increasing or decreasing (linear) trends in the response over time. The full model specification is given below.

We assume we have a set of \(K\) non-overlaping areal units \(S_1,\ldots,S_K\) and data is recorded for each areal unit for \(N\) consecutive time steps \(t = 1,\ldots,N\), then the general hierarchical model is \[ \begin{aligned} Y_{kt} \sim & f(y_{kt} \mid \mu_{kt},\nu^2)\\ g(\mu_{kt}) = & x^T_{kt}\beta+O_{kt} + \psi_{kt}\\ \beta \sim & N(\mu_{\beta},\Sigma_{\beta}) \end{aligned} \] Given that we assume Normal (continious) data, we use \(f =\) Noraml density and link function \(g\) to be identity function.

The spatio-temporal correlations are modeled using \[ \psi_{kt} = \phi_k + (\alpha + \delta_k)\left(\dfrac{t - \bar t}{N}\right) \] Thus, the temporal pattern is modeled by a local trend \(\delta_k\) and the spatial correlations are modeled via \(\phi_k\), both are random effects, which are conditionally normal \[ \begin{aligned} \phi_k \mid \phi_{-k}, W \sim & N\left(\dfrac{1}{c}\sum_{j=1}^Kw_{kj}\phi_j, \dfrac{\tau^2_{int}}{c}\right),~~~c = \sum_{j=1}^Kw_{kj}-1 +1/\rho_{int}\\ \delta_k \mid \delta_{-k},W \sim & N\left(\dfrac{1}{q}\sum_{j=1}^Kw_{kj}\delta_j, \dfrac{\tau^2_{slo}}{q}\right),~~~q = \sum_{j=1}^Kw_{kj}-1 +1/\rho^2_{slo}\\ \tau^2_{int},\tau^2_{slo} \sim & IG(a,b)\\ \rho_{int},\rho_{slo} \sim & Uniform(0,1)\\ \alpha \sim N(\mu_{\alpha},\sigma^2_{\alpha}) \end{aligned} \] Each community area has its own linear trend with intercept \(\phi_k\) and slope \(\alpha + \delta_k\). Here \(\rho_{int},\rho_{slo}\) are spatial dependence parameters, with values of one corresponding to strong spatial smoothness that is equivalent to the intrinsic CAR prior proposed by Besag et al. (1991), while values of zero correspond to independence (for example if \(\rho_{slo}\) then \(\delta_k \sim N(0, \tau^2_{slo})\).

Analysis Results

Model with \(y\) being TNC counts (Poisson) and predictors are bus and rail average daily counts on log-scale

Median 2.5% 97.5%
(Intercept) 0.8069 0.1768 1.4466
log(bus) 0.2332 0.1837 0.2796
log(rail) 0.3932 0.3489 0.4402
alpha -0.0847 -0.1069 -0.0605
tau2.int 3.3423 2.2283 5.3536
tau2.slo 0.0406 0.0237 0.0712
rho.int 0.7319 0.3979 0.9496
rho.slo 0.4749 0.1174 0.8667
var1 var2 var3 var4 var5 var6 var7
5% -0.0177575 0.0351669 -0.042837 -0.0259428 0.0135247 -0.0230802 0.0162286
95% 0.1139849 0.1142256 0.064057 0.0664844 0.0714912 0.0321995 0.0648056

Finally we can look at the medians on the map

Finally, we can look at the random spatial effects \(\phi_k\) (correlations) not captured by the main effects

We can see that spatial effect is stronger in the central and northern parts of the city which

Origin-Destination Flows

Another hypothesis conidered in this analysis is that temporal analysis of flows between community areas will lead to descovering effects of specific areas. Modeled the flows using a gravity model, which is a class of log-linear regression that has been previously was shown to be effective in traffic flows modeling Chen, Banks, and West (2019).

Our gravity model is based on the class of poisson log-linear model of area-to-area flows. Given community areas \(1,\ldots,n\), we model \(y_{ij}\) the number of TNC trips from area \(i\) to area \(j\) as conditionally independet Poisson variables with mean \(t_{ij}\) \[y_{ij}\sim Poisson(t_{ij}).\]

The means \(t_{ij}\) depend on the characteristic of the origin and destination zones and on the characteristics of the flow. In the log-Poisson model we model the logarithm of the mean \[\theta_{ij} = \log t_{ij}.\]

In our particular version of the gravity model, we assume \[\theta_{ij} = m+\log a_i + \log b_{j} + \log f_{ij}\]

The characteristics of the areas as origins and destinatoin are incorporated into parameters \(a_i\) and \(b_j\) respectively, and the interaction term \(f_{ij}\) represents additional factors arising from network characteristics. Specifically, we consider \[f_{ij} = g^Tx_{ij} + \log h_{ij}.\] Here \(x_{ij}\) are known predictors, that characterize the flow on the network. Specifically , we use cost of TNC travel and cost of transit trvel between areas \(i\) and \(j\) and \(h_{ij}\) represents positive random interaction effect. We can see additioanl \(h_{ij}\) term as a way to allow additional variation in the Poisson model, when \(h_{ij}\) are assumed to be log-normal or gamma distributed with mean of 1. In the absence of regressors, the resulting distribution over \(y_{ij}\) is Negative Binomial, which is quite often used to model transportation flows.

The figure below shows a heat plot of the TNC flows

We used the observed TNC flows (averages are shown in the figure above) as our observed data set and used the POLARIS-estimated area-to-area generalized travel cost as our input \(x_{ij}\). Theis heat map for the generalized travel cost is shown below

Below we analyze the random effect of a destination on the number of the TNC trips (on log-scale), while controlling for generalized cost and TNC. The bar plot of the number of trips for each destination shows that the variation is large

Thus, we expect the random effect to be of non-trivial size for each of the destinations.

Summary of Findings

Bibliography

Bernardinelli, L., D. Clayton, C. Pascutto, C. Montomoli, M. Ghislandi, and M. Songini. 1995. “Bayesian Analysis of Space-Time Variation in Disease Risk.” Statistics in Medicine 14 (21-22): 2433–43. https://doi.org/10.1002/sim.4780142112.
Besag, J., and D. Higdon. 1999. “Bayesian Analysis of Agricultural Field Experiments.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 61 (4): 691–746. https://doi.org/10.1111/1467-9868.00201.
Brewer, Mark J., and Andrew J. Nolan. 2007. “Variable Smoothing in Bayesian Intrinsic Autoregressions.” Environmetrics 18 (8): 841–57. https://doi.org/10.1002/env.844.
Chen, Xi, David Banks, and Mike West. 2019. “Bayesian Dynamic Modeling and Monitoring of Network Flows.” Network Science 7 (3): 292–318. https://doi.org/10.1017/nws.2019.10.
Chen, Xi, Kaoru Irie, David Banks, Robert Haslinger, Jewell Thomas, and Mike West. 2018. “Scalable Bayesian Modeling, Monitoring, and Analysis of Dynamic Network Flow Data.” Journal of the American Statistical Association 113 (522): 519–33. https://doi.org/10.1080/01621459.2017.1345742.
Lesage, James P. 1997. “Bayesian Estimation of Spatial Autoregressive Models.” International Regional Science Review 20 (1-2): 113–29. https://doi.org/10.1177/016001769702000107.
Wall, Melanie M. 2004. “A Close Look at the Spatial Structure Implied by the CAR and SAR Models.” Journal of Statistical Planning and Inference 121 (2): 311–24. https://doi.org/10.1016/S0378-3758(03)00111-3.
Ward, Jacob W., Jeremy J. Michalek, Inês L. Azevedo, Constantine Samaras, and Pedro Ferreira. 2019. “Effects of on-Demand Ridesourcing on Vehicle Ownership, Fuel Consumption, Vehicle Miles Traveled, and Emissions Per Capita in u.s. States.” Transportation Research Part C: Emerging Technologies 108: 289–301. https://doi.org/https://doi.org/10.1016/j.trc.2019.07.026.
Ward, Jacob W., Jeremy J. Michalek, Constantine Samaras, Inês L. Azevedo, Alejandro Henao, Clement Rames, and Tom Wenzel. 2021. “The Impact of Uber and Lyft on Vehicle Ownership, Fuel Economy, and Transit Across u.s. Cities.” iScience 24 (1): 101933. https://doi.org/https://doi.org/10.1016/j.isci.2020.101933.
Ward, Jacob W, Jeremy J Michalek, Inês L Azevedo, Constantine Samaras, and Pedro Ferreira. 2018. “On-Demand Ridesourcing Has Reduced Per-Capita Vehicle Registrations and Gasoline Use in US States.”
West, Mike. 1994. “Statistical Inference for Gravity Models in Transportation Flow Forecasting.”