In recent years, public sentiment and user experience have become pivotal aspects in evaluating the success and efficiency of various services, including the realm of public transit systems.
This paper delves into the exploration of factors that correlate with sentiments toward key U.S. public transit systems. By examining sentiment data gathered from Reddit, coupled with external tabular data, the study aims to unravel the nuanced interplay between user perceptions and their experiences with these transit systems. The ultimate goal is to identify key areas that significantly impact sentiment, thereby providing valuable shortcuts for optimizing the user experience within the realm of public transportation.
This research not only contributes to the growing field of sentiment analysis but also offers practical implications for public transit authorities seeking to optimize their services based on existing data coupled with user feedback and sentiments.
https://smartasset.com/mortgage/best-cities-for-public-transportation
Below, the volume of Reddit threads for each city is illustrated. From 2010 to 2024, there is a noticeable increase in conversations over the years, reflecting a growing interest in public transit topics. However, the volume of discussions varies significantly across cities.
For instance, Atlanta has not experienced as significant an increase in thread volume compared to Los Angeles, which shows a sharp rise in conversations. Similarly, Atlanta’s total volume of conversations is much lower compared to cities like Baltimore or Miami, which have seen more sustained and higher levels of discussion over the years.
These graphs provide essential context for our analysis, highlighting the dynamic and evolving nature of the Reddit data source. They emphasize how public interest and engagement in public transit topics can differ widely between cities, influenced by local transit developments, policies, and other socio-economic factors. Understanding these trends is crucial for interpreting the sentiment and operational data in our study.
The bigram analysis of Reddit threads shows significant interest in improving Atlanta’s public transit infrastructure. Heavy rail discussions focus on urban development, streetcar extensions, express lanes, and rail stations. Bus-related conversations highlight rapid transit, network redesign, and projects like the Clifton Corridor. These discussions reflect the community’s engagement in enhancing urban mobility and connectivity through sustainable development and improved transit options, underscoring the evolving nature of public transit discourse in Atlanta.
| word1 | word2 | n |
|---|---|---|
| urbanize | atlanta | 93 |
| atlanta | streetcar | 47 |
| streetcar | extension | 32 |
| express | lanes | 29 |
| rail | stations | 26 |
| line | rail | 25 |
| word1 | word2 | n |
|---|---|---|
| marta | bus | 170 |
| rapid | transit | 119 |
| santa | marta | 118 |
| network | redesign | 100 |
| clifton | corridor | 92 |
| light | rail | 83 |
The bigram analysis of Baltimore’s Reddit threads shows heavy rail discussions focusing on metro stations like Silver Spring and the Red Line, highlighting station accessibility and development. Bus-related topics emphasize light rail, school buses, and commuter buses, with a notable interest in Montgomery County and electric school buses. This analysis seem to underscore the community’s focus on transit accessibility and sustainable solutions.
| word1 | word2 | n |
|---|---|---|
| metro | station | 143 |
| silver | spring | 141 |
| metro | stations | 138 |
| red | line | 127 |
| college | park | 118 |
| grosvenor | strathmore | 96 |
| word1 | word2 | n |
|---|---|---|
| light | rail | 82 |
| school | bus | 82 |
| montgomery | county | 77 |
| commuter | bus | 70 |
| silver | spring | 68 |
| electric | school | 66 |
The bigram analysis of Boston’s Reddit threads reveals distinct focus areas for heavy rail and bus systems. Heavy rail discussions prominently feature the Green Line, Commuter Rail, and Red Line, with attention to slow zones and line extensions. For buses, the emphasis is on the MBTA bus network, its redesign, and specific lines like the Green, Orange, and Red Lines. This seems to highlight the community’s engagement with improving and expanding Boston’s transit infrastructure.
| word1 | word2 | n |
|---|---|---|
| green | line | 196 |
| commuter | rail | 184 |
| red | line | 159 |
| boston | globe | 127 |
| slow | zones | 109 |
| line | extension | 89 |
| word1 | word2 | n |
|---|---|---|
| mbta | bus | 257 |
| network | redesign | 138 |
| mbta | driver | 107 |
| green | line | 89 |
| orange | line | 89 |
| red | line | 68 |
In Chicago, Reddit discussions about the heavy rail system often center on the Red, Blue, Brown, and Green Lines, with mentions of the Cross-Town and issues like unruly behavior. For buses, the focus is on the CTA bus network and drivers, along with media coverage from CBS News and the Chicago Sun-Times. This reflects the community’s interest in both the operational aspects and media representation of Chicago’s transit systems.
| word1 | word2 | n |
|---|---|---|
| red | line | 82 |
| blue | line | 52 |
| brown | line | 43 |
| green | line | 33 |
| cross | town | 28 |
| unruly | behavior | 25 |
| word1 | word2 | n |
|---|---|---|
| cta | bus | 329 |
| cbs | news | 162 |
| cta | driver | 121 |
| news | cta | 117 |
| chicago | sun | 112 |
| sun | times | 112 |
In Cleveland, Reddit discussions about the heavy rail system prominently feature the Red Line, police department, and Waterfront Line, along with occasional mentions of events like the solar eclipse. Bus-related conversations highlight Public Square, Tower City, and the Red Line, as well as notable locations like University Circle and downtown Cleveland. This shows a community engaged in both daily transit operations and significant local events.
| word1 | word2 | n |
|---|---|---|
| red | line | 122 |
| police | department | 59 |
| waterfront | line | 46 |
| solar | eclipse | 34 |
| eclipse | cleveland | 32 |
| cleveland | rta | 31 |
| word1 | word2 | n |
|---|---|---|
| public | square | 281 |
| tower | city | 140 |
| red | line | 103 |
| lighting | ceremony | 95 |
| university | circle | 79 |
| downtown | cleveland | 76 |
In Philadelphia, discussions about the heavy rail system focus on the master plan, SEPTA Lemoore, and Center City, with notable mentions of the Regional Rail and Transit Police. Meanwhile, conversations about the bus system highlight topics such as the Bus Revolution, transit system challenges, and network redesigns, indicating a community engaged in both infrastructure planning and addressing transit issues.
| word1 | word2 | n |
|---|---|---|
| master | plan | 88 |
| septa | lemore | 76 |
| center | city | 69 |
| regional | rail | 50 |
| transit | police | 44 |
| frankford | line | 43 |
| word1 | word2 | n |
|---|---|---|
| bus | revolution | 154 |
| mass | shooting | 60 |
| transit | system | 53 |
| network | redesign | 52 |
| public | transit | 46 |
| septa | bus | 45 |
In discussions about Brooklyn’s heavy rail system, key topics include congestion pricing, NYC subway operations, fare evasion, and subway stations. For the bus system, notable topics include the NY Post, Staten Island, articulated doors, and commuter routes. These paired word analyses reflect a strong community focus on both operational issues and policy debates related to public transit in Brooklyn.
| word1 | word2 | n |
|---|---|---|
| congestion | pricing | 140 |
| nyc | subway | 101 |
| fare | evasion | 91 |
| subway | stations | 52 |
| york | city | 33 |
| subway | gates | 32 |
| word1 | word2 | n |
|---|---|---|
| ny | post | 120 |
| staten | island | 70 |
| door | articulated | 64 |
| local | driver | 60 |
| commuter | route | 48 |
| proposes | commuter | 48 |
In Los Angeles, heavy rail discussions center on rolling stock, light rail, and the Del Metro system. In contrast, bus-related conversations frequently address San Diego, estimated costs, and Twitter presentations. This analysis reflects varied but significant discussions about both transit infrastructure and administrative matters in Los Angeles.
| word1 | word2 | n |
|---|---|---|
| de | la | 238 |
| en | la | 171 |
| rolling | stock | 117 |
| light | rail | 111 |
| del | metro | 97 |
| heavy | rail | 91 |
| word1 | word2 | n |
|---|---|---|
| san | diego | 117 |
| estimated | cost | 109 |
| presentation | 108 | |
| pjpgauto | webps | 104 |
| pngauto | webps | 103 |
| light | rail | 99 |
In Miami, heavy rail discussions primarily focus on Miami-Dade, Tri-Rail, and Metrorail stations, with notable mentions of downtown Miami and public transit. Bus-related conversations emphasize Miami-Dade, Miami Beach, South Beach, and various aspects of public transportation. These analyses indicate a strong focus on local transit infrastructure and service areas in Miami.
| word1 | word2 | n |
|---|---|---|
| miami | dade | 219 |
| tri | rail | 124 |
| metrorail | station | 100 |
| downtown | miami | 85 |
| public | transit | 79 |
| miami | beach | 68 |
| word1 | word2 | n |
|---|---|---|
| miami | dade | 181 |
| miami | beach | 153 |
| south | beach | 96 |
| public | transportation | 82 |
| public | transit | 76 |
| rapid | transit | 63 |
In Washington, D.C., heavy rail discussions feature innovative technologies such as augmented reality and smart glasses alongside core topics like the Washington Metro system and station upgrades. For the bus system, the focus is on metro service improvements and administrative aspects such as SmarTrip cards.
| word1 | word2 | n |
|---|---|---|
| washington | metro | 102 |
| metro | police | 52 |
| metro | station | 47 |
| augmented | reality | 44 |
| smart | glasses | 38 |
| red | line | 37 |
| word1 | word2 | n |
|---|---|---|
| metro | service | 156 |
| inauguration | day | 60 |
| pngauto | webps | 56 |
| monthly | pass | 53 |
| smartrip | card | 53 |
| metro | driver | 52 |
The BERT method was selected for its demonstrated accuracy and flexibility, as emphasized in the literature review. Unlike traditional NLP approaches like Word2Vec, BERT, a transformer model, captures the contextual meaning of words by considering the entire sentence at once (Bello et al., 2023). Research shows that BERT and other machine learning models outperform traditional lexicon-based methods in terms of accuracy and adaptability across various domains (Birjali et al., 2021; Devika et al., 2016). Consequently, this project prioritized the use of the BERT model for sentiment analysis.
After analyzing the sentiment for each of the 200-250 Reddit threads per agency and mode, the average sentiment scores for each agency/mode are shown below. The results reveal a tendency for bus systems (MB) to receive higher sentiment scores (indicating more positive sentiment) compared to heavy rail systems. Additionally, Brooklyn and Philadelphia exhibit higher sentiment scores compared to other cities.
| City | Mode | BERT |
|---|---|---|
| Brooklyn | MB | 0.4739550 |
| Philadelphia | MB | 0.4714314 |
| Brooklyn | HR | 0.4629644 |
| Philadelphia | HR | 0.4509231 |
| Cleveland | MB | 0.4490624 |
| Chicago | MB | 0.4489357 |
| Boston | MB | 0.4481430 |
| Washington | HR | 0.4476094 |
| Baltimore | MB | 0.4469073 |
| Washington | MB | 0.4401778 |
| Miami | MB | 0.4311090 |
| Los Angeles | HR | 0.4307615 |
| Atlanta | HR | 0.4300739 |
| Boston | HR | 0.4253136 |
| Baltimore | HR | 0.4225642 |
| Los Angeles | MB | 0.4194857 |
| Cleveland | HR | 0.4176193 |
| Atlanta | MB | 0.4165565 |
| Chicago | HR | 0.4119503 |
| Miami | HR | 0.4089177 |
The BERT score distribution among the 200-250 threads per agency and mode is shown below. Most cities and modes tend to have scores primarily distributed between 0.25 and 0.5. However, some, like Boston MB and Brooklyn MB, exhibit a wider distribution of scores.
To demonstrate the reliability of the BERT sentiment analysis, we present example threads with scores of 1 star and 5 stars. The agency and mode were selected at random.
Examining the sample texts and sentiment scores, it appears that the BERT method used in our analysis effectively captures not only the sentiments of individual words but also negations.
| City | Mode | BERT |
|---|---|---|
| Brooklyn_HR |
Per Cuomo’s press conference. Very recent development so no news article
yet. Definitely gives credence to the theory that as Cuomo comes under more and more scrutiny, he will reopen more and more sectors. ETA: source now that a news article has been published |
1 star |
| Brooklyn_HR | For those who still use a Metrocard, but also have a digital wallet, be aware of any unauthorized charges from the MTA. I was charged $2.75 twice, even though I don’t ever use my phone/the OMNY “tap to go” to pay for the subway. My phone is also always secure in a bag/pocket when paying with a Metrocard, which makes it even more bizarre. | 1 star |
| Brooklyn_HR |
Hello! Im not sure this fits here but I thought of all people you might like it. I designed and made this clock that tells me when the next three trains are coming to my station to take me to work. It updates every 30 seconds with MTA data which is surprisingly very accurate. The X, ! and E mean no more trains, transit alert and an error with the clock itself. |
5 stars |
Agency : The transit agency’s legal name.
City : The city in which the agency is headquartered.
Mode : A system for carrying transit passengers described by specific right-of-way (ROW), technology and operational features.
Primary.UZA.Population : The population of the urbanized area primarily served by the agency.
Service.Area.Sq.Miles : A measure of transit service in terms of area coverage (square miles).
Total.Mechanical.Failuress : The sum of major and other mechanical failures.
Maintenance_Facilities : Total of Under 200 Vehicles, 200 to 300 Vehicles, Over 300 Vehicles, and Heavy Maintenance Facilities.
Total_Miles : The sum of
total track miles (sum of previous track mile-related columns) and
total roadway miles.
Total.Stations : The sum of total number of stations.
Average Passenger Trip Length (APTL) : The average distance ridden by each passenger in a single trip, computed as passenger miles traveled (PMT) divided by unlinked passenger trips (UPT). May be determined by sampling, or calculated based on actual data.
Passengers per Vehicle Revenue Hour : The average number of passengers to board a vehicle/passenger car in one hour of service. For trains, this applies to passengers per hour on a single train car.
Total_Hours :
Actual Vehicles/ Passenger Car Hours +
Train Revenue Hours. The hours that vehicles/passenger cars
travel while in revenue service plus deadhead hours + The hours that
trains travel while in revenue service
Vehicle Length : The total length of the transit vehicles, measured in feet.
Seating Capacity : The number of seats that are actually installed in the vehicle, not including the driver, except for Vanpool modes.
Standing Capacity : The number of standing passengers that can be accommodated aboard the revenue vehicle during a normal full load (non-crush) in accordance with established loading policy or, in absence of a policy, the manufacturer’s rated standing capacity figures.
Total Employee Count : The number of Full Time or Part Time employees of the transit agency at the end of the fiscal year.
Vehicles Operated at Maximum Service (VOMS) : “The number of revenue vehicles operated to meet the annual maximum service requirement. This is the revenue vehicle count during the peak season of the year; on the week and day, that maximum service is provided.
Percent Agency Capital Responsibility : The percentage of capital responsibility the agency is responsible for. Transit agencies have direct capital responsibility for assets that they own, jointly own with another entity, or for assets that they are responsible for replacing, overhauling, refurbishing, or conducting major repairs on that asset, or the cost of those activities are itemized as a capital line item in the agency’s budget.
Total Modal VRM : Agency allocation of vehicle revenue miles to the respective UZA (see Vehicle Revenue Miles).
Total Modal UPT : Agency allocation of unlinked passenger trips to the respective UZA (see Unlinked Passenger Trips).
%UnderThreshold :
(Units Under Performance Threshold (2022)/Total Units (2022))*100.
The total units of a particular asset that has either met or exceeded
its useful life benchmark (vehicle assets), scores below a 3.0 on the
TERM Scale (facilities), or is operating under a performance restriction
(rail) divided by the total units of a particular asset.
avg_bert_score : The average BERT score was obtained by running the Reddit threads through the BERT algorithm then aggregating the scores by each transit agency and mode.
Capacity : The Capacity Score was determined by calculating the per capita seat-miles, which involves the following formula: Per capita capacity = (Total daily seats on a transit line × Route-miles of transit line in the zone) / (Total resident population + Employment). This score was computed as part of the MS-GIST Capstone Project using ArcGIS Pro.
Frequency : The Frequency Score represents the total daily number of transit services traversing the zone. It was determined using the ‘Calculate Transit Service Frequency’ tool, which accounted for a 24-hour window on a Tuesday. This score was computed as part of the MS-GIST Capstone Project using ArcGIS Pro.
Coverage : The Route Coverage Score was calculated by dividing the number of stops within a Traffic Analysis Zone (TAZ) by its land area in square miles. The score for each tract was derived using the ‘Summarize Within’ tool available in ArcGIS Pro. This score was computed as part of the MS-GIST Capstone Project.
LITA : The Local Index of Transit Availability (LITA), developed by Rood (1998) , provides a comprehensive measure of transit service intensity or accessibility within a specific area. It integrates three essential aspects of transit service—route coverage, frequency, and capacity—to offer a detailed understanding of how effectively public transit meets the needs of the population.
The LITA Score was computed by averaging three standardized metrics: Frequency, Coverage, and Capacity, then adding 5.5. This approach guarantees a positive score and provides a holistic measure of how route coverage, frequency, and capacity contribute to transit accessibility. This score was computed for each agency and mode as part of the MS-GIST Capstone Project.
Collinearity is a linear association between two predictors. Multicollinearity is a situation where two or more predictors are highly linearly related. In general, an absolute correlation coefficient of >0.7 among two or more predictors indicates the presence of multicollinearity
(M, R. (2019, July 15). Correlation and collinearity - how they can make or break a model. Medium. https://blog.clairvoyantsoft.com/correlation-and-collinearity-how-they-can-make-or-break-a-model-9135fbe6936a)
After removing variables from the dataset that are contextually
correlated with each other — such as
Seating Capacity/Standing Capacity,
Coverage score, and LITA score — and iterating
through the other variables that showed high correlations, the project
identified a subset of variables where the maximum correlation remained
below 0.7.
kable(head(top_correlated))
| Var1 | Var2 | Correlation |
|---|---|---|
| Total Employee Count | Total.Mechanical.Failures | 0.6668643 |
| Total.Mechanical.Failures | Total Employee Count | 0.6668643 |
| Maintenance_Facilities | Total.Mechanical.Failures | 0.6321443 |
| Total.Mechanical.Failures | Maintenance_Facilities | 0.6321443 |
| Vehicle Length | Average Passenger Trip Length (APTL) | 0.5835076 |
| Average Passenger Trip Length (APTL) | Vehicle Length | 0.5835076 |
Variance inflation factor (VIF) helps a formal detection-tolerance for multicollinearity. VIF of 5 or 10 and above (depends on the business problem) indicates a multicollinearity problem.
(M, R. (2019, July 15). Correlation and collinearity - how they can make or break a model. Medium. https://blog.clairvoyantsoft.com/correlation-and-collinearity-how-they-can-make-or-break-a-model-9135fbe6936a)
After iterating through the variables with high VIF values, the project identified a subset of variables where the maximum VIF value remained below 10.
lm(avg_bert_score ~ . - City - Mode - State - Agency - Service.Area.Sq.Miles - Primary..UZA.Population - Total_Miles - `Total Stations` - `Seating Capacity` - `Standing Capacity` - Norm..Coverage - Norm..Frequency - Norm..Capacity - LITA.normalized -Capacity-Frequency-Coverage -`ModeTOS Vehicles Operated in Maximum Service (2022)`- `Total Modal UPT (2022)` - `Total Modal VRM (2022)`,
data = DATA)
| Variable | VIF |
|---|---|
| Total_Hours | 59.629661 |
Total Employee Count
|
56.983021 |
Vehicle Length
|
10.659583 |
Passengers per Vehicle Revenue Hour
|
5.684679 |
%UnderThreshold
|
4.244305 |
Average Passenger Trip Length (APTL)
|
3.772985 |
| Maintenance_Facilities | 3.552925 |
| Total.Mechanical.Failures | 3.227442 |
Average Percent Agency Capital Responsibility (2022)
|
2.548931 |
| LITA | 1.862205 |
lm(avg_bert_score ~ . - City - Mode - State - Agency - Service.Area.Sq.Miles - Primary..UZA.Population - Total_Miles - `Total Stations` - `Seating Capacity` - `Standing Capacity` - Norm..Coverage - Norm..Frequency - Norm..Capacity - LITA.normalized -Capacity-Frequency-Coverage -`ModeTOS Vehicles Operated in Maximum Service (2022)`- `Total Modal UPT (2022)` - `Total Modal VRM (2022)`- Total_Hours,
data = DATA)
| Variable | VIF |
|---|---|
Vehicle Length
|
9.003063 |
Passengers per Vehicle Revenue Hour
|
4.502329 |
Total Employee Count
|
3.938892 |
%UnderThreshold
|
3.530695 |
| Maintenance_Facilities | 3.489145 |
Average Passenger Trip Length (APTL)
|
3.400331 |
| Total.Mechanical.Failures | 3.167238 |
Average Percent Agency Capital Responsibility (2022)
|
2.151768 |
| LITA | 1.860315 |
Once a subset of variables with a correlation below 0.7 and a maximum VIF value below 10 was identified, this subset was used to fit the regression model with the BERT score as the dependent variable.
The adjusted R-squared value is 0.626, indicating that the model
explains 62.6% of the variance in the dependent variable. The variables
Total.Mechanical.Failures and
Total Employee Count are statistically significant in this
model.
model <- lm(avg_bert_score ~ . - City - Mode - State - Agency - Service.Area.Sq.Miles - Primary..UZA.Population - Total_Miles - `Total Stations` - `Seating Capacity` - `Standing Capacity` - Norm..Coverage - Norm..Frequency - Norm..Capacity - LITA.normalized -Capacity-Frequency-Coverage -`ModeTOS Vehicles Operated in Maximum Service (2022)`- `Total Modal UPT (2022)` - `Total Modal VRM (2022)`- Total_Hours,
data = DATA)
save(model, file = "/Users/helenalindsay/Documents/Spring_24/model.RData")
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.4561786 | 0.0226049 | 20.1805136 | 0.0000* |
| Total.Mechanical.Failures | -0.0000017 | 0.0000007 | -2.4223331 | 0.0359* |
| Maintenance_Facilities | 0.0001825 | 0.0006503 | 0.2806929 | 0.7847 |
Average Passenger Trip Length (APTL)
|
-0.0043969 | 0.0031196 | -1.4094705 | 0.1890 |
Passengers per Vehicle Revenue Hour
|
-0.0000083 | 0.0002747 | -0.0302324 | 0.9765 |
Vehicle Length
|
-0.0000471 | 0.0004880 | -0.0965867 | 0.9250 |
Total Employee Count
|
0.0000019 | 0.0000008 | 2.3468416 | 0.0409* |
Average Percent Agency Capital Responsibility (2022)
|
-0.0000191 | 0.0000096 | -2.0043494 | 0.0729 |
| LITA | 0.0024902 | 0.0016112 | 1.5455817 | 0.1532 |
%UnderThreshold
|
-0.0002114 | 0.0004025 | -0.5251330 | 0.6109 |
| Metric | Value |
|---|---|
| R-squared | 0.8033871 |
| Adjusted R-squared | 0.6264356 |
| F-statistic | 4.5401525 |
| F-statistic p-value | 0.0134888 |
The project also implemented a stepwise regression model, which
resulted in an improved adjusted R-squared value of 0.707. In this
model, the significant variables are
Total Mechanical Failures,
Average Passenger Trip Length (APTL),
Total Employee Count, and
Average Percent Agency Capital Responsibility (2022). The
LITA variable, with a p-value of 0.072, would be considered
statistically significant at a 10% confidence level.
step_model <- step(
lm(avg_bert_score ~ . - City - Mode - State - Agency - Service.Area.Sq.Miles - Primary..UZA.Population - Total_Miles - `Total Stations` - `Seating Capacity` - `Standing Capacity` - Norm..Coverage - Norm..Frequency - Norm..Capacity - LITA.normalized -Capacity-Frequency-Coverage -`ModeTOS Vehicles Operated in Maximum Service (2022)`- `Total Modal UPT (2022)` - `Total Modal VRM (2022)`- Total_Hours,
data = DATA),
direction = "both",
trace = 0
)
save(step_model, file = "/Users/helenalindsay/Documents/Spring_24/step_model.RData")
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 0.4562710 | 0.0158994 | 28.697426 | 0.0000* |
| Total.Mechanical.Failures | -0.0000015 | 0.0000005 | -3.120816 | 0.0081* |
Average Passenger Trip Length (APTL)
|
-0.0050351 | 0.0017206 | -2.926331 | 0.0118* |
Total Employee Count
|
0.0000018 | 0.0000005 | 3.423495 | 0.0045* |
Average Percent Agency Capital Responsibility (2022)
|
-0.0000181 | 0.0000063 | -2.887716 | 0.0127* |
| LITA | 0.0025107 | 0.0012815 | 1.959174 | 0.0719 |
%UnderThreshold
|
-0.0002459 | 0.0001977 | -1.243825 | 0.2355 |
| Metric | Value |
|---|---|
| R-squared | 0.7998535 |
| Adjusted R-squared | 0.7074781 |
| F-statistic | 8.6587354 |
| F-statistic p-value | 0.0006267 |
The regression model was developed using a subset of variables that maintained a correlation below 0.7 and had a maximum Variance Inflation Factor (VIF) value under 10. With the BERT score as the dependent variable, the model aimed to explain the sentiment towards various public transit modes and agencies based on operational data, including:
Vehicle Length
Passengers per Vehicle Revenue Hour
Total Employee Count %UnderThreshold
Maintenance_Facilities
Average Passenger Trip Length (APTL)
Total.Mechanical.Failures
Average Percent Agency Capital Responsibility (2022)
LITA
The adjusted R-squared value for this model was 0.626, indicating
that the model explains 62.6% of the variance in the dependent variable.
The variables Total.Mechanical.Failures and
Total Employee Count were statistically significant in this
model.
A stepwise regression model was also developed in an attempt to increase the adjusted R-squared value and model strength. Using the following variables, the model achieved an improved adjusted R-squared value of 0.707.
Total.Mechanical.Failures
Average Passenger Trip Length (APTL)
Total Employee Count
Average Percent Agency Capital Responsibility (2022)
LITA %UnderThreshold
In this model, the significant variables were
Total Mechanical Failures,
Average Passenger Trip Length (APTL),
Total Employee Count, and
Average Percent Agency Capital Responsibility (2022). The
LITA variable, with a p-value of 0.072, would be considered
statistically significant at a 10% confidence level.
The stepwise regression model shows that
Total Mechanical Failures has a slightly negative
relationship with the BERT score, indicating that fewer mechanical
failures are associated with a more positive public perception of the
transit system. Similarly,
Average Passenger Trip Length (APTL) also has a slightly
negative relationship with the BERT score, meaning that shorter
passenger trips correlate with a more favorable public sentiment.
Additionally,
Average Percent Agency Capital Responsibility (2022) is
negatively correlated with the BERT score, suggesting that as an
agency’s capital responsibility increases, public sentiment tends to
decline. In contrast, Total Employee Count is positively
correlated with the sentiment score, indicating that a higher number of
employees in an agency is associated with a more positive public
perception. The stepwise regression model also indicates that the
LITA score, which measures the spatial accessibility,
frequency, and capacity of a transit system, has a positive correlation
with sentiment at a 10% confidence interval. This suggests that as the
LITA score improves, public sentiment towards the transit system becomes
more positive.
Based on the findings from the stepwise regression model, it appears that transit agencies could enhance public perception by focusing on several key areas. These areas include minimizing mechanical failures, optimizing passenger trip lengths, managing capital responsibilities, increasing employee numbers, and improving the LITA score. The following suggestions are derived from the results:
Reduce Mechanical Failures:
Preventative Maintenance: It may be beneficial to implement and maintain rigorous preventative maintenance schedules to reduce mechanical failures. Regular vehicle inspections and servicing can help prevent breakdowns and improve reliability.
Modernization of Fleet: Investing in modern, more reliable vehicles and equipment might significantly decrease mechanical issues and enhance service dependability.
Optimize Average Passenger Trip Length:
Network Design: Agencies might consider redesigning routes to minimize the distance passengers need to travel. Implementing more direct routes or express services could help reduce trip lengths.
Transit-Oriented Development: Promoting and supporting transit-oriented development could bring key destinations closer to transit hubs, thereby reducing the need for long commutes.
Manage Capital Responsibilities:
Strategic Partnerships: Collaborating with local, state, and federal agencies to share the burden of capital costs might be effective. Leveraging partnerships and securing grants can help manage financial responsibilities more efficiently.
Cost Management: Adopting efficient budgeting and cost management practices could ensure that capital projects are completed on time and within budget, thereby mitigating negative public perception related to high capital responsibilities.
Increase Employee Count:
Hiring Initiatives: It may be advantageous to invest in hiring more personnel, particularly in customer service and operational roles, to improve service delivery and responsiveness.
Training and Development: Providing comprehensive training and development programs could ensure employees are well-equipped to deliver high-quality service and handle customer inquiries effectively.
Improve LITA Score:
Accessibility Enhancements: Ensuring that transit services are easily accessible to all populations, including those with disabilities, might improve public perception.
Frequency and Capacity: Increasing the frequency and capacity of services could reduce waiting times and overcrowding, leading to a better overall experience for passengers.
Coverage Expansion: Expanding service coverage to underserved areas might ensure that more people have access to reliable public transit options.
By focusing on these areas, transit agencies could potentially enhance the overall quality and reliability of their services, leading to improved public perception and greater satisfaction among riders.