This sentiment analysis was performed using PBOT’s 2019 E-scooter Survey, and Rstudio.
First, data was parsed into a comma separated value file, then pared by omitting stopwords, punctuation, and numbers in the text to form a corpus of meaningful words.
Meaningful words from survey
Three natural language processing lexicons were used to score positive-negative sentiment of each word: AFINN, Bing, and Syuzhet; the R ‘sign’ function was used to compare the vectors from each of these methods on the same binary scale. The results can be interpreted as zero representing neutral sentiment, 1 being most positive, and -1 being most negative. The mean sentiment ranges from 0.2437 to .3731; there is to be a moderately positive sentiment expressed in the survey.
Summary statistics of each method
The NRC Word-Emotion Association Lexicon was used to categorize the meaningful words into emotions beyond positive or negative sentiment; the graph below shows the percentage of each sentiment-type as a fraction of all the meaningful words in the survey response.
Percentage of words with each emotion expressed
The results show that in written responses, the pilot program is receiving positive responses; the respondents are expressing trust and anticipation!
This analysis was conducted using the geospatial data that companies were required to report to PBOT, and ArcMap and Rstudio integrated development platform.
First, data were imported as a comma separated value file into RStudio, using R programming language, and counted the number of trips ending and beginning in each census block. Maps were created to show the number of trips starting and ending in each census block.
Start of Trips
End of Trips
These counts were joined to the census block shapefile in ArcMap, and visualized the data. RStudio was used to sort pairs into different start and end, and count pairs of start and end blocks and the occurrence of each. These counts with were joined with the census block pairs, and x and y coordinates of each pair which was visualized using the XY to line function in ArcMap.
I used this initial mapping to create a map showing the most common start-end pairs, which showed that some of the most common routes are routes traveling outward from downtown, or within downtown.
Spatial data of Trimet Stops, publicly provided by Trimet, was used to determine whether density of trimet stops was correlated with the density of scooter trip origin or destination. ArcMaps was used to spatially join the trimet point data and census block delineation polygons, then used R to calculate the number of points per polygon. I imported this CSV back into GIS and re-joined the census block polygons to create a visualization of the densities.
In order to find out whether the number of Trimet Stops in each census block correlated with the number of scooter trips started or ended in each block, I conducted a linear regression analysis. By plotting the number of trimet stops in each census block on the same graph as the number of e-scooter trip start or ends, I created a line of best fit. I also conducted statistical analysis on this model to test how well the number of trimet stops could predict the number of trip start or ends.
While the p-value for the variable was significant, the p-value for the intercept was not less than 0.5, so the model itself is lacking significant predictive power. From the scatterplot, we can see that the data is very spread out, since there are many outliers in the scooter trip data. This makes it hard to determine how correlated the two variables since the outliers change the line of best fit so that it doesn’t acucrately model the rest of the data.
The concentration of Trimet’s public transportation stops does not seem visually correlated with the areas of scooter trip start and end; while scooters may be ridden through high-concentration areas for Trimet, the scooter trip does not spatially seem to connect with places where riders could most easily get on or off a Trimet form of transportation from their scooter’s parking spot.