Yelp Sentiment Analysis: Hockey Stadium Reviews

Author

Thomas Bonnici

Introduction

I chose to do a sentiment analysis on two NHL stadiums, particularly Bridgestone Arena (home of the Nashville Predators) and Florida Live Arena (home of the Florida Panthers). In this sentiment analysis, I’ll look at different emotions that fans are feeling and do some in depth comparisons for each arena. My guess would be that the sentiment scores in general would be higher in Bridgestone since it’s widely regarded as one of the top stadiums in the NHL, whereas Florida Live is on the opposite end of the spectrum. Here are some photos of the two stadiums:

Quick Disclaimer:

I originally chose to scrape the Scotiabank Saddledome’s site, but decided not to use it due to the number of reviews there were in the database in comparison to Bridgestone and FLA Live.

Data Collection

I was able to scrape three separate Yelp pages to do this analysis between the Scotiabank Saddledome, Bridgestone Arena, and Florida Live Arena. I combined all of them into one file which you can see here: NSH_FLA_comb_data

Data Dictionary

  • reviewer_name = Name of the person who reviewed the stadium on Yelp

  • reviewer_location = Where the reviewer is from

  • review_date = Shows when the review was uploaded to Yelp

  • review_content = The review itself

  • location = Name of the arena being reviewed

Data Cleaning/Data Editing

Here’s the code I used:

# Data cleaning with the date
nsh_fla_reviews$date <- mdy(nsh_fla_reviews$review_date)

# Create a review ID variable based on the date
nsh_fla_reviews <- 
  nsh_fla_reviews %>% 
  arrange(date) %>% 
  mutate(review_id = row_number())

# unnest tokes by brewery and remove stop words
tidy_nsh_fla_reviews <- 
  nsh_fla_reviews %>% 
  unnest_tokens(word, review_content) %>% 
  anti_join(stop_words)
Joining with `by = join_by(word)`
## Simple positivity (valence) scoring ##
# Start with simple positivity and negativity.
# We use bing for simplicity.
bing <- 
  get_sentiments("bing")

nsh_fla_counts <- 
  tidy_nsh_fla_reviews %>% 
  group_by(location, word) %>% 
  summarize(n = n()) %>% 
  inner_join(bing)
`summarise()` has grouped output by 'location'. You can override using the
`.groups` argument.
Joining with `by = join_by(word)`
# First, count the total number of words per location
word_counts <- tidy_nsh_fla_reviews %>%
  group_by(location) %>%
  summarise(total_words = n())

# Join the word counts back to the main dataset
nsh_fla_word_counts_with_total_words <- tidy_nsh_fla_reviews %>%
  left_join(word_counts, by = "location")

# Then, perform the analysis using the word counts
nsh_fla_word_counts_with_percentage <- nsh_fla_word_counts_with_total_words %>%
  group_by(location, word) %>%
  summarise(n = n()) %>%
  left_join(word_counts, by = "location") %>%
  mutate(percentage = n / total_words * 100)  # Calculate percentage
`summarise()` has grouped output by 'location'. You can override using the
`.groups` argument.

The purpose of this code is to do some maneuvering and data creation to be able to execute the analysis below.

Analysis

Question 1: How many positive and negative numbers are being expressed between Bridgestone Arena and Florida Live Arena

Explanation

Looking at this graph we can see that overall FLA Live Arena has more negative words while FLA Live has one more positive word (287 vs 286). This graph shows that there were more words overall that were counted for FLA Live vs Bridgestone. Because of this, I was able to create a percent of total calculation to essentially compare “apples to apples”.

Question 2: What emotions are people expressing while at these stadiums?

`summarise()` has grouped output by 'sentiment'. You can override using the
`.groups` argument.

Explanation

From this graph, we can see that the most common emotions being expressed are trust, joy and anticipation. Generally, this is logical as whether you are in the stadium for a concert, a game, or another event, you should be feeling trust (trust in the facilities, concessions, fan experience), joy (cheering for team/artist) and anticipation (between periods, before the game, before the concert). When breaking it down between Bridgestone Arena and Florida Live Arena, we see that despite my earlier assumptions, it seems that my assumptions are incorrect. FLA Live scores higher in the positive sentiment as well as most of the positive sentiments (trust, anticipation, surprise). The one that Bridgestone did slightly better in was joy. Overall, these results were shocking to me, and lead me to believe that there are other factors that are leading to this outcome (concerts, fan experience, etc).

Question 3: How have positivity scores changed over an eight year span for each stadium?

Joining with `by = join_by(word)`
`summarise()` has grouped output by 'location'. You can override using the
`.groups` argument.

Explanation

For this graph, I was able to look into average positivity scores from 2016 to 2024 for each stadium. First thing I noticed was there were no reviews from 2016 for Bridgestone. In general, for Bridgestone we see a relatively stable score around 0.65 with it peaking in 2020 at almost a score of 0.8. For FLA Live, there seems to be a general downward trend since 2016. Since it’s known for not having the strongest facilities, this makes sense as things like the building deteriorating or aspects like the bathrooms or consessions needing an upgrade. Additionally, the fan experience for both of these arenas are vastly different in the regular season, with Bridgestone providing a fun and rowdy atmosphere and florida struggling to fill the seats when the team is struggling to win games.

Conclusion

Overall, it was very interesting diving into yelp stadium data. Seeing how sentiments are used is a good way to gage the fan experience in this case when interpreted correctly. I think to make a deeper analysis I would have to look at other factors outside of the ones I mentioned and the ones I found through the reviews to truly paint the whole picture. I believe that this analysis would be very useful in other aspects like business or film to truly understand who you are selling your products/films to. All in all, this was very insightful and I learned a lot through the process.