Introduction

The Yankees and RedSox have arguably the best rivalry in sports. As a Yankee fan who grew up in Maine, I have experienced this firsthand. My goal is to look at the current moods of the two fanbases to see how they are feeling about their teams. My plan to accomplish this is to do a sentiment analysis on recent tweets using nrc’s sentiment groupsings.

Analysis

The first step is to search twitter for recent tweets using #Yankees first. I will select the most recent 1000 tweets to analyze.

nyy <- searchTwitter('#Yankees', n = 1000)
nyy_df <- twListToDF(nyy)

Next I will use tidytext to break the tweets into their individual words before analyzing. I will remove stop words like “and” and “the” since they are not relevent to the analysis. I’ll display the top 10 most tweeted words to see what fans have been focusing on.

#use tidytext to isolate individual words and remove stop words
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
nyy_words <- nyy_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

nyy_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10) %>% kable()
## Selecting by n
word n
#yankees 943
rt 629
fans 143
#auction 140
autographed 139
#derekjeter 138
time 135
hit 130
win 127
@nyysportstalk 126

To do the sentiment analysis I will bring in the nrc lexicon, which will map the words in the tweets to different feelings. After joining nrc to the yankee words, I will display the word count per sentiment.

nrc <- sentiments %>% 
  filter(lexicon == "nrc")

nyy_word_sentiments <- nyy_words %>% inner_join(nrc, by = "word")
nyy_word_sentiments$team <- "Yankees"

nyy_word_sentiments %>% group_by(sentiment) %>% summarise(n = n()) %>% arrange(desc(n)) %>% kable()
sentiment n
positive 723
trust 530
anticipation 412
negative 312
joy 215
anger 178
surprise 79
fear 53
sadness 48
disgust 35

With the analysis completed for #Yankees, I will do the same for #RedSox. With that completed I then combine the dataframes.

#grab the last 1000 redsox tweets
bos <- searchTwitter('#RedSox', n = 1000)
bos_df <- twListToDF(bos)

#use tidytext to isolate individual words and remove stop words
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
bos_words <- bos_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

#combine with nrc to get sentiment score
bos_word_sentiments <- bos_words %>% inner_join(nrc, by = "word")
bos_word_sentiments$team <- "RedSox"

#combine two teams
team_sentiments <- rbind(nyy_word_sentiments, bos_word_sentiments)

The final step is to take the combined dataframe and plot the sentiment analyis as percents of total tweets.

sent_df <- team_sentiments %>% 
  group_by(team, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n/sum(n))

ggplot(sent_df, aes(x = sentiment, y = frequency, fill = team)) + 
  geom_bar(stat = "identity", position = "dodge") +
  xlab("Sentiment") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Conclusion

Looking a the sentiment analysis for both #Yankees and #RedSox, there doesnt seem to be a clear consensus of fans’ feelings on the teams. Yankee fans’ tweets are 10% more likely to be positive than Redsox, however they also are more likely to have anger, disgust, or fear. RedSox fans are more likely to be sad, but also express joy and anticipation. Anyone who has listend to sports radio callers or follow a team on twitter would not be surprised by the high variablity in fan reaction. Since it is the offseason, feelings of anticipation are to be expected. Given that both teams have been successful franchises lately, the high trust sentiment also seems reasonable. Overall, both teams seem to have fans feeling more positive than negative, which makes sense given the strong years that both had this past season.