Next I will use tidytext to break the tweets into their individual words before analyzing. I will remove stop words like “and” and “the” since they are not relevent to the analysis. I’ll display the top 10 most tweeted words to see what fans have been focusing on.
#use tidytext to isolate individual words and remove stop words
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
nyy_words <- nyy_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
nyy_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10) %>% kable()
## Selecting by n
#yankees |
943 |
rt |
629 |
fans |
143 |
#auction |
140 |
autographed |
139 |
#derekjeter |
138 |
time |
135 |
hit |
130 |
win |
127 |
@nyysportstalk |
126 |
With the analysis completed for #Yankees, I will do the same for #RedSox. With that completed I then combine the dataframes.
#grab the last 1000 redsox tweets
bos <- searchTwitter('#RedSox', n = 1000)
bos_df <- twListToDF(bos)
#use tidytext to isolate individual words and remove stop words
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
bos_words <- bos_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
#combine with nrc to get sentiment score
bos_word_sentiments <- bos_words %>% inner_join(nrc, by = "word")
bos_word_sentiments$team <- "RedSox"
#combine two teams
team_sentiments <- rbind(nyy_word_sentiments, bos_word_sentiments)
The final step is to take the combined dataframe and plot the sentiment analyis as percents of total tweets.
sent_df <- team_sentiments %>%
group_by(team, sentiment) %>%
summarize(n = n()) %>%
mutate(frequency = n/sum(n))
ggplot(sent_df, aes(x = sentiment, y = frequency, fill = team)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Sentiment") +
ylab("Percent of tweets") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
