McGowan’s Assignment 4

Analysis

The first step is to search twitter for recent tweets using #Yankees first. I will select the most recent 1000 tweets to analyze.

nyy <- searchTwitter('#Yankees', n = 1000)
nyy_df <- twListToDF(nyy)

Next I will use tidytext to break the tweets into their individual words before analyzing. I will remove stop words like “and” and “the” since they are not relevent to the analysis. I’ll display the top 10 most tweeted words to see what fans have been focusing on.

#use tidytext to isolate individual words and remove stop words
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
nyy_words <- nyy_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

nyy_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10) %>% kable()

## Selecting by n

word	n
#yankees	943
rt	629
fans	143
#auction	140
autographed	139
#derekjeter	138
time	135
hit	130
win	127
@nyysportstalk	126

To do the sentiment analysis I will bring in the nrc lexicon, which will map the words in the tweets to different feelings. After joining nrc to the yankee words, I will display the word count per sentiment.

nrc <- sentiments %>% 
  filter(lexicon == "nrc")

nyy_word_sentiments <- nyy_words %>% inner_join(nrc, by = "word")
nyy_word_sentiments$team <- "Yankees"

nyy_word_sentiments %>% group_by(sentiment) %>% summarise(n = n()) %>% arrange(desc(n)) %>% kable()

sentiment	n
positive	723
trust	530
anticipation	412
negative	312
joy	215
anger	178
surprise	79
fear	53
sadness	48
disgust	35

With the analysis completed for #Yankees, I will do the same for #RedSox. With that completed I then combine the dataframes.

#grab the last 1000 redsox tweets
bos <- searchTwitter('#RedSox', n = 1000)
bos_df <- twListToDF(bos)

#use tidytext to isolate individual words and remove stop words
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
bos_words <- bos_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

#combine with nrc to get sentiment score
bos_word_sentiments <- bos_words %>% inner_join(nrc, by = "word")
bos_word_sentiments$team <- "RedSox"

#combine two teams
team_sentiments <- rbind(nyy_word_sentiments, bos_word_sentiments)

The final step is to take the combined dataframe and plot the sentiment analyis as percents of total tweets.

sent_df <- team_sentiments %>% 
  group_by(team, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n/sum(n))

ggplot(sent_df, aes(x = sentiment, y = frequency, fill = team)) + 
  geom_bar(stat = "identity", position = "dodge") +
  xlab("Sentiment") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

McGowan’s Assignment 4

Sean McGowan

November 16, 2017

Introduction

Analysis

The first step is to search twitter for recent tweets using #Yankees first. I will select the most recent 1000 tweets to analyze.

Next I will use tidytext to break the tweets into their individual words before analyzing. I will remove stop words like “and” and “the” since they are not relevent to the analysis. I’ll display the top 10 most tweeted words to see what fans have been focusing on.

To do the sentiment analysis I will bring in the nrc lexicon, which will map the words in the tweets to different feelings. After joining nrc to the yankee words, I will display the word count per sentiment.

With the analysis completed for #Yankees, I will do the same for #RedSox. With that completed I then combine the dataframes.

The final step is to take the combined dataframe and plot the sentiment analyis as percents of total tweets.

Conclusion