Sample: Sentiment Analysis with Tweets

Evaluating sentiment on Twitter related to Mac Jones heading into his 2nd season as the New England Patriots starting Quarterback

Megan Georges
2022-08-26

Searching for Tweets related to Mac Jones

The New England Patriots had their first NFL preseason game at Gillette Stadium on August 11th, 2022. Mac Jones, quarterback out of the University of Alabama, became the starting Quarterback in his rookie season last year for the Pats. Heading into year 2 for Jones, there is great discussion and speculation on how the Pats season will go and how Jones will perform. Since the start of this year’s preseason, let’s see how the QB is being discussed on Twitter.

Mac Jones in Week 1 of 2021 NFL Preseason, Source: Primero y Diez
# Scrape Twitter
# RawData <- search_tweets(q = "Mac+Jones OR mac+jones", 
#                       n = 10000,
#                       type = "recent",
#                       lang = "en",
#                       include_rts = FALSE,
#                      retryonratelimit = TRUE)
# Save to files
# write_csv(RawData,"MacJones_Tweets.csv")

Unfortunately, due to limited authorization, I could only pull tweets as far back as 08/20, but there are still plenty of tweets to work with here (6,294). So we’ll be looking at Tweets mentioning Mac Jones by name between August 20th, 2022 and August 29th, 2022.

Explore Trends and Top Tweets

Mac <- read_csv("MacJones_Tweets.csv")
# 10 most retweeted
MacTopRT <- Mac %>% arrange(desc(retweet_count))
MacTopRT <- MacTopRT[1:10,]

select(MacTopRT, "text", "retweet_count", "favorite_count") %>% 
  kable(col.names = c("Tweet", "Retweets", "Likes")) %>% 
  kable_styling() %>% scroll_box(width = "100%", height = "500px")
Tweet Retweets Likes

Jayon Brown with his 2nd INT at Raiders joint practice against Mac Jones and takes hit to the house!

#RaiderNation https://t.co/xMXfCO3XJX
299 2765
Mac Jones what you doing? <U+0001F440> https://t.co/WhtOxpzDdW 262 2272
MaC jOnEs Is A pRo BoWlEr <U+0001F602><U+0001F602><U+0001F602> https://t.co/FBOdw7I4ad 259 2027
Mac Jones with a laser to Luke Masterson https://t.co/ptLbai7rdf 202 2236
2nd year starters preseason PFF grades: 1: Justin Fields (90.4) 2: Trevor Lawrence (60.2) 3: Trey Lance (59.4) 4: Mac Jones (57.4) 5: Davis Mills (54.5) 6: Zach Wilson (48.4) 197 1729

Let’s settle this once and for all. Who’s your ride or die?

Like - Mac Jones Retweet - Tom Brady
180 738

#Patriots Mac Jones throws a pick to #Raiders rookie LB Luke Masterson.

https://t.co/7CmBjwaQPZ
178 1598

Top 10 in NFL Jersey Sales

1 - Josh Allen 2 - Joe Burrow 3 - Jonathan Taylor 4 - Justin Herbert 5 - TJ Watt 6 - Tom Brady 7 - Mac Jones 8 - Maxx Crosby 9 - Kenny Pickett 10 - Baker Mayfield
157 2082

Ravens released WR Slade Bolden, one of Mac Jones’ favorite targets at Alabama.

Need the reunion asap. https://t.co/prYKwNZxsO
136 1911
Remember Mac Jones is an NFL Top 100 player. Here he is showing off his elite pocket presence and elusiveness!<U+0001F923><U+0001F923><U+0001F923> https://t.co/DGkLNi5H42 128 1145
# summary stats of retweets and likes for tweets about Mac
summary(Mac$retweet_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   1.243   0.000 299.000 
summary(Mac$favorite_count)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    1.00   13.75    2.00 2765.00 
# daily tweet popularity related to Mac
MacDate <- Mac %>% 
  mutate(
    date = as.Date(created_at),
    hour = hour(created_at),
    minute = minute(created_at),
    second = second(created_at)
  ) %>% 
  mutate(
    format_date = format(date, "%m/%d/%Y"),
    format_hour = paste(hour, minute, second, sep = ":")
  )

TopDateMac <- MacDate %>%
  group_by(date) %>%
  slice(which.max(retweet_count))

ggplot(data = TopDateMac) + 
  geom_line(mapping = aes(x = date, y = retweet_count), size = 1.2) +
  theme_bw() +
  labs(title = "Retweet Count for Most Popular Daily Tweet About Mac Jones", 
       x = "Date", 
       y = "Retweet Count")

There was a preseason game on Aug 27th, which explains the peak there! Let’s see what the top tweet was about from Aug 23rd.

Aug23 <- MacDate %>% group_by(date) %>% top_n(1, retweet_count) %>% 
  select(date, text) %>% filter(str_detect(date, "2022-08-23")) 
Aug23$text
[1] "Jayon Brown with his 2nd INT at Raiders joint practice against Mac Jones and takes hit to the house!\n\n#RaiderNation https://t.co/xMXfCO3XJX"

It was a tweet and video about a joint practice session with the Patriots and Raiders.

Most Retwweted Tweet from Aug 23rd
# Total number of daily tweets about Mac Jones
DailyMac <- MacDate %>%
  group_by(date) %>%
  count(date)

ggplot(data = DailyMac) + 
  geom_line(mapping = aes(x = date, y = n), size = 1.2) +
  theme_bw() +
  labs(title = "Daily Total Number of Tweets About Mac Jones", 
       x = "Date", 
       y = "Tweet Count")

A majority of the Tweets from this time period occurred on Aug 27th, when the Patriots played the Raiders in their final preseason match of the year.

Data clean-up

# plot the top 20 words 
Jones <- read_csv("MacJones_Tweets.csv")
Jones %>%
  dplyr::select(text) %>%
  unnest_tokens(word, text) %>%
  count(word, sort = TRUE) %>%
  top_n(20) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(x = word, y = n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
      labs(x = "Top Word",
      y = "Count",
      title = "Count of top words found in tweets containing 'Mac Jones'") +
  theme_classic()

The text definitely needs to be cleaned up, removing stopwords and links.

Jones_clean <- Jones
# remove links
Jones_clean$text <- str_replace_all(Jones$text, "https://t.co/[a-z,A-Z,0-9]*","")
# remove twitter handles
Jones_clean$text <- str_replace_all(Jones_clean$text, "@[[:alnum:]_]{4,}", "")
# create corpus
Jones_corpus <- corpus(Jones_clean$text)
summary(Jones_corpus, n = 10)
Corpus consisting of 6294 documents, showing 10 documents:

   Text Types Tokens Sentences
  text1    24     27         1
  text2    35     44         1
  text3    10     11         1
  text4    14     17         2
  text5    35     45         2
  text6    13     16         1
  text7    24     25         2
  text8    25     28         3
  text9    20     22         1
 text10     7      8         1
# create custom stopwords for Mac and Jones
mystopwords <- c("Mac", "Jones", "mac", "jones")

# create document-feature matrix and clean up data
Jones_dfm <- tokens(Jones_corpus, 
                      remove_punct= TRUE,
                      remove_numbers = TRUE,
                      remove_symbols = TRUE) %>%
  tokens_tolower() %>%
  tokens_select(pattern=stopwords("en"),
                selection="remove") %>%
  tokens_remove(pattern = phrase(mystopwords), 
                valuetype = 'fixed') %>%
  dfm() 

textplot_wordcloud(Jones_dfm, max_words = 60, 
                   min_size = 2, max_size = 5.5)

# feature co-occurrence matrix
fcm <- fcm(Jones_dfm)
# Pull top features
fcm_feats <- names(topfeatures(fcm, 40))
# Retain top features in fcm
fcm <- fcm_select(fcm, pattern = fcm_feats, selection = "keep")

textplot_network(fcm, edge_color = "indianred1", edge_alpha = .3, 
                 vertex_labelcolor = "darkblue", vertex_color = "darkred", 
                 vertex_labelsize = 5.5, vertex_size = 3)

# plot the top 20 words after cleaning text
features_Jones_dfm <- textstat_frequency(Jones_dfm, n = 20)

# Sort by reverse frequency order
features_Jones_dfm$feature <- with(features_Jones_dfm, reorder(feature, -frequency))

ggplot(features_Jones_dfm, aes(x = feature, y = frequency)) +
  geom_col() +
  xlab(NULL) +
  coord_flip() +
      labs(x = "Top Word",
      y = "Count",
      title = "Top words found in tweets containing 'Mac Jones'") +
  theme_classic()

Okay, now we have some more original words related to Mac.

Sentiment Analysis

I’ll use the NRC Sentiment and Emotion Lexicons as a way see how people are discussing Mac Jones.

NRC_Mac <- dfm_lookup(Jones_dfm, dictionary = data_dictionary_NRC)
Mac_Sent <- convert(NRC_Mac, to = "data.frame")
Mac_Sent <- subset(Mac_Sent, select = -c(doc_id))
Mac_Sent <- colSums(Mac_Sent) %>% as_tibble()
sentiments <- c("anger", "anticipation", "disgust", "fear", "joy", 
                "negative", "positive", "sadness", "surprise", "trust") %>% 
  as.data.frame()
MacSummary <- cbind(sentiments, Mac_Sent)
MacSummary <- mutate(MacSummary, sentiment = .) %>% 
  select("sentiment", "value")
MacSummary
      sentiment value
1         anger  2130
2  anticipation  2751
3       disgust  1579
4          fear  1933
5           joy  1774
6      negative  3671
7      positive  4329
8       sadness  1881
9      surprise  1043
10        trust  2656
ggplot(data = MacSummary, mapping = aes(x = sentiment, y = value)) + 
  geom_col(stat="identity", colour="red", fill = "darkblue") +
  theme_bw(base_size = 12) +
  theme(axis.text.x = element_text(angle=30, vjust = 0.7, size = 11)) +
  labs(title = "Sentiment/Emotion Analysis for Tweets Mentioning Mac Jones", 
       subtitle = "Using the NRC Data Dictionary",
       x = "Sentiment", y = "Sentiment Occurrence Count")

There is a high frequency of positive sentiment, anticipation, and trust heading into Jones’ 2nd season! And a large amount of negative sentiment… we’ll see how this changes as the season progresses.