Evaluating sentiment on Twitter related to Mac Jones heading into his 2nd season as the New England Patriots starting Quarterback
The New England Patriots had their first NFL preseason game at Gillette Stadium on August 11th, 2022. Mac Jones, quarterback out of the University of Alabama, became the starting Quarterback in his rookie season last year for the Pats. Heading into year 2 for Jones, there is great discussion and speculation on how the Pats season will go and how Jones will perform. Since the start of this year’s preseason, let’s see how the QB is being discussed on Twitter.
# Scrape Twitter
# RawData <- search_tweets(q = "Mac+Jones OR mac+jones",
# n = 10000,
# type = "recent",
# lang = "en",
# include_rts = FALSE,
# retryonratelimit = TRUE)
# Save to files
# write_csv(RawData,"MacJones_Tweets.csv")
Unfortunately, due to limited authorization, I could only pull tweets as far back as 08/20, but there are still plenty of tweets to work with here (6,294). So we’ll be looking at Tweets mentioning Mac Jones by name between August 20th, 2022 and August 29th, 2022.
Mac <- read_csv("MacJones_Tweets.csv")
# 10 most retweeted
MacTopRT <- Mac %>% arrange(desc(retweet_count))
MacTopRT <- MacTopRT[1:10,]
select(MacTopRT, "text", "retweet_count", "favorite_count") %>%
kable(col.names = c("Tweet", "Retweets", "Likes")) %>%
kable_styling() %>% scroll_box(width = "100%", height = "500px")
| Tweet | Retweets | Likes |
|---|---|---|
|
Jayon Brown with his 2nd INT at Raiders joint practice against Mac Jones and takes hit to the house! #RaiderNation https://t.co/xMXfCO3XJX |
299 | 2765 |
| Mac Jones what you doing? <U+0001F440> https://t.co/WhtOxpzDdW | 262 | 2272 |
| MaC jOnEs Is A pRo BoWlEr <U+0001F602><U+0001F602><U+0001F602> https://t.co/FBOdw7I4ad | 259 | 2027 |
| Mac Jones with a laser to Luke Masterson https://t.co/ptLbai7rdf | 202 | 2236 |
| 2nd year starters preseason PFF grades: 1: Justin Fields (90.4) 2: Trevor Lawrence (60.2) 3: Trey Lance (59.4) 4: Mac Jones (57.4) 5: Davis Mills (54.5) 6: Zach Wilson (48.4) | 197 | 1729 |
|
Let’s settle this once and for all. Who’s your ride or die? Like - Mac Jones Retweet - Tom Brady |
180 | 738 |
|
#Patriots Mac Jones throws a pick to #Raiders rookie LB Luke Masterson. https://t.co/7CmBjwaQPZ |
178 | 1598 |
|
Top 10 in NFL Jersey Sales 1 - Josh Allen 2 - Joe Burrow 3 - Jonathan Taylor 4 - Justin Herbert 5 - TJ Watt 6 - Tom Brady 7 - Mac Jones 8 - Maxx Crosby 9 - Kenny Pickett 10 - Baker Mayfield |
157 | 2082 |
|
Ravens released WR Slade Bolden, one of Mac Jones’ favorite targets at Alabama. Need the reunion asap. https://t.co/prYKwNZxsO |
136 | 1911 |
| Remember Mac Jones is an NFL Top 100 player. Here he is showing off his elite pocket presence and elusiveness!<U+0001F923><U+0001F923><U+0001F923> https://t.co/DGkLNi5H42 | 128 | 1145 |
# summary stats of retweets and likes for tweets about Mac
summary(Mac$retweet_count)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 0.000 0.000 1.243 0.000 299.000
summary(Mac$favorite_count)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 1.00 13.75 2.00 2765.00
# daily tweet popularity related to Mac
MacDate <- Mac %>%
mutate(
date = as.Date(created_at),
hour = hour(created_at),
minute = minute(created_at),
second = second(created_at)
) %>%
mutate(
format_date = format(date, "%m/%d/%Y"),
format_hour = paste(hour, minute, second, sep = ":")
)
TopDateMac <- MacDate %>%
group_by(date) %>%
slice(which.max(retweet_count))
ggplot(data = TopDateMac) +
geom_line(mapping = aes(x = date, y = retweet_count), size = 1.2) +
theme_bw() +
labs(title = "Retweet Count for Most Popular Daily Tweet About Mac Jones",
x = "Date",
y = "Retweet Count")
There was a preseason game on Aug 27th, which explains the peak there! Let’s see what the top tweet was about from Aug 23rd.
Aug23 <- MacDate %>% group_by(date) %>% top_n(1, retweet_count) %>%
select(date, text) %>% filter(str_detect(date, "2022-08-23"))
Aug23$text
[1] "Jayon Brown with his 2nd INT at Raiders joint practice against Mac Jones and takes hit to the house!\n\n#RaiderNation https://t.co/xMXfCO3XJX"
It was a tweet and video about a joint practice session with the Patriots and Raiders.
# Total number of daily tweets about Mac Jones
DailyMac <- MacDate %>%
group_by(date) %>%
count(date)
ggplot(data = DailyMac) +
geom_line(mapping = aes(x = date, y = n), size = 1.2) +
theme_bw() +
labs(title = "Daily Total Number of Tweets About Mac Jones",
x = "Date",
y = "Tweet Count")
A majority of the Tweets from this time period occurred on Aug 27th, when the Patriots played the Raiders in their final preseason match of the year.
# plot the top 20 words
Jones <- read_csv("MacJones_Tweets.csv")
Jones %>%
dplyr::select(text) %>%
unnest_tokens(word, text) %>%
count(word, sort = TRUE) %>%
top_n(20) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "Top Word",
y = "Count",
title = "Count of top words found in tweets containing 'Mac Jones'") +
theme_classic()
The text definitely needs to be cleaned up, removing stopwords and links.
Jones_clean <- Jones
# remove links
Jones_clean$text <- str_replace_all(Jones$text, "https://t.co/[a-z,A-Z,0-9]*","")
# remove twitter handles
Jones_clean$text <- str_replace_all(Jones_clean$text, "@[[:alnum:]_]{4,}", "")
Corpus consisting of 6294 documents, showing 10 documents:
Text Types Tokens Sentences
text1 24 27 1
text2 35 44 1
text3 10 11 1
text4 14 17 2
text5 35 45 2
text6 13 16 1
text7 24 25 2
text8 25 28 3
text9 20 22 1
text10 7 8 1
# create custom stopwords for Mac and Jones
mystopwords <- c("Mac", "Jones", "mac", "jones")
# create document-feature matrix and clean up data
Jones_dfm <- tokens(Jones_corpus,
remove_punct= TRUE,
remove_numbers = TRUE,
remove_symbols = TRUE) %>%
tokens_tolower() %>%
tokens_select(pattern=stopwords("en"),
selection="remove") %>%
tokens_remove(pattern = phrase(mystopwords),
valuetype = 'fixed') %>%
dfm()
textplot_wordcloud(Jones_dfm, max_words = 60,
min_size = 2, max_size = 5.5)
# feature co-occurrence matrix
fcm <- fcm(Jones_dfm)
# Pull top features
fcm_feats <- names(topfeatures(fcm, 40))
# Retain top features in fcm
fcm <- fcm_select(fcm, pattern = fcm_feats, selection = "keep")
textplot_network(fcm, edge_color = "indianred1", edge_alpha = .3,
vertex_labelcolor = "darkblue", vertex_color = "darkred",
vertex_labelsize = 5.5, vertex_size = 3)
# plot the top 20 words after cleaning text
features_Jones_dfm <- textstat_frequency(Jones_dfm, n = 20)
# Sort by reverse frequency order
features_Jones_dfm$feature <- with(features_Jones_dfm, reorder(feature, -frequency))
ggplot(features_Jones_dfm, aes(x = feature, y = frequency)) +
geom_col() +
xlab(NULL) +
coord_flip() +
labs(x = "Top Word",
y = "Count",
title = "Top words found in tweets containing 'Mac Jones'") +
theme_classic()
Okay, now we have some more original words related to Mac.
I’ll use the NRC Sentiment and Emotion Lexicons as a way see how people are discussing Mac Jones.
NRC_Mac <- dfm_lookup(Jones_dfm, dictionary = data_dictionary_NRC)
Mac_Sent <- convert(NRC_Mac, to = "data.frame")
Mac_Sent <- subset(Mac_Sent, select = -c(doc_id))
Mac_Sent <- colSums(Mac_Sent) %>% as_tibble()
sentiments <- c("anger", "anticipation", "disgust", "fear", "joy",
"negative", "positive", "sadness", "surprise", "trust") %>%
as.data.frame()
MacSummary <- cbind(sentiments, Mac_Sent)
MacSummary <- mutate(MacSummary, sentiment = .) %>%
select("sentiment", "value")
MacSummary
sentiment value
1 anger 2130
2 anticipation 2751
3 disgust 1579
4 fear 1933
5 joy 1774
6 negative 3671
7 positive 4329
8 sadness 1881
9 surprise 1043
10 trust 2656
ggplot(data = MacSummary, mapping = aes(x = sentiment, y = value)) +
geom_col(stat="identity", colour="red", fill = "darkblue") +
theme_bw(base_size = 12) +
theme(axis.text.x = element_text(angle=30, vjust = 0.7, size = 11)) +
labs(title = "Sentiment/Emotion Analysis for Tweets Mentioning Mac Jones",
subtitle = "Using the NRC Data Dictionary",
x = "Sentiment", y = "Sentiment Occurrence Count")
There is a high frequency of positive sentiment, anticipation, and trust heading into Jones’ 2nd season! And a large amount of negative sentiment… we’ll see how this changes as the season progresses.