library(tidyverse) # All the tidy things
library(lubridate) # Easily fixing pesky dates
library(tidytext) # Tidy text mining
library(textdata) # Lexicons of sentiment data
library(widyr) # Easily calculating pairwise counts
library(igraph) # Special graphs for network analysis
library(ggraph) # An extension of ggplot for relational dataMessi vs Ronaldo 2022 World Cup Sentiments
Introduction
During the 2022 World Cup knockout stage, social media activity surrounding the 2 of the tournament’s most iconic players, Lionel Messi and Cristiano Ronaldo, intensified dramatically. As millions of fans reacted in real time to match results, individual performances, and off-field narratives, Twitter became a valuable source for measuring public sentiment toward each player. This analysis examines how sentiment toward Messi and Ronaldo evolved throughout the knockout rounds by tokenizing tweets, applying lexicon-based sentiment scoring, and aggregating trends over time. By visualizing daily sentiment patterns, we aim to determine whether Messi received more positive sentiment than Ronaldo during this high-stakes period.
My hypothesis is that Lionel Messi experienced a stronger net-positive sentiment across the knockout stage, reflecting both his on-field success and the global narrative surrounding his pursuit of a 1st World Cup title.
The Data
The data for this project comes from a collection of tweets posted during the knockout stage of the 2022 World Cup. Each row in the dataset represents a single tweet, including the full text of the tweet, the date it was posted, and whether it referred to Lionel Messi or Cristiano Ronaldo. In other words, every line in the table corresponds to one person’s reaction, opinion, or comment about one of the two players. This tweet-level structure allowed me to analyze what people were saying, when they were saying it, and how positive or negative their reactions were throughout the tournament. You can download it below!
First, let’s download the necessary packages
Now we can load in the raw data!
worldcup_tweets <-
read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/pelles1_xavier_edu/IQBTPymsGDhpSKAm_JfxhrE8AUQFl8urkJq72FFQ1zP-8Bo?download=1")Data Cleaning & Wrangling
1) Date needs to be treated as a date variable
worldcup_tweets$Date <- as_date(worldcup_tweets$Date)2) Let’s add a column to indicate whether the tweet mentions Ronaldo, Messi, Both, or Neither
worldcup_tweets <- worldcup_tweets %>%
mutate(
text_lower = str_to_lower(Tweet),
mentions_ronaldo = str_detect(text_lower, "ronaldo"),
mentions_messi = str_detect(text_lower, "messi"),
player_category = case_when(
mentions_ronaldo & mentions_messi ~ "Both",
mentions_ronaldo ~ "Ronaldo",
mentions_messi ~ "Messi",
TRUE ~ "Neither"
)
)3) Now we can filter the data to only tweets about either Ronaldo or Messi
worldcup_tweets <- worldcup_tweets %>%
filter(player_category %in% c("Ronaldo", "Messi"))The Visualizations
Q1) Most Used Words in Tweets about Each Player
1) Extract individual words from each tweet
tidy_worldcup <-
worldcup_tweets %>%
unnest_tokens(word, Tweet) %>%
anti_join(stop_words)2) Simple word counts for each player
tidy_worldcup %>%
group_by(player_category, word) %>%
summarize(n = n()) %>%
arrange(-n)# A tibble: 44,130 × 3
# Groups: player_category [2]
player_category word n
<chr> <chr> <int>
1 Messi messi 21581
2 Messi world 20654
3 Messi cup 19932
4 Messi goal 13120
5 Messi goals 12170
6 Messi t.co 8346
7 Messi https 8345
8 Ronaldo world 7051
9 Messi argentina 6832
10 Ronaldo cup 6743
# ℹ 44,120 more rows
3) Create word count bar graphs for Ronaldo & Messi
ronaldo_words <- tidy_worldcup %>%
filter(player_category == "Ronaldo") %>%
count(word, sort = TRUE) %>%
slice_max(order_by = n, n = 15)
messi_words <- tidy_worldcup %>%
filter(player_category == "Messi") %>%
count(word, sort = TRUE) %>%
slice_max(order_by = n, n = 15)A1) Most Used Words in Tweets about Each Player
ggplot(ronaldo_words, aes(x = reorder(word, n), y = n)) +
geom_col(fill = "darkred") +
coord_flip() +
labs(title = "Most Used Words in Ronaldo Tweets",
x = "Word", y = "Count")ggplot(messi_words, aes(x = reorder(word, n), y = n)) +
geom_col(fill = "cyan") +
coord_flip() +
labs(title = "Most Used Words in Messi Tweets",
x = "Word", y = "Count")While the former half of the most common words for both include common words like the players’ name, “world”, “cup”, and “goal”, Messi’s ladder half of top words include more words referring to his victory in the finals. This includes things like “Argentina”, “Mbappe”, and “worldcupfinal”. This is more positive than Ronaldo’s ladder half of the top words: “knockout”, “0”, “penalty”.
Q2) Bing Sentiment Analysis
1) Import Bing index & group by word
bing <-
get_sentiments("bing")
worldcup_counts <-
tidy_worldcup %>%
group_by(player_category, word) %>%
summarize(n = n()) %>%
inner_join(bing)2) Assign positive/negative values for each word
worldcup_counts %>%
group_by(player_category, sentiment) %>%
summarize(n = n()) %>%
arrange(-n)# A tibble: 4 × 3
# Groups: player_category [2]
player_category sentiment n
<chr> <chr> <int>
1 Messi negative 980
2 Messi positive 682
3 Ronaldo negative 556
4 Ronaldo positive 363
A2) Bing Sentiment Analysis
worldcup_counts %>%
filter(player_category == "Ronaldo", n > 50) %>%
mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col(fill = "darkred") +
coord_flip() +
geom_text(aes(label = signif(n, digits = 3)), nudge_y = 40, size = 3) +
labs(title = "Top 20 Positive and Negative Words for Ronaldo",
subtitle = "Only showing words appearing at least 50 times")worldcup_counts %>%
filter(player_category == "Messi", n > 200) %>%
mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col(fill = "cyan") +
coord_flip() +
geom_text(aes(label = signif(n, digits = 3)), nudge_y = 250, size = 3) +
labs(title = "Top 20 Positive and Negative Words for Messi",
subtitle = "Only showing words appearing at least 300 times")While roughly 1/2 of the most common words used in Ronaldo’s tweets are negative, Messi only has 4 negative sentiment words. This is majorly due to words like “champion” and “congratulations” peaking in popularity after Argentina won the World Cup.
Q3) NRC Lexicon Sentiment Analysis
1) Import NRC Lexicon Index & Create Sentiment Tables
nrc <- get_sentiments("nrc")
ronaldo_sentiment <- tidy_worldcup %>%
inner_join(nrc, by = "word", relationship = "many-to-many") %>%
filter(player_category == "Ronaldo") %>%
group_by(sentiment) %>%
summarize(n = n()) %>%
arrange(desc(n))
messi_sentiment <- tidy_worldcup %>%
inner_join(nrc, by = "word", relationship = "many-to-many") %>%
filter(player_category == "Messi") %>%
group_by(sentiment) %>%
summarize(n = n()) %>%
arrange(desc(n))A3) NRC Lexicon Sentiment Analysis
ggplot(ronaldo_sentiment, aes(x = reorder(sentiment, n), y = n)) +
geom_col(fill = "darkred") +
coord_flip() +
labs(title = "Ronaldo Sentiment Scores",
subtitle = "Total number of emotive words scored",
y = "Total Number of Words",
x = "Emotional Sentiment")ggplot(messi_sentiment, aes(x = reorder(sentiment, n), y = n)) +
geom_col(fill = "cyan") +
coord_flip() +
labs(title = "Messi Sentiment Scores",
subtitle = "Total number of emotive words scored",
y = "Total Number of Words",
x = "Emotional Sentiment")Simply put, Messi’s top 3 Emotional Sentiment categories are Positive, Anticipation, & Trust, while Ronaldo’s are Positive, Negative, & Anticipation. Not only is “Negative” more common for Ronaldo tweets, but “Sadness” ranks higher for him than Messi, while the bottom 4 categories are else the same for each.
Q4) Positive vs Negative Sentiment Over Time
1) Import NRC Lexicon Index and add a column to classify each word as Positive/Negative
nrc <- get_sentiments("nrc")
nrc_posneg <- nrc %>%
filter(sentiment %in% c("positive", "negative"))
tidy_worldcup_sentiments <- tidy_worldcup %>%
inner_join(nrc_posneg, by = "word") 2) Assign point scores to sentiments and group by date
tidy_worldcup_sentiments <- tidy_worldcup_sentiments %>%
mutate(score = if_else(sentiment == "positive", 1, -1))
tweet_sentiments <- tidy_worldcup_sentiments %>%
group_by(Date, player_category) %>%
summarise(tweet_score = sum(score), .groups = "drop")3) Aggregate to group by Day
daily_sentiment <- tweet_sentiments %>%
group_by(Date, player_category) %>%
summarise(avg_sentiment = mean(tweet_score), .groups = "drop")A4) Positive vs Negative Sentiment Over Time
daily_sentiment %>%
ggplot(aes(x= Date, y = avg_sentiment, color = player_category)) +
geom_line (size = 1.2) +
theme_minimal() +
labs(title = "Average Sentiment Through the Knockout Stage",
x = "Date",
y = "Average Tweet Sentiment",
color = NULL) +
scale_color_manual(values = c("Messi" = "cyan", "Ronaldo" = "darkred"))This graph shows that during the beginning and after the World Cup, average sentiment scores for each remained fairly consistent. However, as Argentina progressed through the knockout stage, with each big win Messi’s average sentiment score shot up, while Ronaldo’s stayed the same.
Conclusion
In summary, the analysis of tweets during the 2022 World Cup knockout stage shows that overall public sentiment was higher for Lionel Messi than for Cristiano Ronaldo.
Both the Bing and NRC sentiment analyses indicate that tweets about Messi contained more positive words, while Ronaldo’s tweets had a more neutral or mixed tone. Examining the most commonly used words further highlights this difference, with fans frequently celebrating Messi’s performances and achievements. When looking at sentiment over time, the scores for both players were fairly similar throughout most of the tournament. However, during the later stages, and especially in the final, Messi’s sentiment spiked dramatically, reflecting the excitement and admiration of fans as he led his team to victory. Overall, these results suggest that Messi inspired stronger positive reactions on social media than Ronaldo during this high-stakes period.