Messi vs Ronaldo 2022 World Cup Sentiments

Introduction

During the 2022 World Cup knockout stage, social media activity surrounding the 2 of the tournament’s most iconic players, Lionel Messi and Cristiano Ronaldo, intensified dramatically. As millions of fans reacted in real time to match results, individual performances, and off-field narratives, Twitter became a valuable source for measuring public sentiment toward each player. This analysis examines how sentiment toward Messi and Ronaldo evolved throughout the knockout rounds by tokenizing tweets, applying lexicon-based sentiment scoring, and aggregating trends over time. By visualizing daily sentiment patterns, we aim to determine whether Messi received more positive sentiment than Ronaldo during this high-stakes period.

My hypothesis is that Lionel Messi experienced a stronger net-positive sentiment across the knockout stage, reflecting both his on-field success and the global narrative surrounding his pursuit of a 1st World Cup title.

The Data

The data for this project comes from a collection of tweets posted during the knockout stage of the 2022 World Cup. Each row in the dataset represents a single tweet, including the full text of the tweet, the date it was posted, and whether it referred to Lionel Messi or Cristiano Ronaldo. In other words, every line in the table corresponds to one person’s reaction, opinion, or comment about one of the two players. This tweet-level structure allowed me to analyze what people were saying, when they were saying it, and how positive or negative their reactions were throughout the tournament. You can download it below!

First, let’s download the necessary packages

library(tidyverse) # All the tidy things
library(lubridate) # Easily fixing pesky dates
library(tidytext)  # Tidy text mining
library(textdata)  # Lexicons of sentiment data
library(widyr)     # Easily calculating pairwise counts
library(igraph)    # Special graphs for network analysis
library(ggraph)    # An extension of ggplot for relational data

Now we can load in the raw data!

worldcup_tweets <- 
  read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/pelles1_xavier_edu/IQBTPymsGDhpSKAm_JfxhrE8AUQFl8urkJq72FFQ1zP-8Bo?download=1")

Data Cleaning & Wrangling

1) Date needs to be treated as a date variable

worldcup_tweets$Date <- as_date(worldcup_tweets$Date)

2) Let’s add a column to indicate whether the tweet mentions Ronaldo, Messi, Both, or Neither

worldcup_tweets <- worldcup_tweets %>%
  mutate(
    text_lower = str_to_lower(Tweet),  
    mentions_ronaldo = str_detect(text_lower, "ronaldo"),
    mentions_messi   = str_detect(text_lower, "messi"),
    player_category = case_when(
      mentions_ronaldo & mentions_messi ~ "Both",
      mentions_ronaldo ~ "Ronaldo",
      mentions_messi   ~ "Messi",
      TRUE             ~ "Neither"
    )
  )

3) Now we can filter the data to only tweets about either Ronaldo or Messi

worldcup_tweets <- worldcup_tweets %>%
  filter(player_category %in% c("Ronaldo", "Messi"))

The Visualizations

Q1) Most Used Words in Tweets about Each Player

1) Extract individual words from each tweet

tidy_worldcup <- 
  worldcup_tweets %>% 
  unnest_tokens(word, Tweet) %>% 
  anti_join(stop_words)

2) Simple word counts for each player

tidy_worldcup %>% 
  group_by(player_category, word) %>% 
  summarize(n = n()) %>% 
  arrange(-n)

# A tibble: 44,130 × 3
# Groups:   player_category [2]
   player_category word          n
   <chr>           <chr>     <int>
 1 Messi           messi     21581
 2 Messi           world     20654
 3 Messi           cup       19932
 4 Messi           goal      13120
 5 Messi           goals     12170
 6 Messi           t.co       8346
 7 Messi           https      8345
 8 Ronaldo         world      7051
 9 Messi           argentina  6832
10 Ronaldo         cup        6743
# ℹ 44,120 more rows

3) Create word count bar graphs for Ronaldo & Messi

ronaldo_words <- tidy_worldcup %>%
  filter(player_category == "Ronaldo") %>%
  count(word, sort = TRUE) %>%           
  slice_max(order_by = n, n = 15)

messi_words <- tidy_worldcup %>%
  filter(player_category == "Messi") %>%
  count(word, sort = TRUE) %>%
  slice_max(order_by = n, n = 15)

A1) Most Used Words in Tweets about Each Player

ggplot(ronaldo_words, aes(x = reorder(word, n), y = n)) +
  geom_col(fill = "darkred") +
  coord_flip() +
  labs(title = "Most Used Words in Ronaldo Tweets",
       x = "Word", y = "Count")

ggplot(messi_words, aes(x = reorder(word, n), y = n)) +
  geom_col(fill = "cyan") +
  coord_flip() +
  labs(title = "Most Used Words in Messi Tweets",
       x = "Word", y = "Count")

While the former half of the most common words for both include common words like the players’ name, “world”, “cup”, and “goal”, Messi’s ladder half of top words include more words referring to his victory in the finals. This includes things like “Argentina”, “Mbappe”, and “worldcupfinal”. This is more positive than Ronaldo’s ladder half of the top words: “knockout”, “0”, “penalty”.

Q2) Bing Sentiment Analysis

1) Import Bing index & group by word

bing <- 
  get_sentiments("bing")

worldcup_counts <- 
  tidy_worldcup %>% 
  group_by(player_category, word) %>% 
  summarize(n = n()) %>% 
  inner_join(bing)

2) Assign positive/negative values for each word

worldcup_counts %>%
  group_by(player_category, sentiment) %>% 
  summarize(n = n()) %>% 
  arrange(-n)

# A tibble: 4 × 3
# Groups:   player_category [2]
  player_category sentiment     n
  <chr>           <chr>     <int>
1 Messi           negative    980
2 Messi           positive    682
3 Ronaldo         negative    556
4 Ronaldo         positive    363

A2) Bing Sentiment Analysis

worldcup_counts %>%
  filter(player_category == "Ronaldo", n > 50) %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_col(fill = "darkred") +
  coord_flip() +
  geom_text(aes(label = signif(n, digits = 3)), nudge_y = 40, size = 3) +
  labs(title = "Top 20 Positive and Negative Words for Ronaldo",
       subtitle = "Only showing words appearing at least 50 times")

worldcup_counts %>%
  filter(player_category == "Messi", n > 200) %>%
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_col(fill = "cyan") +
  coord_flip() +
  geom_text(aes(label = signif(n, digits = 3)), nudge_y = 250, size = 3) +
  labs(title = "Top 20 Positive and Negative Words for Messi",
       subtitle = "Only showing words appearing at least 300 times")

While roughly 1/2 of the most common words used in Ronaldo’s tweets are negative, Messi only has 4 negative sentiment words. This is majorly due to words like “champion” and “congratulations” peaking in popularity after Argentina won the World Cup.

Q3) NRC Lexicon Sentiment Analysis

1) Import NRC Lexicon Index & Create Sentiment Tables

nrc <- get_sentiments("nrc")

ronaldo_sentiment <- tidy_worldcup %>%
  inner_join(nrc, by = "word", relationship = "many-to-many") %>%
  filter(player_category == "Ronaldo") %>%
  group_by(sentiment) %>%
  summarize(n = n()) %>%
  arrange(desc(n)) 


messi_sentiment <- tidy_worldcup %>%
  inner_join(nrc, by = "word", relationship = "many-to-many") %>%
  filter(player_category == "Messi") %>%
  group_by(sentiment) %>%
  summarize(n = n()) %>%
  arrange(desc(n))

A3) NRC Lexicon Sentiment Analysis

ggplot(ronaldo_sentiment, aes(x = reorder(sentiment, n), y = n)) +
  geom_col(fill = "darkred") +
  coord_flip() +
  labs(title = "Ronaldo Sentiment Scores",
       subtitle = "Total number of emotive words scored",
       y = "Total Number of Words",
       x = "Emotional Sentiment")

ggplot(messi_sentiment, aes(x = reorder(sentiment, n), y = n)) +
  geom_col(fill = "cyan") +
  coord_flip() +
  labs(title = "Messi Sentiment Scores",
       subtitle = "Total number of emotive words scored",
       y = "Total Number of Words",
       x = "Emotional Sentiment")

Simply put, Messi’s top 3 Emotional Sentiment categories are Positive, Anticipation, & Trust, while Ronaldo’s are Positive, Negative, & Anticipation. Not only is “Negative” more common for Ronaldo tweets, but “Sadness” ranks higher for him than Messi, while the bottom 4 categories are else the same for each.

Q4) Positive vs Negative Sentiment Over Time

1) Import NRC Lexicon Index and add a column to classify each word as Positive/Negative

nrc <- get_sentiments("nrc")

nrc_posneg <- nrc %>% 
  filter(sentiment %in% c("positive", "negative"))

tidy_worldcup_sentiments <- tidy_worldcup %>%
  inner_join(nrc_posneg, by = "word")

2) Assign point scores to sentiments and group by date

tidy_worldcup_sentiments <- tidy_worldcup_sentiments %>%
  mutate(score = if_else(sentiment == "positive", 1, -1))  

tweet_sentiments <- tidy_worldcup_sentiments %>%
  group_by(Date, player_category) %>%
  summarise(tweet_score = sum(score), .groups = "drop")

3) Aggregate to group by Day

daily_sentiment <- tweet_sentiments %>%
  group_by(Date, player_category) %>%
  summarise(avg_sentiment = mean(tweet_score), .groups = "drop")

A4) Positive vs Negative Sentiment Over Time

daily_sentiment %>% 
  ggplot(aes(x= Date, y = avg_sentiment, color = player_category)) +
  geom_line (size = 1.2) +
  theme_minimal() +
  labs(title = "Average Sentiment Through the Knockout Stage",
       x = "Date",
       y = "Average Tweet Sentiment",
       color = NULL) +
  scale_color_manual(values = c("Messi" = "cyan", "Ronaldo" = "darkred"))

This graph shows that during the beginning and after the World Cup, average sentiment scores for each remained fairly consistent. However, as Argentina progressed through the knockout stage, with each big win Messi’s average sentiment score shot up, while Ronaldo’s stayed the same.

Conclusion

In summary, the analysis of tweets during the 2022 World Cup knockout stage shows that overall public sentiment was higher for Lionel Messi than for Cristiano Ronaldo.

Both the Bing and NRC sentiment analyses indicate that tweets about Messi contained more positive words, while Ronaldo’s tweets had a more neutral or mixed tone. Examining the most commonly used words further highlights this difference, with fans frequently celebrating Messi’s performances and achievements. When looking at sentiment over time, the scores for both players were fairly similar throughout most of the tournament. However, during the later stages, and especially in the final, Messi’s sentiment spiked dramatically, reflecting the excitement and admiration of fans as he led his team to victory. Overall, these results suggest that Messi inspired stronger positive reactions on social media than Ronaldo during this high-stakes period.