Prediction if Liverpool FC will make the top 4 using text analytics

Introduction

The English Premier League is one of the biggest soccer leagues in the world. It is watched my millions of fans worldwide. Out of the 20 clubs that play in the league, clubs that finish top 4 play in the UEFA Champions League. All the clubs in the Premier League of course aim to win the league but making the top 4 is an achievement as well. There are 6 clubs in the Premier League that are recognized as “the big six”. Liverpool Football Club (LFC) is one of the “the big six” clubs and is the recent holder of the Premier League Trophy as they won it last season. This season did not go as planned for Liverpool. Being the reigning champions, Liverpool fans were exptecting to see their club challenge for the trophy this season as well. Infact, they are having a roller-coaster journey this season as their squad plummet because of injuries to many big-name players in their squad. Making the top 4 at least is a big goal for a successful club like Liverpool. Being in the top 4 will help them advance to the Champions League, which comprises of top clubs from across Europe and they compete for the ultimate trophy of being the champion of whole Europe. Not only is it about playing in the Champions League but also the huge revenue that is generated once you play in that competition. All big clubs like Liverpool need big revenues to upgrade their squad, facilities, coaching staff, etc. Not qualifying for the Champions League will hurt the revenue streams for the club and make it more difficult financially especially, with all the financial damage caused by the COVID-19 pandemic.

For this project, I have scrapped the website of a neutral source: The Guardian newspaper, a Liverpool favoring blog site: This is Anfield, and a betting odds discussion in the Telegraph newspaper. I chose these three sifferent sources as they provide dialogue from different perspectives and it will be interesting to see the sentiment of these websites.

1. The Guardian: Jürgen Klopp’s fallen Premier League kings have five games to save season

webpage_guardian = read_html("https://www.theguardian.com/football/2021/may/01/liverpool-jurgen-klopp-fallen-premier-league-kings-five-games-to-save-season-manchester-united-jurgen-klopp")

data_guardian  = webpage_guardian %>% html_nodes("p") %>% html_text() 

guardian = data.frame(text = data_guardian) 
my.corpus.guardian = corpus(guardian)

my.corpus.guardian = tokens(my.corpus.guardian,
                            what = "word",
                            remove_punct = FALSE,
                            remove_symbols = FALSE,
                            remove_numbers = FALSE,
                            remove_url = FALSE,
                            remove_separators = TRUE,
                            split_hyphens = FALSE,
                            include_docvars = TRUE,
                            padding = FALSE,
                            verbose = quanteda_options("verbose"))

#Accessing the dictionaries: Quanteda approach
positive.words.bl <- scan("dictionaries/bingliu/positive-words.txt", what = "char", sep = "\n", skip = 35, quiet = T)
negative.words.bl <- scan("dictionaries/bingliu/negative-words.txt", what = "char", sep = "\n", skip = 35, quiet = T)

#Assemble the sentiment dictionary
sentiment.dictionary <- dictionary(list(positive = positive.words.bl, negative = negative.words.bl))

#Creation of a DFM using the sentiment dictionary
dfm.sentiment.guardian <- dfm(my.corpus.guardian, dictionary = sentiment.dictionary, 
                              remove_numbers = TRUE, 
                              remove_punct = TRUE, 
                              remove_symbols = TRUE,
                              split_hyphens = TRUE,
                              include_docvars = TRUE,    
                              remove = stopwords("english"))
head(my.corpus.guardian)

The Guardian’s Sentiment Analysis

( sentiment.guardian <- convert(dfm.sentiment.guardian, "data.frame") %>%
    gather(positive, negative, key = "Polarity", value = "Words") %>% 
    mutate(doc_id = as_factor(doc_id)) %>% 
    rename(Text = doc_id)   )

##      Text Polarity Words
## 1   text1 positive     0
## 2   text2 positive     4
## 3   text3 positive     5
## 4   text4 positive     0
## 5   text5 positive     5
## 6   text6 positive     4
## 7   text7 positive     4
## 8   text8 positive     3
## 9   text9 positive     3
## 10 text10 positive     0
## 11 text11 positive     1
## 12 text12 positive     6
## 13 text13 positive     3
## 14 text14 positive     1
## 15 text15 positive     3
## 16 text16 positive     1
## 17 text17 positive     5
## 18 text18 positive     2
## 19  text1 negative     0
## 20  text2 negative     2
## 21  text3 negative     2
## 22  text4 negative     0
## 23  text5 negative     1
## 24  text6 negative     1
## 25  text7 negative     5
## 26  text8 negative     2
## 27  text9 negative     0
## 28 text10 negative     7
## 29 text11 negative     0
## 30 text12 negative     3
## 31 text13 negative     1
## 32 text14 negative     4
## 33 text15 negative     0
## 34 text16 negative     0
## 35 text17 negative     1
## 36 text18 negative     2

ggplot(sentiment.guardian, aes(Text, Words, fill = Polarity, group = Polarity)) + 
  geom_col(position = "dodge")+
  scale_fill_brewer(palette = "Set1") + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
  ggtitle("Sentiment Scores Across the Article", subtitle = "source: The Guardian") + xlab("")

dfm.sentiment.prop.guardian = dfm_weight(dfm.sentiment.guardian, scheme = "prop")
dfm.sentiment.prop.guardian

## Document-feature matrix of: 18 documents, 2 features (25.0% sparse).
##        features
## docs    positive negative
##   text1    0        0    
##   text2    0.667    0.333
##   text3    0.714    0.286
##   text4    0        0    
##   text5    0.833    0.167
##   text6    0.800    0.200
## [ reached max_ndoc ... 12 more documents ]

sentiment.guardian = convert(dfm.sentiment.prop.guardian, "data.frame") %>%
  gather(positive, negative, key = "Polarity", value = "Share") %>% 
  mutate(doc_id = as_factor(doc_id)) %>% 
  rename(Text = doc_id)   

ggplot(sentiment.guardian, aes(Text, Share, fill = Polarity, group = Polarity)) + 
  geom_col(position = "dodge")+
  scale_fill_brewer(palette = "Set1") + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
  ggtitle("Sentiment Scores Across the Article (Relative)", subtitle = "source: The Guardian")

Combined Sentiment Score of the Article

sentiment.guardian <- convert(dfm.sentiment.prop.guardian, "data.frame") %>%
  rename(Text = doc_id, Sentiment = positive) %>%
  select(Text, Sentiment) %>%
  mutate(Text = as_factor(Text))

ggplot(sentiment.guardian, aes(Text, Sentiment, group = 1)) + 
  geom_line(size = 1, col = "blue") + 
  #geom_hline(yintercept = 0, linetype = "dashed", color = "darkred") + 
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "darkred") + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
  ggtitle("Combined sentiment scores in the Guardian's article", subtitle = "source: The Guardian") + xlab("")

The article published in The Guardian has a mixture of positive and negative sentiment across the article. The combined sentiment graph shows clearly that the article has more positive sentiment than negative.

2. This is Anfield: Jurgen Klopp makes bold prediction on top 4 – “I think it will be enough”

This is Anfield’s Sentiment Analysis

Combined Sentiment Score of the Article

As This is Anfield is a Liverpool fan-led run blog-site you can see in the above analysis that there sentiment analysis comprises of mostly positive words. Negative words have been used very less in this article by them. The huge difference can be seen in the sentiment scores(relative) graph where it shows how there is 90% positive sentiment in the article. Even though both the Guardian and This is Anfield have positive sentiment on their article, there is a clear difference between them. This can surely be as mentioned earlier with The Guardian being a neutral source and This is Anfield being a Liverpool favoring source.

The Telegraph: Premier League top-four odds: Leicester City drift after shock defeat

The Telegraph’s Sentiment Analysis

Combined Sentiment Score of the Article

The Telegraph’s article is on the betting odds and it assess the odds of each team battling for the top 4 position. This article also has more positive sentiment than a negative one. This can surely be because of the article trying to write in an unbiased form for all teams.

Conclusion and Prediction

All the three webpages that were scrapped for this analysis show a more positive sentiment in their articles than negative. After doing the analysis, my personal prediction is that Liverpool FC will not make the top 4 this season. All the above articles say a different story than my prediction as they all had a positive sentiment in the sentiment analysis. I feel that was the result because Liverpool FC is a big club and their squad has performed very well in past 4 years. They also have the easiest fixtures coming up compared to the other teams. These are some factors that maybe helped the writers to write with a positive sentiment based on those success, but I still feel it will not be enough for them to cross that line this season.

Acknowledgement

I want to express my highest gratitude to my professor, Dr. Armando Rodriguez. Without Dr. Rodriquez’s guidance and help, this project would not have been possible. I also want to convey him my highest gratitude for giving me a porject that is related to the Sport Industry.

Prediction if Liverpool FC will make the top 4 using text analytics

Mohammed Anas Ali

5/10/2021

Introduction

1. The Guardian: Jürgen Klopp’s fallen Premier League kings have five games to save season

The Guardian’s Sentiment Analysis

Combined Sentiment Score of the Article

2. This is Anfield: Jurgen Klopp makes bold prediction on top 4 – “I think it will be enough”

This is Anfield’s Sentiment Analysis

Combined Sentiment Score of the Article

The Telegraph: Premier League top-four odds: Leicester City drift after shock defeat

The Telegraph’s Sentiment Analysis

Combined Sentiment Score of the Article

Conclusion and Prediction

Acknowledgement