This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

0.Introduction

Liverpool has not got acceptable results in the current season of PL. However, this is the second season of Jurgen Klopp, and the fans expected much better results due to PL experience that Klopp has gained.

This study tries to evaluate the dynamic image of Klopp from the eyes of football fans. More specifically, I am going to extract tweets about Klopp, posted on last three months, then analyze their contents based on the word frequency, and sentiment analysis. The frequency analysis is done in the first part of this study, and sentiment analysis would remain for the second part.

1.Data Extraction

## [1] "Using direct authentication"

After setting the setup_twitter_oath(), we can go further and extract tweets. I chose 1000 tweets to be extracted, and unfortunately the twitter does not allow to extract tweets that are older than one week. So time span is fixed. The geocode of the users are set to Liverpool city, since I want to know the opinion of Liverpoolians! about Klopp. This would include the fans of the rival team, Everton. In this priliminary study, I do not try to detect and filter the Everton fans.

Some of tweets before any cleaning, and after some cleaning can be read here.

## [1] "------Before Cleaning------"

## [1] RT @LivEchoLFC: Was this the game which makes Klopp finally see the light? https://t.co/Hd0y4RQC0l                                           
## [2] @MatthewCharle3 @EFC13wan92 @TheEFCForum The equivalent to the FA Cup. Which Klopp lost 2 finals in. Gotze got inju… https://t.co/oHnF4c5Bzg 
## [3] RT @thejamessutton: @TheAnfieldWrap I still believe Klopp will get it right. He just needs to accept that he can’t turn water into wine. \nS…
## [4] RT @LivEchoLFC: 'It's Klopp's responsibility, but also his players, as well as sporting director Michael Edwards.' #LFC https://t.co/Hd0y4R… 
## [5] RT @LivEchoLFC: 'Klopp has failed to address the biggest issue - and that is the defence...' #LFC \nhttps://t.co/eKbTRR0dYg                  
## 507 Levels: 'Dejan Lovren looks shot to bits... Klopp may have to give Gomez a go at centre back' https://t.co/GR9JaWvDtA ...

## [1] "------After Cleaning------"

## [1] " Was this the game which makes Klopp finally see the light "                                           
## [2] "The equivalent to the FA Cup Which Klopp lost  finals in Gotze got inju "                              
## [3] " I still believe Klopp will get it right He just needs to accept that he cant turn water into wine \nS"
## [4] " Its Klopps responsibility but also his players as well as sporting director Michael Edwards LFC "     
## [5] " Klopp has failed to address the biggest issue  and that is the defence LFC \n"

Now we remove the stopwords.

The first word count is ready:

Among the top 20 words, there are un-informative words, the words that do not add anything to our understanding. For instance, klopp is the most frequent word, but we know that our set of tweets are about klopp. So such words are removed from the list of words, and the refined version is shown in the figure2.

It is possible to have a cloud of these words, where the frequency is reflected in the size and color of a word.

As it is seen, Liverpoolians in twitter are talking mainly about Lovren, a player with a very awful performance in the last defeat of the team, their verbs are towards the past, wouldn’t and happened, so it seems they are talking about the last match. From Rodgers who is the former coach of Liverpool, and he got fired last year in similar situation of today’s Liverpool, and words like pressure and hell we can feel the pressure on the current coach, Klopp. However, there may be still some believers in Klopp!

2.Sentiment Analysis

Some highlights from tidytextmining.com of Julia Silge:

One way to analyze the sentiment of a text is to consider the text as a combination of its individual words and the sentiment content of the whole text as the sum of the sentiment content of the individual words. This isn’t the only way to approach sentiment analysis, but it is an often-used approach (Julia Silge)

Well, we know that the whole is more than the sum of words! But for sake of this preliminary study, let’s do sentiment analysis as it is.

It is important to keep in mind that these methods do not take into account qualifiers before a word, such as in “no good” or “not true”; a lexicon-based method like this is based on unigrams only.

Maybe later I will use n-grams, but for now the unigram is totally fine!

One last caveat is that the size of the chunk of text that we use to add up unigram sentiment scores can have an effect on an analysis. A text the size of many paragraphs can often have positive and negative sentiment averaged out to about zero, while sentence-sized or paragraph-sized text often works better.

Good! it works on our tweets better than on Shakespear books, from every aspect!

From the three available lexicons, afinn, bing and nrc, I rather to work with nrc because it’s sentiments include trust, fear, sadness, negative, anger, … . While the other two are not this much expressive. Nothing stops me from using all three lexicons, except the length of this report.

There are different approaches to sentiment analysis. In other words, various analysis can be done using the sentiment data. I start with figuring out the overal sentiment of our 1000 tweets.

So the graph is pretty interesting. The feelings seem in a “so-so” situation, the frequency of words with positive and negative sentiments are almost similar, a slight edge for the positive sentiments. In contrast, the words with fear are more prevalent than the words with trust feeling.

But I like to analyze the sentiments per tweet. Each tweet may have several sentiments, I want to know what combination of sentiments are more frequent. Knowing that, we can have a better understanding of the tweets’ sentiments rather than just evaluating the words independently.

Wow! The result even surprised me! Very insightful comparing to the sentiment analysis at word level. Now we can see that the situation is not good. The tweets with only negative sentiment is the most prevalent. Then the tweets with sentiment combination of anger-disgust-fear-negative-positive-sadness. The third spot is the tweets with positive sentiment. You are invited to see the rest.

There are two points here. From the point of view of text analysis, it is a very important lesson that the sentiment analysis and generally text analysis should have a holistic approach. Breaking to words, when the entity is tweet, may be misleading. While we have various tools and approaches, why not using them?

From the point of view of a Klopp’s fan, it is a critical situation! It seems that there is still trust among the Liverpool habitants in Klopp and his team, however the negative feeling is very strong. I wish I could show him this results, so he could talk to fans more informatively!

It is possible to continue doing sentiment analysis with very different ideas. For instance, assessing the dominant sentiments of tweets including “Rodgers”, or “Loveren”. Or cross-validation of the current results with the results of other lexicons.

3.Conclusion

The goal of this study was practice in data extraction from twitter’s API, and text analysis. For text analysis, I tired word frequency first, and then I did sentiment analysis. Lesson learned from sentiment analysis, a systemic lesson about the importance of holistic view. Nevertheless, more comprehensive the research, more trustable the results.

In the next text analysis study, I will go further and try methods such as term frequency analysis.

Codes

require(dplyr)
require(ggplot2)
require(tidyr)
require(twitteR)
require(stringr)
require(colorRamps)

klopp_tweets <- searchTwitter(searchString = "Klopp",
              n = 500 ,
              lang = "en",
              geocode = '53.4084,-2.9916,5mi')


tweetsdf <- twListToDF(klopp_tweets)
#write.csv(x = tweetsdf , file = "klopp_tweets.csv")
t<- read.csv(file = "klopp_tweets.csv")
head(tweetsdf)

tweetsdf$text[1:10]

t<- str_replace_all(string = tweetsdf$text, pattern = "RT" , replacement = "" )
t<- str_replace_all(string = t, pattern = "@[^\\s]+\\s", replace = "")
t<- str_replace_all(string = t , pattern = "https://.+\\s" , replace = "")
t<- str_replace_all(string = t , pattern = "https://.+" , replace = "")
t<- str_replace_all(string =t, pattern = "\\d", replace = "")
t<- str_replace_all(string = t, pattern = "[[:punct:]]", replace = "")
tweetsdf$text <- t 



klopp_tw_token <- tweetsdf %>% unnest_tokens(input = text ,
                                           output = word ,
                                           token = "words")

klopp_tw_token <- klopp_tw_token %>%
        select(word, 
               favorited,
               favoriteCount,
               created,
               id,retweetCount,isRetweet,retweeted )

head(klopp_tw_token)

data("stop_words")
klopp_tw_token <- klopp_tw_token %>%
        anti_join(stop_words)

klopp_tw_token %>% 
        #filter(! word %in% noninformative_words) %>%
        count(word, sort = TRUE) %>% 
        head(n= 20 ) %>%
        
        ggplot() + 
        geom_col(aes(y = n , x = reorder(word,n)),
                 fill = "skyblue") +
        coord_flip() + 
        theme_linedraw() + 
        xlab(label = "words") + 
        ggtitle("20 most frequent words in the tweets about Klopp")

noninformative_words <- c("klopp","lfc","klopps","liverpool","hes","jurgen","yr")

klopp_tw_token %>% 
        filter(! word %in% noninformative_words) %>%
        count(word, sort = TRUE) %>% 
        head(n= 20 ) %>%
        
        ggplot() + 
        geom_col(aes(y = n , x = reorder(word,n)),
                 fill = "skyblue") +
        coord_flip() + 
        theme_linedraw() + 
        xlab(label = "words") + 
        ggtitle("20 most frequent words in the tweets about Klopp- refined")

klopp_tw_token %>%
        filter(! word %in% noninformative_words) %>%
        count(word, sort = TRUE) %>%
         head(20) %>% 
         with(wordcloud(word,n,random.order = FALSE,
                        scale = c(2,0.5),
                        colors = matlab.like(20)))

nrc <- get_sentiments("nrc")

klopp_tw_token %>%
        inner_join(nrc) %>% 
        group_by(sentiment) %>%
        count(sentiment, sort = TRUE) %>% 
         ggplot() + 
        geom_col(aes(y = n , x = reorder(sentiment,n)),
                 fill = "green") +
        coord_flip() + 
        theme_linedraw() + 
        xlab(label = "sentiments") + 
        ggtitle("Sentiments by frequency in the whole tweets")

aggregated_sentiment <- klopp_tw_token %>%
        group_by(id) %>% 
        inner_join(nrc) %>% 
        #count(sentiment) %>% 
        arrange(sentiment) %>% 
        distinct(sentiment) %>%
        mutate(overal_sentiment = glue::collapse(sentiment,sep = " ")) %>%
        select(id,overal_sentiment) %>% 
        unique() %>% 
        ungroup() %>% 
        count(overal_sentiment, sort = TRUE) %>% 
        head(n=20)

aggregated_sentiment %>%
        ggplot() + 
        geom_col(aes(y = n , x = reorder(overal_sentiment,n)),
                 fill = "orange") +
        coord_flip() + 
        theme_linedraw() + 
        xlab(label = "aggregated sentiments") + 
        ggtitle("A plot for Zist86")

Jurgen Klopp from the eyes of Liverpool’s tweets

Shahin Ashkiani: Contact@Shahin-Ashkiani.com

10/25/2017

0.Introduction

1.Data Extraction

2.Sentiment Analysis

3.Conclusion

Codes