Text Mining and Sentiment Analysis Report

Danilo Scorzoni Ré

2022-10-21

Report Specs:

  • Client name: Sample Client.
  • Goal: In this report we present overall stats around a text database containing 300 tweets about Superman, extracted from Twitter API at October 21st, 2022, and also a sentiment analysis with polarity distribution and sentiment word cloud.

Data Sample:

Here is a sample with 10 rows of texts from this project:

##  [1] "Henry Cavill says he got the call to film his #BlackAdam cameo as Superman while he was filming Netflix fantasy series #TheWitcher. “I went to Warner Bros.’ studio in the UK and got back in the suit. It was a very powerful moment for me.” https://t.co/rPvWv5KFIy https://t.co/4WzeyuqsCt"
##  [2] "\"Our commitment to Superman, Batman, Wonder Woman, Aquaman, Harley Quinn...is only equaled by our commitment to the wonder of human possibility these characters represent,\" said James Gunn and Peter Safran in a statement https://t.co/ZPFvTHXFok https://t.co/j4VQkkJ9Bh"                
##  [3] "Henry Cavill wore the #Superman suit from #ManofSteel for #BlackAdam: \n\n“I chose that one in particular because of the nostalgia attached to the suit...It was incredibly important to me to be standing there and enjoying that moment\" https://t.co/QFf2Yu0qUW https://t.co/p5MXdCFjgu"   
##  [4] "RT @Itssan17: Zack Snyder has a message for Henry Cavill aka His fav Superman https://t.co/w7DKtctQZ4"                                                                                                                                                                                         
##  [5] "RT @THR: Henry Cavill wore the #Superman suit from #ManofSteel for #BlackAdam: \n\n“I chose that one in particular because of the nostalgia a…"                                                                                                                                                
##  [6] "RT @SupermanEnjoyer: Do not sleep on Superman and Lois.\nBest science Fiction TV series https://t.co/7KX5dYWsX9"                                                                                                                                                                               
##  [7] "RT @FilmUpdates: Henry Cavill says he never lost hope that he’d return as Superman and that he is excited to play a more “joyful Superman”…"                                                                                                                                                   
##  [8] "RT @TheDC_Syndicate: \"A very small taste of what's to come, my friends. The dawn of hope renewed. Thank you for your patience, it will be r…"                                                                                                                                                 
##  [9] "RT @MoviesThatMaher: Zack Snyder’s Message to Henry Cavill \n\n“I can’t wait to work with you in the future and you are of course the greates…"                                                                                                                                                
## [10] "RT @DiscussingFilm: Henry Cavill says “there is such a bright future ahead for the character, and I’m so excited to tell a story with an en…"

Descriptive Stats

In the following table, you can see some descriptive stats about the size of this data source and distribution metrics around the length (in characters) of each document:

text_count average_length sd_length sd_ratio min_length max_length median_length
300 136.9 56.73 0.41 37 316 140

Word Frequency

In this section we present the top 20 most frequent words in this group of documents, after removing stop-words, words that doesn’t add any meaning to the sentence.

Most Frequent Bigrams

In this section we present the most frequent bi-grams, a combination of 2 words in sentence that happens with more frequency, after excluding pairs of words with stop-words.

Sentiment Analysis

In this first table we present how many negative and positive words are appearing in the full list of documents. The classification is based on Bing lexicon.

sentiment n n_pct
positive 241 68.9
negative 109 31.1

In the next chart we show the distribution of polarity. For each document, what is the net score between negative and positive words. For each negative word we count -1, for each positive word we count + 1. The polarity score is the sum of all classified words.

Sentiment WordCloud

Finally, let’s check out the sentiment wordcloud, that will presents all the words, colored by the sentiment and sized by the frequency that each word is appearing in the text: