2/26/2020

Project Description

  • Sentiment analysis/ Opinion mining, is a sub machine learning task where we want to determine which is the general sentiment of a given document.
  • Using ML techniques and NLP we can extract the subjective information of a document and try to classify it according to its polarity such as positive, neutral or negative.

Twitter

  • An American microblogging and social networking service on which users post and interact with messages known as “tweets”. Users can post, like, and retweet tweet.

Project Goal

  • Perform sentiment analysis of tweets related to US Elections 2020. In this we extracted tweets like #USElections2020, #Trump, #VoteforAmerica etc.

  • Find out the overall response from the tweets related to the political sensation “US Elections 2020”

Data Extraction

For data collection we used the twitter API. Following are some APIs:

  • GET lists/list
  • GET lists/members
  • GET lists/memberships
  • GET lists/ownerships

Peer Recommendation

Peer Comments and Action:

  • Created specific visualisations
  • Calculated Sentiment score for positive and negative sentiments
  • Included visualization showing percentage of positive and negative tweets
  • In addition to generic tags like #USElections,#VoteForAmerica etc we used specific hashtags too like #Trump, #BernieSanders etc

Steps

  • Data Collection
  • Data Wrangling
  • Tokenization
  • Stop words and number removal
  • Creating bigrams
  • Document term matrix
  • Sentiment Analysis

Data preprocessing

Following steps were used:

  • To conduct sentiment analysis we had to remove the URLs from the tweet
  • Drop null values
  • Drop special characters
  • Drop numerical characters

Tokenization:

  • Process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens.
  • The list of tokens becomes input for further processing such as text analysis.

Histogram of cleaned tokens

Word Cloud

  • The more a specific word appears in a source of textual data the bigger and bolder it appears in the word cloud.
  • Collection/cluster of words depicted in different sizes

Sentiment Graph

Contributions to sentiments

Emotional Density:

  • We see that most of the tweets were based on surprise emotion and less of disgust.

Word Cloud based on Emotions

Amount of positive and negative tweets

Tweets by devices

Conclusion

From our analysis, we see that the tweets are mostly positive and mostly about trump. In terms of emotion, it is mostly joy and surprise when Trump is mentioned in the tweets. Though more analysis needs to done, based on our observation, we predict Donald Trump is a probable winner of US Elections 2020