Sentimental Analysis of Tweets in R Language

Description

In this project, we have fetched the latest 100 tweets about the keyword “Corona Virus”. Our aim is to find what the people are talking (or a fancy word tweeting) about it. Their emotions and feelings are explored and are plotted with the help of 10 sentimental words e-g Positive, Negative, Anger etc. The analysis is fully reproducible and hence, can be verified.

Tools

1. RStudio

2. R Language

3. R Markdown and Knitr for preparing this report document.

Procedure

1. Loading the required Libraries

We need total 4 packages/Libraries to perform analysis.

twitteR is used to authenticate with twitter account and to fetch tweets.

ggplot2 is used to make fancy graphs.

syuzhet is used to extract the sentiments i-e sentimental analysis.

tm is used to make NLP in action i-e removing stopwords.

library(twitteR)
library(tm)
## Loading required package: NLP
library(syuzhet)
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:NLP':
## 
##     annotate

2. Setting tokens and get authenticated.

These 4 tokens are taken from our 1 developer twitter account. We can regenrate them later.

consumer_key <-"3kViWlWuPQ9TMnhdDYn7hxmeb"
consumer_secret <- "hPZJmxD5Zhp3BCGPaQ9l1iPZOoEoL6Plmh82CuMlEAn6HkwqB0"
access_token <- "3150218942-XQN1lMkqzrHWafBySp4lWBbLyMs0qrLMUn6IPPB"
access_secret <- "OyRzypjlRrL5Tq8dgMn0BsrORkCr40fqS7kYKBWbv1J7M"
setup_twitter_oauth(consumer_key,consumer_secret,access_token,access_secret)
## [1] "Using direct authentication"

3. Fetching Tweets from twitter

100 Tweets are fetched in English language and are converted into Data Frame.

tweets<-searchTwitter("Coronavirus",n=100,lang = "en")
dftweets<-twListToDF(tweets)

The dataframe contains many columns but we are interested only in “text” column. So we have extracted that column and have made the text in lower case (for ease of analysis)

text<-dftweets$text
text<-tolower(text)

4. Cleaning the tweets for Analysis

Now we have to use NLP to remove the stopwords (a, an, the etc.). One important thing to note is that data must be in the Corpus list format to use NLP.

corpusList<-Corpus(VectorSource(cleanText))
listt<-tm_map(corpusList,function(x) removeWords(x,stopwords()))
## Warning in tm_map.SimpleCorpus(corpusList, function(x) removeWords(x,
## stopwords())): transformation drops documents

5. Performing Sentimental Analysis

Important thing to note is that get_nrc_sentiment operates on character vector. So, we have converted Corpus list into character vector.

sentiments<-get_nrc_sentiment(as.character(listt))
## Warning: `filter_()` is deprecated as of dplyr 0.7.0.
## Please use `filter()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `group_by_()` is deprecated as of dplyr 0.7.0.
## Please use `group_by()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
sentiments
##   anger anticipation disgust fear joy sadness surprise trust negative positive
## 1    27           22      12   39  19      27       14    34       50       55
## 2     0            0       0    0   0       0        0     0        0        0
## 3     0            0       0    0   0       0        0     0        0        0

6. Data manipulation to make graph

Now we have done some data manipulation to get the data in the required format.

data<-as.data.frame(colSums(sentiments))
names(data)<-"Score"
data<-cbind("Sentiments"= rownames(data),data)
rownames(data)<-NULL
data
##      Sentiments Score
## 1         anger    27
## 2  anticipation    22
## 3       disgust    12
## 4          fear    39
## 5           joy    19
## 6       sadness    27
## 7      surprise    14
## 8         trust    34
## 9      negative    50
## 10     positive    55

7. Exploaratory Analysis

After doing all the operations mentioned above, we have successfully plot the sentiments.

g<-ggplot(data=data, aes(x=Sentiments,y=Score))
g<-g+geom_bar(aes(fill=Sentiments),stat="identity")
g<-g+ ggtitle("Sentimental Analysis about Corona Virus")
g<-g+theme(plot.title = element_text(hjust = 0.5), legend.position = "none")
print(g)

You can also try it yourself.