It is becoming very important in the current world to predict the sentiment of the people’s reaction on a particular topic in the social media. Twitter has become one of the major platforms for the people to express themselves openly and is also a very good source to the analysts to scrape the tweets and perform market research or sentiment on them.
The purpose of this document is to learn how to train a machine for analyzing the sentiment. Based on the data extracted from twitter on 6 different topics and by training the machine on the sentiment each tweet by tweet and to verify which predictive model is the best fit and gives highest accuracy score on the test virgin data.
Loading the required packages to extract the data from twitter API.
The “twitteR” package for: Providing an interface to the Twitter web API.
The “base64enc” for: encoding/decoding the extracted data into base64 encoding.
#require("twitteR")||install.packages("twitteR")
#require("base64enc")||install.packages("base64enc")
library(twitteR)
library(base64enc)
The Authentication that is necessary for accessing the Twitter API are defined.
Four different variables for consumer key, consumer secret, access token, access token secret are defined,
The function that is required to establish the connection between R and Twitter API is: “setup_twitter_oauth”. This function uses the above four parameters as inputs and makes a connection to the API.
After this step, we are ready to extract the tweets based on Hashtags.
api_key <- “” #Consumer key:
api_secret <- “” # Consumer secret:
access_token <- “” # Access token:
access_token_secret <- “” # Access token secret:
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret) ########################################################################################
Defining Hashtags
Different mixed topics are choosen like: Amma, Rio2016, UPElections, IndiaStrikesBack, USElections, Jio, Ipl, MamathaAgainstNation.
Below is shown only one seperate hastag topic defined for the variable. This is due to the limitations in extracting the tweets from the API for the general users.
hashtags = c('#amma')
# '#ipl','#jio','#rio2016','#uselections','#upelections','#indiastrikesback','#mamathaagainstnation'
The below code extracts the raw twitter data.
We are extracting only 600 tweets from the API for a given Hashtag.
We then create a dataframe out of it.
The preprocessing of the raw data is performed to remove all the un-necessary contents and create a cleaned DataFrame at the end.
#for (hashtag in hashtags){
tweets = searchTwitter(hashtags, n=600 ) # hash tag for tweets search and number of tweets
tweets = twListToDF(tweets) # Convert from list to dataframe
tweets.df = tweets[,1] # assign tweets for cleaning
tweets.df = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", tweets.df) #;head(tweets.df)
tweets.df = gsub("@\\w+", "", tweets.df) #;head(tweets.df) # regex for removing @user
tweets.df = gsub("[[:punct:]]", "", tweets.df) #;head(tweets.df) # regex for removing punctuation mark
tweets.df = gsub("[[:digit:]]", "", tweets.df) #;head(tweets.df) # regex for removing numbers
tweets.df = gsub("http\\w+", "", tweets.df) #;head(tweets.df) # regex for removing links
tweets.df = gsub("\n", " ", tweets.df) #;head(tweets.df) ## regex for removing new line (\n)
tweets.df = gsub("[ \t]{2,}", " ", tweets.df) #;head(tweets.df) ## regex for removing two blank space
tweets.df = gsub("[^[:alnum:]///' ]", " ", tweets.df) # keep only alpha numeric
tweets.df = iconv(tweets.df, "latin1", "ASCII", sub="") # Keep only ASCII characters
tweets.df = gsub("^\\s+|\\s+$", "", tweets.df) #;head(tweets.df) # Remove leading and trailing white space
tweets[,1] = tweets.df # save in Data frame
#write.csv(tweets,paste0(gsub('#','',hashtag),'.csv'))
#}
The sample DataFrame of tweets and other related data that is extracted is displayed below.
head(tweets$text)
## [1] "Amma AIADMK TamilNadu"
## [2] "Grand jeux concours pour gagner un massage dh Rendezvous sur FB massage bienetre amma Ayurveda"
## [3] "ADMK Amma ate Idly at last translationservices Arabictranslationinchennai"
## [4] "Coolness is the nature of water and heat the nature of fire Similarly joy and sorrow are the nature of life Amma"
## [5] "Consider life and everything that happens in life as a gift from God All true devotees have this attitude Amma http"
## [6] "foodforthought morningmeditations lifelessons mindbodyspirit Amma Love UnconditionalLove"