Executive Summary:

A Word-Cloud is built using 3000 tweets collected about Fast and Furious 7 (#FastFurious7) on the releasee day, i.e., 04-03-2015. As expected, late Mr. Paul Walker is well recollected as seen from the word cloud in the form of words “paul, rip, walker.” The movie seems to have positive response from a lot of people who tweeted words like “awesome, amazing,great,best,better etc.” Sentiment analysis is then done on the clean text to classify the tweets as Negative, Neutral and Positive tweets. It is important to make sure to have the sentiment.R file, text files of positive and negative words in the current directory. Tweets on the release day appears to be split into 10% Negative, 42% Positive and 48% Neutral opinions.

Load Libraries:

plyr
twitteR
tm
wordcloud
stirngr
ggplot2

Read Tweets:

Tweets are downloaded and stored as a text object. It is a good idea to ignore all the graphical parameters in order to prevent errors in the further functions.

setup_twitter_oauth(api_key,api_secret,access_token,access_secret)

## [1] "Using direct authentication"

tweets <- searchTwitter("#FastFurious7",n=3000,lang="en")
tweets.txt <- sapply(tweets, function(t)t$getText())
# Ignore graphical Parameters to avoid input errors
tweets.txt <- str_replace_all(tweets.txt,"[^[:graph:]]", " ")

Process Text

The text object now has to be preprocessed to remove retweets, numbers, routine english words and pronouns etc. The clean text has then to be stored as a vector in order to plot the wordmap. There might be any additional words additional to the default stopwords and are included in remwords vector in the code.

clean.text = function(x)
{
  
   # tolower
   x = tolower(x)
   # remove rt
   x = gsub("rt", "", x)
   # remove at
   x = gsub("@\\w+", "", x)
   # remove punctuation
   x = gsub("[[:punct:]]", "", x)
   # remove numbers
   x = gsub("[[:digit:]]", "", x)
   # remove links http
   x = gsub("http\\w+", "", x)
   # remove tabs
   x = gsub("[ |\t]{2,}", "", x)
   # remove blank spaces at the beginning
   x = gsub("^ ", "", x)
   # remove blank spaces at the end
   x = gsub(" $", "", x)
   return(x)
}

cleanText <- clean.text(tweets.txt)
vector <- paste(cleanText,collapse=" ")
remwords <- c("movie","fast","watching")
vector <- removeWords(vector,c(stopwords("english"),remwords))

Word Cloud

wordcloud(vector, scale=c(6,0.7), max.words=150, 
           random.order=FALSE, rot.per=0.35,colors=brewer.pal(8,"Dark2"))

Sentiment Analysis

pos <- scan("positive.txt",what="character",comment.char=";")
neg <- scan("negative.txt",what="character",comment.char=";")
source("sentiment.R")

analysis <- score.sentiment(cleanText,pos,neg)
table(analysis$score)

## 
##   -5   -3   -2   -1    0    1    2    3    4    5 
##    2    2   27  284 1448  985  184   57    9    2

neutral <- length(which(analysis$score == 0))
positive <- length(which(analysis$score > 0))
negative <- length(which(analysis$score < 0))
Sentiment <- c("Negative","Neutral","Positive")
Count <- c(negative,neutral,positive)
output <- as.data.frame(Sentiment,Count)
qplot(Sentiment,Count,data=output,geom = "histogram", fill=Sentiment,
      binwidth=1,stat="identity",main="Fast&Furious7 Sentiment Analysis")

References

Minqing Hu and Bing Liu. “Mining and Summarizing Customer Reviews.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle,Washington, USA.
MiningTwitter[https://sites.google.com/site/miningtwitter/]