Initial Proposal

Week ten was an interesting week were sentiment analysis was introduced. In week ten discussion I posted regarding Game of thrones and sentiment analysis for season 6 premiere. I feel that we did not cover the topic in depth so my proposal is the following. The election is over and unless the electoral college votes against Donald Trump he will be president. I want to do sentiment analysis using twitter. My primary goal is to capture the mood of the people within the month of November and December, classify twits as positive, negative, or neutral, and identify these words. I will implement learned material and implemented in the spirit of the class.

1 Scrape twitter for data regarding the election (message, date, Maybe geographical location)

2 After cleaning the data, I will use Mongo dB to store information

3 Analysis is going to be perform by querying Mongo dB and using ggplot2

I will use R, Mongo dB, Twitter, R packages (tidyr, dplyr, tm, ggplot2)

Intro

The goal of this my final project was to be able to gather information from a social website clean this data transform the data, and classify it. The topic intrige me due to the many application that can be achived.I see this project as a small step into inplementing a sentiment market reasearch tool with the inclusion of many other social media sites.

Twitter

I first start by connecting to the Twitter API. I first tried to connect using #library(“ROAuth”) but due to the api not validating my access code I search for a different implementation. What worked for me was using direct access authentication with the Twitter API.

#options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
consumerKey = "" 
consumerSecret = ""
accessToken =""
accessTokenSecret=""

#reqURL = "https://api.twitter.com/oauth/request_token" #important at the moment that it is https  Twitter needs a secure connection
#accessURL = "https://api.twitter.com/oauth/access_token"
#authURL = "https://api.twitter.com/oauth/authorize"
#twitCred = OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)
#twitCred$handshake() 
#registerTwitterOAuth(twitCred)


#setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret)

Data Acquisition

I used two search methods with TwitteR and the API connection.The first was the Search Twitter and the second one GetUser.This allowed me two get two different data sources and apply sentiment analysis.In my first method I searched for Trump and dowloaded 10,000 Twitts.The second methode I used Donald Trump Twitter handle to collect all his twitter feeds.I also tried getting his 17.7 M followers but when I search to download the direct connection only produced 56 fallowers.Finally each file were exported to a CSV file that was later uploaded to GitHub.

#tweets=searchTwitter("trump", n=10000,lang = "en")
#df = do.call("rbind", lapply(tweets, as.data.frame))

#write.csv(df, "Trump10000Tweets.csv", row.names=FALSE)


#TrumpTwiterAcct <- getUser("realDonaldTrump")
#donaldtweetslist = userTimeline(TrumpTwiterAcct, n=3200, includeRts=TRUE, excludeReplies=TRUE)
#tumpprofiletweetsdf = do.call("rbind", lapply(donaldtweetslist, as.data.frame))
#write.csv(tumpprofiletweetsdf, "realDonaldTrump3200Tweets.csv", row.names=FALSE)

Github

After uploading the data to github I user Rcurl to bring it back to my project. This was done in order to obtain a reproducible example.

url1 = "https://raw.githubusercontent.com/chrisestevez/DataAnalyticsProjects/master/FinalProject/Trump10000Tweets.csv"
Rdata1 = getURL(url1)
TrumpSearch = read.csv(text = Rdata1,header = TRUE,stringsAsFactors = F,sep=",")
head(TrumpSearch,5)
                                                                                                                                                     text
1                                                      RT @AboveTopSecret: ACLU Threatens Donald Trump Via New York Times Ad #ATS https://t.co/WPc6NC5Ci3
2                                                #Breaking News: A Gang Of Trump Fans Just Viciously Attacked Peaceful Protesters https://t.co/bqqyPz2Vai
3            @Siclittlemonkey As far as I know Trump hasn't done anything illegal. Hillary, on the other hand, should be behind bars for her many crimes.
4     RT @bannerite: #Shameless Donald Trump's sons behind nonprofit selling access to president-elect | Center for Public Integrity https://t.co<U+0085>
5 RT @feistybunnygirl: When u voted for Trump bc he promised to deport all the brown people, but then u realize he's going to cut ur SS &amp; Med<U+0085>
  favorited favoriteCount       replyToSN             created truncated
1     FALSE             0            <NA> 2016-12-20 05:23:19     FALSE
2     FALSE             1            <NA> 2016-12-20 05:23:19     FALSE
3     FALSE             0 Siclittlemonkey 2016-12-20 05:23:19     FALSE
4     FALSE             0            <NA> 2016-12-20 05:23:19     FALSE
5     FALSE             0            <NA> 2016-12-20 05:23:19     FALSE
    replyToSID           id replyToUID
1           NA 8.110795e+17         NA
2           NA 8.110795e+17         NA
3 8.110783e+17 8.110795e+17  429591885
4           NA 8.110795e+17         NA
5           NA 8.110795e+17         NA
                                                                          statusSource
1 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
2                                  <a href="http://ifttt.com" rel="nofollow">IFTTT</a>
3    <a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>
4   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
5 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
       screenName retweetCount isRetweet retweeted longitude latitude
1  OSDlirectioner            2      TRUE     FALSE        NA       NA
2 OccupyDemocrats            0     FALSE     FALSE        NA       NA
3   LibsAreInsane            0     FALSE     FALSE        NA       NA
4       akeithism           69      TRUE     FALSE        NA       NA
5       thelaynee          175      TRUE     FALSE        NA       NA
TrumpSearchText  = as.vector(TrumpSearch$text)

url2 = "https://raw.githubusercontent.com/chrisestevez/DataAnalyticsProjects/master/FinalProject/realDonaldTrump3200Tweets.csv"
Rdata2 = getURL(url2)
TrumpPersonal = read.csv(text = Rdata2,header = TRUE,stringsAsFactors = F,sep=",")
TrumpPersonalText  = as.vector(TrumpPersonal$text)
head(TrumpPersonal,5)
                                                                                                                                                                                                                                                                                                                                                                                                                                                                 text
1                                                                                                                                                                                                                                                                                                                               "@mike_pence: Congratulations to @RealDonaldTrump; officially elected President of the United States today by the Electoral College!"
2                                                                                                                                                                                                                                                                                                                          "@Franklin_Graham: Congratulations to President-elect @realDonaldTrump--the electoral votes are in and it's official." Thank you Franklin!
3 RT @DanScavino: #TrumpTrain<ed><U+00A0><U+00BD><ed><U+00BA><U+0082><ed><U+00A0><U+00BD><ed><U+00B2><U+00A8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8> https://t.co/qAQdBGEwSv
4                                                                                                                                                                                                                                                                                                                        We did it! Thank you to all of my great supporters, we just officially won the election (despite all of the distorted and inaccurate media).
5                                                                                                                                                                                                                                                                                                                        Today there were terror attacks in Turkey, Switzerland and Germany - and it is only getting worse. The civilized world must change thinking!
  favorited favoriteCount replyToSN             created truncated
1     FALSE          3714        NA 2016-12-20 02:50:25     FALSE
2     FALSE          5234        NA 2016-12-20 02:46:01     FALSE
3     FALSE             0        NA 2016-12-20 01:31:21     FALSE
4     FALSE        106442        NA 2016-12-19 23:51:41     FALSE
5     FALSE         58597        NA 2016-12-19 23:21:11     FALSE
  replyToSID           id replyToUID
1         NA 8.110410e+17         NA
2         NA 8.110399e+17         NA
3         NA 8.110211e+17         NA
4         NA 8.109961e+17         NA
5         NA 8.109884e+17         NA
                                                                          statusSource
1   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
2   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
3   <a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>
4 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
5 <a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>
       screenName retweetCount isRetweet retweeted longitude latitude
1 realDonaldTrump         1041     FALSE     FALSE        NA       NA
2 realDonaldTrump         1257     FALSE     FALSE        NA       NA
3 realDonaldTrump         3621      TRUE     FALSE        NA       NA
4 realDonaldTrump        32627     FALSE     FALSE        NA       NA
5 realDonaldTrump        21416     FALSE     FALSE        NA       NA

syuzhet

The sentiment analysis algorithm used here is based on the Word-Emotion Association of Saif Mohammad and Peter Turney. The use a dictionary that associates the words to eight different emotions and a negative/Positive sentiment.Please see exmaples below.

get_nrc_sentiment("Donal Trump is awesome and amazing I'm happy he is running for president")
  anger anticipation disgust fear joy sadness surprise trust negative
1     0            1       0    0   1       0        1     2        0
  positive
1        2
get_nrc_sentiment("I hate Donal Trump he is a liar and deceiving person")
  anger anticipation disgust fear joy sadness surprise trust negative
1     1            0       2    1   0       1        1     1        3
  positive
1        0

10,000 Twitts

In this part of the project I investigate to see if there is any pattern in emotion or sentiment by the Twitter Community.I begin by using the acquire data that was obtain by searching for the term Trump. The twitts were converted into a vector in order to process effectively the data. I used gsub to remove various unwanted terms. I later applied the sentiment algorithm and merge the results to the original data.After merging the data I used dplyr and tidyr to transform and plot the data using ggplot2.

head(TrumpSearchText,5)
[1] "RT @AboveTopSecret: ACLU Threatens Donald Trump Via New York Times Ad #ATS https://t.co/WPc6NC5Ci3"                                                   
[2] "#Breaking News: A Gang Of Trump Fans Just Viciously Attacked Peaceful Protesters https://t.co/bqqyPz2Vai"                                             
[3] "@Siclittlemonkey As far as I know Trump hasn't done anything illegal. Hillary, on the other hand, should be behind bars for her many crimes."         
[4] "RT @bannerite: #Shameless Donald Trump's sons behind nonprofit selling access to president-elect | Center for Public Integrity https://t.co<U+0085>"  
[5] "RT @feistybunnygirl: When u voted for Trump bc he promised to deport all the brown people, but then u realize he's going to cut ur SS &amp; Med<U+0085>"
RT @AboveTopSecret: ACLU Threatens Donald Trump Via New York Times Ad #ATS https://t.co/WPc6NC5Ci3

#Breaking News: A Gang Of Trump Fans Just Viciously Attacked Peaceful Protesters https://t.co/bqqyPz2Vai

@Siclittlemonkey As far as I know Trump hasn't done anything illegal. Hillary, on the other hand, should be behind bars for her many crimes.

RT @bannerite: #Shameless Donald Trump's sons behind nonprofit selling access to president-elect | Center for Public Integrity https://t.co㠼㸵

RT @feistybunnygirl: When u voted for Trump bc he promised to deport all the brown people, but then u realize he's going to cut ur SS &amp; Med㠼㸵
 cleanTweet = gsub("rt|RT", "", TrumpSearchText) # remove Retweet
cleanTweet = gsub("http\\w+", "", cleanTweet)  # remove links http
cleanTweet = gsub("<.*?>", "", cleanTweet) # remove html tags
cleanTweet = gsub("@\\w+", "", cleanTweet) # remove at(@)
cleanTweet = gsub("[[:punct:]]", "", cleanTweet) # remove punctuation
cleanTweet  = gsub("\r?\n|\r", " ", cleanTweet) # remove /n
cleanTweet = gsub("[[:digit:]]", "", cleanTweet) # remove numbers/Digits
cleanTweet = gsub("㠼|㸵|㤼|㸲|㸱|㸳|㸴|㸶|攼|㹤", "", cleanTweet) #  asian letters
cleanTweet = gsub("[ |\t]{2,}", "", cleanTweet) # remove tabs
cleanTweet = gsub("^ ", "", cleanTweet)  # remove blank spaces at the beginning
cleanTweet = gsub(" $", "", cleanTweet) # remove blank spaces at the end 

TrumpSearchSentiment = get_nrc_sentiment(cleanTweet)
head(TrumpSearchSentiment,5)
  anger anticipation disgust fear joy sadness surprise trust negative
1     0            0       0    0   0       0        1     0        0
2     1            1       0    1   1       0        2     1        1
3     1            0       1    1   0       1        1     0        1
4     0            1       1    0   0       0        0     1        1
5     0            0       0    0   0       0        1     0        0
  positive
1        0
2        1
3        0
4        2
5        0
TrumpSearchFinalData = cbind(TrumpSearch,TrumpSearchSentiment)

plotData1 =gather(TrumpSearchFinalData,"sentiment","values",17:24)  %>% 
  group_by( sentiment) %>%
  summarise(Total = sum(values))

ggplot(data = plotData1, aes(x = plotData1$sentiment, y = plotData1$Total)) +
        geom_bar(aes(fill = sentiment), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Emotions") + ylab("Total") + ggtitle("Emotion for Search Term Trump")+
   geom_text(aes(label =   plotData1$Total), position = position_dodge(width=0.75), vjust = -0.25)

plotData2 =gather(TrumpSearchFinalData,"Polarity","values",25:26)  %>% 
  group_by( Polarity) %>%
  summarise(Total = sum(values))

ggplot(data = plotData2, aes(x = plotData2$Polarity, y = plotData2$Total)) +
        geom_bar(aes(fill = plotData2$Polarity), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Sentiment") + ylab("Total") + ggtitle("Sentiment for Search Term Trump")+
  geom_text(aes(label =   plotData2$Total), position = position_dodge(width=0.75), vjust = -0.25)

Sentiment @realDonaldTrump

In this section I focused on Donal Trumps personal twitter handle.The data set includes retwitts and ranges from 2/2016-12/2016. I also try to make sense of the emotions and sentiment by plotting the data monthly.

head( TrumpPersonalText,5)
[1] "\"@mike_pence: Congratulations to @RealDonaldTrump; officially elected President of the United States today by the Electoral College!\""                                                                                                                                                                                                                                                                                                                            
[2] "\"@Franklin_Graham: Congratulations to President-elect @realDonaldTrump--the electoral votes are in and it's official.\" Thank you Franklin!"                                                                                                                                                                                                                                                                                                                       
[3] "RT @DanScavino: #TrumpTrain<ed><U+00A0><U+00BD><ed><U+00BA><U+0082><ed><U+00A0><U+00BD><ed><U+00B2><U+00A8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8> https://t.co/qAQdBGEwSv"
[4] "We did it! Thank you to all of my great supporters, we just officially won the election (despite all of the distorted and inaccurate media)."                                                                                                                                                                                                                                                                                                                       
[5] "Today there were terror attacks in Turkey, Switzerland and Germany - and it is only getting worse. The civilized world must change thinking!"                                                                                                                                                                                                                                                                                                                       
"@mike_pence: Congratulations to @RealDonaldTrump; officially elected President of the United States today by the Electoral College!"

"@Franklin_Graham: Congratulations to President-elect @realDonaldTrump--the electoral votes are in and it's official." Thank you Franklin!

RT @DanScavino: #TrumpTrain<ed><U+00A0><U+00BD><ed><U+00BA><U+0082><ed><U+00A0><U+00BD><ed><U+00B2><U+00A8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8><ed><U+00A0><U+00BC><ed><U+00B7><U+00BA><ed><U+00A0><U+00BC><ed><U+00B7><U+00B8> https://t.co/qAQdBGEwSv

We did it! Thank you to all of my great supporters, we just officially won the election (despite all of the distorted and inaccurate media).

Today there were terror attacks in Turkey, Switzerland and Germany - and it is only getting worse. The civilized world must change thinking!
 cleanTweetp = gsub("rt|RT", "", TrumpPersonalText) # remove Retweet
cleanTweetp = gsub("http\\w+", "", cleanTweetp)  # remove links http
cleanTweetp = gsub("<.*?>", "", cleanTweetp) # remove html tags
cleanTweetp = gsub("@\\w+", "", cleanTweetp) # remove at(@)
cleanTweetp = gsub("[[:punct:]]", "", cleanTweetp) # remove punctuation
cleanTweetp  = gsub("\r?\n|\r", " ", cleanTweetp) # remove /n
cleanTweetp = gsub("[[:digit:]]", "", cleanTweetp) # remove numbers/Digits
cleanTweetp = gsub("㠼|㸵|㤼|㸲|㸱|㸳|㸴|㸶|攼|㹤", "", cleanTweetp) #  asian letters
cleanTweetp = gsub("[ |\t]{2,}", "", cleanTweetp) # remove tabs
cleanTweetp = gsub("^ ", "", cleanTweetp)  # remove blank spaces at the beginning
cleanTweetp = gsub(" $", "", cleanTweetp) # remove blank spaces at the end 

TrumpPersonalSentiment = get_nrc_sentiment(cleanTweetp)
head(TrumpPersonalSentiment,5)
  anger anticipation disgust fear joy sadness surprise trust negative
1     0            0       0    0   0       0        0     2        0
2     0            0       0    0   0       0        0     1        0
3     0            0       0    0   0       0        0     0        0
4     0            0       0    0   0       0        0     0        1
5     0            0       0    3   1       1        0     1        2
  positive
1        2
2        0
3        0
4        0
5        1
TrumpPersonalFinalData = cbind(TrumpPersonal,TrumpPersonalSentiment)

plotData3 =gather(TrumpPersonalFinalData,"sentiment","values",17:24)  %>% 
  group_by( sentiment) %>%
  summarise(Total = sum(values))

ggplot(data = plotData3, aes(x = plotData3$sentiment, y = plotData3$Total)) +
        geom_bar(aes(fill = sentiment), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Sentiment") + ylab("Total") + ggtitle("Emotions for @realDonaldTrump")+
   geom_text(aes(label =   plotData3$Total), position = position_dodge(width=0.75), vjust = -0.25)

plotData4 =gather(TrumpPersonalFinalData,"Polarity","values",25:26)  %>% 
  group_by( Polarity) %>%
  summarise(Total = sum(values))

ggplot(data = plotData4, aes(x = plotData4$Polarity, y = plotData4$Total)) +
        geom_bar(aes(fill = plotData4$Polarity), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Sentiment") + ylab("Total") + ggtitle("Sentiment for @realDonaldTrump")+
  geom_text(aes(label =   plotData4$Total), position = position_dodge(width=0.75), vjust = -0.25)

plotData5 = select(TrumpPersonalFinalData,created,17:24)
 plotData5 = separate(plotData5,created,c("date","Time")," ") %>%
  group_by(date)%>%
   summarise(Anger=sum(anger), Anticipation=sum(anticipation), Disgust=sum(disgust), Fear=sum(fear), Joy=sum(joy), Sadness=sum(sadness), Surprise=sum(surprise), Trust=sum(trust))
 
 plotData5$date = as.Date(plotData5$date,"%Y-%m-%d") 

 plotData5$date <- as.Date(cut(plotData5$date, breaks = "month"))
 
  plotData5 = gather(plotData5,"sentiment","values",2:9)%>%
        group_by(date,sentiment)%>%
    summarise(Total=sum(values))
  
ggplot(data = plotData5, aes(x = plotData5$date, y = plotData5$Total, group = plotData5$sentiment)) +
        geom_line(size = 2.5, alpha = 0.7, aes(color = sentiment,stat = "identity")) +
        geom_point(size = 0.5) +
        #ylim(0, 0.6) +
        theme(legend.title=element_blank(), axis.title.x = element_blank()) +
        ylab("Total") + 
        ggtitle("Emotions of @realDonaldTrump 2/2016-12/2016")+
  scale_y_continuous(limits=c(0,300)) 

plotData6 =gather(TrumpPersonalFinalData,"Polarity","values",25:26)  %>% 
  group_by( created,Polarity) %>%
  summarise(Total = sum(values))
 plotData6 = separate(plotData6,created,c("date","Time")," ")
 plotData6$date = as.Date(plotData6$date,"%Y-%m-%d") 
 plotData6$date <- as.Date(cut(plotData6$date, breaks = "month"))


  plotData6 = select(plotData6,date,Polarity,Total)%>%
    group_by(date,Polarity)%>%
    summarise(Total = sum(Total))

  ggplot(data = plotData6, aes(x = plotData6$date, y = plotData6$Total, group = plotData6$Polarity)) +
        geom_line(size = 2.5, alpha = 0.7, aes(color = plotData6$Polarity,stat = "identity")) +
        geom_point(size = 0.5) +
        #ylim(0, 0.6) +
        theme(legend.title=element_blank(), axis.title.x = element_blank()) +
        ylab("Total") + 
        ggtitle("Sentiment of @realDonaldTrump 2/2016-12/2016")+
  scale_y_continuous(limits=c(0,500)) 

Wordcloud @realDonaldTrump

vector = TrumpPersonal$text
Corpus <- Corpus(VectorSource(vector))
Corpus = tm_map(Corpus,removeNumbers)
Corpus = tm_map(Corpus,str_replace_all,pattern = "http\\w+", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "<.*?>", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "@\\w+", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern ="\\=", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "[[:punct:]]", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "amp", replacement =" ")
Corpus = tm_map(Corpus,removeWords, words= stopwords("en"))
Corpus = tm_map(Corpus,tolower)
Corpus = tm_map(Corpus,stripWhitespace)
Corpus = tm_map(Corpus, PlainTextDocument)

tdm = TermDocumentMatrix(Corpus)
tdm
<<TermDocumentMatrix (terms: 6678, documents: 3195)>>
Non-/sparse entries: 31745/21304465
Sparsity           : 100%
Maximal term length: 28
Weighting          : term frequency (tf)
wordcloud(words = Corpus, 
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

failed Attempt

# 
# 
# #Data Manipulation and Algorithm Implementation
# #tweets.text = laply(tweets, function(t)t$getText())
# 
# #now if you haven’t download the documents that Michael mention on his video, you definitely need to do it now. Remember to save them in the same folder that your R code
# 
# score.sentiment = function(sentences, positiveWords, negativeWords, .progress='none')
# {
# require(plyr)
# require(stringr)
# 
# # we got a vector of sentences. plyr will handle a list or a vector as an “l” for us
# # we want a simple array of scores back, so we use “l” + “a” + “ply” = laply:
# scores = laply(sentences, function(sentence, positiveWords, negativeWords) {
# 
# # clean up sentences with R’s regex-driven global substitute, gsub():
# sentence = gsub('[[:punct:]]', '', sentence)
# sentence = gsub('[[:cntrl:]]', '', sentence)
# sentence = gsub('\\d+', '', sentence)
# # and convert to lower case:
# sentence = tolower(sentence)
# 
# # split into words. str_split is in the stringr package
# word.list = str_split(sentence, '\\s+')
# # sometimes a list() is one level of hierarchy too much
# words = unlist(word.list)
# 
# # compare our words to the dictionaries of positive & negative terms
# pos.matches = match(words, positiveWords)
# neg.matches = match(words, negativeWords)
# 
# # match() returns the position of the matched term or NA
# # we just want a TRUE/FALSE:
# pos.matches = !is.na(pos.matches)
# neg.matches = !is.na(neg.matches)
# 
# # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
# score = sum(pos.matches) - sum(neg.matches)
# 
# return(score)
# }, positiveWords,negativeWords, .progress=.progress )
# 
# scores.df = data.frame(score=scores, text=sentences)
# return(scores.df)
# }
# 
# #this positive and negative words are related to abortion
# positiveWords = scan('positive.txt',what = 'character', comment.char = ';')
# negativeWords = scan('negative.txt',what = 'character', comment.char = ';')
# 
# #Analyse the results
# 
# analysis = score.sentiment(mydf, pos.words, neg.words,.progress='none')
#  table(analysis$score)
#  mean(analysis$score)
#  median(analysis$score)
#  hist(analysis$score)

Conclusion

Sentiment analysis can be applied to many topics.It was interesting to see how he was relating a positive message within his twitter handle.This overshadow the negativity. Also towards november Trust emotion was very high indicating support and self confidence. In the instance were the term trump was search suprise seem to be the overwhelming emotion.

---
title: "607 Final Project - Sentiment Analysis Twitter "
output: 
  html_notebook:
    theme: cosmo
    toc: true
    toc_float: true
    code_folding: show
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library("tidyr")
library("dplyr")
library("tm")
library("RCurl")
#library("ROAuth")
library("twitteR")
library("stringr")
library("syuzhet")
library("lubridate")
library("ggplot2")
library("wordcloud")
library("RColorBrewer")
```
#Initial Proposal
Week ten was an interesting week were sentiment analysis was introduced. In week ten discussion I posted regarding Game of thrones and sentiment analysis for season 6 premiere. I feel that we did not cover the topic in depth so my proposal is the following.
The election is over and unless the electoral college votes against Donald Trump he will be president. I want to do sentiment analysis using twitter. My primary goal is to capture the mood of the people within the month of November and December, classify twits as positive, negative, or neutral, and identify these words. I will implement learned material and implemented in the spirit of the class.

1	Scrape twitter for data regarding the election (message, date, Maybe geographical location) 

2	After cleaning the data, I will use Mongo dB to store information 

3	Analysis is going to be perform by querying Mongo dB and using ggplot2 

I will use R, Mongo dB, Twitter, R packages (tidyr, dplyr, tm, ggplot2)


#Intro
The goal of this my final project was to be able to gather information from a social website clean this data transform the data, and classify it. The topic intrige me due to the many application that can be achived.I see this project as a small step into inplementing a sentiment market reasearch tool with the inclusion of many other social media sites.



#Twitter
I first start by connecting to the Twitter API. I first tried to connect using #library("ROAuth") but due to the api not validating my access code I search for a different implementation. What worked for me was using direct access  authentication with the Twitter API.
```{r}



#options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
consumerKey = "" 
consumerSecret = ""
accessToken =""
accessTokenSecret=""

#reqURL = "https://api.twitter.com/oauth/request_token" #important at the moment that it is https  Twitter needs a secure connection
#accessURL = "https://api.twitter.com/oauth/access_token"
#authURL = "https://api.twitter.com/oauth/authorize"
#twitCred = OAuthFactory$new(consumerKey=consumerKey,consumerSecret=consumerSecret,requestURL=reqURL,accessURL=accessURL,authURL=authURL)
#twitCred$handshake() 
#registerTwitterOAuth(twitCred)


#setup_twitter_oauth(consumerKey,consumerSecret,accessToken,accessTokenSecret)
```

#Data Acquisition
I used two search methods with TwitteR and the API connection.The first was the Search Twitter and the second one GetUser.This allowed me two get two different data sources and apply sentiment analysis.In my first method I searched for Trump and dowloaded 10,000 Twitts.The second methode I used Donald Trump Twitter handle to collect all his twitter feeds.I also tried getting his 17.7 M followers but when I search to download the direct connection only produced 56 fallowers.Finally each file were exported to a CSV file that was later uploaded to GitHub.
```{r}

#tweets=searchTwitter("trump", n=10000,lang = "en")
#df = do.call("rbind", lapply(tweets, as.data.frame))

#write.csv(df, "Trump10000Tweets.csv", row.names=FALSE)


#TrumpTwiterAcct <- getUser("realDonaldTrump")
#donaldtweetslist = userTimeline(TrumpTwiterAcct, n=3200, includeRts=TRUE, excludeReplies=TRUE)
#tumpprofiletweetsdf = do.call("rbind", lapply(donaldtweetslist, as.data.frame))
#write.csv(tumpprofiletweetsdf, "realDonaldTrump3200Tweets.csv", row.names=FALSE)

```


#Github
After uploading the data to github I user Rcurl to bring it back to my project. This was done in order to obtain a reproducible example.
```{r}
url1 = "https://raw.githubusercontent.com/chrisestevez/DataAnalyticsProjects/master/FinalProject/Trump10000Tweets.csv"
Rdata1 = getURL(url1)
TrumpSearch = read.csv(text = Rdata1,header = TRUE,stringsAsFactors = F,sep=",")
head(TrumpSearch,5)

TrumpSearchText  = as.vector(TrumpSearch$text)

url2 = "https://raw.githubusercontent.com/chrisestevez/DataAnalyticsProjects/master/FinalProject/realDonaldTrump3200Tweets.csv"
Rdata2 = getURL(url2)
TrumpPersonal = read.csv(text = Rdata2,header = TRUE,stringsAsFactors = F,sep=",")
TrumpPersonalText  = as.vector(TrumpPersonal$text)
head(TrumpPersonal,5)

```

#syuzhet
The sentiment analysis algorithm used here is based on the Word-Emotion Association of Saif Mohammad and Peter Turney. The use a dictionary that associates the words to eight different emotions and a negative/Positive sentiment.Please see exmaples below.
```{r}
get_nrc_sentiment("Donal Trump is awesome and amazing I'm happy he is running for president")

get_nrc_sentiment("I hate Donal Trump he is a liar and deceiving person")
```

#10,000 Twitts
In this part of the project I investigate to see if there is any pattern in emotion or sentiment by the Twitter Community.I begin by using the acquire data that was obtain by searching for the term Trump. The twitts were converted into a vector in order to process effectively the data. I used gsub to remove various unwanted terms. I later applied the sentiment algorithm and merge the results to the original data.After merging the data I used dplyr and tidyr to transform and plot the data using ggplot2.
```{r}
head(TrumpSearchText,5)

 cleanTweet = gsub("rt|RT", "", TrumpSearchText) # remove Retweet
cleanTweet = gsub("http\\w+", "", cleanTweet)  # remove links http
cleanTweet = gsub("<.*?>", "", cleanTweet) # remove html tags
cleanTweet = gsub("@\\w+", "", cleanTweet) # remove at(@)
cleanTweet = gsub("[[:punct:]]", "", cleanTweet) # remove punctuation
cleanTweet  = gsub("\r?\n|\r", " ", cleanTweet) # remove /n
cleanTweet = gsub("[[:digit:]]", "", cleanTweet) # remove numbers/Digits
cleanTweet = gsub("㠼|㸵|㤼|㸲|㸱|㸳|㸴|㸶|攼|㹤", "", cleanTweet) #  asian letters
cleanTweet = gsub("[ |\t]{2,}", "", cleanTweet) # remove tabs
cleanTweet = gsub("^ ", "", cleanTweet)  # remove blank spaces at the beginning
cleanTweet = gsub(" $", "", cleanTweet) # remove blank spaces at the end 

TrumpSearchSentiment = get_nrc_sentiment(cleanTweet)
head(TrumpSearchSentiment,5)
TrumpSearchFinalData = cbind(TrumpSearch,TrumpSearchSentiment)

plotData1 =gather(TrumpSearchFinalData,"sentiment","values",17:24)  %>% 
  group_by( sentiment) %>%
  summarise(Total = sum(values))

ggplot(data = plotData1, aes(x = plotData1$sentiment, y = plotData1$Total)) +
        geom_bar(aes(fill = sentiment), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Emotions") + ylab("Total") + ggtitle("Emotion for Search Term Trump")+
   geom_text(aes(label =   plotData1$Total), position = position_dodge(width=0.75), vjust = -0.25)

plotData2 =gather(TrumpSearchFinalData,"Polarity","values",25:26)  %>% 
  group_by( Polarity) %>%
  summarise(Total = sum(values))

ggplot(data = plotData2, aes(x = plotData2$Polarity, y = plotData2$Total)) +
        geom_bar(aes(fill = plotData2$Polarity), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Sentiment") + ylab("Total") + ggtitle("Sentiment for Search Term Trump")+
  geom_text(aes(label =   plotData2$Total), position = position_dodge(width=0.75), vjust = -0.25)

```

#Sentiment @realDonaldTrump
In this section I focused on Donal Trumps personal twitter handle.The data set includes retwitts and ranges from 2/2016-12/2016. I also try to make sense of the emotions and sentiment by plotting the data monthly.
```{r}
head( TrumpPersonalText,5)
 cleanTweetp = gsub("rt|RT", "", TrumpPersonalText) # remove Retweet
cleanTweetp = gsub("http\\w+", "", cleanTweetp)  # remove links http
cleanTweetp = gsub("<.*?>", "", cleanTweetp) # remove html tags
cleanTweetp = gsub("@\\w+", "", cleanTweetp) # remove at(@)
cleanTweetp = gsub("[[:punct:]]", "", cleanTweetp) # remove punctuation
cleanTweetp  = gsub("\r?\n|\r", " ", cleanTweetp) # remove /n
cleanTweetp = gsub("[[:digit:]]", "", cleanTweetp) # remove numbers/Digits
cleanTweetp = gsub("㠼|㸵|㤼|㸲|㸱|㸳|㸴|㸶|攼|㹤", "", cleanTweetp) #  asian letters
cleanTweetp = gsub("[ |\t]{2,}", "", cleanTweetp) # remove tabs
cleanTweetp = gsub("^ ", "", cleanTweetp)  # remove blank spaces at the beginning
cleanTweetp = gsub(" $", "", cleanTweetp) # remove blank spaces at the end 

TrumpPersonalSentiment = get_nrc_sentiment(cleanTweetp)
head(TrumpPersonalSentiment,5)
TrumpPersonalFinalData = cbind(TrumpPersonal,TrumpPersonalSentiment)

plotData3 =gather(TrumpPersonalFinalData,"sentiment","values",17:24)  %>% 
  group_by( sentiment) %>%
  summarise(Total = sum(values))

ggplot(data = plotData3, aes(x = plotData3$sentiment, y = plotData3$Total)) +
        geom_bar(aes(fill = sentiment), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Sentiment") + ylab("Total") + ggtitle("Emotions for @realDonaldTrump")+
   geom_text(aes(label =   plotData3$Total), position = position_dodge(width=0.75), vjust = -0.25)

plotData4 =gather(TrumpPersonalFinalData,"Polarity","values",25:26)  %>% 
  group_by( Polarity) %>%
  summarise(Total = sum(values))

ggplot(data = plotData4, aes(x = plotData4$Polarity, y = plotData4$Total)) +
        geom_bar(aes(fill = plotData4$Polarity), stat = "identity") +
       theme(legend.position = "none") +
        xlab("Sentiment") + ylab("Total") + ggtitle("Sentiment for @realDonaldTrump")+
  geom_text(aes(label =   plotData4$Total), position = position_dodge(width=0.75), vjust = -0.25)


plotData5 = select(TrumpPersonalFinalData,created,17:24)
 plotData5 = separate(plotData5,created,c("date","Time")," ") %>%
  group_by(date)%>%
   summarise(Anger=sum(anger), Anticipation=sum(anticipation), Disgust=sum(disgust), Fear=sum(fear), Joy=sum(joy), Sadness=sum(sadness), Surprise=sum(surprise), Trust=sum(trust))
 
 plotData5$date = as.Date(plotData5$date,"%Y-%m-%d") 

 plotData5$date <- as.Date(cut(plotData5$date, breaks = "month"))
 
  plotData5 = gather(plotData5,"sentiment","values",2:9)%>%
        group_by(date,sentiment)%>%
    summarise(Total=sum(values))
  
ggplot(data = plotData5, aes(x = plotData5$date, y = plotData5$Total, group = plotData5$sentiment)) +
        geom_line(size = 2.5, alpha = 0.7, aes(color = sentiment,stat = "identity")) +
        geom_point(size = 0.5) +
        #ylim(0, 0.6) +
        theme(legend.title=element_blank(), axis.title.x = element_blank()) +
        ylab("Total") + 
        ggtitle("Emotions of @realDonaldTrump 2/2016-12/2016")+
  scale_y_continuous(limits=c(0,300)) 


plotData6 =gather(TrumpPersonalFinalData,"Polarity","values",25:26)  %>% 
  group_by( created,Polarity) %>%
  summarise(Total = sum(values))
 plotData6 = separate(plotData6,created,c("date","Time")," ")
 plotData6$date = as.Date(plotData6$date,"%Y-%m-%d") 
 plotData6$date <- as.Date(cut(plotData6$date, breaks = "month"))


  plotData6 = select(plotData6,date,Polarity,Total)%>%
    group_by(date,Polarity)%>%
    summarise(Total = sum(Total))

  ggplot(data = plotData6, aes(x = plotData6$date, y = plotData6$Total, group = plotData6$Polarity)) +
        geom_line(size = 2.5, alpha = 0.7, aes(color = plotData6$Polarity,stat = "identity")) +
        geom_point(size = 0.5) +
        #ylim(0, 0.6) +
        theme(legend.title=element_blank(), axis.title.x = element_blank()) +
        ylab("Total") + 
        ggtitle("Sentiment of @realDonaldTrump 2/2016-12/2016")+
  scale_y_continuous(limits=c(0,500)) 
  
  
```

#Wordcloud @realDonaldTrump
```{r}
vector = TrumpPersonal$text
Corpus <- Corpus(VectorSource(vector))
Corpus = tm_map(Corpus,removeNumbers)
Corpus = tm_map(Corpus,str_replace_all,pattern = "http\\w+", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "<.*?>", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "@\\w+", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern ="\\=", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "[[:punct:]]", replacement =" ")
Corpus = tm_map(Corpus,str_replace_all,pattern = "amp", replacement =" ")
Corpus = tm_map(Corpus,removeWords, words= stopwords("en"))
Corpus = tm_map(Corpus,tolower)
Corpus = tm_map(Corpus,stripWhitespace)
Corpus = tm_map(Corpus, PlainTextDocument)

tdm = TermDocumentMatrix(Corpus)
tdm
wordcloud(words = Corpus, 
          max.words=200, random.order=FALSE, rot.per=0.35, 
          colors=brewer.pal(8, "Dark2"))

```


#failed Attempt
```{r}


# 
# 
# #Data Manipulation and Algorithm Implementation
# #tweets.text = laply(tweets, function(t)t$getText())
# 
# #now if you haven’t download the documents that Michael mention on his video, you definitely need to do it now. Remember to save them in the same folder that your R code
# 
# score.sentiment = function(sentences, positiveWords, negativeWords, .progress='none')
# {
# require(plyr)
# require(stringr)
# 
# # we got a vector of sentences. plyr will handle a list or a vector as an “l” for us
# # we want a simple array of scores back, so we use “l” + “a” + “ply” = laply:
# scores = laply(sentences, function(sentence, positiveWords, negativeWords) {
# 
# # clean up sentences with R’s regex-driven global substitute, gsub():
# sentence = gsub('[[:punct:]]', '', sentence)
# sentence = gsub('[[:cntrl:]]', '', sentence)
# sentence = gsub('\\d+', '', sentence)
# # and convert to lower case:
# sentence = tolower(sentence)
# 
# # split into words. str_split is in the stringr package
# word.list = str_split(sentence, '\\s+')
# # sometimes a list() is one level of hierarchy too much
# words = unlist(word.list)
# 
# # compare our words to the dictionaries of positive & negative terms
# pos.matches = match(words, positiveWords)
# neg.matches = match(words, negativeWords)
# 
# # match() returns the position of the matched term or NA
# # we just want a TRUE/FALSE:
# pos.matches = !is.na(pos.matches)
# neg.matches = !is.na(neg.matches)
# 
# # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
# score = sum(pos.matches) - sum(neg.matches)
# 
# return(score)
# }, positiveWords,negativeWords, .progress=.progress )
# 
# scores.df = data.frame(score=scores, text=sentences)
# return(scores.df)
# }
# 
# #this positive and negative words are related to abortion
# positiveWords = scan('positive.txt',what = 'character', comment.char = ';')
# negativeWords = scan('negative.txt',what = 'character', comment.char = ';')
# 
# #Analyse the results
# 
# analysis = score.sentiment(mydf, pos.words, neg.words,.progress='none')
#  table(analysis$score)
#  mean(analysis$score)
#  median(analysis$score)
#  hist(analysis$score)




```


#Conclusion
Sentiment analysis can be applied to many topics.It was interesting to see how he was relating a positive message within his twitter handle.This overshadow the negativity. Also towards november Trust emotion was very high indicating support and self confidence. In the instance were the term trump was search suprise seem to be the overwhelming emotion.


#References

http://stackoverflow.com/questions/21781014/remove-all-line-breaks-enter-symbols-from-the-string-using-r

http://technokarak.com/how-to-clean-the-twitter-data-using-r-twitter-mining-tutorial.html

http://juliasilge.com/blog/Joy-to-the-World/

https://www.r-bloggers.com/plot-weekly-or-monthly-totals-in-r/

http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm

https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

https://github.com/jeffreybreen/twitter-sentiment-analysis-tutorial-201107
