Assignment-4 MBA 676

0.1 Introductions

“IF DONALD TRUMP loves one thing in this world, it’s Donald Trump. If he loves two things, a close second-place goes to his Twitter feed. Unfortunately, in addition to Trump’s loyal armies of patriots and trolls, Twitter also hosts both liberals and members of the lying liberal media—people who do not show President Trump the respect he deserves. And for that, they must be blocked.” wired, well everyone has his opinion.Since almost all the media mentions the relationship between president Trump and twitter and our assignment based on using twitter. I picked the two # words trump and Obama and make Twitter Data Analysis compare those words.In this assignment, we will make campaigning about the current tweets on two words trump and Obama.

0.2 Access Twitter Data

setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

## [1] "Using direct authentication"

0.3 Search Twitter

The Twitter API returns the tweets as a json stream and makes search on Twitter.In this function, we are going to make search in Twitter on two words or hashtags trump and Obama in those 24 hours or current tweets

#hashtage trump
trump_tweets <- 1000
trump <- searchTwitter('#trump', n = trump_tweets)
head(trump)

## [[1]]
## [1] "ElderJade: RT @TalbertSwan: @COGICFamily @realDonaldTrump Its only a big deal with #Trump when a Democrat gets accused of it! He dismisses the 16 wom"
## 
## [[2]]
## [1] "TimmieTadpole: RT @Emolclause: #BREAKING:.@GOP is actively HELPING #Trump PROFIT from his Presidency. This is the UNFETTERED #CORRUPTION we're accustomed"
## 
## [[3]]
## [1] "Anita67789399: RT @kwilli1046: \"I cannot get enough of Donald #Trump's Twitter feed. If you are not  enjoying it, I feel bad. This is the best thing ever"
## 
## [[4]]
## [1] "SGuilbaud22: RT @amjoyshow: Women who have accused #Trump of sexual misconduct. #AMJoy https://t.co/MSTf6bjETG"
## 
## [[5]]
## [1] "cdeltess: RT @TalbertSwan: @COGICFamily @realDonaldTrump Its only a big deal with #Trump when a Democrat gets accused of it! He dismisses the 16 wom"
## 
## [[6]]
## [1] "bgood12345: RT @Trumpfan1995: How will you mostly spend your Sunday? Do vote and retweet. #MAGA #Trump #NFL"

#hashtag obama
obama_tweets <- 1000
obama <- searchTwitter('#obama', n = obama_tweets)
head(obama)

## [[1]]
## [1] "RealJohhnieDoe: <ed><U+00A0><U+00BD><ed><U+00BA><U+00A8>MUST WATCH!  The 7th Floor CIA officer that developed the CIA Anti-Terror program speaks out on:\n#Obama \n#Clinton https://t.co/XhjQR2FnsK"
## 
## [[2]]
## [1] "Earthboy4life: RT @KNP2BP: #Obama left #Tahmooressi in Mexico &amp; #Wambier in NK\n\nMarine released b/c of @greta , Gov Richardson NM &amp; Reps. Royce (R-CA) Sal"
## 
## [[3]]
## [1] "jemz1113: RT @IncognitoPatrio: HERE'S THE REAL #RussianCollusion!  #Obama s Russian Collusion is ON VIDEO https://t.co/EyW8sfuSyh via @truthfeednews"
## 
## [[4]]
## [1] "kakeenan: RT @wavetossed: #qanon #Obama Barry Soetoro used a #Kenya story to hide the fact that he was #born in #Indonesia and a #Columbia #Universit"
## 
## [[5]]
## [1] "Kayem623: RT @KNP2BP: #Obama left #Tahmooressi in Mexico &amp; #Wambier in NK\n\nMarine released b/c of @greta , Gov Richardson NM &amp; Reps. Royce (R-CA) Sal"
## 
## [[6]]
## [1] "PatriotSteve4U: RT @KNP2BP: #Obama left #Tahmooressi in Mexico &amp; #Wambier in NK\n\nMarine released b/c of @greta , Gov Richardson NM &amp; Reps. Royce (R-CA) Sal"

0.4 Convert the list

Convert the list to a data frame using twListToDF().

trump_df <- twListToDF(trump)
obama_df <- twListToDF(obama)

0.5 Data Frame Contains

Ever HTML page has a meta tag as describe the page and usually it is the website link of the page names title numbers author The data frame contains the tweet along with 15 other metadata items including the username, the retweet count, the platform/application the tweet was created on

Trump

trump_df %>% group_by(statusSource) %>% 
  summarize(n = n()) %>% 
  arrange(desc(n)) %>% 
  top_n(10)

## Selecting by n

## # A tibble: 10 x 2
##                                                                   statusSource
##                                                                          <chr>
##  1 "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter fo
##  2 "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter f
##  3    "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"
##  4 "<a href=\"http://twitter.com/#!/download/ipad\" rel=\"nofollow\">Twitter f
##  5  "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Lite</a>"
##  6 "<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">
##  7                  "<a href=\"https://ifttt.com\" rel=\"nofollow\">IFTTT</a>"
##  8 "<a href=\"https://github.com/dekuplant?tab=repositories\" rel=\"nofollow\"
##  9         "<a href=\"http://grrrgrumbles.me/\" rel=\"nofollow\">Robocuck</a>"
## 10          "<a href=\"http://www.labnol.org\" rel=\"nofollow\">Jess Ella</a>"
## # ... with 1 more variables: n <int>

Obama

obama_df %>% group_by(statusSource) %>% 
  summarize(n = n()) %>% 
  arrange(desc(n)) %>% 
  top_n(10)

## Selecting by n

## # A tibble: 13 x 2
##                                                                   statusSource
##                                                                          <chr>
##  1    "<a href=\"http://twitter.com\" rel=\"nofollow\">Twitter Web Client</a>"
##  2 "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter fo
##  3 "<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter f
##  4 "<a href=\"http://twitter.com/#!/download/ipad\" rel=\"nofollow\">Twitter f
##  5  "<a href=\"https://mobile.twitter.com\" rel=\"nofollow\">Twitter Lite</a>"
##  6                  "<a href=\"https://ifttt.com\" rel=\"nofollow\">IFTTT</a>"
##  7 "<a href=\"https://about.twitter.com/products/tweetdeck\" rel=\"nofollow\">
##  8       "<a href=\"http://www.hootsuite.com\" rel=\"nofollow\">Hootsuite</a>"
##  9              "<a href=\"http://bufferapp.com\" rel=\"nofollow\">Buffer</a>"
## 10               "<a href=\"http://ingminds.com\" rel=\"nofollow\">Zlando</a>"
## 11 "<a href=\"http://leadstories.com\" rel=\"nofollow\">Lead Stories Feed Publ
## 12    "<a href=\"http://www.brianbrown.net/\" rel=\"nofollow\">BBN - SNAP</a>"
## 13 "<a href=\"http://www.twitter.com\" rel=\"nofollow\">Twitter for Windows</a
## # ... with 1 more variables: n <int>

totalTrump <- trump_df %>% summarise(total = sum(n()))
totalObama <- obama_df %>% summarise(total = sum(n()))
trump_df$created <- as.POSIXct(trump_df$created, format="%Y-%m-%d %H%M%S") 
trump_df_hour <- trump_df %>% mutate(formatD = format(round(trump_df$created, units="hours"), format="%H:%M")) %>% group_by(formatD) %>% summarize(n = n(), pct = n/2368*100)

obama_df$created <- as.POSIXct(obama_df$created, format="%Y-%m-%d %H%M%S") 
obama_df_hour <- obama_df  %>% mutate(formatD = format(round(obama_df$created, units="hours"), format="%H:%M")) %>% group_by(formatD) %>% summarize(n = n(), pct = n/10)
trump_df_hour$U.S.President <- "Donald Trump"
obama_df_hour$U.S.President <- "Barack Obama"
Total_hour <- rbind(trump_df_hour, obama_df_hour)

0.6 Plotting the data

0.6.1 By Platforms

compare platforms of trump hashtag and obama took from the current twetts during the day .

ggplot(Total_hour, aes(x = formatD , y = pct, fill = U.S.President)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Today") +
ylab("% tweets") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

nrc <- sentiments %>%
  filter(lexicon == "nrc") %>%
  select(word, sentiment)

0.6.2 Clean Data

process of detecting and removing anything is not word by using the tidytext library

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"

trump_words <- trump_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")
) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))

obama_words <- obama_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")
) %>% unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word, str_detect(word, "[a-z]"))

#Join nrc to words dataframes for Science and Art
trump_sentiments <- trump_words %>% inner_join(nrc, by = "word")
obama_sentiments <- obama_words %>% inner_join(nrc, by = "word")

trump_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))

## # A tibble: 10 x 2
##       sentiment     n
##           <chr> <int>
##  1     negative   636
##  2     positive   593
##  3        trust   471
##  4        anger   431
##  5         fear   425
##  6 anticipation   356
##  7          joy   327
##  8      sadness   294
##  9     surprise   267
## 10      disgust   229

obama_sentiments %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))

## # A tibble: 10 x 2
##       sentiment     n
##           <chr> <int>
##  1     negative   578
##  2        trust   436
##  3         fear   412
##  4     surprise   367
##  5     positive   363
##  6 anticipation   341
##  7      sadness   200
##  8        anger   193
##  9      disgust   193
## 10          joy   135

0.6.3 By Sentiments

trump_sentiments$U.S.President <- "Donald Trump"
obama_sentiments$U.S.President <- "Barack Obama"

Sentiments <- rbind(trump_sentiments, obama_sentiments)

Sentiments_df <- Sentiments %>%
group_by(U.S.President, sentiment) %>%
summarize(n = n()) %>%
mutate(pct = n/sum(n))

ggplot(Sentiments_df, aes(x = sentiment, y = pct, fill = U.S.President)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Sentiment") +
ylab("% Tweets") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

0.7 Wordclouds

0.7.1 Trump Words

trump_wc <- trump_sentiments %>% group_by(word) %>% summarize(count = n()) %>% arrange(desc(count))
wordcloud(trump_wc$word, trump_wc$count, random.order=FALSE, scale=c(5, .8), use.r.layout=TRUE,colors = brewer.pal(6, "Dark2"), max.words=90)

0.7.2 Obama Words

obama_wc <- obama_sentiments %>% group_by(word) %>% summarize(count = n()) %>% arrange(desc(count))
wordcloud(obama_wc$word, trump_wc$count, random.order=FALSE, scale=c(5, .8), use.r.layout=TRUE,colors = brewer.pal(6, "Dark2"), max.words=90)

0.8 Sentiment Score

0.8.1 Time Day

ggplot(data=Total_hour,aes(x =formatD,y = n))+
  geom_bar(aes(fill=formatD),stat="identity")+
  theme(legend.position = "none")+
  xlab("sentiment")+ylab("Score")+ggtitle("Total Sentiment Score During the Day")

0.8.2 Words Score

ggplot(data=Sentiments_df,aes(x =sentiment,y = n))+
  geom_bar(aes(fill=sentiment),stat="identity")+
  theme(legend.position = "none")+
  xlab("sentiment")+ylab("Score")+ggtitle("Total Sentiment Score Based on words Tweets")