Project 4: Twitter Scraping

Reactions to the 2016 Presidential Election

For this assignment, I decided to look at some Twitter reactions to the 2016 Presidential Election. In this case, we’re querying Twitter to look for two hashtags: “MAGA”, and “LoveTrumpsHate”.

The MAGA hashtag is short for “Make America Great Again” and is a hashtag frequently used by Trump supporters.

The LoveTrumpsHate hashtag is conversely frequently used by Trump opponents.

We’ll begin by loading libraries and establishing the connection to Twitter. Note that the code establishing the connection to Twitter is hidden for security purposes.

###Project Work
setwd("~/Documents/MBA 676/Unit 11 Stuff")
getwd()
library(plyr)
library(dplyr)
library(jsonlite)
library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(wordcloud)
library(knitr)
library(colorspace)
library(RColorBrewer)
library(tm)
library(maps)

Next we gather the data:

num_tweets <- 3000
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
#Get love tweets
love <- searchTwitter('#lovetrumpshate', n = num_tweets)
love_df <- twListToDF(love)
#Get #MAGA tweets
maga <- searchTwitter('#MAGA', n = num_tweets)
maga_df <- twListToDF(maga)

Next I was interested to see if there were any common screen names between the top users using each hashtag. In this case, I looked for users who used either hashtag more than twice:

#View the Top Screen Names by number of #LoveTrumpsHate tweets
top_love<-love_df %>% 
        group_by(screenName) %>% 
        summarize(n = n()) %>%
        arrange(desc(n)) %>%
        filter(n > 2)

#View the Top 50 Screen Names by number of #MAGA tweets
top_maga<-maga_df %>% 
        group_by(screenName) %>% 
        summarize(n = n()) %>%
        arrange(desc(n)) %>%
        filter(n > 2)
common_users_love_maga<-inner_join(top_love,top_maga, by = "screenName")
kable(common_users_love_maga)

screenName	n.x	n.y
JorgeLZambrana1	4	6

Perhaps not surprisingly, there are no common users repeatedly using both hashtags.

Fun in the clouds

Now, let’s take a deeper look at the contents of the two different groups of tweets. Inspired by Jill Waterhouse’s project, I decided to start with word clouds from the two groups.
First, find the #LoveTrumpsHate word cloud:

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
love_words <- love_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))
#Create the maga_words frame
maga_words <- maga_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

#Add the nrc and bing sentiment frames
nrc <- sentiments %>%
  filter(lexicon == "nrc") %>%
  select(word, sentiment)
bing <- sentiments %>%
  filter(lexicon == "bing") %>%
  select(word, sentiment)

#Create the Love and Maga nrc sentiment frames:
love_sentiments <- love_words %>% inner_join(nrc, by = "word")
maga_sentiments <- maga_words %>% inner_join(nrc, by = "word")

#Create Love Cloud
love_sentiments %>% count(word) %>% with(wordcloud(word, n, max.words = 100, scale=c(5,.5),min.freq=3, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"RdBu")), rot.per=0.35)

And here, the #MakeAmericaGreatAgain word cloud:

#Create MAGA Cloud
maga_sentiments %>% count(word) %>% with(wordcloud(word, n, max.words = 100, scale=c(5,.5),min.freq=3, random.order=FALSE, rot.per=.15, colors=brewer.pal(8,"RdBu")), rot.per=0.35)

The differences between the clouds are quite striking.

In the #LoveTrumpsHate cloud, we find a mixed bag of “positive” and “negative” words, with many words like “liberty”, “peaceful”, “beautiful”, and “solidarity.” These are mixed with darker words like “rape”, “hate”, and “riot.”

In the #MakeAmericaGreatAgain cloud, we find a spartan, more authoritarian worldview, with words like, “police”, “swamp”, “money”, “violence”, and “shameful.”

Are These Differences Real?

While these differences seem stark, let’s apply one more layer of sentiment analysis to the data to see if we can see the difference quantitatively.

love_sentiments$Hashtag <- "#LoveTrumpsHate"
maga_sentiments$Hashtag <- "#MakeAmericaGreatAgain"
combined_sentiments <-rbind(love_sentiments, maga_sentiments)
combined_df <-combined_sentiments %>% 
  group_by(Hashtag, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n/sum(n)*100)
ggplot(combined_df, aes(x = sentiment, y = frequency, fill = Hashtag)) + 
  geom_bar(stat = "identity", position = "dodge") +
  xlab("Sentiment") +
  ylab("Sentiment Frequency within tweets") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_fill_manual(values=c("#0072B2", "#D55E00"))

Though imperfect, our quantitative sentiment analysis shows some significant differences between the sentiments of the #LoveTrumpsHate and #MakeAmericaGreatAgain tweets, with #LoveTrumpsHate tweets containing a greater frequency of fear and joy words, and #MakeAmericaGreatAgain tweets containing more trust and positive words.

Overall, we see a great deal of unrest in users tweeting both hashtags. I hope over time, there is reason for both groups to feel more optimistic about America’s direction.

Project 4: Twitter Scraping

J. McHenry

11/13/2016

Reactions to the 2016 Presidential Election

Fun in the clouds

Are These Differences Real?