Who’s more negative on Twitter: Paul LePage or Donald Trump?

Methodology

First, I loaded all necessary packages and connected to the Twitter API.

library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(dplyr)
library(knitr)

To keep the data manageable and comparable between LePage and Trump, I only pulled down their last 200 tweets each. Because of certain conditions with the “userTimeline” command, this resulted in less than 200 tweets for each politician. I used the function to exclude retweets and replies to focus on their chosen language.

num_tweets <- 200
lepage_tweets <- userTimeline('@Governor_LePage', n = 200, includeRts = FALSE, excludeReplies = TRUE)
trump_tweets <- userTimeline('@realDonaldTrump', n = 200, includeRts = FALSE, excludeReplies = TRUE)

Those raw tweets now need to go into a data frame. I made one for each politician. Those will serve as the base datasets that I narrow down.

pl_df <- twListToDF(lepage_tweets)
dt_df <- twListToDF(trump_tweets)

It’s worth noting that Trump tweets far more than LePage. Upon a classmate’s suggestion, I made a time variable and did a simple range output. When I updated this analysis on Nov. 19, R had to go back to December 2015 to retrieve 200 LePage tweets. For Trump, it only went back to late October 2016.

pl_time <- as.POSIXct(pl_df$created, format="%Y-%m-%d %H%M")
range(pl_time)

## [1] "2015-12-07 18:12:05 UTC" "2016-11-11 14:08:54 UTC"

dt_time <- as.POSIXct(dt_df$created, format="%Y-%m-%d %H%M")
range(dt_time)

## [1] "2016-10-25 21:03:40 UTC" "2016-11-19 13:56:30 UTC"

Next, I broke each politician’s tweets down into their component words using the code from our lecture.

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
lepage_words <- pl_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
trump_words <- dt_df %>%
  filter(!str_detect(text, '^"')) %>%
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&amp;", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) %>%
  filter(!word %in% stop_words$word,
         str_detect(word, "[a-z]"))

Here, I group LePage’s most used words and chart the top 10 to provide a flavor of what his language looks like.

lepage_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n))

LePage’s most common words

lepage_table <- lepage_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)

## Selecting by n

kable(lepage_table)

word	n
#mepolitics	174
maine	41
pleased	29
hall	25
town	24
talking	15
veterans	15
students	13
energy	12
questions	12

A hashtag — #mepolitics — is, by far, his most common word. But that will be filtered out of the analysis later. His other words are quite benign, such as “Maine,” “pleased,”veterans" and “questions.”

trump_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n))

Trump’s most common words

trump_table <- trump_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)

## Selecting by n

kable(trump_table)

word	n
watch	25
join	22
vote	20
tomorrow	19
#draintheswamp	17
time	15
#icymi	14
hillary	14
win	14
#maga	12
clinton	12
ohio	12

Trump’s words are more diverse, with “Watch” narrowly in the No. 1 spot. But both “Hillary” and “Clinton” also show up, indicating likely negative mentions that won’t show up in our analysis, but matter in the larger scope of his language.

Then, I imported the database that connects words with 10 sentiments: Anger, anticipation, disgust, fear, joy, negativity, positivity, sadness, surprise and trust.

nrc <- sentiments %>%
  filter(lexicon == "nrc") %>%
  select(word, sentiment)

Then, I joined the Twitter databases for both politicians with the sentiment database. This weeds out the proper names and hashtags that I flagged earlier.

lepage_sentiment <- lepage_words %>%
  inner_join(nrc, by = "word")

trump_sentiment <- trump_words %>%
  inner_join(nrc, by = "word")

In this step, I grouped the data by sentiment to summarize it, providing charts for the number of tweets in which both politician used words that carry each of our 10 sentiments.

lepage_sentiment %>% group_by(sentiment) %>%
  summarize(n = n())

LePage sentiments

lepage_chart <- lepage_sentiment %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))
kable(lepage_chart)

sentiment	n
positive	233
trust	161
joy	108
anticipation	100
negative	91
anger	52
sadness	49
fear	45
surprise	32
disgust	19

trump_sentiment %>% group_by(sentiment) %>%
  summarize(n = n())

Trump sentiments

trump_chart <- trump_sentiment %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))
kable(trump_chart)

sentiment	n
positive	155
anticipation	121
trust	104
negative	90
fear	78
joy	65
anger	56
sadness	54
surprise	41
disgust	31

We see that LePage used positive words 78 more times than Trump over this 200-tweet span. Their use of negative words was similar, but Trump used fearful words nearly twice as much as LePage. Trump also used more words associated with anger and sadness.

Next, we prepare the data for visualization by LePage and Trump’s name as a grouping variable.

lepage_sentiment$Politician <- "LePage"
trump_sentiment$Politician <- "Trump"
lpdt_sentiments <- rbind(lepage_sentiment, trump_sentiment)

To conclude, I plotted their use of words different sentiment as a share of their overall tweets. I used dark shading and turned the words on the X-axis to make the graph easier to read.

final_sent <- lpdt_sentiments %>% 
  group_by(Politician, sentiment) %>% 
  summarize(n = n()) %>%
  mutate(frequency = n / sum(n))

ggplot(final_sent, aes(x = sentiment, y = frequency, fill = Politician)) +
  geom_bar(stat = "identity", position = "dodge") + scale_fill_hue(l=30) +
  xlab("Sentiment") +
  ylab("Percent of tweets") +
  theme(axis.text.x = element_text(angle = 40, hjust = 1))