Maine Gov. Paul LePage endorsed Donald Trump in February, saying, “I was Donald Trump before Donald Trump became popular,” a reference to their similar, controversial politics.
But that doesn’t hold for their use of Twitter. While Trump is well-known for his bombastic use of the platform that he has said may continue as president, LePage’s account is more rote. He also uses words that are more associated with positivity and less associated with negative sentiments, including anger and fear, according to a review of recent tweets.
First, I loaded all necessary packages and connected to the Twitter API.
library(twitteR)
library(tidytext)
library(stringr)
library(ggplot2)
library(dplyr)
library(knitr)
To keep the data manageable and comparable between LePage and Trump, I only pulled down their last 200 tweets each. Because of certain conditions with the “userTimeline” command, this resulted in less than 200 tweets for each politician. I used the function to exclude retweets and replies to focus on their chosen language.
num_tweets <- 200
lepage_tweets <- userTimeline('@Governor_LePage', n = 200, includeRts = FALSE, excludeReplies = TRUE)
trump_tweets <- userTimeline('@realDonaldTrump', n = 200, includeRts = FALSE, excludeReplies = TRUE)
Those raw tweets now need to go into a data frame. I made one for each politician. Those will serve as the base datasets that I narrow down.
pl_df <- twListToDF(lepage_tweets)
dt_df <- twListToDF(trump_tweets)
It’s worth noting that Trump tweets far more than LePage. Upon a classmate’s suggestion, I made a time variable and did a simple range output. When I updated this analysis on Nov. 19, R had to go back to December 2015 to retrieve 200 LePage tweets. For Trump, it only went back to late October 2016.
pl_time <- as.POSIXct(pl_df$created, format="%Y-%m-%d %H%M")
range(pl_time)
## [1] "2015-12-07 18:12:05 UTC" "2016-11-11 14:08:54 UTC"
dt_time <- as.POSIXct(dt_df$created, format="%Y-%m-%d %H%M")
range(dt_time)
## [1] "2016-10-25 21:03:40 UTC" "2016-11-19 13:56:30 UTC"
Next, I broke each politician’s tweets down into their component words using the code from our lecture.
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
lepage_words <- pl_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
trump_words <- dt_df %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|&", "")) %>%
unnest_tokens(word, text, token = "regex", pattern = reg) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
Here, I group LePage’s most used words and chart the top 10 to provide a flavor of what his language looks like.
lepage_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n))
lepage_table <- lepage_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
## Selecting by n
kable(lepage_table)
word | n |
---|---|
#mepolitics | 174 |
maine | 41 |
pleased | 29 |
hall | 25 |
town | 24 |
talking | 15 |
veterans | 15 |
students | 13 |
energy | 12 |
questions | 12 |
A hashtag — #mepolitics — is, by far, his most common word. But that will be filtered out of the analysis later. His other words are quite benign, such as “Maine,” “pleased,”veterans" and “questions.”
trump_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n))
trump_table <- trump_words %>% group_by(word) %>% summarize(n = n()) %>% arrange(desc(n)) %>% top_n(10)
## Selecting by n
kable(trump_table)
word | n |
---|---|
watch | 25 |
join | 22 |
vote | 20 |
tomorrow | 19 |
#draintheswamp | 17 |
time | 15 |
#icymi | 14 |
hillary | 14 |
win | 14 |
#maga | 12 |
clinton | 12 |
ohio | 12 |
Trump’s words are more diverse, with “Watch” narrowly in the No. 1 spot. But both “Hillary” and “Clinton” also show up, indicating likely negative mentions that won’t show up in our analysis, but matter in the larger scope of his language.
Then, I imported the database that connects words with 10 sentiments: Anger, anticipation, disgust, fear, joy, negativity, positivity, sadness, surprise and trust.
nrc <- sentiments %>%
filter(lexicon == "nrc") %>%
select(word, sentiment)
Then, I joined the Twitter databases for both politicians with the sentiment database. This weeds out the proper names and hashtags that I flagged earlier.
lepage_sentiment <- lepage_words %>%
inner_join(nrc, by = "word")
trump_sentiment <- trump_words %>%
inner_join(nrc, by = "word")
In this step, I grouped the data by sentiment to summarize it, providing charts for the number of tweets in which both politician used words that carry each of our 10 sentiments.
lepage_sentiment %>% group_by(sentiment) %>%
summarize(n = n())
lepage_chart <- lepage_sentiment %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))
kable(lepage_chart)
sentiment | n |
---|---|
positive | 233 |
trust | 161 |
joy | 108 |
anticipation | 100 |
negative | 91 |
anger | 52 |
sadness | 49 |
fear | 45 |
surprise | 32 |
disgust | 19 |
trump_sentiment %>% group_by(sentiment) %>%
summarize(n = n())
trump_chart <- trump_sentiment %>% group_by(sentiment) %>% summarize(n = n()) %>% arrange(desc(n))
kable(trump_chart)
sentiment | n |
---|---|
positive | 155 |
anticipation | 121 |
trust | 104 |
negative | 90 |
fear | 78 |
joy | 65 |
anger | 56 |
sadness | 54 |
surprise | 41 |
disgust | 31 |
We see that LePage used positive words 78 more times than Trump over this 200-tweet span. Their use of negative words was similar, but Trump used fearful words nearly twice as much as LePage. Trump also used more words associated with anger and sadness.
Next, we prepare the data for visualization by LePage and Trump’s name as a grouping variable.
lepage_sentiment$Politician <- "LePage"
trump_sentiment$Politician <- "Trump"
lpdt_sentiments <- rbind(lepage_sentiment, trump_sentiment)
To conclude, I plotted their use of words different sentiment as a share of their overall tweets. I used dark shading and turned the words on the X-axis to make the graph easier to read.
final_sent <- lpdt_sentiments %>%
group_by(Politician, sentiment) %>%
summarize(n = n()) %>%
mutate(frequency = n / sum(n))
ggplot(final_sent, aes(x = sentiment, y = frequency, fill = Politician)) +
geom_bar(stat = "identity", position = "dodge") + scale_fill_hue(l=30) +
xlab("Sentiment") +
ylab("Percent of tweets") +
theme(axis.text.x = element_text(angle = 40, hjust = 1))
LePage uses Twitter far less than Trump, but his use is more positive and less negative on a variety of variables compared to Trump’s. This may be more of a reflection on who’s running their accounts and the difference in how they reflect the two politicians, because their politics aren’t so different.
Photo credit: Bangor Daily News