The R code for this post can be found here. A less technical version of this post without code snippets can be found here.
How do presidential candidates differ in their use of Twitter to publicize policy positions, garner support, and participate in the political arena? Are certain types of Tweets more effective at attracting support, as evidenced by a greater number of Tweet interactions?
One way to explore these questions is by using sentiment analysis, or the process of computationally categorizing opinions and emotionality in a piece of text. Some data scientists, for example, have used this technique to argue that Donald Trump’s Twitter timeline is populated by different Tweets from different people.
Sentiment analysis has yet to be utilized to explore how different candidates use Twitter in different ways. By utilizing sentence-level sentiment analysis on individual Tweets as well as word-by-word analysis using the NRC Word-Emotion Association Lexicon, we’re able to explore these questions. The analysis leverages the power of the sentimentr package to better understand what types of emotions “sell” on Twitter and how candidates use the platform as a mechanism to attract political support.
library(dplyr)
library(purrr)
library(twitteR)
library(tidyr)
library(lubridate)
library(scales)
library(tidyverse)
library(ggplot2)
library(tidytext)
library(readr)
library(sentimentr)
library(dplyr)
library(syuzhet)
library(broom)
library(cleanNLP)
library(textclean)
library(ggrepel)
library(topicmodels)
library(tm)
library(stm)
library(quanteda)
library(knitr)
We begin our analysis with the data collection process. We use Python to gather a massive amount of Tweets (37,218!) from 12 presidential candidates’ Twitter feeds. Specifically, we use two different methods for pulling Tweets. The main method involved using Twitter’s Tweepy Python library. We created an API that allowed us to pull various information about the most recent 3200 tweets by a user. The API link could be found here.
In a later step, we will go further back in time to pull Tweets from 2016 presidential candidates for a comparative analysis. In order to do this, we use a public GitHub repository that creates a scraper to pull candidate Tweets. The scraper automatically scrolls through the inputed date range on a given candidate’s website and mines data while scrolling.
You can find further documentation of our data pulling process on our Github.
The data we pull contains the information we need but is not quite ready for analysis.
uncleaned <- read_csv("customTwitterUsersCombinedUncleaned.csv")
kable(head(uncleaned$text))
| x |
|---|
| @samanthapetrick Hey Rocco, happy 9th birthday! Sorry I missed the party. I hope you got some ice cream with that #Biden2020 cookie. |
| Congratulations to the Duke and Duchess of Sussex, Prince Harry and Meghan Markle, on the birth of their baby boy! The truth is there’s no better job than being a parent. Jill and I wish you the best of luck on your next chapter. |
| On Saturday, Jill and I visited Columbia, South Carolina, to discuss how we’ll build an inclusive middle class where everyone — regardless of race, gender, sexual orientation, religion or disability — comes along. Thank you to everyone who joined us, we’ll be back soon! https://t.co/jly3piXzPE |
| Nevada, we’re taking our campaign to Henderson tomorrow, May 7th! Head to https://t.co/zTS0WldilG to RSVP and come join us. |
| On this first day of Ramadan, Jill and I wish peace and happiness to Muslims around the world celebrating the holy month. Ramadan Kareem. |
| Our country may seem more divided than ever, but I know our best days are still ahead. It’s time we lift our heads up and remember who we are. If you’re ready to get to work, join our campaign today. https://t.co/lyhRyWPhvS |
Specifically, the presence of links and images may cause some problems as they are high in frequency but provide no insights into Tweet sentiment. Moreover, Tweets often include a large number of Twitter-specific characters such as @s and #s.
We can clean this type of text using some RegEx and the stringr package. Then, using the tweetBinary column in our dataset, we filter out Retweets (because we are interested in candidates’ sentiment, not their followers’).
cleaned <- uncleaned %>%
filter(!str_detect(text, '^"')) %>%
mutate(text = str_replace_all(text, "http://t.co/[A-Za-z\\d]+|&", "")) %>%
mutate(text = str_replace_all(text, "@[A-Za-z0-9_]+[A-Za-z0-9-_]+", "")) %>%
mutate(text = str_replace_all(text, "@", "")) %>%
mutate(text = str_replace_all(text, "—", " ")) %>%
mutate(text = str_replace_all(text, "http\\S+", "")) %>%
mutate(createdDate = parse_date_time(createdDate, 'mdy')) %>%
filter(!is.na(text)) %>%
filter(text!="")
cleaned$candidate <- recode(cleaned$candidate, "amyklobuchar"="Amy Klobuchar")
cleaned$candidate <- recode(cleaned$candidate, "ewarren"="Elizabeth Warren")
cleaned$candidate <- recode(cleaned$candidate, "BetoORourke"="Beto O'Rourke")
cleaned$candidate <- recode(cleaned$candidate, "JoeBiden"="Joe Biden")
cleaned$candidate <- recode(cleaned$candidate, "JulianCastro"="Julian Castro")
cleaned$candidate <- recode(cleaned$candidate, "TulsiGabbard"="Tulsi Gabbard")
cleaned$candidate <- recode(cleaned$candidate, "PeteButtigieg"="Pete Buttigieg")
cleaned$candidate <- recode(cleaned$candidate, "KamalaHarris"="Kamala Harris")
cleaned$candidate <- recode(cleaned$candidate, "BernieSanders"="Bernie Sanders")
cleaned$candidate <- recode(cleaned$candidate, "CoryBooker"="Cory Booker")
cleaned$candidate <- recode(cleaned$candidate, "AndrewYang"="Andrew Yang")
cleaned$candidate <- recode(cleaned$candidate, "SenGillibrand"="Kirsten Gillibrand")
cleanedtweets <- cleaned %>%
filter(tweetBinary == 1)
cleanedretweets <- cleaned %>%
filter(tweetBinary == 0)
This leaves us with 28,347 cleaned Tweets:
| candidate | id | text | createdDate | favoriteCount | retweetCount | tweetBinary |
|---|---|---|---|---|---|---|
| Joe Biden | 1.12554e+18 | Hey Rocco, happy 9th birthday! Sorry I missed the party. I hope you got some ice cream with that #Biden2020 cookie. | 2019-05-06 | 39 | 5 | 1 |
| Joe Biden | 1.12553e+18 | Congratulations to the Duke and Duchess of Sussex, Prince Harry and Meghan Markle, on the birth of their baby boy! The truth is there’s no better job than being a parent. Jill and I wish you the best of luck on your next chapter. | 2019-05-06 | 13842 | 948 | 1 |
| Joe Biden | 1.12552e+18 | On Saturday, Jill and I visited Columbia, South Carolina, to discuss how we’ll build an inclusive middle class where everyone regardless of race, gender, sexual orientation, religion or disability comes along. Thank you to everyone who joined us, we’ll be back soon! | 2019-05-06 | 2783 | 410 | 1 |
| Joe Biden | 1.12550e+18 | Nevada, we’re taking our campaign to Henderson tomorrow, May 7th! Head to to RSVP and come join us. | 2019-05-06 | 1059 | 213 | 1 |
| Joe Biden | 1.12520e+18 | On this first day of Ramadan, Jill and I wish peace and happiness to Muslims around the world celebrating the holy month. Ramadan Kareem. | 2019-05-06 | 14876 | 1754 | 1 |
| Joe Biden | 1.12518e+18 | Our country may seem more divided than ever, but I know our best days are still ahead. It’s time we lift our heads up and remember who we are. If you’re ready to get to work, join our campaign today. | 2019-05-05 | 3631 | 595 | 1 |
We begin with a preliminary look at the structure of our data. First, how equal is the distribution of data among candidates?
cleanedtweets %>%
group_by(candidate) %>%
distinct(id, .keep_all = TRUE) %>%
ggplot(aes(x=forcats::fct_infreq(candidate))) +
geom_bar(stat="count") +
coord_flip() +
labs(x="Number of Tweets",
y="Candidate")
Our dataset is lacking in Tweets from Julian Castro and Joe Biden, and to some extent Cory Booker. This is mostly because those candidates don’t Tweet as much; Joe Biden’s Twitter account only has 1,700 Tweets. But it’s also because some candidates Retweet more than others, and we filter those out.
Next, how does the distribution of Tweets over time look?
cleanedtweets %>%
ggplot(aes(x=month(createdDate))) +
geom_histogram(position = "identity", bins = 12, show.legend = FALSE) +
labs(x="Month of Creation",
y="Number of Tweets") +
scale_x_continuous(name = "Month", breaks=c(3,6,9,12), labels=c("Mar.", "June","Sept.", "Dec."))
cleanedtweets %>%
ggplot(aes(x = month(createdDate), fill = candidate)) +
geom_histogram(position = "identity", bins = 12, show.legend = FALSE) +
facet_wrap(~candidate, scales = "free") +
labs(x="Month of Creation",
y="Number of Tweets") +
scale_x_continuous(name = "Month", breaks=c(3,6,9,12), labels=c("Mar.", "June","Sept.", "Dec.")) +
theme(axis.text.x = element_text(size = 7))
From this, it is obvious that Andrew Yang Tweets way more than everyone else, given that most of his tweets (>2000) are from the last three months (February, March, and April).
Following this preliminary look at our data, we can begin with what we’re really interested in: sentiment analysis of candidate Tweets.
We initially performed our analysis using the syuzhet package in R, but found its output to be imprecise and misleading. This was because the syuzhet library was specifically made for sentiment analysis of fiction text (and the domain-specifity of sentiment analyses is very important). We also ran into problems because the syuzhet library failed to account for valence shifters (such as “not”, “very”, or “doesn’t”).
The sentimentr package addresses both of these concerns; it is fine-tuned for sentence-level analysis, has been used on Tweets in prior analyses, and it accounts for contextual valence shifters.
As an example, the syuzhet package would code the sentence “I don’t hate ice cream” as relatively negative, because the keyword hate determines the overall polarity of the sentence. The sentimentr package accounts for the negator don’t and reverses the polarity score to be slightly positive. For a more thorough review of sentimentr, see this post from its creator.
Let’s add sentiment to the dataset:
tweets <- cleanedtweets %>%
sentimentr::get_sentences() %>%
sentimentr::sentiment(polarity_dt = lexicon::hash_sentiment_jockers_rinker,
valence_shifters_dt = lexicon::hash_valence_shifters) %>%
group_by(element_id) %>%
mutate(text = add_comma_space(text)) %>%
mutate(tweet = paste(text, collapse=" ")) %>%
ungroup() %>%
mutate(mentiontrump = ifelse(str_detect(tweet, "Trump")==TRUE,yes=1,no=0)) %>%
mutate(text = replace_white(text)) %>%
mutate(tweet = replace_white(tweet)) %>%
group_by(tweet) %>%
mutate(tweetlevelsentiment = mean(sentiment)) %>%
ungroup %>%
filter(!is.na(word_count)) %>%
filter(text!=" ") %>%
mutate(sentimentbinary = ifelse(sentiment>0,"Positive",
ifelse(sentiment<0,"Negative","Neutral"))) %>%
mutate(tweetsentimentbinary = ifelse(tweetlevelsentiment>0,"Positive",
ifelse(tweetlevelsentiment<0,"Negative","Neutral"))) %>%
mutate(distancefromzero = abs(sentiment)) %>%
select(candidate, id, tweet, text, sentiment, sentimentbinary,
tweetlevelsentiment, tweetsentimentbinary, distancefromzero,
favoriteCount, retweetCount, createdDate, mentiontrump,
element_id, sentence_id, word_count)
summary(tweets)
## candidate id tweet
## Length:60670 Min. :3.614e+08 Length:60670
## Class :character 1st Qu.:9.794e+17 Class :character
## Mode :character Median :1.046e+18 Mode :character
## Mean :1.007e+18
## 3rd Qu.:1.097e+18
## Max. :1.126e+18
## text sentiment sentimentbinary
## Length:60670 Min. :-1.68846 Length:60670
## Class :character 1st Qu.: 0.00000 Class :character
## Mode :character Median : 0.04588 Mode :character
## Mean : 0.10259
## 3rd Qu.: 0.28868
## Max. : 2.07458
## tweetlevelsentiment tweetsentimentbinary distancefromzero
## Min. :-1.22202 Length:60670 Min. :0.00000
## 1st Qu.:-0.02614 Class :character 1st Qu.:0.02132
## Median : 0.09133 Mode :character Median :0.18549
## Mean : 0.10039 Mean :0.23045
## 3rd Qu.: 0.23024 3rd Qu.:0.35355
## Max. : 1.89171 Max. :2.07458
## favoriteCount retweetCount createdDate
## Min. : 0 Min. : 0 Min. :2007-10-24 00:00:00
## 1st Qu.: 333 1st Qu.: 83 1st Qu.:2018-03-29 00:00:00
## Median : 1255 Median : 319 Median :2018-09-29 00:00:00
## Mean : 5414 Mean : 1418 Mean :2018-06-12 14:50:52
## 3rd Qu.: 4400 3rd Qu.: 1140 3rd Qu.:2019-02-16 00:00:00
## Max. :689487 Max. :265015 Max. :2019-05-07 00:00:00
## mentiontrump element_id sentence_id word_count
## Min. :0.00000 Min. : 1 Min. : 1.000 Min. : 1.00
## 1st Qu.:0.00000 1st Qu.: 7621 1st Qu.: 1.000 1st Qu.: 6.00
## Median :0.00000 Median :14213 Median : 2.000 Median :11.00
## Mean :0.05032 Mean :14244 Mean : 1.851 Mean :12.58
## 3rd Qu.:0.00000 3rd Qu.:20889 3rd Qu.: 2.000 3rd Qu.:18.00
## Max. :1.00000 Max. :28347 Max. :11.000 Max. :68.00
The above code creates a few variables to determine sentiment:
First, we can explore overall sentiment by depicting what proportion of sentences fall into the categories of negative (sentiment < 0), positive (sentiment > 0), or neutral (sentiment = 0).
tweets %>%
ggplot(aes(x=sentimentbinary, fill=sentimentbinary)) +
geom_bar(stat="count", show.legend = FALSE) +
labs(x="Sentiment",
y="Number of Sentences",
title = "Frequency of Sentiment by Sentence")
Individual sentences tend to be overwhelmingly positive. Does a different trend arise when we look at entire Tweets (by averaging the scores of each individual sentence)?
tweets %>%
distinct(id, .keep_all = TRUE) %>%
ggplot(aes(x=tweetsentimentbinary, fill=tweetsentimentbinary)) +
geom_bar(stat="count", show.legend = FALSE) +
labs(x="Sentiment",
y="Number of Tweets",
title = "Frequency of Sentiment by Tweet")
This reveals that there is less neutrality in Tweets than there is in sentences. This makes sense when we consider the likelihood of a Tweet being entirely composed of neutral sentences (sentiment = 0) is relatively low.
An additional question is if sentiment differs by candidate.
tweets %>%
distinct(id, .keep_all = TRUE) %>%
group_by(candidate, tweetsentimentbinary) %>%
summarise(n = n()) %>%
mutate(percentsentiment = 100*(n/sum(n))) %>%
ggplot(aes(x=tweetsentimentbinary, y= percentsentiment, fill=tweetsentimentbinary)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ candidate) +
labs(x="Sentiment",
y="Number of Tweets",
title = "Frequency of Sentiment by Tweet")
Evidently, Kamala Harris, Kirsten Gillibrand, and Bernie Sanders are almost never neutral. By contrast, Joe Biden, Andrew Yang, and Beto O’Rourke tend to be neutral more often than most candidates.
Aside from the binary classification of positive/negative, we may also be curious which candidates tend to be the most positive and the most negative:
tweets %>%
group_by(candidate) %>%
summarise(sentiment = mean(sentiment)) %>%
arrange(desc(sentiment)) %>%
ggplot(aes(x = reorder(candidate, sentiment), y = sentiment, fill=candidate)) +
geom_col(show.legend = FALSE) +
xlab("Candidate") +
ylab("Sentiment") +
ggtitle("Average Sentiment by Candidate", subtitle = "With higher scores indicating more positive content") +
coord_flip()
There is considerable variance in positivity across candidates. Pete Buttigieg’s Tweets tend to be, on average, 3.74x more positive than those from Kamala Harris.
After exploring how sentiment differs by candidate, we can perform additional analyses by looking at the individual words a candidate uses. The inspiration for this decision was the book Text Mining with R, written by Julia Silge and David Robinson.
We use the NRC Word-Emotion Association Lexicon, which is a crowdsourced list of 14,182 words and their most frequent association with eight emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust). This allows us to probe further into candidate Twitter feeds to see how candidates differ in their use of specific emotions.
We create the tweet_words dataset with the unnest_tokens function found in the tidytext package. We then inner join the dataset to the NRC lexicon under the name tweet_words_w_nrc.
reg <- "([^A-Za-z\\d#@']|'(?![A-Za-z\\d#@]))"
tweet_words <- tweets %>%
unnest_tokens(word, text, token = "regex", pattern = reg, drop = FALSE) %>%
filter(!word %in% stop_words$word,
str_detect(word, "[a-z]"))
nrc <- sentiments %>%
filter(lexicon == "nrc") %>%
select(word, sentiment)
colnames(nrc)[colnames(nrc)=="sentiment"] <- "wordsentiment"
tweet_words_w_nrc <- tweet_words %>%
inner_join(nrc, by = "word")
One interesting question is which words garner the greatest Tweet attention, as evidenced by Favorites and Retweets. We answer this question by finding the median number of interactions (Retweets + Favorites) for non-stop-words (e.g. “the”, “of”, and “are”) and plotting the most popular.
word_by_interactions <- tweet_words %>%
group_by(id, word, candidate) %>%
mutate(interactions = favoriteCount+retweetCount) %>%
summarise(interactions = first(interactions)) %>%
group_by(candidate, word) %>%
summarise(interactions = median(interactions), uses = n()) %>%
filter(interactions != 0) %>%
ungroup()
word_by_interactions %>%
filter(uses > 5) %>%
group_by(candidate) %>%
top_n(10, interactions) %>%
arrange(interactions) %>%
ungroup() %>%
mutate(word = factor(word, unique(word))) %>%
ungroup() %>%
ggplot(aes(word, interactions, fill = candidate)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ candidate, scales = "free") +
coord_flip() +
labs(x = NULL,
y = "Median # of Interactions for Tweets containing each word",
title = "Most Popular Words Used on Twitter") +
theme(axis.text.x = element_text(size = 6, angle = 20),
axis.text.y = element_text(size = 5),
strip.text.x = element_text(size = 8))
The plot reveals some interesting insights about what topics are especially popular for certain candidates. It seems as if Tulsi Gabbard performs best when she mentions issues related to international security (terrorism, waging, jihadists, genocidal). Cory Booker received the most Retweets and Favorites during the Kavanaugh hearings in September 2018. Joe Biden is an especially interesting case; nearly all of his most popular words come from one phrase: “a battle for the soul of this nation.” He uses that phrase—or some variation of it—a lot.
More generally, words tangentially related to some legal issue in the Trump administration tend to perform the best on Twitter:
word_by_interactions %>%
filter(uses > 5) %>%
top_n(10, interactions) %>%
arrange(interactions) %>%
ungroup() %>%
mutate(word = factor(word, unique(word))) %>%
ungroup() %>%
ggplot(aes(word, interactions)) +
geom_col(show.legend = FALSE) +
coord_flip() +
labs(x = NULL,
y = "Median # of Interactions for Tweets containing each word",
title = "Most Popular Words Used on Twitter")
By incorporating the NRC lexicon, we can see if any specific emotions “sell.” In other words, we can see if some emotions outperform others when it comes to getting Tweet attention.
candidate_interactions_by_sentiment <- tweet_words_w_nrc %>%
group_by(id, wordsentiment, candidate) %>%
mutate(interactions = favoriteCount+retweetCount) %>%
summarise(interactions = first(interactions)) %>%
group_by(candidate, wordsentiment) %>%
summarise(interactions = mean(interactions), uses = n()) %>%
ungroup()
candidate_interactions_by_sentiment %>%
group_by(candidate) %>%
ggplot(aes(x=reorder(wordsentiment, interactions), y=interactions, fill=wordsentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(~ candidate, scales = "free_x") +
coord_flip() +
ylab("Average # of Interactions") +
xlab(element_blank()) +
ggtitle("Popularity of Tweet Sentiments") +
theme(axis.text.x = element_text(size = 6, angle = 20),
axis.text.y = element_text(size = 6))
candidate_interactions_by_sentiment %>%
group_by(wordsentiment) %>%
summarise(meaninteractions = mean(interactions)) %>%
ggplot(aes(x=reorder(wordsentiment, meaninteractions), y=meaninteractions, fill=wordsentiment)) +
geom_col(show.legend = FALSE) +
coord_flip() +
ylab("Average # of Interactions") +
xlab(element_blank()) +
ggtitle("Popularity of Tweet Sentiments")
The results from this are really consistent: negative emotions—like disgust, anger, sadness, and fear—sell.
Even when we stop looking at specific emotions and turn back to our initial sentiment scale of -2 to 2, we find the trend is consistent and statistically significant.
tweets %>%
distinct(id, .keep_all = TRUE) %>%
mutate(interactions = favoriteCount + retweetCount) %>%
filter(interactions < 90000) %>%
filter(sentiment<1.5) %>%
ggplot(aes(x=tweetlevelsentiment, y=interactions)) +
geom_point(alpha = .05) +
geom_smooth(method=lm, se=TRUE) +
ylab("Number of Interactions") +
xlab("Sentiment Score") +
ggtitle("Relationship Between Tweet Sentiment and Interactions")
A linear regression of Tweet interactions by sentiment reveals that a one unit increase in sentiment results in a loss of 4492 Tweet interactions (p < 0.0001).
A final topic of interest is how Donald Trump affects the sentiment of candidate Tweets. Knowing that Donald Trump is a unique president with uniquely low levels of popularity, one may hypothesize that Tweets focusing on or mentioning the president may be different from most others.
Earlier, we created the variable mentiontrump using the str_detect function from the stringr package. By analyzing differences between Tweets which mention Trump and those that don’t, we can understand how the “Trump variable” impacts Tweet behavior and popularity.
First, who talks about Trump? Are some candidates (e.g. those who need to access Trump’s base in order to secure the Democratic nomination) less likely to attack the president than others?
tweets %>%
distinct(id, .keep_all = TRUE) %>%
group_by(candidate) %>%
summarise(mentiontrump = sum(mentiontrump==1),
totaltweets = n(),
percent = 100*(mentiontrump/totaltweets)) %>%
ggplot(aes(x=reorder(candidate, percent), y=percent, fill = candidate)) +
geom_col(show.legend = FALSE) +
xlab(element_blank()) +
ylab("Percent of Tweets Mentioning Trump") +
ggtitle("Candidate Frequency of Mentioning Trump") +
coord_flip()
Joe Biden, who pitches himself as a moderate and will likely see electoral success by engaging “Never Trump” voters and moderate conservatives, mentions Trump much less frequently than more “radical” candidates like Bernie Sanders.
Next, how does sentiment change in Tweets that mention Trump compared to those that don’t?
tweets %>%
group_by(mentiontrump) %>%
summarise(meansentiment = mean(sentiment)) %>%
ggplot(aes(x=mentiontrump, y=meansentiment, fill=mentiontrump)) +
geom_col(show.legend = FALSE) +
ggtitle("Average Sentiment", subtitle = "Tweets that mention Trump vs those that don't") +
ylab("Sentiment") +
xlab(element_blank()) +
scale_x_continuous(breaks=c(0,1), labels = c("Does Not Mention", "Does Mention"))
Unsurprisingly, Tweets about Donald Trump are significantly more negative than those about other topics.
Again, our word-by-word analysis allows us to get more fine-tuned than this simple binary classification:
tweet_words_w_nrc %>%
mutate(mentiontrump = as.factor(mentiontrump)) %>%
group_by(mentiontrump, wordsentiment) %>%
summarise(n = n()) %>%
mutate(percentsentiment = 100*(n/sum(n))) %>%
ggplot(aes(group=mentiontrump, x=reorder(wordsentiment,percentsentiment),
fill=mentiontrump, y=percentsentiment)) +
geom_bar(position=position_dodge(), stat="identity") +
scale_fill_discrete(name="Mentions Trump",
breaks=c(0,1),
labels=c("No", "Yes")) +
labs(x=element_blank(),
y = "Percent of Words",
title = "Sentiment as a Proportion of Overall Tweets",
subtitle = "Comparing Tweets mentioning Trump with those that don't")
The trend holds true: positive emotions like joy, trust, and anticipation are less common in Tweets that mention Donald Trump. The corollary is also true. Negative emotions such as disgust, sadness, anger, and fear are more prevalent in Tweets that mention the president.
Aside from emotion, does the “Trump variable” affect Tweet popularity? One may hypothesize that Trump’s historically low popularity may lend itself to higher popularity in Tweets which disparage him.
tweets %>%
distinct(id, .keep_all = TRUE) %>%
mutate(interactions = retweetCount+favoriteCount) %>%
mutate(mentiontrump = as.factor(mentiontrump)) %>%
group_by(mentiontrump) %>%
summarise(interactions = mean(interactions)) %>%
ggplot(aes(x=mentiontrump, y=interactions, fill=mentiontrump)) +
geom_col(show.legend = FALSE) +
scale_x_discrete(breaks=c(0,1), labels = c("Does Not Mention", "Does Mention")) +
labs(x="Mentions Trump",
y = "Average # of Interactions",
title = "Average Tweet Popularity",
subtitle = "Comparing Tweets mentioning Trump with those that don't")
Tweets which mention the president tend to get 2.2x the number of interactions (13,087) than those which don’t (5,912).
We want to continue this work by expanding upon current analyses of 2020 presidential candidates. We are curious about how other variables (such as position in race) affect the use of certain emotions and sentiment, or what kind of differences arise when comparing candidates of different demographic backgrounds.
Finally, we would like to see how Twitter has evolved as a platform by using 2016 presidential candidates’ Twitter feeds as a comparison. We are curious if Tweet content has changed in a way which reflects general trends in politics, such as growing partisanship, heightened polarization, and more identity-centric notions of political identity.