On Septempter 19th, 2017, Chuck Todd hosted a debate between Republican Ed Gillespie and Democrat Ralph Northam. This analysis will use the tidytext
, rtweet
, ggplot
, dplyr
, and a few other packages to analyze Twitter reactions through the #VAGov and #VAGovDebate hashtags. As a first step, we need to load (and in some cases download) the necessary R packages.
# install.packages("rtweet")
library(rtweet)
library(tidytext)
library(dplyr)
library(ggplot2)
# devtools::install_github('yaztheme','joshyazman')
library(yaztheme)
library(stringr)
Now I need to set up access to the Twitter API (my credentials are removed, but you can set up a developers account here).
appname <- "redacted"
key <- "redacted"
secret <- "redacted"
twitter_token <- create_token(
app = appname,
consumer_key = key,
consumer_secret = secret)
Now that the API key is set up, I’ll use the search_tweets
function to search for any tweets that use either “#VAGovDebate” or “VAGov”. I want to make sure I get as many tweets as possible, so I set n = 18000
, the maximum number allowed by the API at any one time.
tweets_raw <- search_tweets(q = "#VAGovDebate OR #VAGov", n = 18000)
The tweets come back in the form of a nice, tidy dataframe, which is great! But I don’t need all of the tweets so I’m going to limit the data to tweets from the day of the debate and cut down some columns I don’t need using the select()
, %>%
, and filter()
functions from the dplyr
package.
str(tweets)
'data.frame': 0 obs. of 7 variables:
$ screen_name : Factor w/ 2664 levels "__kelsey","_AlexThomps",..:
$ created_at : Factor w/ 5355 levels "2017-09-19 04:42:19",..:
$ text : Factor w/ 3794 levels "'Atta boy .@chucktodd - couldn't resist a cheap shot @POTUS. Sleazy. #VAGovDebate",..:
$ retweet_count : int
$ favorite_count : int
$ mentions_screen_name: Factor w/ 958 levels "_AlexThomps NOVAChamber",..:
$ hashtags : Factor w/ 504 levels "3rdPerson VAGovDebate",..:
Mentions
Now we have a clean, rich set of tweets to analyze! First, I want to just look at tweet volume over the course of the day. I can do that with a line graph using ggplot. First I’ll aggregate tweets by minute and then plot them. Not surprisingly, there was a spike in tweets related to the debate during the debate!

Let’s dig a bit deeper and look at how many tweets mention each candidate. Again, I’ll create an aggregated dataframe of tweets by candidate by minute, but this time I need to take one extra step and create a flag variable for mentions of [@RalphNortham](https://twitter.com/RalphNortham) and [@EdWGillespie](https://twitter.com/EdWGillespie) or both.

Most of the time, people tweeting about the debate weren’t necessarily mentioning either or both candidates!, but there were some clear moments when Gillespie or Northam popped. I didn’t keep a timed transcript, though, so if you’re reading this and have some ideas of what the candidates were talking about at those times, let me know!
Word Distinctiveness and Sentiment Analysis
Lastly I want to look at tweet sentiments. The tidytext
package has four sentiment lexicons available. The bing
and nrc
lexicons both offer pretty good classifications of words as either positive or negative, although nrc
tends to err on the side of being overly positive. The nrc
lexicon also classifies words as indicative of “trust”, “fear”, “sadness”, “anger”, “surprise”, “positive”, “disgust”, “joy”, and “anticipation” as well. We can unnest tweet words the same way we did with hashtags and then use a simple join to append the positive/negative scores from the bing
lexicon and all of the sentiments from the nrc
lexicon.
First I want to look at tweet sentiment over time broken out by candidate mention. It looks like net sentiments were generally more positive before the debate than during or after, which makes sense because pre-debate tweets were mostly (anecdotally) about hyping one’s candidate rather than hitting the other. During the debate, everyone’s tweets got more negative, but Northam mentions remained the most positive.

To examine NRC sentiments, we calculate the percentage of words within each candidate mention grouping that match each potential sentiment. Anticipation pops across the board, likely because of tweets leading up the the debate. Aside from that, it’s notable that Gillespie leads slightly on Trust and Northam leads slightly on Joy. Tweets that mention both candidates are the angriest.
ggplot(nrc_tweet_words%>%
group_by(mins, candidate_mention, nrc)%>%
summarise(n = n())%>%
group_by(candidate_mention)%>%
mutate(pct = n/sum(n))%>%
ungroup(),
aes(x = nrc, y = pct, fill = candidate_mention))+
geom_bar(stat = 'identity')+
coord_flip()+
facet_wrap(~candidate_mention)+
theme_yaz()+
labs(y = 'Sentiment', x = element_blank(),
title = 'Tweet Sentiments by Candidate')+
scale_fill_manual(name = 'Candidate Mentioned', values = yaz_cols[c(4,2,3,1)])

