On March 15, I collected the most recent 3,000 tweets from each of the 14 people actively campaigning to be the Democratic Party’s nomination and Joe Biden. I chose to include Joe Biden because, although he hasn’t officially announced his candidacy, he has been the favorite of Democratic voters in many polls, and he has not said he will definitely not run either.
The 15 other candidates are: Amy Klobuchar, Andrew Yang, Bernie Sanders (although he is running as an independent), Beto O’Rourke, Cory Booker, Elizabeth Warren, Jack Hickenlooper, Jay Inslee, John Delaney, Julian Castro, Kamala Harris, Marianne Williamson, Pete Buttigieg, Kristen Gillibrand, and Tulsi Gabbard. Some of these candidates actively hold office and have multiple Twitter accounts (e.g. Elizabeth Warren has two accounts: @SenWarren and @ewarren); in those instances, I scraped the account that was related to their campaign.
For this project, I used the following packages: rtweet, tidyverse, tidytext, ggplot2, dplyr, and ggthemes. After connecting to the Twitter API, I executed the following code for each candidate.
jack_hickenlooper <- get_timeline("Hickenlooper", n=3000, retryonratelimit=TRUE)
Not every account had 3,000 tweets to collect, and in those cases, I just collected as many as possible. In total, 42,918 tweets were collected. Then, I filtered the tweets to included only tweets created since the beginning of 2019 — so as to avoid including tweets irrelevant to current events — and 8,610 tweets remained.
For each tweet, a lot of information in addition to the tweet was collected. Information such as whether or not the tweet contained a quote, if the tweet was a retweet, how many favorites the tweet received, how many times the tweet was retweeted, the hashtags the tweet contained, symbols the tweet contained and URLs the tweet contained. The column names are all listed below.
## [1] "user_id" "status_id"
## [3] "created_at" "screen_name"
## [5] "text" "source"
## [7] "display_text_width" "reply_to_status_id"
## [9] "reply_to_user_id" "reply_to_screen_name"
## [11] "is_quote" "is_retweet"
## [13] "favorite_count" "retweet_count"
## [15] "hashtags" "symbols"
## [17] "urls_url" "urls_t.co"
## [19] "urls_expanded_url" "media_url"
## [21] "media_t.co" "media_expanded_url"
## [23] "media_type" "ext_media_url"
## [25] "ext_media_t.co" "ext_media_expanded_url"
## [27] "ext_media_type" "mentions_user_id"
## [29] "mentions_screen_name" "lang"
## [31] "quoted_status_id" "quoted_text"
## [33] "quoted_created_at" "quoted_source"
## [35] "quoted_favorite_count" "quoted_retweet_count"
## [37] "quoted_user_id" "quoted_screen_name"
## [39] "quoted_name" "quoted_followers_count"
## [41] "quoted_friends_count" "quoted_statuses_count"
## [43] "quoted_location" "quoted_description"
## [45] "quoted_verified" "retweet_status_id"
## [47] "retweet_text" "retweet_created_at"
## [49] "retweet_source" "retweet_favorite_count"
## [51] "retweet_retweet_count" "retweet_user_id"
## [53] "retweet_screen_name" "retweet_name"
## [55] "retweet_followers_count" "retweet_friends_count"
## [57] "retweet_statuses_count" "retweet_location"
## [59] "retweet_description" "retweet_verified"
## [61] "place_url" "place_name"
## [63] "place_full_name" "place_type"
## [65] "country" "country_code"
## [67] "geo_coords" "coords_coords"
## [69] "bbox_coords" "status_url"
## [71] "name" "location"
## [73] "description" "url"
## [75] "protected" "followers_count"
## [77] "friends_count" "listed_count"
## [79] "statuses_count" "favourites_count"
## [81] "account_created_at" "verified"
## [83] "profile_url" "profile_expanded_url"
## [85] "account_lang" "profile_banner_url"
## [87] "profile_background_url" "profile_image_url"
Since 2019 began, Andrew Yang has tweeted more than twice as often as any of the other candidates - a total of 2,032 times (an average of 27 tweets per day). Marianne Williamson, John Delaney, and Kamala Harris each tweeted more than 700 times. Elizabeth Warren, Julián Castro, and Pete Buttigieg tweeted more than 500 times each. Joe Biden tweeted the least frequently - just 46 times.
One might hypothesize that the discrepancies across the board could be contributed to lagging candidates’ attempts increase name recognition or raise donations. However, there is a very weak correlation between the candidates’ current performance and the number of tweets they’ve published this year — this was determined using a correlation analysis comparing the data above with the candidate’s ranking in Rolling Stone’s 2020 Democratic Primary Leaderboard, which takes into account numerous factors.
##
## Pearson's product-moment correlation
##
## data: tweet_counts$n and tweet_counts$rank
## t = 1.5264, df = 14, p-value = 0.1492
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1451479 0.7356869
## sample estimates:
## cor
## 0.3777357
But Yang’s background might shed some light on this situation. Unlike most of his competitors, Andrew Yang is not a politician - he’s a businessman who’s claim to fame is Venture for America, an organization that stokes entrepreneurial opportunities in struggling cities. This young (44) candidate’s campaign is grassroots in nature and very modern in its approach to digital interactions and social media. He has appeared on comedian Joe Rogan’s YouTube channel in February (the video has since been viewed more than 2 million times), and his supporters identify themselves as the #YangGang.
Yang’s campaign has been successful in accruing more than 65,000 donors — the minimum requirement to qualify him to appear in the upcoming nationally-televised debate. Coming from almost no name recognition to speak of, that is a significant accomplishment. It’s therefore logical that Yang would continue to do what he’s been doing - tweeting, a lot. Perhaps in coming weeks, as campaigns pick up speed, we will see other candidate’s adopt Yang’s method of reaching potential supporters.
The ten words that appeared most often in the candidates’ tweets were unsurprising given the nature of modern politics and political campaigns, and this analysis did little to clarify the candidate’s priorities.
An analysis of the most common pairs of words (bigrams) that appeared in these tweets (after stop words were filtered out to avoid less-substantial bigrams such as “I am” or “this is,” etc.) did highlight some of the topic pertinent to these candidates’ platforms.
## # A tibble: 10 x 3
## word1 word2 n
## <chr> <chr> <int>
## 1 climate change 262
## 2 health care 138
## 3 presidential candidate 106
## 4 american people 92
## 5 basic income 71
## 6 gun violence 70
## 7 town hall 70
## 8 white house 70
## 9 donald trump 64
## 10 universal basic 59
“Climate change” appeared 262 times, almost twice as many times as the next most common bigram, “health care,” which appeared 138 times. Both of these subjects were significant to former-President Barack Obama’s administration. One of the staples of Obama’s presidency was Obamacare, a law designed to expand and improve healthcare and lower health insurance costs in the United States. Obama also instated numerous regulations designed to reduce pollution and belay climate change - policies Trump has since rescinded. Evidently, the Democratic candidates plan to make these issues central to their campaigns as well, likely somewhat in response to Trump’s actions since Obama left office.
“Basic income” and “universal basic” most likely makes an appearance thanks to Andrew Yang, who is running on a platform of universal basic income.
Surprisingly not appearing more frequently was “gun violence,” which appeared just 70 times in these more than 8,000 tweets. This could be evidence of these candidate’s avoidance of particularly divisive subjects early in their campaigns or an assumption that, given that they are Democrats, they’re sticking with the party line (pro-stricter gun control).
Bernie Sanders blows the other candidates out of the water when it comes to Twitter followings with more than 9.1 million followers, more that twice as many as runner-up Cory Booker who has 4.2 million followers.
So, that is who has the followers, but the act of following someone on Twitter is very passive. On Twitter, users become active audience members when they favorite and retweet tweets. Finding the mean count of favorites of each candidates’, Beto O’Rourke actually leads the pack with an average of more than 18,000 favorites per tweet. Joe Biden is the runner-up with an average of 16.8 thousand favorites per tweet, followed by Kamala Harris who has an average of 14.1 thousand favorites per tweet. The same is true when the average number of retweets per tweet is graphed in that O’Rourke takes the lead once more; however, Harris narrowly surpasses Biden.
## favorites candidate
## 1 5478.70712 Amy Klobuchar
## 2 350.58760 Andrew Yang
## 3 9311.05685 Bernie Sanders
## 4 18213.15686 Beto O'Rourke
## 5 3759.11587 Cory Booker
## 6 4714.80214 Elizabeth Warren
## 7 377.67521 Jack Hickenlooper
## 8 246.79940 Jay Inslee
## 9 16818.32609 Joe Biden
## 10 22.38441 John Delaney
## 11 459.03461 Julian Castro
## 12 14130.61411 Kamala Harris
## 13 145.80528 Marianne Williamson
## 14 1090.03992 Pete Buttigieg
## 15 2361.16027 Tulsi Gubbard
## 16 3061.40564 Kristen Gillibrand
## retweets candidate
## 1 1256.395778 Amy Klobuchar
## 2 96.154528 Andrew Yang
## 3 2325.865633 Bernie Sanders
## 4 4507.686275 Beto O'Rourke
## 5 1115.700252 Cory Booker
## 6 1172.153298 Elizabeth Warren
## 7 55.564103 Jack Hickenlooper
## 8 88.023952 Jay Inslee
## 9 3049.260870 Joe Biden
## 10 9.774108 John Delaney
## 11 184.606557 Julian Castro
## 12 3303.396957 Kamala Harris
## 13 47.649497 Marianne Williamson
## 14 343.051331 Pete Buttigieg
## 15 849.367946 Tulsi Gubbard
## 16 715.203905 Kristen Gillibrand
The strong correlation between the candidates’ average retweets and favorites per tweet suggests that whatever it is about their tweets that prompts their followers to favorite their tweet, also encourages them to retweet their tweet (or vice versa).
##
## Pearson's product-moment correlation
##
## data: favs_rts_csv$favorites and favs_rts_csv$retweets
## t = 20.782, df = 14, p-value = 6.391e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9537840 0.9946364
## sample estimates:
## cor
## 0.9841753
The lack of a correlation between retweets and follower count is significant because it indicates that a large following doesn’t correspond with a strong, supportive following. This analysis suggests O’Rourke, Biden, and Harris have the strongest followings — they are doing something right.
It’s not how often they tweet - although Harris was the third most frequent tweeter, Biden and O’Rourke came in the last two places in that regard. That being said, it is possible that the rarity of Biden and O’Rourke’s tweets make them that much more significant to their followers.
Could it be what they are tweeting? Their most popular words are not that different from those most popular in all of the candidates’ tweets — stereotypical campaign language. The most significant difference is the inclusion of “women.”
## Warning: `data_frame()` is deprecated, use `tibble()`.
## This warning is displayed once per session.
## Joining, by = "word"
The most popular bigrams in these three’s tweets, however, are very different. In this analysis, “gun violence” was the most frequent bigram followed by “health care” and “climate change.” “Background checks” is also related to the gun control debate.
## # A tibble: 10 x 3
## word1 word2 n
## <chr> <chr> <int>
## 1 gun violence 13
## 2 health care 13
## 3 climate change 9
## 4 background checks 8
## 5 federal workers 8
## 6 el paso 7
## 7 national emergency 7
## 8 corporate pacs 6
## 9 american people 5
## 10 civil rights 5
It is important to note that these “popular” bigrams did not appear hundreds of times like the most popular overall bigrams; but again, that is because O’Rourke and Biden have not frequently tweeted this year.
It took me a few minutes to figure out why using ‘anti_join(stop_words)’ wasn’t removing some words from my data frame of unique words. Turns out, it was because stop_words includes contractions with regular apostrophes, but not contractions with curly apostrophes, which is how contractions are written on Twitter apparently. Once I realized this was the issue, I could filter those contractions separately.