The advent of social media has fundamentally changed the way that we interact with politics. Modern politicians are placing more and more emphasis on their social media presence, especially on Twitter. While a candidates’ number of Twitter followers is an imperfect metric in many respects, it does give us functional insight into how a candidates’ messaging is resonating with voters in real-time. This report will explore how social media trends have played out for each of the remaining Democratic primary candidates. Data was collected from February 8th to February 25th, which coincided with several major dates on the Primary calendar (see below for details).
The data in this report was collected via a custom Python HTML scraper, which can be found on my GitHub account: Link to Code
library(tidyverse)
library(reshape2)
library(knitr)
library(kableExtra)
To assess the data scraped with the GetFollowers.py script, we’ll read in each CSV file and attach it as a new line of a composite data frame.
setwd(data.path)
# List CSVs in local directory
files <- list.files(data.path, pattern = "*.csv")
# Set first CSV as a template
data <- read_csv(files[1])
# Loop through all other CSVs and attach them to template
for (i in 2:length(files)) {
temp <- read_csv(files[i])
data <- dplyr::bind_rows(data, temp)
}
# Drop first column (no data included)
data <- data[, 2:length(colnames(data))]
Great! We’ve joined all 9 files in the directory. Let’s peek at the resulting data frame to see what we’re working with:
| Date | SANDERS | WARREN | STEYER | KLOBUCHAR | YANG | BUTTIGIEG | BIDEN | BLOOMBERG |
|---|---|---|---|---|---|---|---|---|
| 02/08/20 | 10498559 | 3691755 | 287987 | 900882 | 1248266 | 1656684 | NA | NA |
| 02/11/20 | 10522309 | 3699414 | 289148 | 910268 | 1252604 | 1667339 | NA | NA |
| 02/12/20 | 10557968 | 3704317 | 290542 | 929321 | 1270705 | 1680931 | NA | NA |
| 02/13/20 | 10566261 | 3706246 | 290874 | 934209 | 1272570 | 1685605 | NA | NA |
| 02/14/20 | 10578410 | 3708710 | 291095 | 937777 | 1273095 | 1691560 | NA | NA |
| 02/15/20 | 10588850 | 3710642 | 291404 | 940620 | 1274283 | 1695486 | NA | NA |
| 02/16/20 | 10599460 | 3712591 | 291538 | 943690 | 1275530 | 1699151 | 4172379 | 2622003 |
| 02/17/20 | 10608699 | 3714030 | 291717 | 945883 | 1276076 | 1702306 | 4173456 | 2630092 |
| 02/18/20 | 10623229 | 3718284 | 292181 | 950865 | 1277067 | 1706293 | 4175734 | 2639620 |
| 02/19/20 | 10635184 | 3722861 | 292603 | 954417 | 1278479 | 1710725 | 4177707 | 2654326 |
| 02/20/20 | 10654258 | 3746212 | 293678 | 961558 | 1286132 | 1721218 | 4181367 | 2674008 |
| 02/21/20 | 10670364 | 3760735 | 294457 | 965423 | 1289119 | 1726250 | 4183692 | 2685369 |
| 02/22/20 | 10687655 | 3768585 | 294942 | 968037 | 1291023 | 1730566 | 4185596 | 2690994 |
| 02/23/20 | 10721130 | 3774394 | 295514 | 970091 | 1292820 | 1733745 | 4186773 | 2694724 |
| 02/24/20 | 10743206 | 3778664 | 295957 | 971341 | 1294225 | 1736083 | 4188131 | 2697162 |
| 02/25/20 | 10764025 | 3783482 | 296358 | 972906 | 1295307 | 1737569 | 4190061 | 2700231 |
We see that Biden and Bloomberg were added into the script later than the others. This was unintentional, but it is something we can try and address. First, let’s visualize the differences in Twitter following per candidate. We’ll use the latest possible data to give us an up-to-date look:
Clearly, Bernie Sanders has a huge advantage in terms of Twitter presence, which is more than double the next closest candidate’s (Joe Biden). Some of this may be driven by his popularity with younger voters, who may be more likely to be active on social media. His platform is also objectively polarizing - it’s possible that many people follow him who do not agree with his agenda, similar to Donald Trump’s Twitter following. With that being said, assessing why people follow Sanders is beyond the scope of this report.
Before moving on to any analysis, we should clean up some of these NA entries in the dataset. We want to keep every other candidate’s entry across this time frame, so we’ll simply impute the minimum value for Biden and Bloomberg over the date range in question. This is an imperfect approach in some respects, but given the lack of insight into the latter candidates’ trajectory it’s the best we have to work with.
## Replace NA values with min values per column
for (k in 2:ncol(data)) {
data[is.na(data[, k]), k] <- min(data[, k], na.rm = T)
}
Now that we have a clean set of data, let’s dig into who has picked up the most steam on Twitter.
Important dates to note during this time period:
We’ll create two new summary statistics:
follower.increase will indicate the raw number of new followers that each candidate has picked up in this time spanpercentage.increase will provide context for the raw value, as a percentage increase over the minimum value# Melt data to make Date in range the grouping variable
tall.data <- melt(data, id = "Date")
# Group by candidate + calculate summary statistics
tracking.data <- tall.data %>%
group_by(variable) %>%
summarise(follower.increase = max(value) - min(value),
percentage.increase = (round(max(value) / min(value), 3) - 1)) %>%
arrange(desc(percentage.increase))
# Display results
tracking.data %>%
kable() %>% kable_styling(kable.format)
| variable | follower.increase | percentage.increase |
|---|---|---|
| KLOBUCHAR | 72024 | 0.080 |
| BUTTIGIEG | 80885 | 0.049 |
| YANG | 47041 | 0.038 |
| BLOOMBERG | 78228 | 0.030 |
| STEYER | 8371 | 0.029 |
| SANDERS | 265466 | 0.025 |
| WARREN | 91727 | 0.025 |
| BIDEN | 17682 | 0.004 |
There are several significant takeaways from these data:
Judging the percentage increase in each candidates’ Twitter following alone is a fundamentally flawed approach - it fails to account for candidates with large Twtitter followings at the start of the analysis, so it’s essentially like comparing apples to oranges. A more meaningful way to gauage candidates’ relative success on Twitter is by tracking the day-over-day increases in Twitter followers.
Before we dive in, let’s narrow the field slightly. Andrew Yang dropped out of the race on February 12th, and Tom Steyer is trending far below the other candidates leading up to South Carolina, per Nate Silver / FiveThirtyEight. For the sake of parsimony, we’ll exclude their data from here on out.
# Remove Yang and Steyer ... sorry lads
data <- data[, !(colnames(data) %in% c("YANG", "STEYER"))]
# Find row-by-row differences in Follower count
daily.increases <- data.frame(diff(as.matrix(data[, 2:ncol(data)])))
# Attach Date column from base data frame
daily.increases <- cbind(data$Date[2:length(data$Date)], daily.increases)
# Rename Date column
colnames(daily.increases)[1] <- "Date"
# Display results
daily.increases %>%
kable() %>% kable_styling(kable.format)
| Date | SANDERS | WARREN | KLOBUCHAR | BUTTIGIEG | BIDEN | BLOOMBERG |
|---|---|---|---|---|---|---|
| 02/11/20 | 23750 | 7659 | 9386 | 10655 | 0 | 0 |
| 02/12/20 | 35659 | 4903 | 19053 | 13592 | 0 | 0 |
| 02/13/20 | 8293 | 1929 | 4888 | 4674 | 0 | 0 |
| 02/14/20 | 12149 | 2464 | 3568 | 5955 | 0 | 0 |
| 02/15/20 | 10440 | 1932 | 2843 | 3926 | 0 | 0 |
| 02/16/20 | 10610 | 1949 | 3070 | 3665 | 0 | 0 |
| 02/17/20 | 9239 | 1439 | 2193 | 3155 | 1077 | 8089 |
| 02/18/20 | 14530 | 4254 | 4982 | 3987 | 2278 | 9528 |
| 02/19/20 | 11955 | 4577 | 3552 | 4432 | 1973 | 14706 |
| 02/20/20 | 19074 | 23351 | 7141 | 10493 | 3660 | 19682 |
| 02/21/20 | 16106 | 14523 | 3865 | 5032 | 2325 | 11361 |
| 02/22/20 | 17291 | 7850 | 2614 | 4316 | 1904 | 5625 |
| 02/23/20 | 33475 | 5809 | 2054 | 3179 | 1177 | 3730 |
| 02/24/20 | 22076 | 4270 | 1250 | 2338 | 1358 | 2438 |
| 02/25/20 | 20819 | 4818 | 1565 | 1486 | 1930 | 3069 |
Again, we have to deal with some missing data for Biden and Bloomberg but this isn’t the end of the world. To control for the 0 scores, we’ll calculate the average increase in Twitter followers per-candidate, while removing entries of 0.
# Group by Date in range
tall.daily <- melt(daily.increases, id = "Date")
# Calculate average increases (excluding 0s)
tall.daily %>%
group_by(variable) %>%
summarise(daily.av = round(mean(value[value != 0]), 2)) %>%
arrange(desc(daily.av)) %>%
kable() %>% kable_styling(kable.format)
| variable | daily.av |
|---|---|
| SANDERS | 17697.73 |
| BLOOMBERG | 8692.00 |
| WARREN | 6115.13 |
| BUTTIGIEG | 5392.33 |
| KLOBUCHAR | 4801.60 |
| BIDEN | 1964.67 |
Notably, Bernie continues to pick up the most new Twitter followers despite his already-robust digital coalition. Bloomberg is a distant second in this metric, likely fueled by the (not-so-small) fortune he’s poured into his ad campaign throughout the early primaries. Warren, Buttigieg, and Klobuchar are close behind, with less than 2,000 daily new followers separating them. Biden is averaging less than 2,000 new followers a day - perhaps a byproduct of his hot-and-cold performance over the month of February.
Observing this trend at a more granular level gives us some added insight into how each candidate is tracking on Twitter. Bernie’s median pickup in new followers is higher than any other candidate’s IQR. Warren, Buttigieg, and Klobuchar each enjoyed at least two days of high Twitter traffic (likely after a debate, but we’ll dig into that more). Biden is a model of consistency, with little deviation in the number of new followers gained daily (alas, that figure also happens to the lowest of the bunch.)
Here we see an obvious trend - following the New Hampshire Primary (2/11) and 9th Democratic Debate (2/19), each candidate saw a major pickup in Twitter followers. This trend carries across all six candidates.
# Get largest follower increase + its respective date per candidate
tall.daily %>%
group_by(variable) %>%
summarise(biggest.pickup.val = max(value),
biggest.pickup.date = tall.daily[which.max(value), "Date"]) %>%
arrange(desc(biggest.pickup.val)) %>%
kable() %>% kable_styling(kable.format)
| variable | biggest.pickup.val | biggest.pickup.date |
|---|---|---|
| SANDERS | 35659 | 02/12/20 |
| WARREN | 23351 | 02/20/20 |
| BLOOMBERG | 19682 | 02/20/20 |
| KLOBUCHAR | 19053 | 02/12/20 |
| BUTTIGIEG | 13592 | 02/12/20 |
| BIDEN | 3660 | 02/20/20 |
While Bloomberg has the second-highest average of new Twitter followers, Sanders and Warren have had the most dynamic increases of any two candidates (Sanders after the NH primary, and Warren after the 9th Democratic Debate prior to the Nevada caucuses).
Factoring out Warren and Sanders, let’s observe how this distribution looks for the moderate candidates remaining - Klobuchar, Buttigieg, Biden and Bloomberg. Since we only started including data from Bloomberg and Biden on February 17th, we’ll only assess data from that date on.
Of these four candidates, Bloomberg appeared to be trending in the right direction until February 21st. At this point he began to wane slightly in new Twitter followers. Pete Buttigieg has remained very consistent over this time period. Amy Klobuchar saw nice returns after the 9th Democratic Debate, and has averaged slightly less in the days since.
Speaking of Amy Klobuchar…
Senator Amy Klobuchar has gained significant momentum over the last several weeks after a strong performance in the New Hampshire primaries, finishing third behind Sanders and Buttigieg. She occupies the crowded moderate territory with Biden, Buttigieg, and Bloomberg, which underscores the importance of her online presence. Let’s unpack the trajectory of her Twitter presence a little further.
We observe a massive increase in her Twitter following between February 11th and February 12th, clearly an effect of her finish in the New Hampshire primary.
# Evaluate difference in Twitter followers before / after NH primary
klobuchar %>%
filter(Date %in% c("02/11/20", "02/12/20")) %>%
summarise(increase.after.NH = max(value) - min(value)) %>%
kable() %>% kable_styling(kable.format)
| increase.after.NH |
|---|
| 19053 |
Case in point! 19,053 new Twitter followers after the New Hampshire primary is certainly nothing to scoff at. Amy has continued a linear trajectory with new Twitter followers since. It’ll be interesting to see if and how she’s able to translate these new followers into votes come Super Tuesday.