The advent of social media has fundamentally changed the way that we interact with politics. Modern politicians are placing more and more emphasis on their social media presence, especially on Twitter. While a candidates’ number of Twitter followers is an imperfect metric in many respects, it does give us functional insight into how a candidates’ messaging is resonating with voters in real-time. This report will explore how social media trends have played out for each of the remaining Democratic primary candidates. Data was collected from February 8th to February 25th, which coincided with several major dates on the Primary calendar (see below for details).


The data in this report was collected via a custom Python HTML scraper, which can be found on my GitHub account: Link to Code


Imports

library(tidyverse)
library(reshape2)
library(knitr)
library(kableExtra)

Loading and Cleaning Data


To assess the data scraped with the GetFollowers.py script, we’ll read in each CSV file and attach it as a new line of a composite data frame.

setwd(data.path)

# List CSVs in local directory
files <- list.files(data.path, pattern = "*.csv")

# Set first CSV as a template
data <- read_csv(files[1])

# Loop through all other CSVs and attach them to template
for (i in 2:length(files)) {
        
        temp <- read_csv(files[i])
        data <- dplyr::bind_rows(data, temp)
}

# Drop first column (no data included)
data <- data[, 2:length(colnames(data))]


Great! We’ve joined all 9 files in the directory. Let’s peek at the resulting data frame to see what we’re working with:

Date SANDERS WARREN STEYER KLOBUCHAR YANG BUTTIGIEG BIDEN BLOOMBERG
02/08/20 10498559 3691755 287987 900882 1248266 1656684 NA NA
02/11/20 10522309 3699414 289148 910268 1252604 1667339 NA NA
02/12/20 10557968 3704317 290542 929321 1270705 1680931 NA NA
02/13/20 10566261 3706246 290874 934209 1272570 1685605 NA NA
02/14/20 10578410 3708710 291095 937777 1273095 1691560 NA NA
02/15/20 10588850 3710642 291404 940620 1274283 1695486 NA NA
02/16/20 10599460 3712591 291538 943690 1275530 1699151 4172379 2622003
02/17/20 10608699 3714030 291717 945883 1276076 1702306 4173456 2630092
02/18/20 10623229 3718284 292181 950865 1277067 1706293 4175734 2639620
02/19/20 10635184 3722861 292603 954417 1278479 1710725 4177707 2654326
02/20/20 10654258 3746212 293678 961558 1286132 1721218 4181367 2674008
02/21/20 10670364 3760735 294457 965423 1289119 1726250 4183692 2685369
02/22/20 10687655 3768585 294942 968037 1291023 1730566 4185596 2690994
02/23/20 10721130 3774394 295514 970091 1292820 1733745 4186773 2694724
02/24/20 10743206 3778664 295957 971341 1294225 1736083 4188131 2697162
02/25/20 10764025 3783482 296358 972906 1295307 1737569 4190061 2700231


We see that Biden and Bloomberg were added into the script later than the others. This was unintentional, but it is something we can try and address. First, let’s visualize the differences in Twitter following per candidate. We’ll use the latest possible data to give us an up-to-date look:


Clearly, Bernie Sanders has a huge advantage in terms of Twitter presence, which is more than double the next closest candidate’s (Joe Biden). Some of this may be driven by his popularity with younger voters, who may be more likely to be active on social media. His platform is also objectively polarizing - it’s possible that many people follow him who do not agree with his agenda, similar to Donald Trump’s Twitter following. With that being said, assessing why people follow Sanders is beyond the scope of this report.


Before moving on to any analysis, we should clean up some of these NA entries in the dataset. We want to keep every other candidate’s entry across this time frame, so we’ll simply impute the minimum value for Biden and Bloomberg over the date range in question. This is an imperfect approach in some respects, but given the lack of insight into the latter candidates’ trajectory it’s the best we have to work with.


## Replace NA values with min values per column
for (k in 2:ncol(data)) {
        
        data[is.na(data[, k]), k] <- min(data[, k], na.rm = T)
}

Increase in Twitter Following Over Time


Now that we have a clean set of data, let’s dig into who has picked up the most steam on Twitter.

Important dates to note during this time period:


We’ll create two new summary statistics:


# Melt data to make Date in range the grouping variable
tall.data <- melt(data, id = "Date")

# Group by candidate + calculate summary statistics
tracking.data <- tall.data %>% 
                        group_by(variable) %>% 
                        summarise(follower.increase = max(value) - min(value),
                                  percentage.increase = (round(max(value) / min(value), 3) - 1)) %>% 
                        arrange(desc(percentage.increase))

# Display results
tracking.data %>% 
        kable() %>% kable_styling(kable.format)
variable follower.increase percentage.increase
KLOBUCHAR 72024 0.080
BUTTIGIEG 80885 0.049
YANG 47041 0.038
BLOOMBERG 78228 0.030
STEYER 8371 0.029
SANDERS 265466 0.025
WARREN 91727 0.025
BIDEN 17682 0.004



There are several significant takeaways from these data:


Daily Increases in Followers


Judging the percentage increase in each candidates’ Twitter following alone is a fundamentally flawed approach - it fails to account for candidates with large Twtitter followings at the start of the analysis, so it’s essentially like comparing apples to oranges. A more meaningful way to gauage candidates’ relative success on Twitter is by tracking the day-over-day increases in Twitter followers.

Before we dive in, let’s narrow the field slightly. Andrew Yang dropped out of the race on February 12th, and Tom Steyer is trending far below the other candidates leading up to South Carolina, per Nate Silver / FiveThirtyEight. For the sake of parsimony, we’ll exclude their data from here on out.

# Remove Yang and Steyer ... sorry lads
data <- data[, !(colnames(data) %in% c("YANG", "STEYER"))]

# Find row-by-row differences in Follower count
daily.increases <- data.frame(diff(as.matrix(data[, 2:ncol(data)])))

# Attach Date column from base data frame
daily.increases <- cbind(data$Date[2:length(data$Date)], daily.increases)

# Rename Date column
colnames(daily.increases)[1] <- "Date"

# Display results
daily.increases %>% 
        kable() %>% kable_styling(kable.format)
Date SANDERS WARREN KLOBUCHAR BUTTIGIEG BIDEN BLOOMBERG
02/11/20 23750 7659 9386 10655 0 0
02/12/20 35659 4903 19053 13592 0 0
02/13/20 8293 1929 4888 4674 0 0
02/14/20 12149 2464 3568 5955 0 0
02/15/20 10440 1932 2843 3926 0 0
02/16/20 10610 1949 3070 3665 0 0
02/17/20 9239 1439 2193 3155 1077 8089
02/18/20 14530 4254 4982 3987 2278 9528
02/19/20 11955 4577 3552 4432 1973 14706
02/20/20 19074 23351 7141 10493 3660 19682
02/21/20 16106 14523 3865 5032 2325 11361
02/22/20 17291 7850 2614 4316 1904 5625
02/23/20 33475 5809 2054 3179 1177 3730
02/24/20 22076 4270 1250 2338 1358 2438
02/25/20 20819 4818 1565 1486 1930 3069


Again, we have to deal with some missing data for Biden and Bloomberg but this isn’t the end of the world. To control for the 0 scores, we’ll calculate the average increase in Twitter followers per-candidate, while removing entries of 0.

# Group by Date in range
tall.daily <- melt(daily.increases, id = "Date")

# Calculate average increases (excluding 0s)
tall.daily %>% 
        group_by(variable) %>% 
        summarise(daily.av = round(mean(value[value != 0]), 2)) %>% 
        arrange(desc(daily.av)) %>% 
        kable() %>% kable_styling(kable.format)
variable daily.av
SANDERS 17697.73
BLOOMBERG 8692.00
WARREN 6115.13
BUTTIGIEG 5392.33
KLOBUCHAR 4801.60
BIDEN 1964.67


Notably, Bernie continues to pick up the most new Twitter followers despite his already-robust digital coalition. Bloomberg is a distant second in this metric, likely fueled by the (not-so-small) fortune he’s poured into his ad campaign throughout the early primaries. Warren, Buttigieg, and Klobuchar are close behind, with less than 2,000 daily new followers separating them. Biden is averaging less than 2,000 new followers a day - perhaps a byproduct of his hot-and-cold performance over the month of February.


Observing this trend at a more granular level gives us some added insight into how each candidate is tracking on Twitter. Bernie’s median pickup in new followers is higher than any other candidate’s IQR. Warren, Buttigieg, and Klobuchar each enjoyed at least two days of high Twitter traffic (likely after a debate, but we’ll dig into that more). Biden is a model of consistency, with little deviation in the number of new followers gained daily (alas, that figure also happens to the lowest of the bunch.)


Here we see an obvious trend - following the New Hampshire Primary (2/11) and 9th Democratic Debate (2/19), each candidate saw a major pickup in Twitter followers. This trend carries across all six candidates.

# Get largest follower increase + its respective date per candidate
tall.daily %>% 
        group_by(variable) %>% 
        summarise(biggest.pickup.val = max(value),
                  biggest.pickup.date = tall.daily[which.max(value), "Date"]) %>% 
        arrange(desc(biggest.pickup.val)) %>% 
        kable() %>% kable_styling(kable.format)
variable biggest.pickup.val biggest.pickup.date
SANDERS 35659 02/12/20
WARREN 23351 02/20/20
BLOOMBERG 19682 02/20/20
KLOBUCHAR 19053 02/12/20
BUTTIGIEG 13592 02/12/20
BIDEN 3660 02/20/20


While Bloomberg has the second-highest average of new Twitter followers, Sanders and Warren have had the most dynamic increases of any two candidates (Sanders after the NH primary, and Warren after the 9th Democratic Debate prior to the Nevada caucuses).

Factoring out Warren and Sanders, let’s observe how this distribution looks for the moderate candidates remaining - Klobuchar, Buttigieg, Biden and Bloomberg. Since we only started including data from Bloomberg and Biden on February 17th, we’ll only assess data from that date on.


Of these four candidates, Bloomberg appeared to be trending in the right direction until February 21st. At this point he began to wane slightly in new Twitter followers. Pete Buttigieg has remained very consistent over this time period. Amy Klobuchar saw nice returns after the 9th Democratic Debate, and has averaged slightly less in the days since.

Speaking of Amy Klobuchar…


Summary