Please do not cite without permission. This example serves a pedagogical purpose.
In this project, we aim to demonstrate how one can plot the geographic distribution of donors and donations in the 2016 Democratic Primary Contests for President. At the end of this document will be a set of maps that show the geographic distribution of individual itemized donations at the county-level to the presidential campaign committees. All of the data used is publicly available through the Federal Election Commission at their website.
The project proceeds in four steps:
1. Downloading and readying the data for both remaining candidates
2. Collapsing the data at the county-level
3. Merging both candidates together and preparing the data for mapping
4. Mapping the output using the choroplethr package
I have set up a function named gettingData(cand.id) which uses the candidate identification produced by FEC Form 2 to download data from the FEC FTP client. Within the function, I also convert dollar amounts to numeric data, remove any instances of refunds (NOTE: a more in-depth project would treat refunds more carefully), and merge the dataset with zipcode data provided by the noncensus package.
gettingData <- function(cand.id){
url.name <- paste0("ftp://ftp.fec.gov/FEC/fecviewer/FEC_2016_", cand.id, "_F3P_17A_CSV.zip")
loc <- getwd()
download.file(url = url.name, destfile = paste0(loc, "/file.zip"))
require(data.table, quietly = T)
file.name <- strsplit(url.name, "/")[[1]][length(strsplit(url.name, "/")[[1]])]
file.name <- gsub(pattern = "_CSV.zip", replacement = ".csv", x = file.name)
unzip("file.zip", file.name)
temp.data <- fread(file.name)
temp.data$Amount <- (as.numeric(sub("$", "", as.character(temp.data$Amount), fixed =T)))
# filtering out refunds
temp.data <- temp.data[!grepl("refund",ignore.case = T, temp.data$Description),]
temp.data <- temp.data[temp.data$Amount > 0,]
temp.data$zip <- substr(x = temp.data$Zip, start = 0, stop = 5)
require(noncensus, quietly = T)
data(zip_codes)
require(plyr)
temp.data <- join(x = temp.data, y = zip_codes, type = "left")
return(temp.data)
}
Next, I use the function to gather data on the two remaining candidates.
Clinton.data <- gettingData(cand.id = "P00003392")
Sanders.data <- gettingData(cand.id = "P60007168")
In this section, I create a function named byCounty(data.list), which takes the candidate-level data and produces sums and counts of the donations at the county-level.
library(doBy, quietly = T)
byCounty <- function(data.list){
county.data <- summaryBy(Amount~fips, data=data.list, FUN = c(length, sum))
county.data$region <- county.data$fips
return(county.data)
}
Sanders.county <- byCounty(Sanders.data)
Clinton.county <- byCounty(Clinton.data)
In this section, Iorganize the data in a more efficient manner - by labeling the variable names for each candidate and then merging them together. (This process could be done more efficiently.)
Sanders.county$Sanders.Number <- Sanders.county$Amount.length
Sanders.county$Sanders.Amount <- Sanders.county$Amount.sum
Sanders.county$Amount.sum <- NULL
Sanders.county$Amount.length <- NULL
Clinton.county$Clinton.Number <- Clinton.county$Amount.length
Clinton.county$Clinton.Amount <- Clinton.county$Amount.sum
Clinton.county$Amount.sum <- NULL
Clinton.county$Amount.length <- NULL
#Merging Clinton and Sanders
library(plyr)
Democrats <- join(x = Sanders.county, y = Clinton.county, by = "fips", type = "full")
#if missing, inputting 0's for donors and amounts.
Democrats$Sanders.Number[is.na(Democrats$Sanders.Number)] <- 0
Democrats$Sanders.Amount[is.na(Democrats$Sanders.Amount)] <- 0
Democrats$Clinton.Number[is.na(Democrats$Clinton.Number)] <- 0
Democrats$Clinton.Amount[is.na(Democrats$Clinton.Amount)] <- 0
In this section, I create two maps for the Democrats – one standardizing the geographic distribution by the number of donors and another standardizing the geographic distribution by the donation amount in each section. For both calculations, I first subtracted Clinton’s total by Sanders’s total and then used the scale() function to scale it with a center 0. Any positive value indicates Clinton’s advantage and any negative value indicates Sanders’s advantage. In the map itself, however, the standardized values are coerced into individual bins.
# Mapping Democrats by Number of Donors
Democrats$Number <- Democrats$Clinton.Number - Democrats$Sanders.Number
Democrats$temp <- as.numeric(scale(Democrats$Number, center=0))
Democrats$value <- cut(Democrats$temp, breaks = c(-55, -2, -1, -.5, 0, .5, 1, 2))
library(ggplot2)
library(choroplethr)
choro <- CountyChoropleth$new(Democrats)
choro$title = "2016 Democratic Primary, Candidates by Number of Donations"
choro$ggplot_scale <- scale_fill_brewer(name = "Number of Donations", palette = "PuBuGn", guide = "legend", labels = c("Sanders", "", "", "", "", "Clinton"))
choro$render()
#Mapping Democrats by Donation Amounts
Democrats$Number <- Democrats$Clinton.Amount - Democrats$Sanders.Amount
Democrats$temp <- as.numeric(scale(Democrats$Number, center=0))
Democrats$value <- cut(Democrats$temp, breaks = c(-55, -2, - 1.5, -1, -.5, 0, .5, 1, 1.5, 2))
choro <- CountyChoropleth$new(Democrats)
choro$title = "2016 Democratic Primary, Candidates by Donation Amounts"
choro$ggplot_scale <- scale_fill_brewer(name = "Donation Amounts", palette = "PuBuGn", guide = "legend", labels = c("Sanders", "", "", "", "", "Clinton"))
choro$render()
Overall, the visual result runs contrary to conventional wisdom. Conventional wisdom likely would posit Mr. Sanders has an advantage when it comes to the number of donors but not the amount of donations. Here, we find he has an advantage of both when we visualize these figures geographically. The next step would be to do so using mapping features that weights counties by population. Doing so would demonstrate Clinton’s presumed advantage in urban centers.