In this project we will examine campaign expenditures for the 2016 US presidential election. We will concentrate on the two major party nominees, and will look at expenditures by the month, starting in Feb 2016.
Because of the nuances of campaign finance law, while we will be able to see the intake for the campaigns, we will not be able to see fund intake for the independent PACs. As a result, we will look at expenditures.
The summary stats by month for each campaign are already available. We will tidy the independant PAC data to create monthy summary expenditures for groups that favor or oppose the two major party nominees.
Our goal is to compare monthly expenditures for the campaign and for PACs across the two candidates. We will further attempt to use the “sup_opp” variable in the PAC data to divide PAC expenditures across “negative” or “positive” efforts.
We will attempt to determine if there is a difference in average monthly expenditure across the 6 categories.
if (!require('dplyr')) install.packages('dplyr')
if (!require('stringr')) install.packages('stringr')
if (!require('ggplot2')) install.packages('ggplot2')
if (!require('DT')) install.packages('DT')# load data
camp <- read.csv("FEC_campaign_summary.csv")
pac <- read.csv("FEC_independent-expenditure.csv")In our raw data, there are 239 observations of the campaign data and 168703 of the PAC data.
To effectively conduct our analysis, we’ll only want the monthly observations for our two candidates so once the data is cleaned and merged into a long format dataset, I would expect the number of cases = n months * 6, where 6 represents ClintonCampaign, ProClintonPAC, AntiClintonPAC, TrumpCampaign, ProTrumpPAC, AntiTrumpPAC.
The data is delivered to the FEC and redistributed to the public as part of disclosure law.
“The Federal Election Commission (FEC) is an independent regulatory agency established in 1975 to administer and enforce the Federal Election Campaign Act (FECA). That statute limits the sources and amounts of the contributions used to finance federal elections, requires public disclosure of campaign finance information and–in tandem with the Primary Matching Payment Act and the Presidential Election Campaign Fund Act–provides for the public funding of Presidential elections. For more information on the FEC’s role in regulating federal elections, see the brochure”The FEC and the Federal Campaign Finance Law." - From the FEC Site
This is an observational data study.
Data is source from is publicly available datasets from the Federal Election Commission. The data we will examine includes summary data on the 2016 US Presidential election from both the campaigns and independant groups or PACs.
The response variable will be the expenditure by the month. This will be a numerical variable.
We have a few explanatory variables: candidate + campaign/PAC type (Categorical) and month (Categorical).
#subset to DT and HC
campsub <- filter(camp, CMTE_NM =="HILLARY FOR AMERICA" | CMTE_NM =="DONALD J. TRUMP FOR PRESIDENT, INC.") %>%
filter(str_detect(RPT_TP, "M")) %>%
arrange(CMTE_NM, RPT_TP) %>%
select(CMTE_NM, RPT_TP, TTL_DISB_SUM_PAGE_PER) %>%
droplevels()
#change vars
campsub$CMTE_NM <- as.character(campsub$CMTE_NM)
campsub$CMTE_NM[campsub$CMTE_NM == "HILLARY FOR AMERICA"] <- "HRCCamp"
campsub$CMTE_NM[campsub$CMTE_NM == "DONALD J. TRUMP FOR PRESIDENT, INC."] <- "DJTCamp"
#change to month
campsub$RPT_TP <- as.character(campsub$RPT_TP)
campsub$RPT_TP[campsub$RPT_TP == "M2"] <- "Feb"
campsub$RPT_TP[campsub$RPT_TP == "M3"] <- "Mar"
campsub$RPT_TP[campsub$RPT_TP == "M4"] <- "Apr"
campsub$RPT_TP[campsub$RPT_TP == "M5"] <- "May"
campsub$RPT_TP[campsub$RPT_TP == "M6"] <- "Jun"
campsub$RPT_TP[campsub$RPT_TP == "M7"] <- "Jul"
campsub$RPT_TP[campsub$RPT_TP == "M8"] <- "Aug"
campsub$RPT_TP[campsub$RPT_TP == "M9"] <- "Sep"
campsub$RPT_TP[campsub$RPT_TP == "M10"] <- "Oct"
campsub$RPT_TP <- factor(campsub$RPT_TP, level=
c("Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct"))
campsub$direction[campsub$CMTE_NM =="DJTCamp"] <- "Trump"
campsub$direction[campsub$CMTE_NM =="HRCCamp"] <- "Clinton"ggplot(campsub, aes(x=RPT_TP, y=TTL_DISB_SUM_PAGE_PER,
group=CMTE_NM, colour=CMTE_NM)) +
geom_line() + ylab("Disbursements") + xlab("") +
ggtitle("Total Disbursements by Campaigns") Below we will begin to clean, subset, and rename to find PACs associated with HRC and DJT, noting whether they support or oppose a candidate.
# ready numeric vector
pac$expenditure <- str_replace_all(as.character(pac$exp_amo), fixed("$"), "")
pac$expenditure <- as.numeric(str_replace_all(pac$expenditure, ',', ''))
# there are dupes if we leave in amn_ind (amended filing) vars, let's remove them. while imprecise, just reviewing the original filing will give us what we need for this study.
pac <- pac[pac$amn_ind == "N",]
# droplevels
pac$can_nam <- as.character(pac$can_nam)
pac$can_nam <- tolower(pac$can_nam)
pac$spe_nam <- tolower(pac$spe_nam)
# subset only those that refer to HRC
pacsubHC <- pac[grepl("inton|hil", pac$can_nam), ]
pacsubHC <- pacsubHC[!grepl("kirk|james|george", pacsubHC$can_nam), ]
pacsubHC$org[pacsubHC$sup_opp == "Oppose"] <- "HRC_Oppose"
pacsubHC$org[pacsubHC$sup_opp == "Support"] <- "HRC_Support"
# what's missing?
rmhrc <- pacsubHC[!complete.cases(pacsubHC$org),]
pacsubHC <- pacsubHC[complete.cases(pacsubHC$org),]
rmhrc$org[rmhrc$can_par_aff == "Dem"] <- "HRC_Support"
rmhrc <- rmhrc[complete.cases(rmhrc$org),]
pacsubHC <- rbind(pacsubHC, rmhrc)
# subset only those that refer to DJT
pacsubDT <- pac[grepl("trump", pac$can_nam),]
pacsubDT$org[pacsubDT$sup_opp == "Oppose"] <- "DJT_Oppose"
pacsubDT$org[pacsubDT$sup_opp == "Support"] <- "DJT_Support"
# what's missing?
rmdjt <- pacsubDT[!complete.cases(pacsubDT$org),]
pacsubDT <- pacsubDT[complete.cases(pacsubDT$org),]
rmdjt$org[rmdjt$can_par_aff == "Rep"] <- "DJT_Support"
rmdjt <- rmdjt[complete.cases(rmdjt$org),]
pacsubDT <- rbind(pacsubDT, rmdjt)
# combine, subset
pacsub <- rbind(pacsubHC, pacsubDT)
pacsub <- select(pacsub, org, spe_nam, rec_dat, expenditure)
# clean type
pacsub$org <- as.factor(pacsub$org)
pacsub$spe_nam <- as.character(pacsub$spe_nam)
pacsub$rec_dat <- as.Date(pacsub$rec_dat, "%m/%d/%Y")
# remove obs < feb 2016 > nov (same as campaign data summary)
pacsub <- pacsub[pacsub$rec_dat >= "2016-02-01" & pacsub$rec_dat < "2016-11-01",]
#clean house
rm(pacsubHC, pacsubDT, rmdjt, rmhrc)length(unique(pacsub$spe_nam))## [1] 247
While we see there about 247 unique Super PACs associated with HRC and DJT, apparently many of them use multiple submitting parties (who can’t spell) to send reports to the FEC. There are a number of duplicate records in here, but a quick scan suggests maybe 20-40 are dupes. Cleaning this field is beyond the scope of this project.
pacsub$month[pacsub$rec_dat >= "2016/02/01" & pacsub$rec_dat < "2016/03/01"] <- "Feb"
pacsub$month[pacsub$rec_dat >= "2016/03/01" & pacsub$rec_dat < "2016/04/01"] <- "Mar"
pacsub$month[pacsub$rec_dat >= "2016/04/01" & pacsub$rec_dat < "2016/05/01"] <- "Apr"
pacsub$month[pacsub$rec_dat >= "2016/05/01" & pacsub$rec_dat < "2016/06/01"] <- "May"
pacsub$month[pacsub$rec_dat >= "2016/06/01" & pacsub$rec_dat < "2016/07/01"] <- "Jun"
pacsub$month[pacsub$rec_dat >= "2016/07/01" & pacsub$rec_dat < "2016/08/01"] <- "Jul"
pacsub$month[pacsub$rec_dat >= "2016/08/01" & pacsub$rec_dat < "2016/09/01"] <- "Aug"
pacsub$month[pacsub$rec_dat >= "2016/09/01" & pacsub$rec_dat < "2016/10/01"] <- "Sep"
pacsub$month[pacsub$rec_dat >= "2016/10/01" & pacsub$rec_dat < "2016/11/01"] <- "Oct"HRCOp <- pacsub[pacsub$org=="HRC_Oppose", ]
DJTOp <- pacsub[pacsub$org=="DJT_Oppose", ]
HRCSu <- pacsub[pacsub$org=="HRC_Support", ]
DJTSu <- pacsub[pacsub$org=="DJT_Support", ]
monthsum <- function(df){
df %>%
group_by(month) %>%
summarize(sum(expenditure))
}
HOp <- monthsum(HRCOp)
DOp <- monthsum(DJTOp)
HSu <- monthsum(HRCSu)
DSu <- monthsum(DJTSu)
HOp$org <- "HRC_Oppose"
DOp$org <- "DJT_Oppose"
HSu$org <- "HRC_Support"
DSu$org <- "DJT_Support"
pacsummary <- rbind(HOp, DOp, HSu, DSu)
pacsummary$month <- factor(pacsummary$month, level=
c("Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct"))
names(pacsummary) <- c("RPT_TP", "TTL_DISB_SUM_PAGE_PER", "CMTE_NM")
pacsummary <- pacsummary[,c(3,1,2)]
# create summary var for direction of effort
pacsummary$direction[pacsummary$CMTE_NM =="DJT_Support" | pacsummary$CMTE_NM =="HRC_Oppose"] <- "Trump"
pacsummary$direction[pacsummary$CMTE_NM =="DJT_Oppose" | pacsummary$CMTE_NM =="HRC_Support"] <- "Clinton"
rm(HOp, DOp, HSu, DSu, HRCOp, DJTOp, HRCSu)ggplot(pacsummary, aes(x=RPT_TP, y=TTL_DISB_SUM_PAGE_PER,
group=CMTE_NM, colour=CMTE_NM)) +
geom_line() + ylab("Disbursements") + xlab("") +
ggtitle("Total Disbursements by PACS") # why are June expenditures so high for Trump supporting PACs?
review <- DJTSu[DJTSu$month=="Jun",]
review <- review[order(-review$expenditure),]
datatable(review, options=list(pageLength=5))# what was this $50mil expenditure from Jun 14?
whatonearth <- pac[pac$expenditure==50000000,]
whatonearth <- select(whatonearth, spe_nam, pur, pay, exp_amo)
knitr::kable(whatonearth)| spe_nam | pur | pay | exp_amo | |
|---|---|---|---|---|
| 15924 | get our jobs back, inc | Digital Media Marketing, Revenue Sharing | Statware, Inc. | $50,000,000.00 |
rm(DJTSu, review, whatonearth)Ok, this is quite something. This is a very very strange outlier. More information on this particular observation can be found here. We should leave this in, but note it is an outlier likely to skew statistical analysis.
final <- rbind(pacsummary, campsub)
names(final) <- c("category", "month", "expenditure", "direction")
final$category <- as.factor(final$category)
final$direction <- as.factor(final$direction)
ggplot(final, aes(x=month, y=expenditure,
group=category, colour=category)) +
geom_line() + ylab("Disbursements") + xlab("") +
ggtitle("Total Disbursements by Campaigns and PACs") ggplot(final, aes(x=direction, y=expenditure, fill=category)) +
geom_bar(position="dodge", stat="identity") +
ylab("Total Disbursements") + xlab("") +
ggtitle("Total Disbursements by Org Type") \(H_0: \mu_{TCamp} = \mu_{HCamp} = \mu_{TPACOp} = \mu_{HPACOp} = \mu_{TPACSu} = \mu_{HPACSu}\). The avg monthly expenditures does not vary across all types.
\(H_A:\) The avg monthly expenditures across some or all groups does vary.
ggplot(final, aes(factor(category), expenditure)) + geom_boxplot(aes(fill=category)) + xlab("") + theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())finalsum <- group_by(final, category)
finalsum2 <- summarise(finalsum, mean = round(mean(expenditure),0), stdev = round(sd(expenditure), 0))
knitr::kable(finalsum2)| category | mean | stdev |
|---|---|---|
| DJT_Oppose | 24138816 | 23916610 |
| DJT_Support | 9915562 | 17023951 |
| DJTCamp | 19692503 | 20249352 |
| HRC_Oppose | 7111931 | 10329706 |
| HRC_Support | 5238978 | 6447687 |
| HRCCamp | 35879179 | 20373035 |
We believe the data from each category is independant (it is unlikely that a similar expenditure shows up in each category). We also estimate that while the variability and distributions are not ideal, they should also be adequate for conducting a simple ANOVA test.
anova <- aov(final$expenditure ~ final$category)
summary(anova)## Df Sum Sq Mean Sq F value Pr(>F)
## final$category 5 6.308e+15 1.262e+15 4.125 0.00339 **
## Residuals 48 1.468e+16 3.059e+14
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tukey <- TukeyHSD(anova, ordered=TRUE)
tukey## Tukey multiple comparisons of means
## 95% family-wise confidence level
## factor levels have been ordered
##
## Fit: aov(formula = final$expenditure ~ final$category)
##
## $`final$category`
## diff lwr upr p adj
## HRC_Oppose-HRC_Support 1872953 -22595545 26341451 0.9999114
## DJT_Support-HRC_Support 4676583 -19791914 29145081 0.9926928
## DJTCamp-HRC_Support 14453525 -10014973 38922023 0.5047741
## DJT_Oppose-HRC_Support 18899837 -5568661 43368335 0.2171664
## HRCCamp-HRC_Support 30640200 6171702 55108698 0.0066044
## DJT_Support-HRC_Oppose 2803630 -21664867 27272128 0.9993597
## DJTCamp-HRC_Oppose 12580572 -11887926 37049070 0.6496221
## DJT_Oppose-HRC_Oppose 17026884 -7441614 41495382 0.3224373
## HRCCamp-HRC_Oppose 28767247 4298749 53235745 0.0126490
## DJTCamp-DJT_Support 9776941 -14691556 34245439 0.8413586
## DJT_Oppose-DJT_Support 14223254 -10245244 38691751 0.5224657
## HRCCamp-DJT_Support 25963617 1495119 50432114 0.0316406
## DJT_Oppose-DJTCamp 4446312 -20022185 28914810 0.9942179
## HRCCamp-DJTCamp 16186675 -8281822 40655173 0.3779452
## HRCCamp-DJT_Oppose 11740363 -12728135 36208861 0.7124701
We can see from our test that the p value is extremely low = 0.0033878. This causes us to reject \(H_0\) and conclude that there are some differences in monthly expenditures across categories.
To help determine where the biggest differences in mean might be occurring, we conducted a Tukey range test. The lowest ‘p adj’ values indicate the biggest difference in mean and often refer to the Clinton Campaign (HRCCamp) and any of the PAC groups.
Our study of the 2016 presidential election campaign revealed a bit about the nature of expenditures related to the two major party candidates. We looked at subgroups representing the campaigns and PACs that spent to either support or oppose a candidate.
The majority of our lessons learned about the nuances of the data came from our exploratory data preparation process. Finding outliers like the $50mil donation caused us to find additional duplicates (amended records), learn more about the many things that could be considered a campaign expenditure, and illuminate the inconsistent nature of the self-reported data.
We set out to show expenditures across PACs and campaigns and determine whether there are differences in the average monthly expenditures for each group type we reviewed. Using ANOVA and a Tukey test were able to determine which means were most alike and which were most different.
A few other approaches that could be used to analyze the same problem include: