This project will examine the expenditures of independent groups supporting or opposing electoral candidates as reported to the Federal Election Commission (FEC) for elections from 2010 to 2020.
These groups, commonly called Political Action Committees (PACs), may be formed by individuals, corporations, and organizations, including political parties. They can advocate for issues as well as support or oppose candidates for office. The FEC requires that when they take actions supporting or opposing a candidate, they operate independently of any candidates’ campaign. They must also report to the FEC all expenditures they make advocating in regard to specific candidates. The FEC terms these as “indpendent expenditures”. For each election cycle, the FEC extracts and gathers the data of each independent expenditure and makes it available to download.
Bear in mind that PAC’s are not required to report expenditures for purposes performed not associated with a candidate for office. Additionally, PAC’s have motives to obscure the purposes of their reported expenditures.
The other purpose of this project is to explore the ggalluvial package for constructing flow diagrams that connect flows (alluvia) of continuous variables across strata of multiple categories. In this case, the alluvia will be money and the strata will consist of political affiliations, support and opposition, traditional and Internet enabled political activities.
The FEC collects twenty three comma separated values describing each expenditure. The following are of interest here:
| field | desc | class |
|---|---|---|
| spe_id | Spender ID: PAC uid | Character |
| spe_nam | Spender name | Character |
| ele_typ | Election type | Character |
| can_off | Office | Character |
| cand_par_aff | Party | Character |
| exp_amo | Expenditure amount | Currency |
| exp_dat | Expenditure date | Date |
| rcp_dat | Filing receipt date | Date |
| sup_opp | Support/Oppose | Character |
| pur | Purpose of expenditure | Character |
| fec_election_yr | Two-year period | Date |
| tran_id | expenditure within filing | Character |
| file_num | Filing uid | Character |
| NB: | The tran_id and file_num uniquely identify the expenditure |
load(url("https://github.com/sdutky/mcData110/raw/master/pacExpenditures/alluvialPacTableAndGraphs.rdata"))
cat(firstNAcount.pander,sep = "\n")
| columnNames | naCount | uniqueCount |
|---|---|---|
| Total Rows and Columns | 450039 | 23 |
| cand_id | 36474 | 1907 |
| cand_name | 22 | 4552 |
| spe_id | 0 | 1740 |
| spe_nam | 73 | 2191 |
| ele_type | 166 | 9 |
| can_office_state | 123470 | 63 |
| can_office_dis | 332 | 65 |
| can_office | 252 | 4 |
| cand_pty_aff | 40931 | 13 |
| exp_amo | 13 | 67400 |
| exp_date | 13921 | 2905 |
| agg_amo | 776 | 85895 |
| sup_opp | 171 | 5 |
| pur | 81 | 18102 |
| pay | 565 | 21367 |
| file_num | 0 | 32197 |
| amndt_ind | 0 | 10 |
| tran_id | 0 | 352943 |
| image_num | 0 | 107485 |
| receipt_dat | 0 | 2691 |
| fec_election_yr | 0 | 6 |
| prev_file_num | 401491 | 3444 |
| dissem_dt | 348574 | 1871 |
We will need to drop the rows where either of an expenditure’s amount (exp_amo) or whether it was to support or oppose (sup_opp) a candidate is unknown. We can pigeonhole unknown purposes (pur) as other. The date a filing was received by the FEC (receipt_dat) can substitute for unknown expenditure dates (exp_dat). Nearly 10% of expenditures have missing party affiliations. Where the spender (spe_id) has other expenditures, we can try to use the distributions of the parties they support or oppose to impute the value for expenditures where this missing. The remaining fields don’t particularly concern us.
cat(activityDistribution.pander,sep = "\n")
| activity | count | aggSpend | meanSpend | sdSpend |
|---|---|---|---|---|
| PMedia | 1248 | 244,433,893 | 195,860 | 457,272 |
| 1115 | 23,610,815 | 21,176 | 35,347 | |
| Other | 128 | 9,453,373 | 73,854 | 734,394 |
| Canvass | 805 | 7,618,535 | 9,464 | 43,073 |
| Adv | 155 | 7,151,514 | 46,139 | 180,391 |
| Phone | 540 | 4,617,834 | 8,552 | 27,131 |
| Tv | 25 | 3,760,731 | 150,429 | 276,512 |
| Stuff | 822 | 3,662,051 | 4,455 | 21,330 |
| Staff | 27159 | 2,649,062 | 98 | 907 |
| Data | 207 | 2,391,795 | 11,555 | 21,819 |
| Content | 283 | 1,892,367 | 6,687 | 34,423 |
| Radio | 31 | 1,374,529 | 44,340 | 79,941 |
| Consult | 229 | 1,092,029 | 4,769 | 11,455 |
| Field | 23 | 617,991 | 26,869 | 32,789 |
| Events | 3955 | 592,869 | 150 | 2,363 |
| Web | 139 | 451,448 | 3,248 | 10,902 |
| 41 | 84,891 | 2,071 | 4,572 | |
| Donor | 138 | 79,802 | 578 | 1,409 |
| Outdoor | 6 | 68,972 | 11,495 | 4,390 |
| Mgmt | 402 | 46,065 | 115 | 432 |
| Text | 13 | 19,796 | 1,523 | 1,621 |
| SMedia | 6 | 6,603 | 1,101 | 853 |
| Search | 1 | 2,762 | 2,762 | NA |
| GOTV | 3 | 317 | 106 | 46 |
Cleaning the entire expenditure dataset of 450 thousand row took eighteen hours on my laptop and significant coaxing. I saved the cleaned dataset from which I produced plots, but I used a 50K sample to run the parts of the script that did the cleaning. The script completes in a reasonable time, but I believe the smaller sample introduced some NA’s. I have not tracked down the problems.
We extracted characteristic strings from the purpose of each expenditure to assign them to one or more of these activities. This table breaks out those expenditures that have only a single activity. We will take the expenditures that we classified with multiple activities and generate multiple rows of expenditures each having a single activity. We will apportion the amount of each “new” expenditure by applying a factor derived from a truncated normal sample (eliminating negative amounts) to the amount of the original expenditure.
cat(singleActivityAggSpend.pander,sep = "\n")
| activity | count | kDollar |
|---|---|---|
| PMedia | 2114 | $280450.27K |
| Tv | 808 | $94659.79K |
| 1600 | $25644.89K | |
| Canvass | 1103 | $10987.62K |
| Content | 2104 | $10223.20K |
| Phone | 872 | $5928.58K |
| Stuff | 1227 | $5859.88K |
| Data | 917 | $5369.05K |
| Radio | 489 | $5204.31K |
| Field | 148 | $2891.66K |
| Staff | 27394 | $2451.96K |
| Consult | 269 | $1213.82K |
| Other | 93 | $920.68K |
| Events | 4157 | $759.03K |
| 155 | $144.63K | |
| Outdoor | 22 | $93.97K |
| Donor | 195 | $85.02K |
| Mgmt | 428 | $45.75K |
| GOTV | 92 | $24.21K |
| Text | 82 | $22.60K |
| Adv | 3010 | $NAK |
| Search | 43 | $NAK |
| SMedia | 81 | $NAK |
| Web | 1797 | $NAK |
This shows the complete sample (more or less) with single activities apportioned
cat(summaryOfRowsAndNAs.pander,sep = "\n")
| columnNames | naCount | uniqueCount |
|---|---|---|
| Total Rows and Columns | 451772 | 27 |
| cand_id | 22446 | 1873 |
| cand_name | 19 | 3982 |
| spe_id | 0 | 1420 |
| spe_nam | 1 | 1779 |
| ele_type | 0 | 9 |
| can_office_state | 118519 | 63 |
| can_office_dis | 266 | 55 |
| can_office | 188 | 4 |
| cand_pty_aff | 0 | 11 |
| exp_amo | 0 | 147449 |
| exp_date | 0 | 2840 |
| agg_amo | 807 | 71179 |
| sup_opp | 0 | 2 |
| pur | 0 | 14817 |
| pay | 0 | 18759 |
| file_num | 0 | 24935 |
| amndt_ind | 0 | 1 |
| tran_id | 0 | 340043 |
| image_num | 0 | 90698 |
| receipt_dat | 0 | 2500 |
| fec_election_yr | 0 | 6 |
| prev_file_num | 451717 | 4 |
| dissem_dt | 332521 | 1831 |
| uid | 0 | 384624 |
| tags | 0 | 12872 |
| activity | 0 | 25 |
| activityGroup | 0 | 6 |
After the sample run, I imported the entire cleaned expenditure dataset. Only a very few NA’s appear in columns that concern us
cat(expendituresByActivityGroup.pander,sep = "\n")
| activity | activityGroup | count | aggSpend ($K) | meanSpend | sdSpend |
|---|---|---|---|---|---|
| PMedia | Advertising | 20,365 | 2,753,480 | 135,206 | 401,204 |
| Tv | Advertising | 7,470 | 945,257 | 126,540 | 347,299 |
| Adv | Advertising | 26,937 | 656,265 | 24,363 | 114,512 |
| Outreach | 15,458 | 285,978 | 18,500 | 51,338 | |
| Canvass | Outreach | 10,847 | 108,001 | 9,957 | 52,318 |
| Content | Strategy | 20,032 | 65,932 | 3,291 | 16,079 |
| Web | AdvertisingV2 | 16,267 | 64,452 | 3,962 | 17,673 |
| Radio | Advertising | 4,618 | 61,020 | 13,214 | 37,067 |
| Data | Strategy | 8,342 | 50,557 | 6,061 | 18,150 |
| Phone | Outreach | 8,389 | 49,150 | 5,859 | 18,480 |
| Stuff | Outreach | 11,552 | 39,075 | 3,383 | 14,258 |
| Staff | Overhead | 247,730 | 22,092 | 89 | 865 |
| Field | Outreach | 1,362 | 18,736 | 13,756 | 53,668 |
| Consult | Strategy | 2,558 | 13,379 | 5,230 | 17,552 |
| GOTV | Outreach | 915 | 7,403 | 8,091 | 27,474 |
| Events | Outreach | 37,324 | 7,123 | 191 | 2,369 |
| Other | Overhead | 1,130 | 5,572 | 4,931 | 26,471 |
| Outdoor | Advertising | 256 | 3,404 | 13,295 | 67,183 |
| OutreachV2 | 1,565 | 2,409 | 1,539 | 9,245 | |
| Text | OutreachV2 | 772 | 1,019 | 1,319 | 7,138 |
| Donor | Strategy | 1,779 | 770 | 433 | 2,023 |
| Mgmt | Overhead | 3,921 | 444 | 113 | 803 |
| Media | Advertising | 1,013 | 442 | 436 | 2,929 |
| SMedia | AdvertisingV2 | 785 | 177 | 226 | 781 |
| Search | AdvertisingV2 | 385 | 69 | 179 | 905 |
I consolidated the twenty five activities into six activityGroups that can be meaningfully represented on a plot. Outreach consists of connecting with voters, V2 includes email and texting. Advertising is a broadcast medium, V2 includes, social media and Internet search. Strategy combines polling, fundraising (here called Donor so that it can fit on a plot), and consulting. Everything else is consigned to Overhead.
cat(summaryPACexpenditures.pander,sep = "\n")
| For/Against | Target | count | aggSpend ($K) | meanSpend | sdSpend |
|---|---|---|---|---|---|
| Against | REPUBLICAN PARTY | 143,853 | 2,054,377 | 14,281 | 105,278 |
| Against | DEMOCRATIC PARTY | 56,943 | 1,753,791 | 30,799 | 218,979 |
| For | REPUBLICAN PARTY | 49,684 | 760,864 | 15,314 | 101,539 |
| For | DEMOCRATIC PARTY | 197,613 | 506,035 | 2,561 | 30,463 |
| Against | DEMOCRATIC-FARMER-LABOR | 346 | 38,640 | 111,675 | 211,588 |
| Against | INDEPENDENT | 1,652 | 13,095 | 7,927 | 44,229 |
| For | LIBERTARIAN PARTY | 235 | 12,526 | 53,303 | 168,517 |
| For | DEMOCRATIC-FARMER-LABOR | 660 | 9,778 | 14,815 | 57,783 |
| Against | UNKNOWN | 130 | 4,251 | 32,698 | 62,889 |
| For | INDEPENDENT | 201 | 2,899 | 14,422 | 23,363 |
| For | UNKNOWN | 326 | 2,019 | 6,193 | 16,996 |
| Against | NONE | 57 | 1,837 | 32,228 | 73,357 |
| Against | CONSTITUTION PARTY | 14 | 1,340 | 95,713 | 219,706 |
| Against | LIBERTARIAN PARTY | 7 | 396 | 56,551 | 83,693 |
| Against | OTHER | 15 | 129 | 8,605 | 23,606 |
| For | NONE | 19 | 124 | 6,538 | 7,425 |
| For | GREEN PARTY | 12 | 102 | 8,507 | 10,988 |
| For | CONSTITUTION PARTY | 2 | 1 | 749 | 433 |
| For | OTHER | 1 | 0 | 466 | NA |
| For | CONSERVATIVE PARTY | 2 | 0 | 1 | 0 |
The nays have it
cat(summaryPACbyElectionRound.pander,sep = "\n")
| Election Cycle | Round | For/Against | count | aggSpend ($K) | meanSpend | sdSpend |
|---|---|---|---|---|---|---|
| 2020 | general | Against | 51 | 278 | 5,454 | 9,886 |
| 2020 | general | For | 1,014 | 9,903 | 9,767 | 17,483 |
| 2020 | other | Against | 26 | 81 | 3,112 | 3,211 |
| 2020 | other | For | 167 | 2,987 | 17,884 | 59,240 |
| 2020 | primary | Against | 1,865 | 6,614 | 3,547 | 41,404 |
| 2020 | primary | For | 1,372 | 5,255 | 3,830 | 20,199 |
| 2020 | run off | Against | 7 | 285 | 40,767 | 45,707 |
| 2020 | run off | For | 5 | 42 | 8,308 | 8,548 |
| 2020 | special | Against | 186 | 5,855 | 31,480 | 96,821 |
| 2020 | special | For | 201 | 2,407 | 11,977 | 35,223 |
| 2018 | general | Against | 14,379 | 733,458 | 51,009 | 186,954 |
| 2018 | general | For | 23,251 | 206,712 | 8,890 | 57,299 |
| 2018 | other | Against | 513 | 6,856 | 13,365 | 58,718 |
| 2018 | other | For | 1,680 | 6,743 | 4,014 | 21,871 |
| 2018 | primary | Against | 2,939 | 56,368 | 19,179 | 89,028 |
| 2018 | primary | For | 4,217 | 64,726 | 15,349 | 74,312 |
| 2018 | run off | Against | 483 | 12,044 | 24,935 | 84,582 |
| 2018 | run off | For | 550 | 6,075 | 11,045 | 30,670 |
| 2018 | special | Against | 858 | 28,359 | 33,053 | 104,413 |
| 2018 | special | For | 934 | 11,360 | 12,162 | 41,020 |
| 2016 | general | Against | 69,523 | 944,632 | 13,587 | 125,793 |
| 2016 | general | For | 86,962 | 228,646 | 2,629 | 39,993 |
| 2016 | other | Against | 31 | 1,455 | 46,949 | 188,834 |
| 2016 | other | For | 180 | 1,585 | 8,808 | 20,762 |
| 2016 | primary | Against | 4,038 | 119,120 | 29,500 | 120,307 |
| 2016 | primary | For | 11,306 | 268,854 | 23,780 | 150,640 |
| 2016 | run off | Against | 173 | 2,501 | 14,459 | 46,574 |
| 2016 | run off | For | 206 | 922 | 4,477 | 17,753 |
| 2016 | special | Against | 4 | 1 | 361 | 484 |
| 2016 | special | For | 44 | 46 | 1,054 | 2,084 |
| 2014 | general | Against | 40,235 | 504,614 | 12,542 | 75,207 |
| 2014 | general | For | 24,257 | 111,841 | 4,611 | 34,631 |
| 2014 | other | Against | 167 | 6,523 | 39,062 | 106,886 |
| 2014 | other | For | 620 | 5,769 | 9,304 | 29,354 |
| 2014 | primary | Against | 1,945 | 50,132 | 25,775 | 74,947 |
| 2014 | primary | For | 3,386 | 33,860 | 10,000 | 35,651 |
| 2014 | run off | Against | 282 | 7,045 | 24,983 | 91,195 |
| 2014 | run off | For | 549 | 1,527 | 2,782 | 7,796 |
| 2014 | special | Against | 472 | 10,448 | 22,135 | 70,743 |
| 2014 | special | For | 513 | 5,143 | 10,025 | 48,684 |
| 2012 | general | Against | 51,553 | 903,675 | 17,529 | 202,616 |
| 2012 | general | For | 51,536 | 178,875 | 3,471 | 51,792 |
| 2012 | other | Against | 268 | 36,996 | 138,044 | 540,450 |
| 2012 | other | For | 177 | 6,834 | 38,609 | 99,930 |
| 2012 | primary | Against | 2,277 | 78,750 | 34,585 | 170,440 |
| 2012 | primary | For | 3,943 | 50,381 | 12,777 | 65,992 |
| 2012 | run off | Against | 133 | 7,435 | 55,899 | 167,544 |
| 2012 | run off | For | 234 | 3,012 | 12,871 | 43,206 |
| 2012 | special | Against | 125 | 4,377 | 35,018 | 64,050 |
| 2012 | special | For | 100 | 1,494 | 14,943 | 38,417 |
| 2010 | general | Against | 9,776 | 330,396 | 33,797 | 109,317 |
| 2010 | general | For | 26,162 | 60,252 | 2,303 | 19,627 |
| 2010 | other | Against | 19 | 272 | 14,335 | 30,017 |
| 2010 | other | For | 93 | 2,205 | 23,714 | 59,487 |
| 2010 | primary | Against | 471 | 3,239 | 6,876 | 21,778 |
| 2010 | primary | For | 3,769 | 8,855 | 2,350 | 14,903 |
| 2010 | run off | Against | 24 | 151 | 6,286 | 7,574 |
| 2010 | run off | For | 946 | 1,501 | 1,586 | 17,447 |
| 2010 | special | Against | 194 | 5,894 | 30,379 | 55,885 |
| 2010 | special | For | 381 | 6,537 | 17,158 | 48,948 |
proAntiByParty.ggplot
placeholder
Some vocabulary: the horizontal bars are called axes, the categories they contain are called strata. Alluvial are the flows between strata. Where they cross is called a lode. The size of the alluvium at the lode gives its y value. Its thickness between lodes follows aesthetic practicalities.
All told, more than five billion was reported by PAC’s for election rounds between 2010 and 2020. 2020 and Other parties barely register.
proAntiByCampaignActivity.ggplot
placeholder
Using a log scale on the dollars spent makes it difficult to gauge the differences but it keeps all categories on the plots. I could not coerce the ggplot calls to produce a scale on the y-axis. A map distance scale bar calibrated to log intervals would have let a viewer judge the relative thickness of the alluvia.
proAntiByInetAdv.ggplot
placeholder
Again, the log scale prevents categories from getting squeezed off the plot, but over emphasizes their actual size. I can’t explain the y-scale. To get classified as social media, the expenditure’s purpose had to contain facebook, twitter etc. For search, it had to contain search, google, or bing. Web sucked up everything else.
proAntiByInetOutreach.ggplot
placeholder
Same issues with the dangers of log scale.
Cleaning will entail dropping rows where NA’s appear in any of expenditure amount (exp_amo), support/opposition (supp_opp), or election year (fec_election_yr).
Wherever the expenditure date (exp_dat) is missing the filing receipt date (rcp_dat) will be substituted.
Wherever the candidate’s party (cand_par_aff) is NA, we will try to impute it from the PAC’s (spe_id) other expenditures.
Dollar amounts will be adjusted for inflation to constant 2019 dollars. The purpose of each expenditure will be mutated to include codes indicating the activities involved.
qw similar to perl function of same name: split barewords in string into vector, default type character (what=character())
qw<-function (a, ... ) {
if (!hasArg(what)) return(scan(text = a, quiet = TRUE, what=character(),...))
scan(text = a, quiet = TRUE, ...)
#what = character(),
###########################################################
#### examples:
#####
##### vector<-qw(" a b c \"embedded space\" def")
##### vector
##### [1] "a" "b" "c" "embedded space" "def"
#####
##### vector<-qw("1 2 3")
##### vector
##### [1] "1" "2" "3"
#####
##### ... represents arguments to be passed to scan which may or may not work
###########################################################
}
nWc convert a numeric value to a string formatted with commas with an optional number of fractional digits.
nWc<-function(amt,precision=0) format(round(as.numeric(amt), precision), nsmall=precision, big.mark=",",justify = "right")
tallyNAsUniqs will return, by column, the number of NA’s and unique values in a tibble/dataframe. pretty=TRUE returns the skeleton of a knitable table.
tallyNAsUniqs <- function(exp, pretty=FALSE, caption="default") {
# pass argument caption or take the default
require(tidyverse)
# will require(pander) if pretty=TRUE
if (caption=="default") {
caption=paste("Tally of NA's and unique values in",
"all columns of",
deparse(substitute(exp)),
collapse = " ")
}
nas <- exp %>%
ungroup() %>%
summarise_all(~ sum(is.na(.)),~ n_distinct(.)) %>%
#summarise_all(~ n_distinct(.)) %>%
gather()
uniques <- exp %>%
ungroup() %>%
#summarise_all(~ sum(is.na(.)),~ n_distinct(.)) %>%
summarise_all(~ n_distinct(.)) %>%
gather()
nas<-cbind(nas,uniques$value)
names(nas)<-c("columnNames","naCount","uniqueCount")
nrows<-nrow(exp)
ncols<-ncol(exp)
nas<-nas %>%
add_row(columnNames="Total Rows and Columns",
naCount=nrows,
uniqueCount=ncols,
.before = 1)
# specifically for Rmarkdown:
if (pretty) {
#pretty print tally:
require(pander)
panderOptions('table.split.table', Inf)
set.caption(caption)
return(pander_return(nas, style = 'rmarkdown'))
return(nas)
##########################################################
#
#
# To print tally use command cat(nas, sep="\n")
# To knit, enclose in its own chunk:
##### ```{r, results='asis'}
##### cat(nas,"\n")
##### ```
#
#
##########################################################
}
return(nas)
}
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(pander)
library(readxl)
library(httr)
library(digest)
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(ggalluvial)
library(ggthemes)
for election years 2010-2020
urls<-as.character(c(
"https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2010/independent_expenditure_2010.csv",
"https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2012/independent_expenditure_2012.csv",
"https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2014/independent_expenditure_2014.csv",
"https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2016/independent_expenditure_2016.csv",
"https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2018/independent_expenditure_2018.csv",
"https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2020/independent_expenditure_2020.csv"
))
expenditure<-read_csv(urls[1])
for ( i in urls[-1] ) {
a<-read_csv(i)
expenditure<-bind_rows(expenditure,a)
}
# and do it now
firstNAcount.pander<-tallyNAsUniqs(expenditure,pretty = TRUE)
cat(firstNAcount.pander, sep="\n")
[1] “| columnNames | naCount | uniqueCount ||:———————-:|:——-:|:———–:|| Total Rows and Columns | 450039 | 23 || cand_id | 36474 | 1907 || cand_name | 22 | 4552 || spe_id | 0 | 1740 || spe_nam | 73 | 2191 || ele_type | 166 | 9 || can_office_state | 123470 | 63 || can_office_dis | 332 | 65 || can_office | 252 | 4 || cand_pty_aff | 40931 | 13 || exp_amo | 13 | 67400 || exp_date | 13921 | 2905 || agg_amo | 776 | 85895 || sup_opp | 171 | 5 || pur | 81 | 18102 || pay | 565 | 21367 || file_num | 0 | 32197 || amndt_ind | 0 | 10 || tran_id | 0 | 352943 || image_num | 0 | 107485 || receipt_dat | 0 | 2691 || fec_election_yr | 0 | 6 || prev_file_num | 401491 | 3444 || dissem_dt | 348574 | 1871 |: Tally of NA’s and unique values in all columns of expenditure” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA
expenditure<-sample_n(expenditure,50000)
expenditure<-expenditure %>%
# mutate coerce to "s" or "o"
mutate(sup_opp= tolower(sup_opp)) %>%
mutate(sup_opp= gsub("0","o",sup_opp)) %>%
filter(sup_opp=="o"|sup_opp=="s") %>%
drop_na(exp_amo,sup_opp) %>% # no point in dealing with no money or no point of view
mutate(exp_amo=round(exp_amo)) %>%
replace_na(list(pur="other",ele_type="other",pay="unknown")) %>%
# form unique identifier for row
mutate(uid=paste(tran_id,file_num)) %>%
filter(exp_amo>0) %>% # need expenditure amount to be positive
filter(exp_amo!=50000000) %>% # bogus $50M in-kind donation by Ponzi scheme suspect
filter(amndt_ind=="N") # new transactions only, filter ammendments
# replace na in exp_dat with corresponding receipt_dat
a<-is.na(expenditure$exp_date)
expenditure$exp_date[a]<-expenditure$receipt_dat[a]
# convert two digit year 20-NOV-10 --> 20-NOV-2010
a<-gsub("(?:(..)$)","20\\1",expenditure$exp_date,perl = TRUE)
expenditure$exp_date<-parse_date_time(a,"dmy")
Step 1: Get Consumer Price Index table from the Bureau of Labor Statistics
require(readxl)
require(httr)
url <- "https://www.bls.gov/cpi/research-series/allitems.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [https://www.bls.gov/cpi/research-series/allitems.xlsx]
## Date: 2019-12-03 08:23
## Status: 200
## Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
## Size: 16.6 kB
## <ON DISK> /tmp/RtmpeaIOAH/file260e2556c93b.xlsx
d<-read_excel(tf,skip=6,n_max=75)
# extract items of interest from 2008 to 2018
inflationFactor<-d[32:42,c("YEAR","AVG")]
lm<-lm(AVG~YEAR,inflationFactor)
f2019<-predict(lm,data.frame(YEAR=2019))
# add 2019 & f2019 to the factor tibble:
inflationFactor<-rbind(inflationFactor,as_tibble(list(YEAR=2019,AVG=f2019)))
Step 2: Set up helper functions to do conversion:
consDollar<-function(expAmount,expDate,toYear,factor=inflationFactor) {
f<-function(amt,date,toYear) {
# convert one dollar spent on date to toYear constant dollars:
# get from year factor:
fromYear=year(date)
fromFactor=factor$AVG[factor$YEAR==fromYear]
toFactor=factor$AVG[factor$YEAR==toYear]
return(amt*toFactor/fromFactor)
}
# use mapply to convert all vector quantities:
return( mapply(f,expAmount,expDate,toYear) )
}
Step3 perform the conversion to 2019 dollars
expenditure$exp_amo<-consDollar(expenditure$exp_amo,expenditure$exp_date,2019)
activityDistribution<- expenditure %>%
select(activity,exp_amo) %>%
# allow only rows of a single activity
filter(grepl("^[A-Za-z0-9]+$",activity)) %>%
group_by(activity) %>%
summarise(count=n(),aggSpend=sum(exp_amo,na.rm = TRUE),meanSpend=mean(exp_amo),sdSpend=sd(exp_amo)) %>%
arrange(desc(aggSpend))
a<-activityDistribution %>%
mutate(aggSpend=nWc(aggSpend), meanSpend=nWc(meanSpend),sdSpend=nWc(sdSpend))
require(pander)
panderOptions('table.split.table', Inf)
set.caption("Tally and aggregate amounts of single activities")
activityDistribution.pander<-pander_return(a, style = 'rmarkdown')
cat(activityDistribution.pander, sep="\n")
[1] “| activity | count | aggSpend | meanSpend | sdSpend ||:——–:|:—–:|:———–:|:———:|:——-:|| PMedia | 1319 | 272,327,529 | 206,465 | 459,728 || Mail | 1112 | 25,289,886 | 22,743 | 50,358 || Canvass | 743 | 6,700,579 | 9,018 | 40,747 || Adv | 171 | 5,475,692 | 32,022 | 82,587 || Tv | 26 | 4,434,845 | 170,571 | 224,841 || Phone | 560 | 3,982,140 | 7,111 | 17,567 || Stuff | 815 | 3,129,714 | 3,840 | 13,210 || Staff | 27128 | 2,822,474 | 104 | 1,068 || Data | 191 | 2,050,980 | 10,738 | 12,687 || Radio | 44 | 1,633,670 | 37,129 | 44,553 || Content | 243 | 1,330,640 | 5,476 | 10,730 || Other | 154 | 1,160,839 | 7,538 | 45,711 || Field | 35 | 932,087 | 26,631 | 44,637 || Consult | 250 | 908,087 | 3,632 | 6,134 || Events | 4033 | 452,281 | 112 | 1,077 || Web | 139 | 357,067 | 2,569 | 7,163 || Outdoor | 9 | 126,191 | 14,021 | 12,859 || Email | 43 | 117,577 | 2,734 | 4,635 || Donor | 156 | 72,515 | 465 | 821 || Mgmt | 373 | 43,365 | 116 | 361 || Text | 10 | 28,660 | 2,866 | 4,113 || SMedia | 4 | 15,376 | 3,844 | 4,902 || GOTV | 5 | 12,566 | 2,513 | 3,634 || Media | 3 | 10,378 | 3,459 | 5,508 || Search | 1 | 1,278 | 1,278 | NA |: Tally and aggregate amounts of single activities” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA
# generate N samples from a truncated normal distribution between between values a and b
rTruncNorm <- function(N, mean = 0, sd = 1, a = -Inf, b = Inf) {
if (a > b) stop('Error: Truncation range is empty');
U <- runif(N, pnorm(a, mean, sd), pnorm(b, mean, sd));
qnorm(U, mean, sd); }
# generate table of 1000 trunc Norm distributed values for each activity:
trNorm<-matrix(0,nrow=nrow(activityDistribution),ncol=1000)
row.names(trNorm)<-activityDistribution$activity
genTrNorm<-function(activity,mean,sd){
trNorm[activity,1:1000]<<-rTruncNorm(1000,mean=mean,sd=sd,a=0)
TRUE
}
a<- activityDistribution %>%
mutate(a=mapply(genTrNorm,activity,meanSpend,sdSpend)) %>%
filter(FALSE)
## Warning in runif(N, pnorm(a, mean, sd), pnorm(b, mean, sd)): NAs produced
getTrNorm<-function(activity) trNorm[activity,ceiling(runif(1)*1000)]
apportionActs<-function(activityString,exp_amo) {
a<-(strsplit(activityString," "))[[1]]
dist<- activityDistribution %>%
filter(activity %in% c(a) ) %>%
mutate(ct=n()) %>%
mutate(fracAmo=0) %>%
mutate(randomTruncatedNormal=getTrNorm(activity)) %>%
#mutate(randomTruncatedNormal=rTruncNorm(ct,mean=meanSpend,sd=sdSpend,a=0)) %>%
mutate(fracAmo=exp_amo*randomTruncatedNormal/sum(randomTruncatedNormal))
return(dist[,c("activity","fracAmo")])
}
# initialize tibble to hold apportioned rows
applyAppor<-function(Row) {
Row<-as.data.frame(Row,stringsAsFactors = FALSE)
appo<-apportionActs(as.character(Row$activity),Row$exp_amo)
for (i in 1:nrow(appo)) {
Row$activity<-appo$activity[i]
Row$exp_amo<-appo$fracAmo[i]
rowBoundApportions<<-rbind(rowBoundApportions,Row)
}
Row[-1,]
}
rowBoundApportions<-expenditure[1,]
rowBoundApportions<-rowBoundApportions[-1,]
# generate apportioned rows, add them to rowBoundApportions
discard <- expenditure %>%
filter(grepl(" ",activity)) %>%
rowwise() %>%
do(applyAppor(.)) %>%
filter(FALSE)
# add the apportioned rows to expenditures
expenditure<-rbind(expenditure,rowBoundApportions)
# drop the rows containing spaces that were apportioned
expenditure<-expenditure %>%
filter(!grepl(" ",activity))
# identify PACs (spe_id's) with expenditures for unspecified party affiliations
spesWoPartyAffiation<- expenditure %>%
filter(is.na(cand_pty_aff)) %>%
group_by(spe_id) %>%
summarise(ct=n())
# for these, find their expenditures where party affiliation is specified, then summarize
spesWpartyAffiliation<- expenditure %>%
filter(!is.na(cand_pty_aff), spe_id %in% spesWoPartyAffiation$spe_id) %>%
select(cand_pty_aff, spe_id,sup_opp, ele_type) %>%
group_by(spe_id,sup_opp,ele_type,cand_pty_aff) %>%
summarise(ct=n()) %>%
arrange(spe_id,sup_opp,ele_type,cand_pty_aff,ct)
# Function for imputing party affiliation
imputePartyAffiliation<-function(candPtyAff,speId,suppOpp,eleType ) {
# from spender id, support/opposition, party affiliation != NA construct frequency table for spender:
spenderFreq<- spesWpartyAffiliation %>%
filter(spe_id==speId, sup_opp==suppOpp, ele_type==eleType) %>%
group_by(cand_pty_aff) %>%
summarise(ct=sum(ct)) %>%
arrange(ct)
if (nrow(spenderFreq) == 0) return(NA) # no other affiliations
if (nrow(spenderFreq) == 1) return(spenderFreq$cand_pty_aff[1]) # only one part affiliation
# PAC supports/oppose candidates of multiple parties:
# pick one from a probability weighted summary of PAC's parties
bins<-c(0,spenderFreq$ct)[1:nrow(spenderFreq)]
i<-findInterval(runif(1)*sum(spenderFreq$ct),bins)
# return imputed party affiliation
return(spenderFreq$cand_pty_aff[i])
}
mapplyImputePartyAffiliation<-function(cand_pty_aff, spe_id,sup_opp,ele_type) {
mapply(imputePartyAffiliation,cand_pty_aff, spe_id,sup_opp,ele_type)
}
affils <- expenditure %>%
filter(is.na(cand_pty_aff))
affils <- affils %>%
mutate(cand_pty_aff= mapplyImputePartyAffiliation(cand_pty_aff, spe_id,sup_opp,ele_type)) %>%
#mutate(cand_pty_aff= mapply(imputePartyAffiliation,cand_pty_aff, spe_id,sup_opp,ele_type)) %>%
drop_na(cand_pty_aff)
expenditure<-rbind(expenditure,affils)
expenditure<-expenditure %>% drop_na(cand_pty_aff)
# summarise activities:
a<- expenditure %>%
select(activity,exp_amo) %>%
group_by(activity) %>%
summarise(count=n(),spend=sum(exp_amo)) %>%
arrange(desc(spend)) %>%
mutate(kDollar=sprintf("$%0.2fK",spend/1000)) %>%
select(activity,count,kDollar)
panderOptions('table.split.table', Inf)
set.caption("Tally and aggregate amounts of single activities")
singleActivityAggSpend.pander<-pander_return(a, style = 'rmarkdown')
cat(singleActivityAggSpend.pander, sep="\n")
[1] “| activity | count | kDollar ||:——–:|:—–:|:———–:|| PMedia | 2252 | $319036.50K || Tv | 803 | $104953.40K || Mail | 1680 | $29750.41K || Content | 2115 | $10104.95K || Canvass | 1104 | $9192.28K || Radio | 481 | $6528.01K || Data | 938 | $5566.16K || Phone | 913 | $5307.36K || Stuff | 1220 | $4118.92K || Staff | 27371 | $2581.45K || Field | 150 | $1666.47K || Consult | 329 | $1243.06K || Other | 120 | $1053.55K || Events | 4269 | $572.92K || Outdoor | 26 | $310.22K || GOTV | 95 | $174.53K || Email | 150 | $155.12K || Text | 75 | $109.18K || Donor | 225 | $78.10K || Mgmt | 389 | $42.53K || Media | 103 | $33.61K || Adv | 2904 | $NAK || Search | 40 | $NAK || SMedia | 73 | $NAK || Web | 1791 | $NAK |: Tally and aggregate amounts of single activities” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA
a<-sort(unique(expenditure$activity))
names(a)<-a
a[qw(" GOTV Events Field Phone Stuff Mail Canvass")]<-"Outreach"
a[qw("Text Email")]<-"OutreachV2"
a[qw("Radio PMedia Tv Adv Outdoor Media")]<-"Advertising"
a[qw("Web SMedia Search")]<-"AdvertisingV2"
a[qw("Data Consult Donor Content")]<-"Strategy"
a[qw("Staff Other Mgmt")]<-"Overhead"
expenditure[,"activityGroup"]<-a[expenditure$activity]
load(url("https://github.com/sdutky/mcData110/raw/master/pacExpenditures/expenditure.rdata"))
Let’s see what we have got:
# look at the NA's:
summaryOfRowsAndNAs.pander<-tallyNAsUniqs(expenditure,pretty=TRUE,
caption="Summary of Row and NA's in cleaned expenditure\nMost of NA's are in rows that don't matter")
cat(summaryOfRowsAndNAs.pander, sep="\n")
[1] “| columnNames | naCount | uniqueCount ||:———————-:|:——-:|:———–:|| Total Rows and Columns | 451772 | 27 || cand_id | 22446 | 1873 || cand_name | 19 | 3982 || spe_id | 0 | 1420 || spe_nam | 1 | 1779 || ele_type | 0 | 9 || can_office_state | 118519 | 63 || can_office_dis | 266 | 55 || can_office | 188 | 4 || cand_pty_aff | 0 | 11 || exp_amo | 0 | 147449 || exp_date | 0 | 2840 || agg_amo | 807 | 71179 || sup_opp | 0 | 2 || pur | 0 | 14817 || pay | 0 | 18759 || file_num | 0 | 24935 || amndt_ind | 0 | 1 || tran_id | 0 | 340043 || image_num | 0 | 90698 || receipt_dat | 0 | 2500 || fec_election_yr | 0 | 6 || prev_file_num | 451717 | 4 || dissem_dt | 332521 | 1831 || uid | 0 | 384624 || tags | 0 | 12872 || activity | 0 | 25 || activityGroup | 0 | 6 |: Summary of Row and NA’s in cleaned expenditureof NA’s are in rows that don’t matter” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA
a<-expenditure %>%
group_by(activity, activityGroup) %>%
summarise(count=nWc(n(),0),
aggOrder=sum(exp_amo),
aggSpend=nWc(sum(exp_amo)/1000,0),
meanSpend=nWc(mean(exp_amo),0),
sdSpend=nWc(sd(exp_amo),0)) %>%
arrange(desc(aggOrder)) %>%
select(activity,activityGroup,count,aggSpend,meanSpend,sdSpend)
names(a)<-qw("activity activityGroup count \"aggSpend ($K)\" meanSpend sdSpend")
panderOptions('table.split.table', Inf)
set.caption("Summary of Independent Expenditures by Activity Group elections 2010-2020\n(constant 2019 dollars)")
expendituresByActivityGroup.pander<- pander_return(a, style = 'rmarkdown')
cat(expendituresByActivityGroup.pander, sep="\n")
[1] “| activity | activityGroup | count | aggSpend ($K) | meanSpend | sdSpend ||:——–:|:————-:|:——-:|:————-:|:———:|:——-:|| PMedia | Advertising | 20,365 | 2,753,480 | 135,206 | 401,204 || Tv | Advertising | 7,470 | 945,257 | 126,540 | 347,299 || Adv | Advertising | 26,937 | 656,265 | 24,363 | 114,512 || Mail | Outreach | 15,458 | 285,978 | 18,500 | 51,338 || Canvass | Outreach | 10,847 | 108,001 | 9,957 | 52,318 || Content | Strategy | 20,032 | 65,932 | 3,291 | 16,079 || Web | AdvertisingV2 | 16,267 | 64,452 | 3,962 | 17,673 || Radio | Advertising | 4,618 | 61,020 | 13,214 | 37,067 || Data | Strategy | 8,342 | 50,557 | 6,061 | 18,150 || Phone | Outreach | 8,389 | 49,150 | 5,859 | 18,480 || Stuff | Outreach | 11,552 | 39,075 | 3,383 | 14,258 || Staff | Overhead | 247,730 | 22,092 | 89 | 865 || Field | Outreach | 1,362 | 18,736 | 13,756 | 53,668 || Consult | Strategy | 2,558 | 13,379 | 5,230 | 17,552 || GOTV | Outreach | 915 | 7,403 | 8,091 | 27,474 || Events | Outreach | 37,324 | 7,123 | 191 | 2,369 || Other | Overhead | 1,130 | 5,572 | 4,931 | 26,471 || Outdoor | Advertising | 256 | 3,404 | 13,295 | 67,183 || Email | OutreachV2 | 1,565 | 2,409 | 1,539 | 9,245 || Text | OutreachV2 | 772 | 1,019 | 1,319 | 7,138 || Donor | Strategy | 1,779 | 770 | 433 | 2,023 || Mgmt | Overhead | 3,921 | 444 | 113 | 803 || Media | Advertising | 1,013 | 442 | 436 | 2,929 || SMedia | AdvertisingV2 | 785 | 177 | 226 | 781 || Search | AdvertisingV2 | 385 | 69 | 179 | 905 |: Summary of Independent Expenditures by Activity Group elections 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA
supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")
a<-expenditure %>%
group_by(cand_pty_aff, sup_opp) %>%
summarise(count=nWc(n(),0),
aggOrder=sum(exp_amo),
aggSpend=nWc(sum(exp_amo)/1000,0),
meanSpend=nWc(mean(exp_amo),0),
sdSpend=nWc(sd(exp_amo),0)) %>%
arrange(desc(aggOrder)) %>%
mutate(oS=supOppRecode[sup_opp]) %>%
select(oS,cand_pty_aff,count,aggSpend,meanSpend,sdSpend)
names(a)<-qw("For/Against Target count \"aggSpend ($K)\" meanSpend sdSpend")
panderOptions('table.split.table', Inf)
set.caption("Summary of PAC Expenditures U.S. elections 2010-2020\n(constant 2019 dollars)")
summaryPACexpenditures.pander<-pander_return(a, style = 'rmarkdown')
cat(summaryPACexpenditures.pander, sep="\n")
[1] “| For/Against | Target | count | aggSpend ($K) | meanSpend | sdSpend ||:———–:|:———————–:|:——-:|:————-:|:———:|:——-:|| Against | REPUBLICAN PARTY | 143,853 | 2,054,377 | 14,281 | 105,278 || Against | DEMOCRATIC PARTY | 56,943 | 1,753,791 | 30,799 | 218,979 || For | REPUBLICAN PARTY | 49,684 | 760,864 | 15,314 | 101,539 || For | DEMOCRATIC PARTY | 197,613 | 506,035 | 2,561 | 30,463 || Against | DEMOCRATIC-FARMER-LABOR | 346 | 38,640 | 111,675 | 211,588 || Against | INDEPENDENT | 1,652 | 13,095 | 7,927 | 44,229 || For | LIBERTARIAN PARTY | 235 | 12,526 | 53,303 | 168,517 || For | DEMOCRATIC-FARMER-LABOR | 660 | 9,778 | 14,815 | 57,783 || Against | UNKNOWN | 130 | 4,251 | 32,698 | 62,889 || For | INDEPENDENT | 201 | 2,899 | 14,422 | 23,363 || For | UNKNOWN | 326 | 2,019 | 6,193 | 16,996 || Against | NONE | 57 | 1,837 | 32,228 | 73,357 || Against | CONSTITUTION PARTY | 14 | 1,340 | 95,713 | 219,706 || Against | LIBERTARIAN PARTY | 7 | 396 | 56,551 | 83,693 || Against | OTHER | 15 | 129 | 8,605 | 23,606 || For | NONE | 19 | 124 | 6,538 | 7,425 || For | GREEN PARTY | 12 | 102 | 8,507 | 10,988 || For | CONSTITUTION PARTY | 2 | 1 | 749 | 433 || For | OTHER | 1 | 0 | 466 | NA || For | CONSERVATIVE PARTY | 2 | 0 | 1 | 0 |: Summary of PAC Expenditures U.S. elections 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA
# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)] <-"other"
supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")
a<-expenditure %>%
mutate(target=partyRecode[cand_pty_aff]) %>%
mutate(oS=supOppRecode[sup_opp]) %>%
mutate(cycle=as.character(fec_election_yr)) %>%
group_by(cycle,oS,target) %>%
#group_by(target, sup_opp) %>%
summarise(
aggSpend=sum(exp_amo)
) %>%
#filter(target=="Dem") %>%
select(cycle,oS,target,aggSpend) %>%
arrange(cycle,oS,target)
# ggplot it:
proAntiByParty.ggplot <- ggplot(as.data.frame(a),
aes(y = aggSpend,
axis1 = cycle, axis2 = oS, axis3 = target)) +
#geom_alluvium(aes(fill =cycle))+ #, width = -1, knot.pos = 1/10, reverse = FALSE) +
geom_alluvium(aes(fill =cycle) , width = 1/6, knot.pos = 1/6, reverse = FALSE) +
guides(fill = FALSE) +
geom_stratum(width = 1/5, reverse = FALSE) +
geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","For/Against", "Target")) +
scale_y_continuous(name="total spent")+
#scale_y_continuous(trans = "log10",name="total spent") +
#scale_y_log10(name="total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
coord_flip() +
theme_economist()+
ggtitle("Money Flows for U.S. Elections 2010-2020")
proAntiByParty.ggplot
# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)] <-"Other"
supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")
# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr
a<-expenditure %>%
#mutate(activityGroup=agRecode[activityGroup]) %>%
mutate(target=partyRecode[cand_pty_aff]) %>%
mutate(oS=supOppRecode[sup_opp]) %>%
mutate(proAnti=paste(oS,target,sep="")) %>%
mutate(cycle=as.character(fec_election_yr)) %>%
group_by(cycle,proAnti,activityGroup) %>%
summarise(
aggSpend=sum(exp_amo)
) %>%
filter(aggSpend>1) %>%
filter(cycle %in% qw( "2010 2018")) %>%
select(cycle,proAnti,activityGroup,aggSpend) %>%
arrange(cycle,proAnti,activityGroup)
# ggplot it:
proAntiByCampaignActivity.ggplot <- ggplot(as.data.frame(a),
aes(y = aggSpend,
axis1 = cycle, axis3 = proAnti, axis2 = activityGroup)) +
#geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
guides(fill = FALSE) +
geom_stratum(width = 1/5, reverse = FALSE) +
#geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
#geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
#geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
#geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
#scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means", "Pro/Anti")) +
scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means", "Pro/Anti")) +
#scale_y_continuous(name="total spent")+
#scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
#coord_trans(y="log10")+
#scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
#scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
#coord_flip() +
theme_economist()+
ggtitle("Money Flows for U.S. Elections By Activity 2010 & 2018")
proAntiByCampaignActivity.ggplot
# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)] <-"Other"
supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")
# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr
a<-expenditure %>%
filter(activityGroup=="AdvertisingV2") %>%
mutate(target=partyRecode[cand_pty_aff]) %>%
mutate(oS=supOppRecode[sup_opp]) %>%
mutate(proAnti=paste(oS,target,sep="")) %>%
mutate(cycle=as.character(fec_election_yr)) %>%
group_by(cycle,proAnti,activity) %>%
summarise(
aggSpend=sum(exp_amo)
) %>%
filter(aggSpend>1) %>%
filter(cycle %in% qw( "2010 2018")) %>%
select(cycle,proAnti,activity,aggSpend) %>%
arrange(cycle,proAnti,activity)
# ggplot it:
proAntiByInetAdv.ggplot <- ggplot(as.data.frame(a),
aes(y = aggSpend,
axis1 = cycle, axis3 = proAnti, axis2 = activity)) +
#geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
guides(fill = FALSE) +
geom_stratum(width = 1/5, reverse = FALSE) +
#geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
#geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
#geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
#geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
#scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means", "Pro/Anti")) +
scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means", "Pro/Anti")) +
#scale_y_continuous(name="total spent")+
#scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
#coord_trans(y="log10")+
#scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
#scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
#coord_flip() +
theme_economist()+
ggtitle("Money Flows for U.S. Elections By Internet Advertising 2010-2020")
proAntiByInetAdv.ggplot
# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)] <-"Other"
supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")
# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr
a<-expenditure %>%
filter(activityGroup=="OutreachV2") %>%
mutate(target=partyRecode[cand_pty_aff]) %>%
mutate(oS=supOppRecode[sup_opp]) %>%
mutate(proAnti=paste(oS,target,sep="")) %>%
mutate(cycle=as.character(fec_election_yr)) %>%
group_by(cycle,proAnti,activity) %>%
summarise(
aggSpend=sum(exp_amo)
) %>%
filter(aggSpend>1) %>%
filter(cycle %in% qw( "2010 2018")) %>%
select(cycle,proAnti,activity,aggSpend) %>%
arrange(cycle,proAnti,activity)
# ggplot it:
proAntiByInetOutreach.ggplot <- ggplot(as.data.frame(a),
aes(y = aggSpend,
axis1 = cycle, axis3 = proAnti, axis2 = activity)) +
#geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
guides(fill = FALSE) +
geom_stratum(width = 1/5, reverse = FALSE) +
#geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
#geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
#geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
#geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
#scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means", "Pro/Anti")) +
scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means", "Pro/Anti")) +
#scale_y_continuous(name="total spent")+
#scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
#coord_trans(y="log10")+
#scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
#scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
#coord_flip() +
theme_economist()+
ggtitle("Money Flows for U.S. Elections By Internet Outreach 2010-2020")
proAntiByInetOutreach.ggplot
affils <- expenditure %>%
select(spe_id,ele_type,cand_pty_aff,sup_opp) %>%
group_by(spe_id,ele_type,sup_opp,cand_pty_aff ) %>%
summarize(ct=n())
naAffils<-affils %>% filter(is.na(cand_pty_aff)) %>% group_by(spe_id) %>% summarise()
affils<-affils %>% filter(spe_id %in% naAffils$spe_id) %>% select(everything())
oneOrMoreAffils <- affils %>% filter(!is.na(cand_pty_aff)) %>% group_by(spe_id) %>% summarise()
affils<-affils %>% filter(spe_id %in% oneOrMoreAffils$spe_id ) %>% select(everything())
nasImputable<- affils %>% filter(is.na(cand_pty_aff)) %>% group_by(sup_opp) %>% summarise(sum(ct))
nasSum<-expenditure %>% select(everything()) %>% filter(is.na(cand_pty_aff)) %>% group_by(sup_opp) %>% summarise(ct=n())
b<-grep("internet|digital|onlin|web",a$tags,value = TRUE)
b<-grep("print",a$tags,value = TRUE)
b<-grep("survey|research|poll( |ing)|data",a$tags,value = TRUE)
b<-grep("canv|door[ ]*to",a$tags,value = TRUE)
b<-grep("[A-Z]",a$tags,value = TRUE)
b<-grep("phone|calls",a$tags,value = TRUE)
#b<-grep("",a$tags,value = TRUE)
view(b)
b<- a %>%
select(cand_id,cand_name,cand_pty_aff,activity) %>%
filter(is.na(cand_pty_aff) & !is.na(cand_name)) %>%
group_by(cand_name) %>%
summarise(ct=n()) %>%
arrange(desc(ct))
f<-function(a) a %in% b$cand_name
c<- a %>%
select(cand_id,cand_name,cand_pty_aff,activity) %>%
filter( cand_name %in% b$cand_name) %>%
group_by(cand_name, cand_pty_aff) %>%
summarise(ct=n())
#c<-sort(unique(b$cand_name))
view(c)
headers <- as_tibble(data.frame(matrix(nrow=0,ncol=50)))
colnames(headers)<-paste("field",1:50,sep="")
for ( i in urls ) {
a<-read.csv(i,nrows=1, header = FALSE)
a[1,(length(a)+1):50]<-NA
names(a)=paste("field",1:50,sep="")
headers<-rbind(headers,a)
}
summary(headers) # Summary above shows all column headers identical in all urls
require(digest)
md5<-function(a) {
data.frame(md5=digest(a,algo="md5",serialize = FALSE),stringsAsFactors = FALSE)
}
cksums<- expenditure %>%
rowwise() %>%
do(md5(paste(.,collapse = " ")))
ckSumCount <- cksums %>%
group_by(md5) %>%
summarise(ct=n()) %>%
filter(ct>1)
cat(paste("\n",nrow(ckSumCount)," duplicate rows found", collapse=" "))
a<-ungroup(expenditure[1:100,])
c<-a[1,]
c<-c[-1,]
prt<-function(a) {
c<<-rbind(c,a)
TRUE
}
b<- expenditure %>%
filter(is.na(cand_pty_aff)) %>%
mutate(prt(as.data.frame(.)))
speId<-"C00000935"; suppOpp<-"o"; eleType<-"G"
spenderFreq<- spesWpartyAffiliation %>%
filter(spe_id==speId, sup_opp==suppOpp, ele_type==eleType) %>%
group_by(cand_pty_aff) %>%
summarise(ct=sum(ct)) %>%
arrange(ct)
a<-expenditure %>%
select(exp_amo,pay,spe_nam,sup_opp,cand_pty_aff) %>%
group_by(pay,sup_opp,cand_pty_aff) %>%
summarise(total=sum(exp_amo)) %>%
arrange(desc(total),pay,sup_opp,cand_pty_aff)
view(a)
supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")
electionRoundRecode<-qw("other other general other other primary \"run off\" special other")
names(electionRoundRecode)<-qw("2 C G O other P R S X")
a<-expenditure %>%
select(fec_election_yr,ele_type,sup_opp,exp_amo) %>%
mutate(sup_opp=supOppRecode[sup_opp]) %>%
mutate(ele_type=electionRoundRecode[ele_type])
#select(-ele_type,-sup_opp)
a<-a %>%
group_by(fec_election_yr,ele_type,sup_opp) %>%
summarise(count=nWc(n(),0),
aggOrder=sum(exp_amo),
aggSpend=nWc(sum(exp_amo)/1000,0),
meanSpend=nWc(mean(exp_amo),0),
sdSpend=nWc(sd(exp_amo),0)) %>%
arrange(desc(fec_election_yr)) %>%
select(fec_election_yr,ele_type,sup_opp,count,aggSpend,meanSpend,sdSpend)
names(a)<-qw("\"Election Cycle\" Round For/Against count \"aggSpend ($K)\" meanSpend sdSpend")
panderOptions('table.split.table', Inf)
set.caption("Summary of PAC Expenditures by Election Round 2010-2020\n(constant 2019 dollars)")
summaryPACbyElectionRound.pander<-pander_return(a, style = 'rmarkdown')
cat(summaryPACbyElectionRound.pander, sep="\n")
[1] “| Election Cycle | Round | For/Against | count | aggSpend ($K) | meanSpend | sdSpend ||:————–:|:——-:|:———–:|:——:|:————-:|:———:|:——-:|| 2020 | general | Against | 51 | 278 | 5,454 | 9,886 || 2020 | general | For | 1,014 | 9,903 | 9,767 | 17,483 || 2020 | other | Against | 26 | 81 | 3,112 | 3,211 || 2020 | other | For | 167 | 2,987 | 17,884 | 59,240 || 2020 | primary | Against | 1,865 | 6,614 | 3,547 | 41,404 || 2020 | primary | For | 1,372 | 5,255 | 3,830 | 20,199 || 2020 | run off | Against | 7 | 285 | 40,767 | 45,707 || 2020 | run off | For | 5 | 42 | 8,308 | 8,548 || 2020 | special | Against | 186 | 5,855 | 31,480 | 96,821 || 2020 | special | For | 201 | 2,407 | 11,977 | 35,223 || 2018 | general | Against | 14,379 | 733,458 | 51,009 | 186,954 || 2018 | general | For | 23,251 | 206,712 | 8,890 | 57,299 || 2018 | other | Against | 513 | 6,856 | 13,365 | 58,718 || 2018 | other | For | 1,680 | 6,743 | 4,014 | 21,871 || 2018 | primary | Against | 2,939 | 56,368 | 19,179 | 89,028 || 2018 | primary | For | 4,217 | 64,726 | 15,349 | 74,312 || 2018 | run off | Against | 483 | 12,044 | 24,935 | 84,582 || 2018 | run off | For | 550 | 6,075 | 11,045 | 30,670 || 2018 | special | Against | 858 | 28,359 | 33,053 | 104,413 || 2018 | special | For | 934 | 11,360 | 12,162 | 41,020 || 2016 | general | Against | 69,523 | 944,632 | 13,587 | 125,793 || 2016 | general | For | 86,962 | 228,646 | 2,629 | 39,993 || 2016 | other | Against | 31 | 1,455 | 46,949 | 188,834 || 2016 | other | For | 180 | 1,585 | 8,808 | 20,762 || 2016 | primary | Against | 4,038 | 119,120 | 29,500 | 120,307 || 2016 | primary | For | 11,306 | 268,854 | 23,780 | 150,640 || 2016 | run off | Against | 173 | 2,501 | 14,459 | 46,574 || 2016 | run off | For | 206 | 922 | 4,477 | 17,753 || 2016 | special | Against | 4 | 1 | 361 | 484 || 2016 | special | For | 44 | 46 | 1,054 | 2,084 || 2014 | general | Against | 40,235 | 504,614 | 12,542 | 75,207 || 2014 | general | For | 24,257 | 111,841 | 4,611 | 34,631 || 2014 | other | Against | 167 | 6,523 | 39,062 | 106,886 || 2014 | other | For | 620 | 5,769 | 9,304 | 29,354 || 2014 | primary | Against | 1,945 | 50,132 | 25,775 | 74,947 || 2014 | primary | For | 3,386 | 33,860 | 10,000 | 35,651 || 2014 | run off | Against | 282 | 7,045 | 24,983 | 91,195 || 2014 | run off | For | 549 | 1,527 | 2,782 | 7,796 || 2014 | special | Against | 472 | 10,448 | 22,135 | 70,743 || 2014 | special | For | 513 | 5,143 | 10,025 | 48,684 || 2012 | general | Against | 51,553 | 903,675 | 17,529 | 202,616 || 2012 | general | For | 51,536 | 178,875 | 3,471 | 51,792 || 2012 | other | Against | 268 | 36,996 | 138,044 | 540,450 || 2012 | other | For | 177 | 6,834 | 38,609 | 99,930 || 2012 | primary | Against | 2,277 | 78,750 | 34,585 | 170,440 || 2012 | primary | For | 3,943 | 50,381 | 12,777 | 65,992 || 2012 | run off | Against | 133 | 7,435 | 55,899 | 167,544 || 2012 | run off | For | 234 | 3,012 | 12,871 | 43,206 || 2012 | special | Against | 125 | 4,377 | 35,018 | 64,050 || 2012 | special | For | 100 | 1,494 | 14,943 | 38,417 || 2010 | general | Against | 9,776 | 330,396 | 33,797 | 109,317 || 2010 | general | For | 26,162 | 60,252 | 2,303 | 19,627 || 2010 | other | Against | 19 | 272 | 14,335 | 30,017 || 2010 | other | For | 93 | 2,205 | 23,714 | 59,487 || 2010 | primary | Against | 471 | 3,239 | 6,876 | 21,778 || 2010 | primary | For | 3,769 | 8,855 | 2,350 | 14,903 || 2010 | run off | Against | 24 | 151 | 6,286 | 7,574 || 2010 | run off | For | 946 | 1,501 | 1,586 | 17,447 || 2010 | special | Against | 194 | 5,894 | 30,379 | 55,885 || 2010 | special | For | 381 | 6,537 | 17,158 | 48,948 |: Summary of PAC Expenditures by Election Round 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA