Purpose: Describe the Money Flows of Independent Political Campaigns

This project will examine the expenditures of independent groups supporting or opposing electoral candidates as reported to the Federal Election Commission (FEC) for elections from 2010 to 2020.

These groups, commonly called Political Action Committees (PACs), may be formed by individuals, corporations, and organizations, including political parties. They can advocate for issues as well as support or oppose candidates for office. The FEC requires that when they take actions supporting or opposing a candidate, they operate independently of any candidates’ campaign. They must also report to the FEC all expenditures they make advocating in regard to specific candidates. The FEC terms these as “indpendent expenditures”. For each election cycle, the FEC extracts and gathers the data of each independent expenditure and makes it available to download.

Bear in mind that PAC’s are not required to report expenditures for purposes performed not associated with a candidate for office. Additionally, PAC’s have motives to obscure the purposes of their reported expenditures.

The other purpose of this project is to explore the ggalluvial package for constructing flow diagrams that connect flows (alluvia) of continuous variables across strata of multiple categories. In this case, the alluvia will be money and the strata will consist of political affiliations, support and opposition, traditional and Internet enabled political activities.

Expenditure Data Items: Cleaning and Preparation

The FEC collects twenty three comma separated values describing each expenditure. The following are of interest here:

Expenditure Data Fields to be Described
field desc class
spe_id Spender ID: PAC uid Character
spe_nam Spender name Character
ele_typ Election type Character
can_off Office Character
cand_par_aff Party Character
exp_amo Expenditure amount Currency
exp_dat Expenditure date Date
rcp_dat Filing receipt date Date
sup_opp Support/Oppose Character
pur Purpose of expenditure Character
fec_election_yr Two-year period Date
tran_id expenditure within filing Character
file_num Filing uid Character
NB: The tran_id and file_num uniquely identify the expenditure

Load and display pre-processed tables and plots from gitHub:

load(url("https://github.com/sdutky/mcData110/raw/master/pacExpenditures/alluvialPacTableAndGraphs.rdata"))

Display the pander tables:

cat(firstNAcount.pander,sep = "\n")
Tally of NA’s and unique values in all columns of expenditure
columnNames naCount uniqueCount
Total Rows and Columns 450039 23
cand_id 36474 1907
cand_name 22 4552
spe_id 0 1740
spe_nam 73 2191
ele_type 166 9
can_office_state 123470 63
can_office_dis 332 65
can_office 252 4
cand_pty_aff 40931 13
exp_amo 13 67400
exp_date 13921 2905
agg_amo 776 85895
sup_opp 171 5
pur 81 18102
pay 565 21367
file_num 0 32197
amndt_ind 0 10
tran_id 0 352943
image_num 0 107485
receipt_dat 0 2691
fec_election_yr 0 6
prev_file_num 401491 3444
dissem_dt 348574 1871

The above table describes the data comprising expenditure as downloaded from the FEC.

We will need to drop the rows where either of an expenditure’s amount (exp_amo) or whether it was to support or oppose (sup_opp) a candidate is unknown. We can pigeonhole unknown purposes (pur) as other. The date a filing was received by the FEC (receipt_dat) can substitute for unknown expenditure dates (exp_dat). Nearly 10% of expenditures have missing party affiliations. Where the spender (spe_id) has other expenditures, we can try to use the distributions of the parties they support or oppose to impute the value for expenditures where this missing. The remaining fields don’t particularly concern us.

cat(activityDistribution.pander,sep = "\n")
Tally and aggregate amounts of single activities
activity count aggSpend meanSpend sdSpend
PMedia 1248 244,433,893 195,860 457,272
Mail 1115 23,610,815 21,176 35,347
Other 128 9,453,373 73,854 734,394
Canvass 805 7,618,535 9,464 43,073
Adv 155 7,151,514 46,139 180,391
Phone 540 4,617,834 8,552 27,131
Tv 25 3,760,731 150,429 276,512
Stuff 822 3,662,051 4,455 21,330
Staff 27159 2,649,062 98 907
Data 207 2,391,795 11,555 21,819
Content 283 1,892,367 6,687 34,423
Radio 31 1,374,529 44,340 79,941
Consult 229 1,092,029 4,769 11,455
Field 23 617,991 26,869 32,789
Events 3955 592,869 150 2,363
Web 139 451,448 3,248 10,902
Email 41 84,891 2,071 4,572
Donor 138 79,802 578 1,409
Outdoor 6 68,972 11,495 4,390
Mgmt 402 46,065 115 432
Text 13 19,796 1,523 1,621
SMedia 6 6,603 1,101 853
Search 1 2,762 2,762 NA
GOTV 3 317 106 46

Cleaning the entire expenditure dataset of 450 thousand row took eighteen hours on my laptop and significant coaxing. I saved the cleaned dataset from which I produced plots, but I used a 50K sample to run the parts of the script that did the cleaning. The script completes in a reasonable time, but I believe the smaller sample introduced some NA’s. I have not tracked down the problems.

We extracted characteristic strings from the purpose of each expenditure to assign them to one or more of these activities. This table breaks out those expenditures that have only a single activity. We will take the expenditures that we classified with multiple activities and generate multiple rows of expenditures each having a single activity. We will apportion the amount of each “new” expenditure by applying a factor derived from a truncated normal sample (eliminating negative amounts) to the amount of the original expenditure.

cat(singleActivityAggSpend.pander,sep = "\n")
Tally and aggregate amounts of single activities
activity count kDollar
PMedia 2114 $280450.27K
Tv 808 $94659.79K
Mail 1600 $25644.89K
Canvass 1103 $10987.62K
Content 2104 $10223.20K
Phone 872 $5928.58K
Stuff 1227 $5859.88K
Data 917 $5369.05K
Radio 489 $5204.31K
Field 148 $2891.66K
Staff 27394 $2451.96K
Consult 269 $1213.82K
Other 93 $920.68K
Events 4157 $759.03K
Email 155 $144.63K
Outdoor 22 $93.97K
Donor 195 $85.02K
Mgmt 428 $45.75K
GOTV 92 $24.21K
Text 82 $22.60K
Adv 3010 $NAK
Search 43 $NAK
SMedia 81 $NAK
Web 1797 $NAK

This shows the complete sample (more or less) with single activities apportioned

cat(summaryOfRowsAndNAs.pander,sep = "\n")
Summary of Row and NA’s in cleaned expenditure Most of NA’s are in rows that don’t matter
columnNames naCount uniqueCount
Total Rows and Columns 451772 27
cand_id 22446 1873
cand_name 19 3982
spe_id 0 1420
spe_nam 1 1779
ele_type 0 9
can_office_state 118519 63
can_office_dis 266 55
can_office 188 4
cand_pty_aff 0 11
exp_amo 0 147449
exp_date 0 2840
agg_amo 807 71179
sup_opp 0 2
pur 0 14817
pay 0 18759
file_num 0 24935
amndt_ind 0 1
tran_id 0 340043
image_num 0 90698
receipt_dat 0 2500
fec_election_yr 0 6
prev_file_num 451717 4
dissem_dt 332521 1831
uid 0 384624
tags 0 12872
activity 0 25
activityGroup 0 6

After the sample run, I imported the entire cleaned expenditure dataset. Only a very few NA’s appear in columns that concern us

cat(expendituresByActivityGroup.pander,sep = "\n")
Summary of Independent Expenditures by Activity Group elections 2010-2020 (constant 2019 dollars)
activity activityGroup count aggSpend ($K) meanSpend sdSpend
PMedia Advertising 20,365 2,753,480 135,206 401,204
Tv Advertising 7,470 945,257 126,540 347,299
Adv Advertising 26,937 656,265 24,363 114,512
Mail Outreach 15,458 285,978 18,500 51,338
Canvass Outreach 10,847 108,001 9,957 52,318
Content Strategy 20,032 65,932 3,291 16,079
Web AdvertisingV2 16,267 64,452 3,962 17,673
Radio Advertising 4,618 61,020 13,214 37,067
Data Strategy 8,342 50,557 6,061 18,150
Phone Outreach 8,389 49,150 5,859 18,480
Stuff Outreach 11,552 39,075 3,383 14,258
Staff Overhead 247,730 22,092 89 865
Field Outreach 1,362 18,736 13,756 53,668
Consult Strategy 2,558 13,379 5,230 17,552
GOTV Outreach 915 7,403 8,091 27,474
Events Outreach 37,324 7,123 191 2,369
Other Overhead 1,130 5,572 4,931 26,471
Outdoor Advertising 256 3,404 13,295 67,183
Email OutreachV2 1,565 2,409 1,539 9,245
Text OutreachV2 772 1,019 1,319 7,138
Donor Strategy 1,779 770 433 2,023
Mgmt Overhead 3,921 444 113 803
Media Advertising 1,013 442 436 2,929
SMedia AdvertisingV2 785 177 226 781
Search AdvertisingV2 385 69 179 905

I consolidated the twenty five activities into six activityGroups that can be meaningfully represented on a plot. Outreach consists of connecting with voters, V2 includes email and texting. Advertising is a broadcast medium, V2 includes, social media and Internet search. Strategy combines polling, fundraising (here called Donor so that it can fit on a plot), and consulting. Everything else is consigned to Overhead.

cat(summaryPACexpenditures.pander,sep = "\n")
Summary of PAC Expenditures U.S. elections 2010-2020 (constant 2019 dollars)
For/Against Target count aggSpend ($K) meanSpend sdSpend
Against REPUBLICAN PARTY 143,853 2,054,377 14,281 105,278
Against DEMOCRATIC PARTY 56,943 1,753,791 30,799 218,979
For REPUBLICAN PARTY 49,684 760,864 15,314 101,539
For DEMOCRATIC PARTY 197,613 506,035 2,561 30,463
Against DEMOCRATIC-FARMER-LABOR 346 38,640 111,675 211,588
Against INDEPENDENT 1,652 13,095 7,927 44,229
For LIBERTARIAN PARTY 235 12,526 53,303 168,517
For DEMOCRATIC-FARMER-LABOR 660 9,778 14,815 57,783
Against UNKNOWN 130 4,251 32,698 62,889
For INDEPENDENT 201 2,899 14,422 23,363
For UNKNOWN 326 2,019 6,193 16,996
Against NONE 57 1,837 32,228 73,357
Against CONSTITUTION PARTY 14 1,340 95,713 219,706
Against LIBERTARIAN PARTY 7 396 56,551 83,693
Against OTHER 15 129 8,605 23,606
For NONE 19 124 6,538 7,425
For GREEN PARTY 12 102 8,507 10,988
For CONSTITUTION PARTY 2 1 749 433
For OTHER 1 0 466 NA
For CONSERVATIVE PARTY 2 0 1 0

The nays have it

cat(summaryPACbyElectionRound.pander,sep = "\n")
Summary of PAC Expenditures by Election Round 2010-2020 (constant 2019 dollars)
Election Cycle Round For/Against count aggSpend ($K) meanSpend sdSpend
2020 general Against 51 278 5,454 9,886
2020 general For 1,014 9,903 9,767 17,483
2020 other Against 26 81 3,112 3,211
2020 other For 167 2,987 17,884 59,240
2020 primary Against 1,865 6,614 3,547 41,404
2020 primary For 1,372 5,255 3,830 20,199
2020 run off Against 7 285 40,767 45,707
2020 run off For 5 42 8,308 8,548
2020 special Against 186 5,855 31,480 96,821
2020 special For 201 2,407 11,977 35,223
2018 general Against 14,379 733,458 51,009 186,954
2018 general For 23,251 206,712 8,890 57,299
2018 other Against 513 6,856 13,365 58,718
2018 other For 1,680 6,743 4,014 21,871
2018 primary Against 2,939 56,368 19,179 89,028
2018 primary For 4,217 64,726 15,349 74,312
2018 run off Against 483 12,044 24,935 84,582
2018 run off For 550 6,075 11,045 30,670
2018 special Against 858 28,359 33,053 104,413
2018 special For 934 11,360 12,162 41,020
2016 general Against 69,523 944,632 13,587 125,793
2016 general For 86,962 228,646 2,629 39,993
2016 other Against 31 1,455 46,949 188,834
2016 other For 180 1,585 8,808 20,762
2016 primary Against 4,038 119,120 29,500 120,307
2016 primary For 11,306 268,854 23,780 150,640
2016 run off Against 173 2,501 14,459 46,574
2016 run off For 206 922 4,477 17,753
2016 special Against 4 1 361 484
2016 special For 44 46 1,054 2,084
2014 general Against 40,235 504,614 12,542 75,207
2014 general For 24,257 111,841 4,611 34,631
2014 other Against 167 6,523 39,062 106,886
2014 other For 620 5,769 9,304 29,354
2014 primary Against 1,945 50,132 25,775 74,947
2014 primary For 3,386 33,860 10,000 35,651
2014 run off Against 282 7,045 24,983 91,195
2014 run off For 549 1,527 2,782 7,796
2014 special Against 472 10,448 22,135 70,743
2014 special For 513 5,143 10,025 48,684
2012 general Against 51,553 903,675 17,529 202,616
2012 general For 51,536 178,875 3,471 51,792
2012 other Against 268 36,996 138,044 540,450
2012 other For 177 6,834 38,609 99,930
2012 primary Against 2,277 78,750 34,585 170,440
2012 primary For 3,943 50,381 12,777 65,992
2012 run off Against 133 7,435 55,899 167,544
2012 run off For 234 3,012 12,871 43,206
2012 special Against 125 4,377 35,018 64,050
2012 special For 100 1,494 14,943 38,417
2010 general Against 9,776 330,396 33,797 109,317
2010 general For 26,162 60,252 2,303 19,627
2010 other Against 19 272 14,335 30,017
2010 other For 93 2,205 23,714 59,487
2010 primary Against 471 3,239 6,876 21,778
2010 primary For 3,769 8,855 2,350 14,903
2010 run off Against 24 151 6,286 7,574
2010 run off For 946 1,501 1,586 17,447
2010 special Against 194 5,894 30,379 55,885
2010 special For 381 6,537 17,158 48,948
proAntiByParty.ggplot
placeholder

placeholder

The first alluvial plot:

Some vocabulary: the horizontal bars are called axes, the categories they contain are called strata. Alluvial are the flows between strata. Where they cross is called a lode. The size of the alluvium at the lode gives its y value. Its thickness between lodes follows aesthetic practicalities.
All told, more than five billion was reported by PAC’s for election rounds between 2010 and 2020. 2020 and Other parties barely register.

proAntiByCampaignActivity.ggplot
placeholder

placeholder

Using a log scale on the dollars spent makes it difficult to gauge the differences but it keeps all categories on the plots. I could not coerce the ggplot calls to produce a scale on the y-axis. A map distance scale bar calibrated to log intervals would have let a viewer judge the relative thickness of the alluvia.

proAntiByInetAdv.ggplot
placeholder

placeholder

Again, the log scale prevents categories from getting squeezed off the plot, but over emphasizes their actual size. I can’t explain the y-scale. To get classified as social media, the expenditure’s purpose had to contain facebook, twitter etc. For search, it had to contain search, google, or bing. Web sucked up everything else.

proAntiByInetOutreach.ggplot
placeholder

placeholder

Same issues with the dangers of log scale.

Cleaning will entail dropping rows where NA’s appear in any of expenditure amount (exp_amo), support/opposition (supp_opp), or election year (fec_election_yr).

Wherever the expenditure date (exp_dat) is missing the filing receipt date (rcp_dat) will be substituted.

Wherever the candidate’s party (cand_par_aff) is NA, we will try to impute it from the PAC’s (spe_id) other expenditures.

Dollar amounts will be adjusted for inflation to constant 2019 dollars. The purpose of each expenditure will be mutated to include codes indicating the activities involved.

utility functions which may hold interest or help to others:

qw similar to perl function of same name: split barewords in string into vector, default type character (what=character())

qw<-function (a, ... ) {
  if (!hasArg(what)) return(scan(text = a, quiet = TRUE, what=character(),...))
  scan(text = a, quiet = TRUE, ...)
#what = character(),
###########################################################
#### examples:
#####
#####   vector<-qw(" a b c \"embedded space\" def")
#####   vector
#####  [1] "a" "b" "c" "embedded space" "def"
#####   
#####   vector<-qw("1 2 3")
#####   vector
#####  [1] "1" "2" "3"
#####
#####  ... represents arguments to be passed to scan which may or may not work
###########################################################

}

nWc convert a numeric value to a string formatted with commas with an optional number of fractional digits.

nWc<-function(amt,precision=0) format(round(as.numeric(amt), precision), nsmall=precision, big.mark=",",justify = "right")

tallyNAsUniqs will return, by column, the number of NA’s and unique values in a tibble/dataframe. pretty=TRUE returns the skeleton of a knitable table.

tallyNAsUniqs <- function(exp, pretty=FALSE, caption="default") {
  
# pass argument caption or take the default  
                 
require(tidyverse)               
# will require(pander) if pretty=TRUE
if (caption=="default")  {
      caption=paste("Tally of NA's and unique values in",
                    "all columns of",
                    deparse(substitute(exp)),
                    collapse = " ") 
}
nas <- exp %>% 
        ungroup() %>%
        summarise_all(~ sum(is.na(.)),~ n_distinct(.)) %>%
        #summarise_all(~ n_distinct(.)) %>%
        gather()

uniques <- exp %>% 
        ungroup() %>%
        #summarise_all(~ sum(is.na(.)),~ n_distinct(.)) %>%
        summarise_all(~ n_distinct(.)) %>%
        gather()

nas<-cbind(nas,uniques$value)

names(nas)<-c("columnNames","naCount","uniqueCount")

nrows<-nrow(exp)
ncols<-ncol(exp)

nas<-nas %>%
  add_row(columnNames="Total Rows and Columns",
          naCount=nrows,
          uniqueCount=ncols,
          .before = 1)

# specifically for Rmarkdown:
if (pretty) { 
  #pretty print tally:
  require(pander)
  panderOptions('table.split.table', Inf)
  set.caption(caption)
  return(pander_return(nas, style = 'rmarkdown'))
  return(nas)
##########################################################
#
#  
# To print tally use command cat(nas, sep="\n")
# To knit, enclose in its own chunk:
#####   ```{r, results='asis'}  
#####       cat(nas,"\n")
#####   ```
#
#
##########################################################
 }

return(nas)
}

load libraries

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(pander)
library(readxl)
library(httr)
library(digest)
library(ggplot2)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(ggalluvial)
library(ggthemes)

Urls to FEC’s PAC independent expenditures

for election years 2010-2020

urls<-as.character(c(
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2010/independent_expenditure_2010.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2012/independent_expenditure_2012.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2014/independent_expenditure_2014.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2016/independent_expenditure_2016.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2018/independent_expenditure_2018.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2020/independent_expenditure_2020.csv"
))

Combine all election years

expenditure<-read_csv(urls[1])

for ( i in urls[-1] ) {
  a<-read_csv(i)
  expenditure<-bind_rows(expenditure,a)
}

Tally the NA’s and unique values in the dataset as downloaded:

# and do it now
firstNAcount.pander<-tallyNAsUniqs(expenditure,pretty = TRUE)
cat(firstNAcount.pander, sep="\n")

[1] “| columnNames | naCount | uniqueCount ||:———————-:|:——-:|:———–:|| Total Rows and Columns | 450039 | 23 || cand_id | 36474 | 1907 || cand_name | 22 | 4552 || spe_id | 0 | 1740 || spe_nam | 73 | 2191 || ele_type | 166 | 9 || can_office_state | 123470 | 63 || can_office_dis | 332 | 65 || can_office | 252 | 4 || cand_pty_aff | 40931 | 13 || exp_amo | 13 | 67400 || exp_date | 13921 | 2905 || agg_amo | 776 | 85895 || sup_opp | 171 | 5 || pur | 81 | 18102 || pay | 565 | 21367 || file_num | 0 | 32197 || amndt_ind | 0 | 10 || tran_id | 0 | 352943 || image_num | 0 | 107485 || receipt_dat | 0 | 2691 || fec_election_yr | 0 | 6 || prev_file_num | 401491 | 3444 || dissem_dt | 348574 | 1871 |: Tally of NA’s and unique values in all columns of expenditure” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Take a 50,000 row sample of expenditure so that knitting can finish this November

expenditure<-sample_n(expenditure,50000)

Handle the simple NA’s: replace, mutate or drop row:

expenditure<-expenditure %>%
  # mutate coerce to "s" or "o"
  mutate(sup_opp= tolower(sup_opp)) %>%
  mutate(sup_opp= gsub("0","o",sup_opp)) %>%
  filter(sup_opp=="o"|sup_opp=="s") %>%
  drop_na(exp_amo,sup_opp) %>% # no point in dealing with no money or no point of view
  mutate(exp_amo=round(exp_amo)) %>%
  replace_na(list(pur="other",ele_type="other",pay="unknown")) %>%
  
  # form unique identifier for row
  mutate(uid=paste(tran_id,file_num)) %>%
  
  filter(exp_amo>0) %>% # need expenditure amount to be positive
  filter(exp_amo!=50000000) %>% # bogus $50M in-kind donation by Ponzi scheme suspect 
  filter(amndt_ind=="N") # new transactions only, filter ammendments

# replace na in exp_dat with corresponding receipt_dat
a<-is.na(expenditure$exp_date)
expenditure$exp_date[a]<-expenditure$receipt_dat[a]

convert exp_date [currently dd-mmm-yy ] into standard date format via lubridate/parse

# convert two digit year 20-NOV-10 --> 20-NOV-2010
a<-gsub("(?:(..)$)","20\\1",expenditure$exp_date,perl = TRUE)
expenditure$exp_date<-parse_date_time(a,"dmy")

Convert to constant 2019 dollars

Step 1: Get Consumer Price Index table from the Bureau of Labor Statistics

require(readxl)
require(httr)
url <- "https://www.bls.gov/cpi/research-series/allitems.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [https://www.bls.gov/cpi/research-series/allitems.xlsx]
##   Date: 2019-12-03 08:23
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 16.6 kB
## <ON DISK>  /tmp/RtmpeaIOAH/file260e2556c93b.xlsx
d<-read_excel(tf,skip=6,n_max=75)

# extract items of interest from 2008 to 2018
inflationFactor<-d[32:42,c("YEAR","AVG")]

we need to construct a linear model to predict inflation for 2019

lm<-lm(AVG~YEAR,inflationFactor)
f2019<-predict(lm,data.frame(YEAR=2019))
# add 2019 & f2019 to the factor tibble:
inflationFactor<-rbind(inflationFactor,as_tibble(list(YEAR=2019,AVG=f2019)))

Step 2: Set up helper functions to do conversion:

consDollar<-function(expAmount,expDate,toYear,factor=inflationFactor) {
  f<-function(amt,date,toYear) {
    # convert one dollar spent on date to toYear constant dollars:
    # get from year factor:
    fromYear=year(date)
    fromFactor=factor$AVG[factor$YEAR==fromYear]
    toFactor=factor$AVG[factor$YEAR==toYear]
    return(amt*toFactor/fromFactor)
  }
  # use mapply to convert all vector quantities:
  return( mapply(f,expAmount,expDate,toYear) )
}

Step3 perform the conversion to 2019 dollars

expenditure$exp_amo<-consDollar(expenditure$exp_amo,expenditure$exp_date,2019)

Recode expenditure purpose into categorical tags

expenditure <- expenditure %>%
    # remove all character not lower case, digit, or space: ie. get rid of punctuation
     mutate( tags=gsub("[^a-z0-9 ]","",tolower(pur)))  %>%

    mutate(tags=gsub("(^| )ad(s |s$| |$|v)|meida|meda buy|direct marke|targeti|air time|prerecorded messages|voter education"," Adv ",tags)) %>%
    mutate(tags=gsub("canv|vass|walk|door[ ]*to|outreach|clipboards|distribution|file|organizer|translation"," Canvass ",tags)) %>% mutate(tags=gsub("consult|logistics|management|marketing|planning|strategic|strategy"," Consult ",tags)) %>%

    mutate(tags=gsub("video|writing|teleconference|public relations|press release|creative|proofreading|prodcution|production|(^| )art|design|messag|communication|( |^)pr( |$)|propaganda|copywritng|editing|images|photo|photograph|photography|photos"," Content ",tags)) %>%

    mutate(tags=gsub("voter contact|survey|research|poll( |ing)|data|list|opinion| id|message testing|poll"," Data ",tags)) %>%

    mutate(tags=gsub("donation|fundraising|donor"," Donor ",tags)) %>%

    mutate(tags=gsub("(^| )e mail|( |^)email *"," Email ",tags)) %>%
    mutate(tags=gsub("event|tour|rental|townhall|rally|hotel|bus[^i]|travel|transportation|flight|airfare|parking|toll|food|bever|crowd building|catering|costumes|meal|refreshments|teleprompter|telethon"," Events ",tags)) %>%
    mutate(tags=gsub("field|groundgame|grassroots|organizing|coordinating|lunch|meals|stipends|volunteer|volunteers"," Field ",tags)) %>%
    mutate(tags=gsub("gotv|get out the vote"," GOTV ",tags)) %>%
    mutate(tags=gsub("(^| )mail|(^|[^eE])mailing|usps|postage|zip|ship|courier|envelope|envelopes|stamps"," Mail ",tags)) %>%
    mutate(tags=gsub("news|newsletter|telepresser|tipsheet"," Media ",tags)) %>%

    mutate(tags=gsub("rent|utilities|office|supplies|administrative|toner"," Mgmt ",tags)) %>%

    mutate(tags=gsub("outdoor|billboard"," Outdoor ",tags)) %>%

    mutate(tags=gsub("phone|call|robo|dialer|predictive|telemarket|telemarking|telephoning"," Phone ",tags)) %>%

    mutate(tags=gsub("newsp|magaz|media|journal|print ad"," PMedia ",tags)) %>%
    mutate(tags=gsub("radio"," Radio ",tags)) %>%
    mutate(tags=gsub("( |^)search|google|bing"," Search ",tags)) %>%
    mutate(tags=gsub("social m|fb|twitter|facebook|instagram|snapc"," SMedia ",tags)) %>%
    mutate(tags=gsub("wages|per diem|payroll|salary|salaries|hourly pay|benefits|staff|mileage|gas|lodging|housing"," Staff ",tags))  %>%

    mutate(tags=gsub("print|liter|hanger|ballot|brochure|sticker|leafl|flier|flyer|card|yard sign|postcard|banner|sign|books|booklets|button|shirt|handbills|(^| )sign|material|guide|priniting|magnet|pins|clothing|apparel|decals|fulfillment|handouts|hats|labels|lilterature|( |^)lit( |$)|paraphernalia|posters|reflectors"," Stuff ",tags)) %>%
    mutate(tags=gsub("text(ing)*|mobile device"," Text ",tags)) %>%
    mutate(tags=gsub("cable|tv|television|tv"," Tv ",tags)) %>%
    mutate(tags=gsub("internet|digital|onlin|web|emarketing|electronic"," Web ",tags)) %>%
      mutate( tags=gsub("^other$","Other",tags)) %>%
      mutate( tags=gsub("^[ ]*$","Other",tags)) 

#      mutate( tags=gsub("","",tags)) %>%

Extract activities from tags

cleanActivity<-function(a){
  applyr<-function(a) 
      paste(sort(unique(str_split(a," ")[[1]])),collapse=" ")
  sapply(a, applyr)
}
    
expenditure <- expenditure %>%
  # remove all words consisting of lower case letters and numbers
  mutate(activity=gsub("(^| )[a-z0-9]*"," ",tags)) %>%
  # collapse all runs of multiple spaces
  mutate(activity=gsub("  +"," ",activity)) %>%
  # remove leading and trailing space
  mutate(activity=gsub("(^ | $)","",activity)) %>%
  # remove duplicates and sort
  mutate(activity=cleanActivity(activity)) %>%
  # replace null activities with Other
  mutate(activity=gsub("^$","Other",activity))

Create activityDistribution for apportioning expenditure amounts (exp_amo) over multiple activities

activityDistribution<- expenditure %>%
  select(activity,exp_amo) %>%
  # allow only rows of a single activity
  filter(grepl("^[A-Za-z0-9]+$",activity)) %>%
  group_by(activity) %>%
  summarise(count=n(),aggSpend=sum(exp_amo,na.rm = TRUE),meanSpend=mean(exp_amo),sdSpend=sd(exp_amo)) %>%
  arrange(desc(aggSpend))

a<-activityDistribution %>%
  mutate(aggSpend=nWc(aggSpend), meanSpend=nWc(meanSpend),sdSpend=nWc(sdSpend))

  require(pander)
  panderOptions('table.split.table', Inf)
  set.caption("Tally and aggregate amounts of single activities")
  activityDistribution.pander<-pander_return(a, style = 'rmarkdown')
  cat(activityDistribution.pander, sep="\n")

[1] “| activity | count | aggSpend | meanSpend | sdSpend ||:——–:|:—–:|:———–:|:———:|:——-:|| PMedia | 1319 | 272,327,529 | 206,465 | 459,728 || Mail | 1112 | 25,289,886 | 22,743 | 50,358 || Canvass | 743 | 6,700,579 | 9,018 | 40,747 || Adv | 171 | 5,475,692 | 32,022 | 82,587 || Tv | 26 | 4,434,845 | 170,571 | 224,841 || Phone | 560 | 3,982,140 | 7,111 | 17,567 || Stuff | 815 | 3,129,714 | 3,840 | 13,210 || Staff | 27128 | 2,822,474 | 104 | 1,068 || Data | 191 | 2,050,980 | 10,738 | 12,687 || Radio | 44 | 1,633,670 | 37,129 | 44,553 || Content | 243 | 1,330,640 | 5,476 | 10,730 || Other | 154 | 1,160,839 | 7,538 | 45,711 || Field | 35 | 932,087 | 26,631 | 44,637 || Consult | 250 | 908,087 | 3,632 | 6,134 || Events | 4033 | 452,281 | 112 | 1,077 || Web | 139 | 357,067 | 2,569 | 7,163 || Outdoor | 9 | 126,191 | 14,021 | 12,859 || Email | 43 | 117,577 | 2,734 | 4,635 || Donor | 156 | 72,515 | 465 | 821 || Mgmt | 373 | 43,365 | 116 | 361 || Text | 10 | 28,660 | 2,866 | 4,113 || SMedia | 4 | 15,376 | 3,844 | 4,902 || GOTV | 5 | 12,566 | 2,513 | 3,634 || Media | 3 | 10,378 | 3,459 | 5,508 || Search | 1 | 1,278 | 1,278 | NA |: Tally and aggregate amounts of single activities” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Functions for apportioning expenditures over multiple activities weighted by the aggregate expenditures of single activities

# generate N samples from a truncated normal distribution between between values a and b  
rTruncNorm <- function(N, mean = 0, sd = 1, a = -Inf, b = Inf) {
  if (a > b) stop('Error: Truncation range is empty');
  U <- runif(N, pnorm(a, mean, sd), pnorm(b, mean, sd));
  qnorm(U, mean, sd); }

# generate table of 1000 trunc Norm distributed values for each activity:
trNorm<-matrix(0,nrow=nrow(activityDistribution),ncol=1000)
row.names(trNorm)<-activityDistribution$activity

genTrNorm<-function(activity,mean,sd){
  trNorm[activity,1:1000]<<-rTruncNorm(1000,mean=mean,sd=sd,a=0)
  TRUE
}

a<- activityDistribution %>%
  mutate(a=mapply(genTrNorm,activity,meanSpend,sdSpend)) %>%
  filter(FALSE)
## Warning in runif(N, pnorm(a, mean, sd), pnorm(b, mean, sd)): NAs produced
getTrNorm<-function(activity) trNorm[activity,ceiling(runif(1)*1000)]

apportionActs<-function(activityString,exp_amo) {
  a<-(strsplit(activityString," "))[[1]]
  dist<- activityDistribution %>%
   filter(activity %in% c(a) ) %>%
    mutate(ct=n()) %>%
    mutate(fracAmo=0) %>%
    mutate(randomTruncatedNormal=getTrNorm(activity)) %>%
    #mutate(randomTruncatedNormal=rTruncNorm(ct,mean=meanSpend,sd=sdSpend,a=0)) %>%
    mutate(fracAmo=exp_amo*randomTruncatedNormal/sum(randomTruncatedNormal))
  return(dist[,c("activity","fracAmo")])
}

# initialize tibble to hold apportioned rows

applyAppor<-function(Row) {
  Row<-as.data.frame(Row,stringsAsFactors = FALSE)
  appo<-apportionActs(as.character(Row$activity),Row$exp_amo)
  for (i in 1:nrow(appo)) {
    Row$activity<-appo$activity[i]
    Row$exp_amo<-appo$fracAmo[i]
    rowBoundApportions<<-rbind(rowBoundApportions,Row)
  }
  Row[-1,]
}

Allocate expenditure amounts for rows containing multiple activities

rowBoundApportions<-expenditure[1,]
rowBoundApportions<-rowBoundApportions[-1,]

# generate apportioned rows, add them to rowBoundApportions
discard <- expenditure %>%
  filter(grepl(" ",activity)) %>%
  rowwise() %>%
  do(applyAppor(.)) %>%
  filter(FALSE)
  

# add the apportioned rows to expenditures
expenditure<-rbind(expenditure,rowBoundApportions)

# drop the rows containing spaces that were apportioned
expenditure<-expenditure %>%
  filter(!grepl(" ",activity))

Build functions and indices for imputing political affiliations where missing

# identify PACs (spe_id's) with expenditures for unspecified party affiliations
spesWoPartyAffiation<- expenditure %>%
  filter(is.na(cand_pty_aff)) %>%
  group_by(spe_id) %>%
  summarise(ct=n())

# for these, find their expenditures where party affiliation is specified, then summarize
spesWpartyAffiliation<- expenditure %>%
  filter(!is.na(cand_pty_aff), spe_id %in% spesWoPartyAffiation$spe_id) %>%
  select(cand_pty_aff, spe_id,sup_opp, ele_type) %>%
  group_by(spe_id,sup_opp,ele_type,cand_pty_aff) %>%
  summarise(ct=n()) %>%
  arrange(spe_id,sup_opp,ele_type,cand_pty_aff,ct)
  
#  Function for imputing party affiliation  
imputePartyAffiliation<-function(candPtyAff,speId,suppOpp,eleType ) {
  
# from spender id, support/opposition, party affiliation != NA construct frequency table for spender:

spenderFreq<- spesWpartyAffiliation %>%
  filter(spe_id==speId, sup_opp==suppOpp, ele_type==eleType) %>%
  group_by(cand_pty_aff) %>%
  summarise(ct=sum(ct)) %>%
  arrange(ct)

  if (nrow(spenderFreq) == 0) return(NA)  # no other affiliations
  if (nrow(spenderFreq) == 1) return(spenderFreq$cand_pty_aff[1]) # only one part affiliation

# PAC supports/oppose candidates of multiple parties:
# pick one from a probability weighted summary of PAC's parties 
  bins<-c(0,spenderFreq$ct)[1:nrow(spenderFreq)]
  i<-findInterval(runif(1)*sum(spenderFreq$ct),bins)
# return imputed party affiliation
  return(spenderFreq$cand_pty_aff[i])
}

mapplyImputePartyAffiliation<-function(cand_pty_aff, spe_id,sup_opp,ele_type) {
  mapply(imputePartyAffiliation,cand_pty_aff, spe_id,sup_opp,ele_type)
}

Apply candidate party affiliation imputation

affils <- expenditure %>%
  filter(is.na(cand_pty_aff))


affils <- affils %>%
  mutate(cand_pty_aff= mapplyImputePartyAffiliation(cand_pty_aff, spe_id,sup_opp,ele_type)) %>%
  #mutate(cand_pty_aff= mapply(imputePartyAffiliation,cand_pty_aff, spe_id,sup_opp,ele_type)) %>%
  drop_na(cand_pty_aff)

Bind imputed affiliation rows to expenditure, drop rows where expenditure$can_pty_aff has NA’s

expenditure<-rbind(expenditure,affils)

expenditure<-expenditure %>% drop_na(cand_pty_aff)
# summarise activities:
a<- expenditure %>%
  select(activity,exp_amo) %>%
  group_by(activity) %>%
  summarise(count=n(),spend=sum(exp_amo)) %>%
  arrange(desc(spend)) %>%
  mutate(kDollar=sprintf("$%0.2fK",spend/1000)) %>%
  select(activity,count,kDollar)


panderOptions('table.split.table', Inf)
  set.caption("Tally and aggregate amounts of single activities")
  singleActivityAggSpend.pander<-pander_return(a, style = 'rmarkdown')
  cat(singleActivityAggSpend.pander, sep="\n")

[1] “| activity | count | kDollar ||:——–:|:—–:|:———–:|| PMedia | 2252 | $319036.50K || Tv | 803 | $104953.40K || Mail | 1680 | $29750.41K || Content | 2115 | $10104.95K || Canvass | 1104 | $9192.28K || Radio | 481 | $6528.01K || Data | 938 | $5566.16K || Phone | 913 | $5307.36K || Stuff | 1220 | $4118.92K || Staff | 27371 | $2581.45K || Field | 150 | $1666.47K || Consult | 329 | $1243.06K || Other | 120 | $1053.55K || Events | 4269 | $572.92K || Outdoor | 26 | $310.22K || GOTV | 95 | $174.53K || Email | 150 | $155.12K || Text | 75 | $109.18K || Donor | 225 | $78.10K || Mgmt | 389 | $42.53K || Media | 103 | $33.61K || Adv | 2904 | $NAK || Search | 40 | $NAK || SMedia | 73 | $NAK || Web | 1791 | $NAK |: Tally and aggregate amounts of single activities” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Combine activities into groups:

a<-sort(unique(expenditure$activity))
names(a)<-a
a[qw(" GOTV Events  Field  Phone  Stuff Mail   Canvass")]<-"Outreach"
a[qw("Text Email")]<-"OutreachV2"
a[qw("Radio PMedia Tv Adv Outdoor Media")]<-"Advertising"
a[qw("Web SMedia Search")]<-"AdvertisingV2"
a[qw("Data Consult Donor Content")]<-"Strategy"
a[qw("Staff Other Mgmt")]<-"Overhead"

expenditure[,"activityGroup"]<-a[expenditure$activity]

At this point the sampled data are as clean as they are going to get. And, now, fresh from the oven, let’s get the complete cleaned dataset.

 load(url("https://github.com/sdutky/mcData110/raw/master/pacExpenditures/expenditure.rdata"))

Let’s see what we have got:

# look at the NA's:
summaryOfRowsAndNAs.pander<-tallyNAsUniqs(expenditure,pretty=TRUE,
    caption="Summary of Row and NA's in cleaned expenditure\nMost of NA's are in rows that don't matter")
  cat(summaryOfRowsAndNAs.pander, sep="\n")

[1] “| columnNames | naCount | uniqueCount ||:———————-:|:——-:|:———–:|| Total Rows and Columns | 451772 | 27 || cand_id | 22446 | 1873 || cand_name | 19 | 3982 || spe_id | 0 | 1420 || spe_nam | 1 | 1779 || ele_type | 0 | 9 || can_office_state | 118519 | 63 || can_office_dis | 266 | 55 || can_office | 188 | 4 || cand_pty_aff | 0 | 11 || exp_amo | 0 | 147449 || exp_date | 0 | 2840 || agg_amo | 807 | 71179 || sup_opp | 0 | 2 || pur | 0 | 14817 || pay | 0 | 18759 || file_num | 0 | 24935 || amndt_ind | 0 | 1 || tran_id | 0 | 340043 || image_num | 0 | 90698 || receipt_dat | 0 | 2500 || fec_election_yr | 0 | 6 || prev_file_num | 451717 | 4 || dissem_dt | 332521 | 1831 || uid | 0 | 384624 || tags | 0 | 12872 || activity | 0 | 25 || activityGroup | 0 | 6 |: Summary of Row and NA’s in cleaned expenditureof NA’s are in rows that don’t matter” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Now let’s see how money was spent:

a<-expenditure %>%
  group_by(activity, activityGroup) %>%
  summarise(count=nWc(n(),0),
            aggOrder=sum(exp_amo),
            aggSpend=nWc(sum(exp_amo)/1000,0),
            meanSpend=nWc(mean(exp_amo),0),
            sdSpend=nWc(sd(exp_amo),0)) %>%
  arrange(desc(aggOrder)) %>%
  select(activity,activityGroup,count,aggSpend,meanSpend,sdSpend)
  
names(a)<-qw("activity  activityGroup  count   \"aggSpend ($K)\"   meanSpend  sdSpend")

panderOptions('table.split.table', Inf)
  set.caption("Summary of Independent Expenditures by Activity Group elections 2010-2020\n(constant 2019 dollars)")
 expendituresByActivityGroup.pander<- pander_return(a, style = 'rmarkdown')
 cat(expendituresByActivityGroup.pander, sep="\n")

[1] “| activity | activityGroup | count | aggSpend ($K) | meanSpend | sdSpend ||:——–:|:————-:|:——-:|:————-:|:———:|:——-:|| PMedia | Advertising | 20,365 | 2,753,480 | 135,206 | 401,204 || Tv | Advertising | 7,470 | 945,257 | 126,540 | 347,299 || Adv | Advertising | 26,937 | 656,265 | 24,363 | 114,512 || Mail | Outreach | 15,458 | 285,978 | 18,500 | 51,338 || Canvass | Outreach | 10,847 | 108,001 | 9,957 | 52,318 || Content | Strategy | 20,032 | 65,932 | 3,291 | 16,079 || Web | AdvertisingV2 | 16,267 | 64,452 | 3,962 | 17,673 || Radio | Advertising | 4,618 | 61,020 | 13,214 | 37,067 || Data | Strategy | 8,342 | 50,557 | 6,061 | 18,150 || Phone | Outreach | 8,389 | 49,150 | 5,859 | 18,480 || Stuff | Outreach | 11,552 | 39,075 | 3,383 | 14,258 || Staff | Overhead | 247,730 | 22,092 | 89 | 865 || Field | Outreach | 1,362 | 18,736 | 13,756 | 53,668 || Consult | Strategy | 2,558 | 13,379 | 5,230 | 17,552 || GOTV | Outreach | 915 | 7,403 | 8,091 | 27,474 || Events | Outreach | 37,324 | 7,123 | 191 | 2,369 || Other | Overhead | 1,130 | 5,572 | 4,931 | 26,471 || Outdoor | Advertising | 256 | 3,404 | 13,295 | 67,183 || Email | OutreachV2 | 1,565 | 2,409 | 1,539 | 9,245 || Text | OutreachV2 | 772 | 1,019 | 1,319 | 7,138 || Donor | Strategy | 1,779 | 770 | 433 | 2,023 || Mgmt | Overhead | 3,921 | 444 | 113 | 803 || Media | Advertising | 1,013 | 442 | 436 | 2,929 || SMedia | AdvertisingV2 | 785 | 177 | 226 | 781 || Search | AdvertisingV2 | 385 | 69 | 179 | 905 |: Summary of Independent Expenditures by Activity Group elections 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

And what were the targets?

supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")

a<-expenditure %>%
  group_by(cand_pty_aff, sup_opp) %>%
  summarise(count=nWc(n(),0),
            aggOrder=sum(exp_amo),
            aggSpend=nWc(sum(exp_amo)/1000,0),
            meanSpend=nWc(mean(exp_amo),0),
            sdSpend=nWc(sd(exp_amo),0)) %>%
  arrange(desc(aggOrder)) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  select(oS,cand_pty_aff,count,aggSpend,meanSpend,sdSpend)

  
names(a)<-qw("For/Against  Target  count   \"aggSpend ($K)\"   meanSpend  sdSpend")

panderOptions('table.split.table', Inf)
  set.caption("Summary of PAC Expenditures U.S. elections 2010-2020\n(constant 2019 dollars)")
  summaryPACexpenditures.pander<-pander_return(a, style = 'rmarkdown')
  cat(summaryPACexpenditures.pander, sep="\n")

[1] “| For/Against | Target | count | aggSpend ($K) | meanSpend | sdSpend ||:———–:|:———————–:|:——-:|:————-:|:———:|:——-:|| Against | REPUBLICAN PARTY | 143,853 | 2,054,377 | 14,281 | 105,278 || Against | DEMOCRATIC PARTY | 56,943 | 1,753,791 | 30,799 | 218,979 || For | REPUBLICAN PARTY | 49,684 | 760,864 | 15,314 | 101,539 || For | DEMOCRATIC PARTY | 197,613 | 506,035 | 2,561 | 30,463 || Against | DEMOCRATIC-FARMER-LABOR | 346 | 38,640 | 111,675 | 211,588 || Against | INDEPENDENT | 1,652 | 13,095 | 7,927 | 44,229 || For | LIBERTARIAN PARTY | 235 | 12,526 | 53,303 | 168,517 || For | DEMOCRATIC-FARMER-LABOR | 660 | 9,778 | 14,815 | 57,783 || Against | UNKNOWN | 130 | 4,251 | 32,698 | 62,889 || For | INDEPENDENT | 201 | 2,899 | 14,422 | 23,363 || For | UNKNOWN | 326 | 2,019 | 6,193 | 16,996 || Against | NONE | 57 | 1,837 | 32,228 | 73,357 || Against | CONSTITUTION PARTY | 14 | 1,340 | 95,713 | 219,706 || Against | LIBERTARIAN PARTY | 7 | 396 | 56,551 | 83,693 || Against | OTHER | 15 | 129 | 8,605 | 23,606 || For | NONE | 19 | 124 | 6,538 | 7,425 || For | GREEN PARTY | 12 | 102 | 8,507 | 10,988 || For | CONSTITUTION PARTY | 2 | 1 | 749 | 433 || For | OTHER | 1 | 0 | 466 | NA || For | CONSERVATIVE PARTY | 2 | 0 | 1 | 0 |: Summary of PAC Expenditures U.S. elections 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Lining up an alluvial plot #1:

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"other"

supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")

a<-expenditure %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,oS,target) %>%
  #group_by(target, sup_opp) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  #filter(target=="Dem") %>%
  select(cycle,oS,target,aggSpend) %>%
  arrange(cycle,oS,target) 
# ggplot it:

proAntiByParty.ggplot  <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis2 = oS, axis3 = target)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = -1, knot.pos = 1/10, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle)  , width = 1/6, knot.pos = 1/6, reverse = FALSE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","For/Against",  "Target")) +
  scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent") +
  #scale_y_log10(name="total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
  coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections 2010-2020")

proAntiByParty.ggplot

And now incorporating activityGroups

Lining up an alluvial plot #2:

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"Other"

supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")

# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr

a<-expenditure %>%
  #mutate(activityGroup=agRecode[activityGroup]) %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(proAnti=paste(oS,target,sep="")) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,proAnti,activityGroup) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  filter(aggSpend>1) %>%
  filter(cycle %in% qw( "2010 2018")) %>%
  select(cycle,proAnti,activityGroup,aggSpend) %>%
  arrange(cycle,proAnti,activityGroup) 
# ggplot it:

proAntiByCampaignActivity.ggplot <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis3 = proAnti, axis2 = activityGroup)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  #geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
  geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
  #geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
  #scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  #scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
  #coord_trans(y="log10")+
  #scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
  scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
  #scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
  #coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections By Activity 2010 & 2018")

proAntiByCampaignActivity.ggplot

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"Other"

supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")

# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr

a<-expenditure %>%
  filter(activityGroup=="AdvertisingV2") %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(proAnti=paste(oS,target,sep="")) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,proAnti,activity) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  filter(aggSpend>1) %>%
  filter(cycle %in% qw( "2010 2018")) %>%
  select(cycle,proAnti,activity,aggSpend) %>%
  arrange(cycle,proAnti,activity) 
# ggplot it:

proAntiByInetAdv.ggplot <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis3 = proAnti, axis2 = activity)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  #geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
  geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
  #geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
  #scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  #scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
  #coord_trans(y="log10")+
  #scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
 scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
 #scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
  #coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections By Internet Advertising                                                                     2010-2020")

proAntiByInetAdv.ggplot

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"Other"

supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")

# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr

a<-expenditure %>%
  filter(activityGroup=="OutreachV2") %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(proAnti=paste(oS,target,sep="")) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,proAnti,activity) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  filter(aggSpend>1) %>%
  filter(cycle %in% qw( "2010 2018")) %>%
  select(cycle,proAnti,activity,aggSpend) %>%
  arrange(cycle,proAnti,activity) 
# ggplot it:

proAntiByInetOutreach.ggplot <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis3 = proAnti, axis2 = activity)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  #geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
  geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
  #geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
  #scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  #scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
  #coord_trans(y="log10")+
  #scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
  scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
  #scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
  #coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections By Internet Outreach                                                                     2010-2020")

proAntiByInetOutreach.ggplot

Chunks for surveying and assessing expenditures:

Can we impute party affiliations?

affils <- expenditure %>%
  select(spe_id,ele_type,cand_pty_aff,sup_opp) %>%
  group_by(spe_id,ele_type,sup_opp,cand_pty_aff ) %>%
  summarize(ct=n())

naAffils<-affils %>% filter(is.na(cand_pty_aff)) %>% group_by(spe_id) %>% summarise()

affils<-affils %>% filter(spe_id %in% naAffils$spe_id) %>% select(everything())

oneOrMoreAffils <- affils %>% filter(!is.na(cand_pty_aff)) %>% group_by(spe_id) %>% summarise()

affils<-affils %>% filter(spe_id %in% oneOrMoreAffils$spe_id ) %>% select(everything())

nasImputable<- affils %>% filter(is.na(cand_pty_aff)) %>% group_by(sup_opp) %>% summarise(sum(ct))
nasSum<-expenditure %>% select(everything()) %>%  filter(is.na(cand_pty_aff)) %>% group_by(sup_opp) %>% summarise(ct=n())

greps for consolidating tags:

b<-grep("salary|benefits|staff|mileage|gas|lodging",a$tags,value = TRUE)
b<-grep("(^| )mail|usps|postage|zip",a$tags,value = TRUE)
b<-grep("print|liter|hanger|brochure|sticker|leafl|flier|pamphlet",a$tags,value = TRUE)
b<-sample(grep("[A-Z]",a$tags,value = TRUE,perl = TRUE,invert = TRUE),1000)
b<-grep("event|tour|field",a$tags,value = TRUE)
b<-grep("[A-Z]",a$tags,value = TRUE,perl = TRUE,invert = TRUE)

#b<-grep("",a$tags,value = TRUE)
view(b)

more greps

b<-grep("internet|digital|onlin|web",a$tags,value = TRUE)
b<-grep("print",a$tags,value = TRUE)
b<-grep("survey|research|poll( |ing)|data",a$tags,value = TRUE)
b<-grep("canv|door[ ]*to",a$tags,value = TRUE)
b<-grep("[A-Z]",a$tags,value = TRUE)
b<-grep("phone|calls",a$tags,value = TRUE)

#b<-grep("",a$tags,value = TRUE)
view(b)

yet more greps of tags

b<-grep("[^e]mail",a$tags,value = TRUE)
b<-grep(" e mail",a$tags,value = TRUE,perl = TRUE)
b<-grep(" e mail",a$tags,value = TRUE)
b<-grep("Text",a$tags,value = TRUE)
b<-grep("(^|[^O]*)[tT][vV]",a$tags,value = TRUE)
b<-grep("newsp|magaz|media|journal|print ad",a$tags,value = TRUE)
b<-grep("social n|fb|twitter|facebook|instagram|snapc",a$tags,value = TRUE)
b<-grep("( |^)search|google",a$tags,value = TRUE)
view(b)

look for NA party affiliation

 b<- a %>%
 select(cand_id,cand_name,cand_pty_aff,activity) %>%
 filter(is.na(cand_pty_aff) & !is.na(cand_name)) %>%
  group_by(cand_name) %>%
  summarise(ct=n()) %>%
  arrange(desc(ct))

f<-function(a) a %in% b$cand_name

c<- a %>%
  select(cand_id,cand_name,cand_pty_aff,activity) %>%
  filter( cand_name %in% b$cand_name) %>%
  group_by(cand_name, cand_pty_aff) %>%
  summarise(ct=n())
  

#c<-sort(unique(b$cand_name))
 view(c)

Check that name headers of each csv file match

headers <- as_tibble(data.frame(matrix(nrow=0,ncol=50)))
colnames(headers)<-paste("field",1:50,sep="")

for ( i in urls ) {
  a<-read.csv(i,nrows=1, header = FALSE)
  a[1,(length(a)+1):50]<-NA 
  names(a)=paste("field",1:50,sep="")
  
  headers<-rbind(headers,a)
}
summary(headers) #  Summary above shows all column headers identical in all urls

Using MD5 checksums, look for duplicate rows: None found

require(digest)

md5<-function(a) {
  data.frame(md5=digest(a,algo="md5",serialize = FALSE),stringsAsFactors = FALSE)
}

cksums<- expenditure %>%
  rowwise() %>%
  do(md5(paste(.,collapse = " ")))

tally cksums, isolate those with more than two occurrences: None Found

ckSumCount <- cksums %>%
  group_by(md5) %>%
  summarise(ct=n()) %>%
  filter(ct>1)

cat(paste("\n",nrow(ckSumCount)," duplicate rows found", collapse=" "))

Look for alternative to rowwise(), do() for walking through rows

a<-ungroup(expenditure[1:100,])
c<-a[1,]
c<-c[-1,]

prt<-function(a) {
  c<<-rbind(c,a)
  TRUE
}

b<- expenditure %>%
  filter(is.na(cand_pty_aff)) %>%
  mutate(prt(as.data.frame(.)))

Scrapbook for applying candidate party affiliation imputation

  speId<-"C00000935"; suppOpp<-"o"; eleType<-"G" 
  
spenderFreq<- spesWpartyAffiliation %>%
  filter(spe_id==speId, sup_opp==suppOpp, ele_type==eleType) %>%
  group_by(cand_pty_aff) %>%
  summarise(ct=sum(ct)) %>%
  arrange(ct)

browse spenders, payees, money

a<-expenditure %>% 
  select(exp_amo,pay,spe_nam,sup_opp,cand_pty_aff) %>% 
  group_by(pay,sup_opp,cand_pty_aff) %>% 
  summarise(total=sum(exp_amo)) %>% 
  arrange(desc(total),pay,sup_opp,cand_pty_aff)

view(a)
supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")

electionRoundRecode<-qw("other other general other other primary \"run off\" special other")
names(electionRoundRecode)<-qw("2 C G O other P R S X")

a<-expenditure %>%
  select(fec_election_yr,ele_type,sup_opp,exp_amo) %>%
  mutate(sup_opp=supOppRecode[sup_opp]) %>%
  mutate(ele_type=electionRoundRecode[ele_type]) 
  #select(-ele_type,-sup_opp)
    
  
a<-a %>%
  group_by(fec_election_yr,ele_type,sup_opp) %>%
  summarise(count=nWc(n(),0),
            aggOrder=sum(exp_amo),
            aggSpend=nWc(sum(exp_amo)/1000,0),
            meanSpend=nWc(mean(exp_amo),0),
            sdSpend=nWc(sd(exp_amo),0)) %>%
  arrange(desc(fec_election_yr)) %>%
  select(fec_election_yr,ele_type,sup_opp,count,aggSpend,meanSpend,sdSpend)


names(a)<-qw("\"Election Cycle\" Round For/Against  count   \"aggSpend ($K)\"   meanSpend  sdSpend")

panderOptions('table.split.table', Inf)
  set.caption("Summary of PAC Expenditures by Election Round 2010-2020\n(constant 2019 dollars)")
  summaryPACbyElectionRound.pander<-pander_return(a, style = 'rmarkdown')
  cat(summaryPACbyElectionRound.pander, sep="\n")

[1] “| Election Cycle | Round | For/Against | count | aggSpend ($K) | meanSpend | sdSpend ||:————–:|:——-:|:———–:|:——:|:————-:|:———:|:——-:|| 2020 | general | Against | 51 | 278 | 5,454 | 9,886 || 2020 | general | For | 1,014 | 9,903 | 9,767 | 17,483 || 2020 | other | Against | 26 | 81 | 3,112 | 3,211 || 2020 | other | For | 167 | 2,987 | 17,884 | 59,240 || 2020 | primary | Against | 1,865 | 6,614 | 3,547 | 41,404 || 2020 | primary | For | 1,372 | 5,255 | 3,830 | 20,199 || 2020 | run off | Against | 7 | 285 | 40,767 | 45,707 || 2020 | run off | For | 5 | 42 | 8,308 | 8,548 || 2020 | special | Against | 186 | 5,855 | 31,480 | 96,821 || 2020 | special | For | 201 | 2,407 | 11,977 | 35,223 || 2018 | general | Against | 14,379 | 733,458 | 51,009 | 186,954 || 2018 | general | For | 23,251 | 206,712 | 8,890 | 57,299 || 2018 | other | Against | 513 | 6,856 | 13,365 | 58,718 || 2018 | other | For | 1,680 | 6,743 | 4,014 | 21,871 || 2018 | primary | Against | 2,939 | 56,368 | 19,179 | 89,028 || 2018 | primary | For | 4,217 | 64,726 | 15,349 | 74,312 || 2018 | run off | Against | 483 | 12,044 | 24,935 | 84,582 || 2018 | run off | For | 550 | 6,075 | 11,045 | 30,670 || 2018 | special | Against | 858 | 28,359 | 33,053 | 104,413 || 2018 | special | For | 934 | 11,360 | 12,162 | 41,020 || 2016 | general | Against | 69,523 | 944,632 | 13,587 | 125,793 || 2016 | general | For | 86,962 | 228,646 | 2,629 | 39,993 || 2016 | other | Against | 31 | 1,455 | 46,949 | 188,834 || 2016 | other | For | 180 | 1,585 | 8,808 | 20,762 || 2016 | primary | Against | 4,038 | 119,120 | 29,500 | 120,307 || 2016 | primary | For | 11,306 | 268,854 | 23,780 | 150,640 || 2016 | run off | Against | 173 | 2,501 | 14,459 | 46,574 || 2016 | run off | For | 206 | 922 | 4,477 | 17,753 || 2016 | special | Against | 4 | 1 | 361 | 484 || 2016 | special | For | 44 | 46 | 1,054 | 2,084 || 2014 | general | Against | 40,235 | 504,614 | 12,542 | 75,207 || 2014 | general | For | 24,257 | 111,841 | 4,611 | 34,631 || 2014 | other | Against | 167 | 6,523 | 39,062 | 106,886 || 2014 | other | For | 620 | 5,769 | 9,304 | 29,354 || 2014 | primary | Against | 1,945 | 50,132 | 25,775 | 74,947 || 2014 | primary | For | 3,386 | 33,860 | 10,000 | 35,651 || 2014 | run off | Against | 282 | 7,045 | 24,983 | 91,195 || 2014 | run off | For | 549 | 1,527 | 2,782 | 7,796 || 2014 | special | Against | 472 | 10,448 | 22,135 | 70,743 || 2014 | special | For | 513 | 5,143 | 10,025 | 48,684 || 2012 | general | Against | 51,553 | 903,675 | 17,529 | 202,616 || 2012 | general | For | 51,536 | 178,875 | 3,471 | 51,792 || 2012 | other | Against | 268 | 36,996 | 138,044 | 540,450 || 2012 | other | For | 177 | 6,834 | 38,609 | 99,930 || 2012 | primary | Against | 2,277 | 78,750 | 34,585 | 170,440 || 2012 | primary | For | 3,943 | 50,381 | 12,777 | 65,992 || 2012 | run off | Against | 133 | 7,435 | 55,899 | 167,544 || 2012 | run off | For | 234 | 3,012 | 12,871 | 43,206 || 2012 | special | Against | 125 | 4,377 | 35,018 | 64,050 || 2012 | special | For | 100 | 1,494 | 14,943 | 38,417 || 2010 | general | Against | 9,776 | 330,396 | 33,797 | 109,317 || 2010 | general | For | 26,162 | 60,252 | 2,303 | 19,627 || 2010 | other | Against | 19 | 272 | 14,335 | 30,017 || 2010 | other | For | 93 | 2,205 | 23,714 | 59,487 || 2010 | primary | Against | 471 | 3,239 | 6,876 | 21,778 || 2010 | primary | For | 3,769 | 8,855 | 2,350 | 14,903 || 2010 | run off | Against | 24 | 151 | 6,286 | 7,574 || 2010 | run off | For | 946 | 1,501 | 1,586 | 17,447 || 2010 | special | Against | 194 | 5,894 | 30,379 | 55,885 || 2010 | special | For | 381 | 6,537 | 17,158 | 48,948 |: Summary of PAC Expenditures by Election Round 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA