Purpose: Describe the Money Flows of Independent Political Campaigns

This project will examine the expenditures of independent groups supporting or opposing electoral candidates as reported to the Federal Election Commission (FEC) for elections from 2010 to 2020.

These groups, commonly called Political Action Committees (PACs), may be formed by individuals, corporations, and organizations, including political parties. They can advocate for issues as well as support or oppose candidates for office. The FEC requires that when they take actions supporting or opposing a candidate, they operate independently of any candidates’ campaign. They must also report to the FEC all expenditures they make advocating in regard to specific candidates. The FEC terms these as “indpendent expenditures”. For each election cycle, the FEC extracts and gathers the data of each independent expenditure and makes it available to download.

Bear in mind that PAC’s are not required to report expenditures for purposes performed not associated with a candidate for office. Additionally, PAC’s have motives to obscure the purposes of their reported expenditures.

The other purpose of this project is to explore the ggalluvial package for constructing flow diagrams that connect flows (alluvia) of continuous variables across strata of multiple categories. In this case, the alluvia will be money and the strata will consist of political affiliations, support and opposition, traditional and Internet enabled political activities.

Expenditure Data Items: Cleaning and Preparation

The FEC collects twenty three comma separated values describing each expenditure. The following are of interest here:

Expenditure Data Fields to be Described
field	desc	class
spe_id	Spender ID: PAC uid	Character
spe_nam	Spender name	Character
ele_typ	Election type	Character
can_off	Office	Character
cand_par_aff	Party	Character
exp_amo	Expenditure amount	Currency
exp_dat	Expenditure date	Date
rcp_dat	Filing receipt date	Date
sup_opp	Support/Oppose	Character
pur	Purpose of expenditure	Character
fec_election_yr	Two-year period	Date
tran_id	expenditure within filing	Character
file_num	Filing uid	Character
NB:	The tran_id and file_num uniquely identify the expenditure

Load and display pre-processed tables and plots from gitHub:

load(url("https://github.com/sdutky/mcData110/raw/master/pacExpenditures/alluvialPacTableAndGraphs.rdata"))

Display the pander tables:

cat(firstNAcount.pander,sep = "\n")

Tally of NA’s and unique values in all columns of expenditure
columnNames	naCount	uniqueCount
Total Rows and Columns	450039	23
cand_id	36474	1907
cand_name	22	4552
spe_id	0	1740
spe_nam	73	2191
ele_type	166	9
can_office_state	123470	63
can_office_dis	332	65
can_office	252	4
cand_pty_aff	40931	13
exp_amo	13	67400
exp_date	13921	2905
agg_amo	776	85895
sup_opp	171	5
pur	81	18102
pay	565	21367
file_num	0	32197
amndt_ind	0	10
tran_id	0	352943
image_num	0	107485
receipt_dat	0	2691
fec_election_yr	0	6
prev_file_num	401491	3444
dissem_dt	348574	1871

The above table describes the data comprising expenditure as downloaded from the FEC.

We will need to drop the rows where either of an expenditure’s amount (exp_amo) or whether it was to support or oppose (sup_opp) a candidate is unknown. We can pigeonhole unknown purposes (pur) as other. The date a filing was received by the FEC (receipt_dat) can substitute for unknown expenditure dates (exp_dat). Nearly 10% of expenditures have missing party affiliations. Where the spender (spe_id) has other expenditures, we can try to use the distributions of the parties they support or oppose to impute the value for expenditures where this missing. The remaining fields don’t particularly concern us.

cat(activityDistribution.pander,sep = "\n")

Tally and aggregate amounts of single activities
activity	count	aggSpend	meanSpend	sdSpend
PMedia	1248	244,433,893	195,860	457,272
Mail	1115	23,610,815	21,176	35,347
Other	128	9,453,373	73,854	734,394
Canvass	805	7,618,535	9,464	43,073
Adv	155	7,151,514	46,139	180,391
Phone	540	4,617,834	8,552	27,131
Tv	25	3,760,731	150,429	276,512
Stuff	822	3,662,051	4,455	21,330
Staff	27159	2,649,062	98	907
Data	207	2,391,795	11,555	21,819
Content	283	1,892,367	6,687	34,423
Radio	31	1,374,529	44,340	79,941
Consult	229	1,092,029	4,769	11,455
Field	23	617,991	26,869	32,789
Events	3955	592,869	150	2,363
Web	139	451,448	3,248	10,902
Email	41	84,891	2,071	4,572
Donor	138	79,802	578	1,409
Outdoor	6	68,972	11,495	4,390
Mgmt	402	46,065	115	432
Text	13	19,796	1,523	1,621
SMedia	6	6,603	1,101	853
Search	1	2,762	2,762	NA
GOTV	3	317	106	46

Cleaning the entire expenditure dataset of 450 thousand row took eighteen hours on my laptop and significant coaxing. I saved the cleaned dataset from which I produced plots, but I used a 50K sample to run the parts of the script that did the cleaning. The script completes in a reasonable time, but I believe the smaller sample introduced some NA’s. I have not tracked down the problems.

We extracted characteristic strings from the purpose of each expenditure to assign them to one or more of these activities. This table breaks out those expenditures that have only a single activity. We will take the expenditures that we classified with multiple activities and generate multiple rows of expenditures each having a single activity. We will apportion the amount of each “new” expenditure by applying a factor derived from a truncated normal sample (eliminating negative amounts) to the amount of the original expenditure.

cat(singleActivityAggSpend.pander,sep = "\n")

Tally and aggregate amounts of single activities
activity	count	kDollar
PMedia	2114	$280450.27K
Tv	808	$94659.79K
Mail	1600	$25644.89K
Canvass	1103	$10987.62K
Content	2104	$10223.20K
Phone	872	$5928.58K
Stuff	1227	$5859.88K
Data	917	$5369.05K
Radio	489	$5204.31K
Field	148	$2891.66K
Staff	27394	$2451.96K
Consult	269	$1213.82K
Other	93	$920.68K
Events	4157	$759.03K
Email	155	$144.63K
Outdoor	22	$93.97K
Donor	195	$85.02K
Mgmt	428	$45.75K
GOTV	92	$24.21K
Text	82	$22.60K
Adv	3010	$NAK
Search	43	$NAK
SMedia	81	$NAK
Web	1797	$NAK

This shows the complete sample (more or less) with single activities apportioned

cat(summaryOfRowsAndNAs.pander,sep = "\n")

Summary of Row and NA’s in cleaned expenditure Most of NA’s are in rows that don’t matter
columnNames	naCount	uniqueCount
Total Rows and Columns	451772	27
cand_id	22446	1873
cand_name	19	3982
spe_id	0	1420
spe_nam	1	1779
ele_type	0	9
can_office_state	118519	63
can_office_dis	266	55
can_office	188	4
cand_pty_aff	0	11
exp_amo	0	147449
exp_date	0	2840
agg_amo	807	71179
sup_opp	0	2
pur	0	14817
pay	0	18759
file_num	0	24935
amndt_ind	0	1
tran_id	0	340043
image_num	0	90698
receipt_dat	0	2500
fec_election_yr	0	6
prev_file_num	451717	4
dissem_dt	332521	1831
uid	0	384624
tags	0	12872
activity	0	25
activityGroup	0	6

After the sample run, I imported the entire cleaned expenditure dataset. Only a very few NA’s appear in columns that concern us

cat(expendituresByActivityGroup.pander,sep = "\n")

Summary of Independent Expenditures by Activity Group elections 2010-2020 (constant 2019 dollars)
activity	activityGroup	count	aggSpend ($K)	meanSpend	sdSpend
PMedia	Advertising	20,365	2,753,480	135,206	401,204
Tv	Advertising	7,470	945,257	126,540	347,299
Adv	Advertising	26,937	656,265	24,363	114,512
Mail	Outreach	15,458	285,978	18,500	51,338
Canvass	Outreach	10,847	108,001	9,957	52,318
Content	Strategy	20,032	65,932	3,291	16,079
Web	AdvertisingV2	16,267	64,452	3,962	17,673
Radio	Advertising	4,618	61,020	13,214	37,067
Data	Strategy	8,342	50,557	6,061	18,150
Phone	Outreach	8,389	49,150	5,859	18,480
Stuff	Outreach	11,552	39,075	3,383	14,258
Staff	Overhead	247,730	22,092	89	865
Field	Outreach	1,362	18,736	13,756	53,668
Consult	Strategy	2,558	13,379	5,230	17,552
GOTV	Outreach	915	7,403	8,091	27,474
Events	Outreach	37,324	7,123	191	2,369
Other	Overhead	1,130	5,572	4,931	26,471
Outdoor	Advertising	256	3,404	13,295	67,183
Email	OutreachV2	1,565	2,409	1,539	9,245
Text	OutreachV2	772	1,019	1,319	7,138
Donor	Strategy	1,779	770	433	2,023
Mgmt	Overhead	3,921	444	113	803
Media	Advertising	1,013	442	436	2,929
SMedia	AdvertisingV2	785	177	226	781
Search	AdvertisingV2	385	69	179	905

I consolidated the twenty five activities into six activityGroups that can be meaningfully represented on a plot. Outreach consists of connecting with voters, V2 includes email and texting. Advertising is a broadcast medium, V2 includes, social media and Internet search. Strategy combines polling, fundraising (here called Donor so that it can fit on a plot), and consulting. Everything else is consigned to Overhead.

cat(summaryPACexpenditures.pander,sep = "\n")

Summary of PAC Expenditures U.S. elections 2010-2020 (constant 2019 dollars)
For/Against	Target	count	aggSpend ($K)	meanSpend	sdSpend
Against	REPUBLICAN PARTY	143,853	2,054,377	14,281	105,278
Against	DEMOCRATIC PARTY	56,943	1,753,791	30,799	218,979
For	REPUBLICAN PARTY	49,684	760,864	15,314	101,539
For	DEMOCRATIC PARTY	197,613	506,035	2,561	30,463
Against	DEMOCRATIC-FARMER-LABOR	346	38,640	111,675	211,588
Against	INDEPENDENT	1,652	13,095	7,927	44,229
For	LIBERTARIAN PARTY	235	12,526	53,303	168,517
For	DEMOCRATIC-FARMER-LABOR	660	9,778	14,815	57,783
Against	UNKNOWN	130	4,251	32,698	62,889
For	INDEPENDENT	201	2,899	14,422	23,363
For	UNKNOWN	326	2,019	6,193	16,996
Against	NONE	57	1,837	32,228	73,357
Against	CONSTITUTION PARTY	14	1,340	95,713	219,706
Against	LIBERTARIAN PARTY	7	396	56,551	83,693
Against	OTHER	15	129	8,605	23,606
For	NONE	19	124	6,538	7,425
For	GREEN PARTY	12	102	8,507	10,988
For	CONSTITUTION PARTY	2	1	749	433
For	OTHER	1	0	466	NA
For	CONSERVATIVE PARTY	2	0	1	0

The nays have it

cat(summaryPACbyElectionRound.pander,sep = "\n")

Summary of PAC Expenditures by Election Round 2010-2020 (constant 2019 dollars)
Election Cycle	Round	For/Against	count	aggSpend ($K)	meanSpend	sdSpend
2020	general	Against	51	278	5,454	9,886
2020	general	For	1,014	9,903	9,767	17,483
2020	other	Against	26	81	3,112	3,211
2020	other	For	167	2,987	17,884	59,240
2020	primary	Against	1,865	6,614	3,547	41,404
2020	primary	For	1,372	5,255	3,830	20,199
2020	run off	Against	7	285	40,767	45,707
2020	run off	For	5	42	8,308	8,548
2020	special	Against	186	5,855	31,480	96,821
2020	special	For	201	2,407	11,977	35,223
2018	general	Against	14,379	733,458	51,009	186,954
2018	general	For	23,251	206,712	8,890	57,299
2018	other	Against	513	6,856	13,365	58,718
2018	other	For	1,680	6,743	4,014	21,871
2018	primary	Against	2,939	56,368	19,179	89,028
2018	primary	For	4,217	64,726	15,349	74,312
2018	run off	Against	483	12,044	24,935	84,582
2018	run off	For	550	6,075	11,045	30,670
2018	special	Against	858	28,359	33,053	104,413
2018	special	For	934	11,360	12,162	41,020
2016	general	Against	69,523	944,632	13,587	125,793
2016	general	For	86,962	228,646	2,629	39,993
2016	other	Against	31	1,455	46,949	188,834
2016	other	For	180	1,585	8,808	20,762
2016	primary	Against	4,038	119,120	29,500	120,307
2016	primary	For	11,306	268,854	23,780	150,640
2016	run off	Against	173	2,501	14,459	46,574
2016	run off	For	206	922	4,477	17,753
2016	special	Against	4	1	361	484
2016	special	For	44	46	1,054	2,084
2014	general	Against	40,235	504,614	12,542	75,207
2014	general	For	24,257	111,841	4,611	34,631
2014	other	Against	167	6,523	39,062	106,886
2014	other	For	620	5,769	9,304	29,354
2014	primary	Against	1,945	50,132	25,775	74,947
2014	primary	For	3,386	33,860	10,000	35,651
2014	run off	Against	282	7,045	24,983	91,195
2014	run off	For	549	1,527	2,782	7,796
2014	special	Against	472	10,448	22,135	70,743
2014	special	For	513	5,143	10,025	48,684
2012	general	Against	51,553	903,675	17,529	202,616
2012	general	For	51,536	178,875	3,471	51,792
2012	other	Against	268	36,996	138,044	540,450
2012	other	For	177	6,834	38,609	99,930
2012	primary	Against	2,277	78,750	34,585	170,440
2012	primary	For	3,943	50,381	12,777	65,992
2012	run off	Against	133	7,435	55,899	167,544
2012	run off	For	234	3,012	12,871	43,206
2012	special	Against	125	4,377	35,018	64,050
2012	special	For	100	1,494	14,943	38,417
2010	general	Against	9,776	330,396	33,797	109,317
2010	general	For	26,162	60,252	2,303	19,627
2010	other	Against	19	272	14,335	30,017
2010	other	For	93	2,205	23,714	59,487
2010	primary	Against	471	3,239	6,876	21,778
2010	primary	For	3,769	8,855	2,350	14,903
2010	run off	Against	24	151	6,286	7,574
2010	run off	For	946	1,501	1,586	17,447
2010	special	Against	194	5,894	30,379	55,885
2010	special	For	381	6,537	17,158	48,948

proAntiByParty.ggplot

placeholder

The first alluvial plot:

Some vocabulary: the horizontal bars are called axes, the categories they contain are called strata. Alluvial are the flows between strata. Where they cross is called a lode. The size of the alluvium at the lode gives its y value. Its thickness between lodes follows aesthetic practicalities.
All told, more than five billion was reported by PAC’s for election rounds between 2010 and 2020. 2020 and Other parties barely register.

proAntiByCampaignActivity.ggplot

$placeholder$

placeholder

Using a log scale on the dollars spent makes it difficult to gauge the differences but it keeps all categories on the plots. I could not coerce the ggplot calls to produce a scale on the y-axis. A map distance scale bar calibrated to log intervals would have let a viewer judge the relative thickness of the alluvia.

proAntiByInetAdv.ggplot

placeholder

Again, the log scale prevents categories from getting squeezed off the plot, but over emphasizes their actual size. I can’t explain the y-scale. To get classified as social media, the expenditure’s purpose had to contain facebook, twitter etc. For search, it had to contain search, google, or bing. Web sucked up everything else.

proAntiByInetOutreach.ggplot

placeholder

Same issues with the dangers of log scale.

Cleaning will entail dropping rows where NA’s appear in any of expenditure amount (exp_amo), support/opposition (supp_opp), or election year (fec_election_yr).

Wherever the expenditure date (exp_dat) is missing the filing receipt date (rcp_dat) will be substituted.

Wherever the candidate’s party (cand_par_aff) is NA, we will try to impute it from the PAC’s (spe_id) other expenditures.

Dollar amounts will be adjusted for inflation to constant 2019 dollars. The purpose of each expenditure will be mutated to include codes indicating the activities involved.

utility functions which may hold interest or help to others:

qw similar to perl function of same name: split barewords in string into vector, default type character (what=character())

qw<-function (a, ... ) {
  if (!hasArg(what)) return(scan(text = a, quiet = TRUE, what=character(),...))
  scan(text = a, quiet = TRUE, ...)
#what = character(),
###########################################################
#### examples:
#####
#####   vector<-qw(" a b c \"embedded space\" def")
#####   vector
#####  [1] "a" "b" "c" "embedded space" "def"
#####   
#####   vector<-qw("1 2 3")
#####   vector
#####  [1] "1" "2" "3"
#####
#####  ... represents arguments to be passed to scan which may or may not work
###########################################################

}

nWc convert a numeric value to a string formatted with commas with an optional number of fractional digits.

nWc<-function(amt,precision=0) format(round(as.numeric(amt), precision), nsmall=precision, big.mark=",",justify = "right")

tallyNAsUniqs will return, by column, the number of NA’s and unique values in a tibble/dataframe. pretty=TRUE returns the skeleton of a knitable table.

tallyNAsUniqs <- function(exp, pretty=FALSE, caption="default") {
  
# pass argument caption or take the default  
                 
require(tidyverse)               
# will require(pander) if pretty=TRUE
if (caption=="default")  {
      caption=paste("Tally of NA's and unique values in",
                    "all columns of",
                    deparse(substitute(exp)),
                    collapse = " ") 
}
nas <- exp %>% 
        ungroup() %>%
        summarise_all(~ sum(is.na(.)),~ n_distinct(.)) %>%
        #summarise_all(~ n_distinct(.)) %>%
        gather()

uniques <- exp %>% 
        ungroup() %>%
        #summarise_all(~ sum(is.na(.)),~ n_distinct(.)) %>%
        summarise_all(~ n_distinct(.)) %>%
        gather()

nas<-cbind(nas,uniques$value)

names(nas)<-c("columnNames","naCount","uniqueCount")

nrows<-nrow(exp)
ncols<-ncol(exp)

nas<-nas %>%
  add_row(columnNames="Total Rows and Columns",
          naCount=nrows,
          uniqueCount=ncols,
          .before = 1)

# specifically for Rmarkdown:
if (pretty) { 
  #pretty print tally:
  require(pander)
  panderOptions('table.split.table', Inf)
  set.caption(caption)
  return(pander_return(nas, style = 'rmarkdown'))
  return(nas)
##########################################################
#
#  
# To print tally use command cat(nas, sep="\n")
# To knit, enclose in its own chunk:
#####   ```{r, results='asis'}  
#####       cat(nas,"\n")
#####   ```
#
#
##########################################################
 }

return(nas)
}

load libraries

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 3.2.1     ✔ purrr   0.3.2
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0

## ── Conflicts ────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

library(pander)
library(readxl)
library(httr)
library(digest)
library(ggplot2)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

library(ggalluvial)
library(ggthemes)

Urls to FEC’s PAC independent expenditures

for election years 2010-2020

urls<-as.character(c(
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2010/independent_expenditure_2010.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2012/independent_expenditure_2012.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2014/independent_expenditure_2014.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2016/independent_expenditure_2016.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2018/independent_expenditure_2018.csv",
  "https://cg-519a459a-0ea3-42c2-b7bc-fa1143481f74.s3-us-gov-west-1.amazonaws.com/bulk-downloads/2020/independent_expenditure_2020.csv"
))

Combine all election years

expenditure<-read_csv(urls[1])

for ( i in urls[-1] ) {
  a<-read_csv(i)
  expenditure<-bind_rows(expenditure,a)
}

Tally the NA’s and unique values in the dataset as downloaded:

# and do it now
firstNAcount.pander<-tallyNAsUniqs(expenditure,pretty = TRUE)

cat(firstNAcount.pander, sep="\n")

[1] “| columnNames | naCount | uniqueCount ||:———————-:|:——-:|:———–:|| Total Rows and Columns | 450039 | 23 || cand_id | 36474 | 1907 || cand_name | 22 | 4552 || spe_id | 0 | 1740 || spe_nam | 73 | 2191 || ele_type | 166 | 9 || can_office_state | 123470 | 63 || can_office_dis | 332 | 65 || can_office | 252 | 4 || cand_pty_aff | 40931 | 13 || exp_amo | 13 | 67400 || exp_date | 13921 | 2905 || agg_amo | 776 | 85895 || sup_opp | 171 | 5 || pur | 81 | 18102 || pay | 565 | 21367 || file_num | 0 | 32197 || amndt_ind | 0 | 10 || tran_id | 0 | 352943 || image_num | 0 | 107485 || receipt_dat | 0 | 2691 || fec_election_yr | 0 | 6 || prev_file_num | 401491 | 3444 || dissem_dt | 348574 | 1871 |: Tally of NA’s and unique values in all columns of expenditure” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Take a 50,000 row sample of expenditure so that knitting can finish this November

expenditure<-sample_n(expenditure,50000)

Handle the simple NA’s: replace, mutate or drop row:

expenditure<-expenditure %>%
  # mutate coerce to "s" or "o"
  mutate(sup_opp= tolower(sup_opp)) %>%
  mutate(sup_opp= gsub("0","o",sup_opp)) %>%
  filter(sup_opp=="o"|sup_opp=="s") %>%
  drop_na(exp_amo,sup_opp) %>% # no point in dealing with no money or no point of view
  mutate(exp_amo=round(exp_amo)) %>%
  replace_na(list(pur="other",ele_type="other",pay="unknown")) %>%
  
  # form unique identifier for row
  mutate(uid=paste(tran_id,file_num)) %>%
  
  filter(exp_amo>0) %>% # need expenditure amount to be positive
  filter(exp_amo!=50000000) %>% # bogus $50M in-kind donation by Ponzi scheme suspect 
  filter(amndt_ind=="N") # new transactions only, filter ammendments

# replace na in exp_dat with corresponding receipt_dat
a<-is.na(expenditure$exp_date)
expenditure$exp_date[a]<-expenditure$receipt_dat[a]

convert exp_date [currently dd-mmm-yy ] into standard date format via lubridate/parse

# convert two digit year 20-NOV-10 --> 20-NOV-2010
a<-gsub("(?:(..)$)","20\\1",expenditure$exp_date,perl = TRUE)
expenditure$exp_date<-parse_date_time(a,"dmy")

Convert to constant 2019 dollars

Step 1: Get Consumer Price Index table from the Bureau of Labor Statistics

require(readxl)
require(httr)
url <- "https://www.bls.gov/cpi/research-series/allitems.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))

## Response [https://www.bls.gov/cpi/research-series/allitems.xlsx]
##   Date: 2019-12-03 08:23
##   Status: 200
##   Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
##   Size: 16.6 kB
## <ON DISK>  /tmp/RtmpeaIOAH/file260e2556c93b.xlsx

d<-read_excel(tf,skip=6,n_max=75)

# extract items of interest from 2008 to 2018
inflationFactor<-d[32:42,c("YEAR","AVG")]

we need to construct a linear model to predict inflation for 2019

lm<-lm(AVG~YEAR,inflationFactor)
f2019<-predict(lm,data.frame(YEAR=2019))
# add 2019 & f2019 to the factor tibble:
inflationFactor<-rbind(inflationFactor,as_tibble(list(YEAR=2019,AVG=f2019)))

Step 2: Set up helper functions to do conversion:

consDollar<-function(expAmount,expDate,toYear,factor=inflationFactor) {
  f<-function(amt,date,toYear) {
    # convert one dollar spent on date to toYear constant dollars:
    # get from year factor:
    fromYear=year(date)
    fromFactor=factor$AVG[factor$YEAR==fromYear]
    toFactor=factor$AVG[factor$YEAR==toYear]
    return(amt*toFactor/fromFactor)
  }
  # use mapply to convert all vector quantities:
  return( mapply(f,expAmount,expDate,toYear) )
}

Step3 perform the conversion to 2019 dollars

expenditure$exp_amo<-consDollar(expenditure$exp_amo,expenditure$exp_date,2019)

Recode expenditure purpose into categorical tags

expenditure <- expenditure %>%
    # remove all character not lower case, digit, or space: ie. get rid of punctuation
     mutate( tags=gsub("[^a-z0-9 ]","",tolower(pur)))  %>%

    mutate(tags=gsub("(^| )ad(s |s$| |$|v)|meida|meda buy|direct marke|targeti|air time|prerecorded messages|voter education"," Adv ",tags)) %>%
    mutate(tags=gsub("canv|vass|walk|door[ ]*to|outreach|clipboards|distribution|file|organizer|translation"," Canvass ",tags)) %>% mutate(tags=gsub("consult|logistics|management|marketing|planning|strategic|strategy"," Consult ",tags)) %>%

    mutate(tags=gsub("video|writing|teleconference|public relations|press release|creative|proofreading|prodcution|production|(^| )art|design|messag|communication|( |^)pr( |$)|propaganda|copywritng|editing|images|photo|photograph|photography|photos"," Content ",tags)) %>%

    mutate(tags=gsub("voter contact|survey|research|poll( |ing)|data|list|opinion| id|message testing|poll"," Data ",tags)) %>%

    mutate(tags=gsub("donation|fundraising|donor"," Donor ",tags)) %>%

    mutate(tags=gsub("(^| )e mail|( |^)email *"," Email ",tags)) %>%
    mutate(tags=gsub("event|tour|rental|townhall|rally|hotel|bus[^i]|travel|transportation|flight|airfare|parking|toll|food|bever|crowd building|catering|costumes|meal|refreshments|teleprompter|telethon"," Events ",tags)) %>%
    mutate(tags=gsub("field|groundgame|grassroots|organizing|coordinating|lunch|meals|stipends|volunteer|volunteers"," Field ",tags)) %>%
    mutate(tags=gsub("gotv|get out the vote"," GOTV ",tags)) %>%
    mutate(tags=gsub("(^| )mail|(^|[^eE])mailing|usps|postage|zip|ship|courier|envelope|envelopes|stamps"," Mail ",tags)) %>%
    mutate(tags=gsub("news|newsletter|telepresser|tipsheet"," Media ",tags)) %>%

    mutate(tags=gsub("rent|utilities|office|supplies|administrative|toner"," Mgmt ",tags)) %>%

    mutate(tags=gsub("outdoor|billboard"," Outdoor ",tags)) %>%

    mutate(tags=gsub("phone|call|robo|dialer|predictive|telemarket|telemarking|telephoning"," Phone ",tags)) %>%

    mutate(tags=gsub("newsp|magaz|media|journal|print ad"," PMedia ",tags)) %>%
    mutate(tags=gsub("radio"," Radio ",tags)) %>%
    mutate(tags=gsub("( |^)search|google|bing"," Search ",tags)) %>%
    mutate(tags=gsub("social m|fb|twitter|facebook|instagram|snapc"," SMedia ",tags)) %>%
    mutate(tags=gsub("wages|per diem|payroll|salary|salaries|hourly pay|benefits|staff|mileage|gas|lodging|housing"," Staff ",tags))  %>%

    mutate(tags=gsub("print|liter|hanger|ballot|brochure|sticker|leafl|flier|flyer|card|yard sign|postcard|banner|sign|books|booklets|button|shirt|handbills|(^| )sign|material|guide|priniting|magnet|pins|clothing|apparel|decals|fulfillment|handouts|hats|labels|lilterature|( |^)lit( |$)|paraphernalia|posters|reflectors"," Stuff ",tags)) %>%
    mutate(tags=gsub("text(ing)*|mobile device"," Text ",tags)) %>%
    mutate(tags=gsub("cable|tv|television|tv"," Tv ",tags)) %>%
    mutate(tags=gsub("internet|digital|onlin|web|emarketing|electronic"," Web ",tags)) %>%
      mutate( tags=gsub("^other$","Other",tags)) %>%
      mutate( tags=gsub("^[ ]*$","Other",tags)) 

#      mutate( tags=gsub("","",tags)) %>%

Extract activities from tags

cleanActivity<-function(a){
  applyr<-function(a) 
      paste(sort(unique(str_split(a," ")[[1]])),collapse=" ")
  sapply(a, applyr)
}
    
expenditure <- expenditure %>%
  # remove all words consisting of lower case letters and numbers
  mutate(activity=gsub("(^| )[a-z0-9]*"," ",tags)) %>%
  # collapse all runs of multiple spaces
  mutate(activity=gsub("  +"," ",activity)) %>%
  # remove leading and trailing space
  mutate(activity=gsub("(^ | $)","",activity)) %>%
  # remove duplicates and sort
  mutate(activity=cleanActivity(activity)) %>%
  # replace null activities with Other
  mutate(activity=gsub("^$","Other",activity))

Create activityDistribution for apportioning expenditure amounts (exp_amo) over multiple activities

activityDistribution<- expenditure %>%
  select(activity,exp_amo) %>%
  # allow only rows of a single activity
  filter(grepl("^[A-Za-z0-9]+$",activity)) %>%
  group_by(activity) %>%
  summarise(count=n(),aggSpend=sum(exp_amo,na.rm = TRUE),meanSpend=mean(exp_amo),sdSpend=sd(exp_amo)) %>%
  arrange(desc(aggSpend))

a<-activityDistribution %>%
  mutate(aggSpend=nWc(aggSpend), meanSpend=nWc(meanSpend),sdSpend=nWc(sdSpend))

  require(pander)
  panderOptions('table.split.table', Inf)
  set.caption("Tally and aggregate amounts of single activities")
  activityDistribution.pander<-pander_return(a, style = 'rmarkdown')

  cat(activityDistribution.pander, sep="\n")

[1] “| activity | count | aggSpend | meanSpend | sdSpend ||:——–:|:—–:|:———–:|:———:|:——-:|| PMedia | 1319 | 272,327,529 | 206,465 | 459,728 || Mail | 1112 | 25,289,886 | 22,743 | 50,358 || Canvass | 743 | 6,700,579 | 9,018 | 40,747 || Adv | 171 | 5,475,692 | 32,022 | 82,587 || Tv | 26 | 4,434,845 | 170,571 | 224,841 || Phone | 560 | 3,982,140 | 7,111 | 17,567 || Stuff | 815 | 3,129,714 | 3,840 | 13,210 || Staff | 27128 | 2,822,474 | 104 | 1,068 || Data | 191 | 2,050,980 | 10,738 | 12,687 || Radio | 44 | 1,633,670 | 37,129 | 44,553 || Content | 243 | 1,330,640 | 5,476 | 10,730 || Other | 154 | 1,160,839 | 7,538 | 45,711 || Field | 35 | 932,087 | 26,631 | 44,637 || Consult | 250 | 908,087 | 3,632 | 6,134 || Events | 4033 | 452,281 | 112 | 1,077 || Web | 139 | 357,067 | 2,569 | 7,163 || Outdoor | 9 | 126,191 | 14,021 | 12,859 || Email | 43 | 117,577 | 2,734 | 4,635 || Donor | 156 | 72,515 | 465 | 821 || Mgmt | 373 | 43,365 | 116 | 361 || Text | 10 | 28,660 | 2,866 | 4,113 || SMedia | 4 | 15,376 | 3,844 | 4,902 || GOTV | 5 | 12,566 | 2,513 | 3,634 || Media | 3 | 10,378 | 3,459 | 5,508 || Search | 1 | 1,278 | 1,278 | NA |: Tally and aggregate amounts of single activities” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Functions for apportioning expenditures over multiple activities weighted by the aggregate expenditures of single activities

# generate N samples from a truncated normal distribution between between values a and b  
rTruncNorm <- function(N, mean = 0, sd = 1, a = -Inf, b = Inf) {
  if (a > b) stop('Error: Truncation range is empty');
  U <- runif(N, pnorm(a, mean, sd), pnorm(b, mean, sd));
  qnorm(U, mean, sd); }

# generate table of 1000 trunc Norm distributed values for each activity:
trNorm<-matrix(0,nrow=nrow(activityDistribution),ncol=1000)
row.names(trNorm)<-activityDistribution$activity

genTrNorm<-function(activity,mean,sd){
  trNorm[activity,1:1000]<<-rTruncNorm(1000,mean=mean,sd=sd,a=0)
  TRUE
}

a<- activityDistribution %>%
  mutate(a=mapply(genTrNorm,activity,meanSpend,sdSpend)) %>%
  filter(FALSE)

## Warning in runif(N, pnorm(a, mean, sd), pnorm(b, mean, sd)): NAs produced

getTrNorm<-function(activity) trNorm[activity,ceiling(runif(1)*1000)]

apportionActs<-function(activityString,exp_amo) {
  a<-(strsplit(activityString," "))[[1]]
  dist<- activityDistribution %>%
   filter(activity %in% c(a) ) %>%
    mutate(ct=n()) %>%
    mutate(fracAmo=0) %>%
    mutate(randomTruncatedNormal=getTrNorm(activity)) %>%
    #mutate(randomTruncatedNormal=rTruncNorm(ct,mean=meanSpend,sd=sdSpend,a=0)) %>%
    mutate(fracAmo=exp_amo*randomTruncatedNormal/sum(randomTruncatedNormal))
  return(dist[,c("activity","fracAmo")])
}

# initialize tibble to hold apportioned rows

applyAppor<-function(Row) {
  Row<-as.data.frame(Row,stringsAsFactors = FALSE)
  appo<-apportionActs(as.character(Row$activity),Row$exp_amo)
  for (i in 1:nrow(appo)) {
    Row$activity<-appo$activity[i]
    Row$exp_amo<-appo$fracAmo[i]
    rowBoundApportions<<-rbind(rowBoundApportions,Row)
  }
  Row[-1,]
}

Allocate expenditure amounts for rows containing multiple activities

rowBoundApportions<-expenditure[1,]
rowBoundApportions<-rowBoundApportions[-1,]

# generate apportioned rows, add them to rowBoundApportions
discard <- expenditure %>%
  filter(grepl(" ",activity)) %>%
  rowwise() %>%
  do(applyAppor(.)) %>%
  filter(FALSE)
  

# add the apportioned rows to expenditures
expenditure<-rbind(expenditure,rowBoundApportions)

# drop the rows containing spaces that were apportioned
expenditure<-expenditure %>%
  filter(!grepl(" ",activity))

Build functions and indices for imputing political affiliations where missing

# identify PACs (spe_id's) with expenditures for unspecified party affiliations
spesWoPartyAffiation<- expenditure %>%
  filter(is.na(cand_pty_aff)) %>%
  group_by(spe_id) %>%
  summarise(ct=n())

# for these, find their expenditures where party affiliation is specified, then summarize
spesWpartyAffiliation<- expenditure %>%
  filter(!is.na(cand_pty_aff), spe_id %in% spesWoPartyAffiation$spe_id) %>%
  select(cand_pty_aff, spe_id,sup_opp, ele_type) %>%
  group_by(spe_id,sup_opp,ele_type,cand_pty_aff) %>%
  summarise(ct=n()) %>%
  arrange(spe_id,sup_opp,ele_type,cand_pty_aff,ct)
  
#  Function for imputing party affiliation  
imputePartyAffiliation<-function(candPtyAff,speId,suppOpp,eleType ) {
  
# from spender id, support/opposition, party affiliation != NA construct frequency table for spender:

spenderFreq<- spesWpartyAffiliation %>%
  filter(spe_id==speId, sup_opp==suppOpp, ele_type==eleType) %>%
  group_by(cand_pty_aff) %>%
  summarise(ct=sum(ct)) %>%
  arrange(ct)

  if (nrow(spenderFreq) == 0) return(NA)  # no other affiliations
  if (nrow(spenderFreq) == 1) return(spenderFreq$cand_pty_aff[1]) # only one part affiliation

# PAC supports/oppose candidates of multiple parties:
# pick one from a probability weighted summary of PAC's parties 
  bins<-c(0,spenderFreq$ct)[1:nrow(spenderFreq)]
  i<-findInterval(runif(1)*sum(spenderFreq$ct),bins)
# return imputed party affiliation
  return(spenderFreq$cand_pty_aff[i])
}

mapplyImputePartyAffiliation<-function(cand_pty_aff, spe_id,sup_opp,ele_type) {
  mapply(imputePartyAffiliation,cand_pty_aff, spe_id,sup_opp,ele_type)
}

Apply candidate party affiliation imputation

affils <- expenditure %>%
  filter(is.na(cand_pty_aff))


affils <- affils %>%
  mutate(cand_pty_aff= mapplyImputePartyAffiliation(cand_pty_aff, spe_id,sup_opp,ele_type)) %>%
  #mutate(cand_pty_aff= mapply(imputePartyAffiliation,cand_pty_aff, spe_id,sup_opp,ele_type)) %>%
  drop_na(cand_pty_aff)

Bind imputed affiliation rows to expenditure, drop rows where expenditure$can_pty_aff has NA’s

expenditure<-rbind(expenditure,affils)

expenditure<-expenditure %>% drop_na(cand_pty_aff)

# summarise activities:
a<- expenditure %>%
  select(activity,exp_amo) %>%
  group_by(activity) %>%
  summarise(count=n(),spend=sum(exp_amo)) %>%
  arrange(desc(spend)) %>%
  mutate(kDollar=sprintf("$%0.2fK",spend/1000)) %>%
  select(activity,count,kDollar)


panderOptions('table.split.table', Inf)
  set.caption("Tally and aggregate amounts of single activities")
  singleActivityAggSpend.pander<-pander_return(a, style = 'rmarkdown')

  cat(singleActivityAggSpend.pander, sep="\n")

[1] “| activity | count | kDollar ||:——–:|:—–:|:———–:|| PMedia | 2252 | $319036.50K || Tv | 803 | $104953.40K || Mail | 1680 | $29750.41K || Content | 2115 | $10104.95K || Canvass | 1104 | $9192.28K || Radio | 481 | $6528.01K || Data | 938 | $5566.16K || Phone | 913 | $5307.36K || Stuff | 1220 | $4118.92K || Staff | 27371 | $2581.45K || Field | 150 | $1666.47K || Consult | 329 | $1243.06K || Other | 120 | $1053.55K || Events | 4269 | $572.92K || Outdoor | 26 | $310.22K || GOTV | 95 | $174.53K || Email | 150 | $155.12K || Text | 75 | $109.18K || Donor | 225 | $78.10K || Mgmt | 389 | $42.53K || Media | 103 | $33.61K || Adv | 2904 | $NAK || Search | 40 | $NAK || SMedia | 73 | $NAK || Web | 1791 | $NAK |: Tally and aggregate amounts of single activities” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Combine activities into groups:

Outreach traditional means
GOTV
Events
Field
Phone
Materials
Mail
Canvass
OutreachV2 using the Internet & text messaging
Text
Email
Advertising tradiontal means
Radio
PrintMedia
Tv
Adv
Outdoor
AdvertisingV2 using Internet
Web
SocialMedia
Search
Strategy
Video
Data
Consult
Funding
Messaging
Overhead
Staff
Other
Mgmt

a<-sort(unique(expenditure$activity))
names(a)<-a
a[qw(" GOTV Events  Field  Phone  Stuff Mail   Canvass")]<-"Outreach"
a[qw("Text Email")]<-"OutreachV2"
a[qw("Radio PMedia Tv Adv Outdoor Media")]<-"Advertising"
a[qw("Web SMedia Search")]<-"AdvertisingV2"
a[qw("Data Consult Donor Content")]<-"Strategy"
a[qw("Staff Other Mgmt")]<-"Overhead"

expenditure[,"activityGroup"]<-a[expenditure$activity]

At this point the sampled data are as clean as they are going to get. And, now, fresh from the oven, let’s get the complete cleaned dataset.

 load(url("https://github.com/sdutky/mcData110/raw/master/pacExpenditures/expenditure.rdata"))

Let’s see what we have got:

# look at the NA's:
summaryOfRowsAndNAs.pander<-tallyNAsUniqs(expenditure,pretty=TRUE,
    caption="Summary of Row and NA's in cleaned expenditure\nMost of NA's are in rows that don't matter")

  cat(summaryOfRowsAndNAs.pander, sep="\n")

[1] “| columnNames | naCount | uniqueCount ||:———————-:|:——-:|:———–:|| Total Rows and Columns | 451772 | 27 || cand_id | 22446 | 1873 || cand_name | 19 | 3982 || spe_id | 0 | 1420 || spe_nam | 1 | 1779 || ele_type | 0 | 9 || can_office_state | 118519 | 63 || can_office_dis | 266 | 55 || can_office | 188 | 4 || cand_pty_aff | 0 | 11 || exp_amo | 0 | 147449 || exp_date | 0 | 2840 || agg_amo | 807 | 71179 || sup_opp | 0 | 2 || pur | 0 | 14817 || pay | 0 | 18759 || file_num | 0 | 24935 || amndt_ind | 0 | 1 || tran_id | 0 | 340043 || image_num | 0 | 90698 || receipt_dat | 0 | 2500 || fec_election_yr | 0 | 6 || prev_file_num | 451717 | 4 || dissem_dt | 332521 | 1831 || uid | 0 | 384624 || tags | 0 | 12872 || activity | 0 | 25 || activityGroup | 0 | 6 |: Summary of Row and NA’s in cleaned expenditureof NA’s are in rows that don’t matter” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Now let’s see how money was spent:

a<-expenditure %>%
  group_by(activity, activityGroup) %>%
  summarise(count=nWc(n(),0),
            aggOrder=sum(exp_amo),
            aggSpend=nWc(sum(exp_amo)/1000,0),
            meanSpend=nWc(mean(exp_amo),0),
            sdSpend=nWc(sd(exp_amo),0)) %>%
  arrange(desc(aggOrder)) %>%
  select(activity,activityGroup,count,aggSpend,meanSpend,sdSpend)
  
names(a)<-qw("activity  activityGroup  count   \"aggSpend ($K)\"   meanSpend  sdSpend")

panderOptions('table.split.table', Inf)
  set.caption("Summary of Independent Expenditures by Activity Group elections 2010-2020\n(constant 2019 dollars)")
 expendituresByActivityGroup.pander<- pander_return(a, style = 'rmarkdown')

 cat(expendituresByActivityGroup.pander, sep="\n")

[1] “| activity | activityGroup | count | aggSpend ($K) | meanSpend | sdSpend ||:——–:|:————-:|:——-:|:————-:|:———:|:——-:|| PMedia | Advertising | 20,365 | 2,753,480 | 135,206 | 401,204 || Tv | Advertising | 7,470 | 945,257 | 126,540 | 347,299 || Adv | Advertising | 26,937 | 656,265 | 24,363 | 114,512 || Mail | Outreach | 15,458 | 285,978 | 18,500 | 51,338 || Canvass | Outreach | 10,847 | 108,001 | 9,957 | 52,318 || Content | Strategy | 20,032 | 65,932 | 3,291 | 16,079 || Web | AdvertisingV2 | 16,267 | 64,452 | 3,962 | 17,673 || Radio | Advertising | 4,618 | 61,020 | 13,214 | 37,067 || Data | Strategy | 8,342 | 50,557 | 6,061 | 18,150 || Phone | Outreach | 8,389 | 49,150 | 5,859 | 18,480 || Stuff | Outreach | 11,552 | 39,075 | 3,383 | 14,258 || Staff | Overhead | 247,730 | 22,092 | 89 | 865 || Field | Outreach | 1,362 | 18,736 | 13,756 | 53,668 || Consult | Strategy | 2,558 | 13,379 | 5,230 | 17,552 || GOTV | Outreach | 915 | 7,403 | 8,091 | 27,474 || Events | Outreach | 37,324 | 7,123 | 191 | 2,369 || Other | Overhead | 1,130 | 5,572 | 4,931 | 26,471 || Outdoor | Advertising | 256 | 3,404 | 13,295 | 67,183 || Email | OutreachV2 | 1,565 | 2,409 | 1,539 | 9,245 || Text | OutreachV2 | 772 | 1,019 | 1,319 | 7,138 || Donor | Strategy | 1,779 | 770 | 433 | 2,023 || Mgmt | Overhead | 3,921 | 444 | 113 | 803 || Media | Advertising | 1,013 | 442 | 436 | 2,929 || SMedia | AdvertisingV2 | 785 | 177 | 226 | 781 || Search | AdvertisingV2 | 385 | 69 | 179 | 905 |: Summary of Independent Expenditures by Activity Group elections 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

And what were the targets?

supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")

a<-expenditure %>%
  group_by(cand_pty_aff, sup_opp) %>%
  summarise(count=nWc(n(),0),
            aggOrder=sum(exp_amo),
            aggSpend=nWc(sum(exp_amo)/1000,0),
            meanSpend=nWc(mean(exp_amo),0),
            sdSpend=nWc(sd(exp_amo),0)) %>%
  arrange(desc(aggOrder)) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  select(oS,cand_pty_aff,count,aggSpend,meanSpend,sdSpend)

  
names(a)<-qw("For/Against  Target  count   \"aggSpend ($K)\"   meanSpend  sdSpend")

panderOptions('table.split.table', Inf)
  set.caption("Summary of PAC Expenditures U.S. elections 2010-2020\n(constant 2019 dollars)")
  summaryPACexpenditures.pander<-pander_return(a, style = 'rmarkdown')

  cat(summaryPACexpenditures.pander, sep="\n")

[1] “| For/Against | Target | count | aggSpend ($K) | meanSpend | sdSpend ||:———–:|:———————–:|:——-:|:————-:|:———:|:——-:|| Against | REPUBLICAN PARTY | 143,853 | 2,054,377 | 14,281 | 105,278 || Against | DEMOCRATIC PARTY | 56,943 | 1,753,791 | 30,799 | 218,979 || For | REPUBLICAN PARTY | 49,684 | 760,864 | 15,314 | 101,539 || For | DEMOCRATIC PARTY | 197,613 | 506,035 | 2,561 | 30,463 || Against | DEMOCRATIC-FARMER-LABOR | 346 | 38,640 | 111,675 | 211,588 || Against | INDEPENDENT | 1,652 | 13,095 | 7,927 | 44,229 || For | LIBERTARIAN PARTY | 235 | 12,526 | 53,303 | 168,517 || For | DEMOCRATIC-FARMER-LABOR | 660 | 9,778 | 14,815 | 57,783 || Against | UNKNOWN | 130 | 4,251 | 32,698 | 62,889 || For | INDEPENDENT | 201 | 2,899 | 14,422 | 23,363 || For | UNKNOWN | 326 | 2,019 | 6,193 | 16,996 || Against | NONE | 57 | 1,837 | 32,228 | 73,357 || Against | CONSTITUTION PARTY | 14 | 1,340 | 95,713 | 219,706 || Against | LIBERTARIAN PARTY | 7 | 396 | 56,551 | 83,693 || Against | OTHER | 15 | 129 | 8,605 | 23,606 || For | NONE | 19 | 124 | 6,538 | 7,425 || For | GREEN PARTY | 12 | 102 | 8,507 | 10,988 || For | CONSTITUTION PARTY | 2 | 1 | 749 | 433 || For | OTHER | 1 | 0 | 466 | NA || For | CONSERVATIVE PARTY | 2 | 0 | 1 | 0 |: Summary of PAC Expenditures U.S. elections 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

Lining up an alluvial plot #1:

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"other"

supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")

a<-expenditure %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,oS,target) %>%
  #group_by(target, sup_opp) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  #filter(target=="Dem") %>%
  select(cycle,oS,target,aggSpend) %>%
  arrange(cycle,oS,target)

# ggplot it:

proAntiByParty.ggplot  <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis2 = oS, axis3 = target)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = -1, knot.pos = 1/10, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle)  , width = 1/6, knot.pos = 1/6, reverse = FALSE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","For/Against",  "Target")) +
  scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent") +
  #scale_y_log10(name="total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
  coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections 2010-2020")

proAntiByParty.ggplot

And now incorporating activityGroups

Lining up an alluvial plot #2:

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"Other"

supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")

# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr

a<-expenditure %>%
  #mutate(activityGroup=agRecode[activityGroup]) %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(proAnti=paste(oS,target,sep="")) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,proAnti,activityGroup) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  filter(aggSpend>1) %>%
  filter(cycle %in% qw( "2010 2018")) %>%
  select(cycle,proAnti,activityGroup,aggSpend) %>%
  arrange(cycle,proAnti,activityGroup)

# ggplot it:

proAntiByCampaignActivity.ggplot <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis3 = proAnti, axis2 = activityGroup)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  #geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
  geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
  #geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
  #scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  #scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
  #coord_trans(y="log10")+
  #scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
  scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
  #scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
  #coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections By Activity 2010 & 2018")

proAntiByCampaignActivity.ggplot

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"Other"

supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")

# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr

a<-expenditure %>%
  filter(activityGroup=="AdvertisingV2") %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(proAnti=paste(oS,target,sep="")) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,proAnti,activity) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  filter(aggSpend>1) %>%
  filter(cycle %in% qw( "2010 2018")) %>%
  select(cycle,proAnti,activity,aggSpend) %>%
  arrange(cycle,proAnti,activity)

# ggplot it:

proAntiByInetAdv.ggplot <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis3 = proAnti, axis2 = activity)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  #geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
  geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
  #geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
  #scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  #scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
  #coord_trans(y="log10")+
  #scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
 scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
 #scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(.x))) +
  #coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections By Internet Advertising                                                                     2010-2020")

proAntiByInetAdv.ggplot

# Like always, there's no room to fit anybody who is not Dem or GOP:
pty<-sort(unique(expenditure$cand_pty_aff))
partyRecode<-pty
names(partyRecode)<-pty
partyRecode<-gsub("DEMOCRATIC PARTY","Dem",partyRecode)
partyRecode<-gsub("REPUBLICAN PARTY","GOP",partyRecode)
partyRecode[!grepl("Dem|GOP",partyRecode)]  <-"Other"

supOppRecode<-qw("anti pro")
names(supOppRecode)<-qw("o s")

# recode activityGroup so it fits in statum
agr<-sort(unique(expenditure$activityGroup))
agRecode<-qw("Advert AdvertV2 F2f F2fV2 Overhead Strategy")
names(agRecode)<-agr

a<-expenditure %>%
  filter(activityGroup=="OutreachV2") %>%
  mutate(target=partyRecode[cand_pty_aff]) %>%
  mutate(oS=supOppRecode[sup_opp]) %>%
  mutate(proAnti=paste(oS,target,sep="")) %>%
  mutate(cycle=as.character(fec_election_yr)) %>%
  group_by(cycle,proAnti,activity) %>%
  summarise(
            aggSpend=sum(exp_amo)
  ) %>%
  filter(aggSpend>1) %>%
  filter(cycle %in% qw( "2010 2018")) %>%
  select(cycle,proAnti,activity,aggSpend) %>%
  arrange(cycle,proAnti,activity)

# ggplot it:

proAntiByInetOutreach.ggplot <- ggplot(as.data.frame(a),
       aes(y = aggSpend,
           axis1 = cycle, axis3 = proAnti, axis2 = activity)) +
  #geom_alluvium(aes(fill =cycle))+ #, width = 0, knot.pos = 0, reverse = FALSE) +
  geom_alluvium(aes(fill =cycle), width = 1/12, knot.pos = 1/6, reverse = FALSE, show.legend = TRUE) +
  guides(fill = FALSE) +
  geom_stratum(width = 1/5, reverse = FALSE) +
  #geom_text(stat = "stratum", vjust="inward",label.strata = TRUE, reverse = FALSE) +
  geom_text(stat = "stratum", vjust="center",label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata=TRUE,fill="green",vjust="center") +
  #geom_text(stat = "stratum", label.strata = TRUE, reverse = FALSE) +
  #geom_label(stat = "stratum", label.strata = TRUE,vjust="inward") +
  #scale_x_discrete(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  scale_x_continuous(breaks = 1:3, labels = c("Election Cycle","Means",  "Pro/Anti")) +
  #scale_y_continuous(name="total spent")+
  #scale_y_continuous(trans = "log10",name="total spent",limits=NULL) +
  #coord_trans(y="log10")+
  #scale_y_log10(name="log total spent",breaks = 1e+100*c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09), labels = c(2e+03,2e+04,2e+05,2e+06,2e+07,2e+08,2e+09)) +
  scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", scientific_format())) +
  #scale_y_log10(name="log total spent",breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
  #coord_flip() +
  theme_economist()+
  ggtitle("Money Flows for U.S. Elections By Internet Outreach                                                                     2010-2020")

proAntiByInetOutreach.ggplot

Chunks for surveying and assessing expenditures:

Can we impute party affiliations?

affils <- expenditure %>%
  select(spe_id,ele_type,cand_pty_aff,sup_opp) %>%
  group_by(spe_id,ele_type,sup_opp,cand_pty_aff ) %>%
  summarize(ct=n())

naAffils<-affils %>% filter(is.na(cand_pty_aff)) %>% group_by(spe_id) %>% summarise()

affils<-affils %>% filter(spe_id %in% naAffils$spe_id) %>% select(everything())

oneOrMoreAffils <- affils %>% filter(!is.na(cand_pty_aff)) %>% group_by(spe_id) %>% summarise()

affils<-affils %>% filter(spe_id %in% oneOrMoreAffils$spe_id ) %>% select(everything())

nasImputable<- affils %>% filter(is.na(cand_pty_aff)) %>% group_by(sup_opp) %>% summarise(sum(ct))
nasSum<-expenditure %>% select(everything()) %>%  filter(is.na(cand_pty_aff)) %>% group_by(sup_opp) %>% summarise(ct=n())

greps for consolidating tags:

b<-grep("salary|benefits|staff|mileage|gas|lodging",a$tags,value = TRUE)
b<-grep("(^| )mail|usps|postage|zip",a$tags,value = TRUE)
b<-grep("print|liter|hanger|brochure|sticker|leafl|flier|pamphlet",a$tags,value = TRUE)
b<-sample(grep("[A-Z]",a$tags,value = TRUE,perl = TRUE,invert = TRUE),1000)
b<-grep("event|tour|field",a$tags,value = TRUE)
b<-grep("[A-Z]",a$tags,value = TRUE,perl = TRUE,invert = TRUE)

#b<-grep("",a$tags,value = TRUE)
view(b)

more greps

b<-grep("internet|digital|onlin|web",a$tags,value = TRUE)
b<-grep("print",a$tags,value = TRUE)
b<-grep("survey|research|poll( |ing)|data",a$tags,value = TRUE)
b<-grep("canv|door[ ]*to",a$tags,value = TRUE)
b<-grep("[A-Z]",a$tags,value = TRUE)
b<-grep("phone|calls",a$tags,value = TRUE)

#b<-grep("",a$tags,value = TRUE)
view(b)

yet more greps of tags

b<-grep("[^e]mail",a$tags,value = TRUE)
b<-grep(" e mail",a$tags,value = TRUE,perl = TRUE)
b<-grep(" e mail",a$tags,value = TRUE)
b<-grep("Text",a$tags,value = TRUE)
b<-grep("(^|[^O]*)[tT][vV]",a$tags,value = TRUE)
b<-grep("newsp|magaz|media|journal|print ad",a$tags,value = TRUE)
b<-grep("social n|fb|twitter|facebook|instagram|snapc",a$tags,value = TRUE)
b<-grep("( |^)search|google",a$tags,value = TRUE)
view(b)

look for NA party affiliation

 b<- a %>%
 select(cand_id,cand_name,cand_pty_aff,activity) %>%
 filter(is.na(cand_pty_aff) & !is.na(cand_name)) %>%
  group_by(cand_name) %>%
  summarise(ct=n()) %>%
  arrange(desc(ct))

f<-function(a) a %in% b$cand_name

c<- a %>%
  select(cand_id,cand_name,cand_pty_aff,activity) %>%
  filter( cand_name %in% b$cand_name) %>%
  group_by(cand_name, cand_pty_aff) %>%
  summarise(ct=n())
  

#c<-sort(unique(b$cand_name))
 view(c)

Check that name headers of each csv file match

headers <- as_tibble(data.frame(matrix(nrow=0,ncol=50)))
colnames(headers)<-paste("field",1:50,sep="")

for ( i in urls ) {
  a<-read.csv(i,nrows=1, header = FALSE)
  a[1,(length(a)+1):50]<-NA 
  names(a)=paste("field",1:50,sep="")
  
  headers<-rbind(headers,a)
}
summary(headers) #  Summary above shows all column headers identical in all urls

Using MD5 checksums, look for duplicate rows: None found

require(digest)

md5<-function(a) {
  data.frame(md5=digest(a,algo="md5",serialize = FALSE),stringsAsFactors = FALSE)
}

cksums<- expenditure %>%
  rowwise() %>%
  do(md5(paste(.,collapse = " ")))

tally cksums, isolate those with more than two occurrences: None Found

ckSumCount <- cksums %>%
  group_by(md5) %>%
  summarise(ct=n()) %>%
  filter(ct>1)

cat(paste("\n",nrow(ckSumCount)," duplicate rows found", collapse=" "))

Look for alternative to rowwise(), do() for walking through rows

a<-ungroup(expenditure[1:100,])
c<-a[1,]
c<-c[-1,]

prt<-function(a) {
  c<<-rbind(c,a)
  TRUE
}

b<- expenditure %>%
  filter(is.na(cand_pty_aff)) %>%
  mutate(prt(as.data.frame(.)))

Scrapbook for applying candidate party affiliation imputation

  speId<-"C00000935"; suppOpp<-"o"; eleType<-"G" 
  
spenderFreq<- spesWpartyAffiliation %>%
  filter(spe_id==speId, sup_opp==suppOpp, ele_type==eleType) %>%
  group_by(cand_pty_aff) %>%
  summarise(ct=sum(ct)) %>%
  arrange(ct)

browse spenders, payees, money

a<-expenditure %>% 
  select(exp_amo,pay,spe_nam,sup_opp,cand_pty_aff) %>% 
  group_by(pay,sup_opp,cand_pty_aff) %>% 
  summarise(total=sum(exp_amo)) %>% 
  arrange(desc(total),pay,sup_opp,cand_pty_aff)

view(a)

supOppRecode<-qw("Against For")
names(supOppRecode)<-qw("o s")

electionRoundRecode<-qw("other other general other other primary \"run off\" special other")
names(electionRoundRecode)<-qw("2 C G O other P R S X")

a<-expenditure %>%
  select(fec_election_yr,ele_type,sup_opp,exp_amo) %>%
  mutate(sup_opp=supOppRecode[sup_opp]) %>%
  mutate(ele_type=electionRoundRecode[ele_type]) 
  #select(-ele_type,-sup_opp)
    
  
a<-a %>%
  group_by(fec_election_yr,ele_type,sup_opp) %>%
  summarise(count=nWc(n(),0),
            aggOrder=sum(exp_amo),
            aggSpend=nWc(sum(exp_amo)/1000,0),
            meanSpend=nWc(mean(exp_amo),0),
            sdSpend=nWc(sd(exp_amo),0)) %>%
  arrange(desc(fec_election_yr)) %>%
  select(fec_election_yr,ele_type,sup_opp,count,aggSpend,meanSpend,sdSpend)


names(a)<-qw("\"Election Cycle\" Round For/Against  count   \"aggSpend ($K)\"   meanSpend  sdSpend")

panderOptions('table.split.table', Inf)
  set.caption("Summary of PAC Expenditures by Election Round 2010-2020\n(constant 2019 dollars)")
  summaryPACbyElectionRound.pander<-pander_return(a, style = 'rmarkdown')

  cat(summaryPACbyElectionRound.pander, sep="\n")

[1] “| Election Cycle | Round | For/Against | count | aggSpend ($K) | meanSpend | sdSpend ||:————–:|:——-:|:———–:|:——:|:————-:|:———:|:——-:|| 2020 | general | Against | 51 | 278 | 5,454 | 9,886 || 2020 | general | For | 1,014 | 9,903 | 9,767 | 17,483 || 2020 | other | Against | 26 | 81 | 3,112 | 3,211 || 2020 | other | For | 167 | 2,987 | 17,884 | 59,240 || 2020 | primary | Against | 1,865 | 6,614 | 3,547 | 41,404 || 2020 | primary | For | 1,372 | 5,255 | 3,830 | 20,199 || 2020 | run off | Against | 7 | 285 | 40,767 | 45,707 || 2020 | run off | For | 5 | 42 | 8,308 | 8,548 || 2020 | special | Against | 186 | 5,855 | 31,480 | 96,821 || 2020 | special | For | 201 | 2,407 | 11,977 | 35,223 || 2018 | general | Against | 14,379 | 733,458 | 51,009 | 186,954 || 2018 | general | For | 23,251 | 206,712 | 8,890 | 57,299 || 2018 | other | Against | 513 | 6,856 | 13,365 | 58,718 || 2018 | other | For | 1,680 | 6,743 | 4,014 | 21,871 || 2018 | primary | Against | 2,939 | 56,368 | 19,179 | 89,028 || 2018 | primary | For | 4,217 | 64,726 | 15,349 | 74,312 || 2018 | run off | Against | 483 | 12,044 | 24,935 | 84,582 || 2018 | run off | For | 550 | 6,075 | 11,045 | 30,670 || 2018 | special | Against | 858 | 28,359 | 33,053 | 104,413 || 2018 | special | For | 934 | 11,360 | 12,162 | 41,020 || 2016 | general | Against | 69,523 | 944,632 | 13,587 | 125,793 || 2016 | general | For | 86,962 | 228,646 | 2,629 | 39,993 || 2016 | other | Against | 31 | 1,455 | 46,949 | 188,834 || 2016 | other | For | 180 | 1,585 | 8,808 | 20,762 || 2016 | primary | Against | 4,038 | 119,120 | 29,500 | 120,307 || 2016 | primary | For | 11,306 | 268,854 | 23,780 | 150,640 || 2016 | run off | Against | 173 | 2,501 | 14,459 | 46,574 || 2016 | run off | For | 206 | 922 | 4,477 | 17,753 || 2016 | special | Against | 4 | 1 | 361 | 484 || 2016 | special | For | 44 | 46 | 1,054 | 2,084 || 2014 | general | Against | 40,235 | 504,614 | 12,542 | 75,207 || 2014 | general | For | 24,257 | 111,841 | 4,611 | 34,631 || 2014 | other | Against | 167 | 6,523 | 39,062 | 106,886 || 2014 | other | For | 620 | 5,769 | 9,304 | 29,354 || 2014 | primary | Against | 1,945 | 50,132 | 25,775 | 74,947 || 2014 | primary | For | 3,386 | 33,860 | 10,000 | 35,651 || 2014 | run off | Against | 282 | 7,045 | 24,983 | 91,195 || 2014 | run off | For | 549 | 1,527 | 2,782 | 7,796 || 2014 | special | Against | 472 | 10,448 | 22,135 | 70,743 || 2014 | special | For | 513 | 5,143 | 10,025 | 48,684 || 2012 | general | Against | 51,553 | 903,675 | 17,529 | 202,616 || 2012 | general | For | 51,536 | 178,875 | 3,471 | 51,792 || 2012 | other | Against | 268 | 36,996 | 138,044 | 540,450 || 2012 | other | For | 177 | 6,834 | 38,609 | 99,930 || 2012 | primary | Against | 2,277 | 78,750 | 34,585 | 170,440 || 2012 | primary | For | 3,943 | 50,381 | 12,777 | 65,992 || 2012 | run off | Against | 133 | 7,435 | 55,899 | 167,544 || 2012 | run off | For | 234 | 3,012 | 12,871 | 43,206 || 2012 | special | Against | 125 | 4,377 | 35,018 | 64,050 || 2012 | special | For | 100 | 1,494 | 14,943 | 38,417 || 2010 | general | Against | 9,776 | 330,396 | 33,797 | 109,317 || 2010 | general | For | 26,162 | 60,252 | 2,303 | 19,627 || 2010 | other | Against | 19 | 272 | 14,335 | 30,017 || 2010 | other | For | 93 | 2,205 | 23,714 | 59,487 || 2010 | primary | Against | 471 | 3,239 | 6,876 | 21,778 || 2010 | primary | For | 3,769 | 8,855 | 2,350 | 14,903 || 2010 | run off | Against | 24 | 151 | 6,286 | 7,574 || 2010 | run off | For | 946 | 1,501 | 1,586 | 17,447 || 2010 | special | Against | 194 | 5,894 | 30,379 | 55,885 || 2010 | special | For | 381 | 6,537 | 17,158 | 48,948 |: Summary of PAC Expenditures by Election Round 2010-2020(constant 2019 dollars)” attr(,“class”) [1] “knit_asis” attr(,“knit_cacheable”) [1] NA

alluvialsFECindependentExpenditures

Steve Dutky McData110 Project2

11/2/2019

Purpose: Describe the Money Flows of Independent Political Campaigns

Expenditure Data Items: Cleaning and Preparation

Load and display pre-processed tables and plots from gitHub:

Display the pander tables:

The above table describes the data comprising expenditure as downloaded from the FEC.

The first alluvial plot:

utility functions which may hold interest or help to others:

load libraries

Urls to FEC’s PAC independent expenditures

Combine all election years

Tally the NA’s and unique values in the dataset as downloaded:

Take a 50,000 row sample of expenditure so that knitting can finish this November

Handle the simple NA’s: replace, mutate or drop row:

convert exp_date [currently dd-mmm-yy ] into standard date format via lubridate/parse

Convert to constant 2019 dollars

we need to construct a linear model to predict inflation for 2019

Recode expenditure purpose into categorical tags

Extract activities from tags

Create activityDistribution for apportioning expenditure amounts (exp_amo) over multiple activities

Functions for apportioning expenditures over multiple activities weighted by the aggregate expenditures of single activities

Allocate expenditure amounts for rows containing multiple activities

Build functions and indices for imputing political affiliations where missing

Apply candidate party affiliation imputation

Bind imputed affiliation rows to expenditure, drop rows where expenditure$can_pty_aff has NA’s

Combine activities into groups:

At this point the sampled data are as clean as they are going to get. And, now, fresh from the oven, let’s get the complete cleaned dataset.

Now let’s see how money was spent:

And what were the targets?

Lining up an alluvial plot #1:

And now incorporating activityGroups

Lining up an alluvial plot #2:

Chunks for surveying and assessing expenditures:

Can we impute party affiliations?

greps for consolidating tags:

more greps

yet more greps of tags

look for NA party affiliation

Check that name headers of each csv file match

Using MD5 checksums, look for duplicate rows: None found

tally cksums, isolate those with more than two occurrences: None Found

Look for alternative to rowwise(), do() for walking through rows

Scrapbook for applying candidate party affiliation imputation

browse spenders, payees, money