Project 1: Factorial Design Experiment

1.Setting

This is a factorial design experiment, where multiple factors are varied at the same time. This uses a fixed effect model; thus, the aim is to determine the effect of the four categorical variables. No inferences will be made about the population in this experiment. This experiment studies the 2016 presidential candidates and candidate disbursements to determine if candidate, state, type of disbursement, and year have an effect on disbursement amount. The null hypothesis is that there is no difference between these factors; by the null hypothesis, observed variation is due to randomization. The dataset was chosen from the list of 100 interesting datasets for Statistics: http://www.fec.gov/disclosurep/PDownload.do

#Load in excel data using Utils package
library(utils)
setwd("C:/Users/Alexis/Documents/R/win-library/3.3")
election_data <- read.csv("Project1_PresidentialElectionData3.csv", header = TRUE)

#Show first and last ten rows of data table
head(election_data, 10)

##                  cand_nm    disb_dt recipient_st
## 1          Walker, Scott  7/17/2014           IA
## 2  Webb, James Henry Jr. 11/25/2014           VA
## 3  Webb, James Henry Jr. 11/25/2014           VA
## 4  Webb, James Henry Jr.  12/1/2014           DC
## 5  Webb, James Henry Jr.  12/1/2014           VA
## 6  Webb, James Henry Jr. 12/22/2014           VA
## 7  Webb, James Henry Jr.       2015           VA
## 8        Kasich, John R.       2015           TX
## 9        Kasich, John R.       2015           TX
## 10       Kasich, John R.       2015           TX
##                                  disb_desc disb_amt   cmte_id   cand_id
## 1  HAHN 9/9 REIMBURSEMENT: MEETING EXPENSE   228.83 C00580480 P60006046
## 2                                   SALARY  1000.00 C00581215 P60008885
## 3                ADMINISTRATIVE CONSULTING  1000.00 C00581215 P60008885
## 4                        DATABASE SERVICES   750.00 C00581215 P60008885
## 5                            MERCHANT FEES  2923.86 C00581215 P60008885
## 6                ADMINISTRATIVE CONSULTING  1000.00 C00581215 P60008885
## 7                            MERCHANT FEES  1755.78 C00581215 P60008885
## 8                                  AIRFARE   278.10 C00581876 P60003670
## 9                                  AIRFARE   163.10 C00581876 P60003670
## 10                                 AIRFARE   269.60 C00581876 P60003670
##              recipient_nm recipient_city recipient_zip memo_cd memo_text
## 1                  HY VEE     DES MOINES         50266       X          
## 2            STANLEY, JOE       CALLAWAY     240673414                  
## 3               WEBB, AMY          BURKE     220152949                  
## 4                 NGP VAN     WASHINGTON     200055006                  
## 5  SAGE PAYMENT SOLUTIONS         RESTON     201905858                  
## 6               WEBB, AMY          BURKE     220152949                  
## 7  SAGE PAYMENT SOLUTIONS         RESTON     201905858                  
## 8       AMERICAN AIRLINES     FORT WORTH     761552605       X          
## 9       AMERICAN AIRLINES     FORT WORTH     761552605       X          
## 10      AMERICAN AIRLINES     FORT WORTH     761552605       X          
##    form_tp file_num              tran_id election_tp
## 1     SB23  1057127            SB23.5056       P2016
## 2     SB23  1057535 B8DB5C9B769244633A1F       P2016
## 3     SB23  1057535 B2116ED8A6E7443ACB97       P2016
## 4     SB23  1057535 BD81F6F22406D40CAB7B       P2016
## 5     SB23  1057535 B48B71AFC20D54D6CBFB       P2016
## 6     SB23  1057535 B769F25E80F014565BC9       P2016
## 7     SB23  1057535 B574650C0BC614DC194D       P2016
## 8     SB23  1070348 B42F15A0C9670493A8AA       P2016
## 9     SB23  1070348 B7A74D9DB5DDE439284F       P2016
## 10    SB23  1070348 B1EC2F9CAFEF048D49E3       P2016

tail(election_data, 10)

##              cand_nm disb_dt recipient_st
## 219570 Trump, Donald    2016           WA
## 219571 Trump, Donald    2016           WA
## 219572 Trump, Donald    2016           WA
## 219573 Trump, Donald    2016           WA
## 219574 Trump, Donald    2016           WA
## 219575 Trump, Donald    2016           WA
## 219576 Trump, Donald    2016           WI
## 219577 Trump, Donald    2016           WV
## 219578 Walker, Scott    2016           DC
## 219579 Trump, Donald    2015           AL
##                                       disb_desc disb_amt   cmte_id
## 219570                 IN-KIND: OFFICE SUPPLIES   518.20 C00580100
## 219571                         FIELD CONSULTING 10500.00 C00580100
## 219572                    EVENT STAGING EXPENSE   774.17 C00580100
## 219573                         FIELD CONSULTING 10500.00 C00580100
## 219574 TRAVEL EXPENSE REIMBURSEMENT: ITEMIZATIO   668.83 C00580100
## 219575          IN-KIND: MEETING EXPENSE: MEALS   234.24 C00580100
## 219576                         FIELD CONSULTING 10000.00 C00580100
## 219577                            IN-KIND: RENT  1500.00 C00580100
## 219578              CREDIT CARD PROCESSING FEES   435.00 C00580480
## 219579    TRAVEL: LODGING [TUCKER: SB23.289541]   528.19 C00580100
##          cand_id               recipient_nm recipient_city recipient_zip
## 219570 P80001571                BENTON, DON      VANCOUVER         98668
## 219571 P80001571           THE BENTON GROUP      VANCOUVER         98668
## 219572 P80001571 SALISH CONSULTANT SERVICES    WOODINVILLE         98077
## 219573 P80001571                BENTON, DON      VANCOUVER         98668
## 219574 P80001571                BENTON, DON      VANCOUVER         98668
## 219575 P80001571                BENTON, DON      VANCOUVER         98668
## 219576 P80001571             TROVATO, VINCE       WAUKESHA         53188
## 219577 P80001571             ROGERS, ROBERT     CLARKSBURG         26301
## 219578 P60006046                     STRIPE     WASHINGTON         20045
## 219579 P80001571                 DOUBLETREE     BIRMINGHAM         35205
##        memo_cd memo_text form_tp file_num      tran_id election_tp
## 219570                      SB23  1100920 SB23.2143059       G2016
## 219571                      SB23  1100920 SB23.2141371       G2016
## 219572                      SB23  1100920 SB23.2140787       G2016
## 219573                      SB23  1100920 SB23.2141327       G2016
## 219574                      SB23  1100920 SB23.2141354       G2016
## 219575                      SB23  1100920 SB23.2143056       G2016
## 219576                      SB23  1100920 SB23.2141385       G2016
## 219577                      SB23  1100920 SB23.2225539       G2016
## 219578                      SB23  1100469  SB23.150087       P2016
## 219579       X              SB23  1051572  SB23.289887       P2016

The 4 factors (independent variables) being studied are presidential candidate, state where donations were made/money disbursed, description of the disbursements, and year money disbursed. disb_dt was changed from d-m-y to YEAR, so there are no continuous variables used here.

cand_nm levels: Hillary Clinton, Bernie Sanders, Donald Trump, Ted Cruz recipient_st levels: DC, CA, NY, MA, KY, UT, GA disb_desc levels: in-kind contribution, office supplies, travel, online advertising disb_dt levels: 2015, 2016 The response variable (dependent variable) is disb_amt (disbursement amount).

Select the final 2 democrats and final 2 republicans (also select recipient state, disbursement description, and dispursement date (simplified to year))

election_data_subset <- subset(election_data, cand_nm == "Clinton, Hillary Rodham" | cand_nm == "Sanders, Bernard" | cand_nm == "Cruz, Ted" | cand_nm == "Trump, Donald", select = cand_nm:disb_amt) 
election_data_subset2 <- subset(election_data_subset, recipient_st == "DC" | recipient_st == "MA" | recipient_st == "NY" | recipient_st == "CA" | recipient_st == "KY" | recipient_st == "GA" | recipient_st == "UT")
election_data_subset3 <- subset(election_data_subset2, disb_desc == "IN-KIND CONTRIBUTION" | disb_desc == "OFFICE SUPPLIES" | disb_desc == "TRAVEL" | disb_desc == "ONLINE ADVERTISING")
election_data_subset4 <- subset(election_data_subset3, disb_dt == "2015" | disb_dt == "2016") 

#Show first and last ten rows of new subsetted data table
head(election_data_subset4, 10)

##                     cand_nm disb_dt recipient_st disb_desc disb_amt
## 14  Clinton, Hillary Rodham    2015           NY    TRAVEL   903.70
## 15  Clinton, Hillary Rodham    2015           NY    TRAVEL  1320.39
## 17  Clinton, Hillary Rodham    2015           NY    TRAVEL    51.15
## 44  Clinton, Hillary Rodham    2015           NY    TRAVEL  4032.00
## 46  Clinton, Hillary Rodham    2015           NY    TRAVEL   242.33
## 68  Clinton, Hillary Rodham    2015           CA    TRAVEL   320.82
## 75  Clinton, Hillary Rodham    2015           CA    TRAVEL   185.00
## 118 Clinton, Hillary Rodham    2015           NY    TRAVEL   400.10
## 119 Clinton, Hillary Rodham    2015           NY    TRAVEL  2287.60
## 120 Clinton, Hillary Rodham    2015           NY    TRAVEL  1173.70

tail(election_data_subset4, 10)

##                        cand_nm disb_dt recipient_st       disb_desc
## 217386 Clinton, Hillary Rodham    2016           GA          TRAVEL
## 217395 Clinton, Hillary Rodham    2016           NY          TRAVEL
## 217418 Clinton, Hillary Rodham    2016           DC          TRAVEL
## 217419 Clinton, Hillary Rodham    2016           DC          TRAVEL
## 217420 Clinton, Hillary Rodham    2016           DC          TRAVEL
## 217423 Clinton, Hillary Rodham    2016           GA          TRAVEL
## 217424 Clinton, Hillary Rodham    2016           GA          TRAVEL
## 217462               Cruz, Ted    2016           MA OFFICE SUPPLIES
## 217577           Trump, Donald    2016           MA OFFICE SUPPLIES
## 219511           Trump, Donald    2016           NY OFFICE SUPPLIES
##        disb_amt
## 217386   496.33
## 217395   595.10
## 217418   542.70
## 217419  1090.44
## 217420   917.03
## 217423   201.60
## 217424   497.60
## 217462   121.22
## 217577  2062.14
## 219511  1633.13

sample_size = length(1:nrow(election_data_subset4))
sample_size

## [1] 9662

#The data has five columns, 1 for each of the factors and 1 for the response variable. For the subsetted data, the factors are all categorical variables with a set number of levels. The data has 9662 rows.

2. (Experimental) Design

The main effect will be conducted for each factor as well as the interaction effects for all two factor interaction.

Randomization:

As there was no control over randomization in data collection, the data will be randomized without replacement for analysis. Factorial design experiments assume that data are randomized (in object selection, assignment to treatment, and experimental run order).

election_data_randomized = election_data_subset4[sample(1:nrow(election_data_subset4), size = sample_size, replace = FALSE),]

#Show first and last ten rows of randomized data table
head(election_data_randomized, 10)

##                        cand_nm disb_dt recipient_st       disb_desc
## 90968         Sanders, Bernard    2015           MA OFFICE SUPPLIES
## 80751  Clinton, Hillary Rodham    2015           GA          TRAVEL
## 164349 Clinton, Hillary Rodham    2016           NY          TRAVEL
## 115606 Clinton, Hillary Rodham    2016           GA OFFICE SUPPLIES
## 207510           Trump, Donald    2016           NY OFFICE SUPPLIES
## 54510  Clinton, Hillary Rodham    2015           NY          TRAVEL
## 105706 Clinton, Hillary Rodham    2016           NY          TRAVEL
## 216630           Trump, Donald    2016           NY OFFICE SUPPLIES
## 123123 Clinton, Hillary Rodham    2016           UT          TRAVEL
## 160497 Clinton, Hillary Rodham    2016           GA          TRAVEL
##        disb_amt
## 90968   3634.34
## 80751    342.10
## 164349 19260.00
## 115606    25.00
## 207510   445.93
## 54510     18.25
## 105706    17.25
## 216630   955.56
## 123123   134.47
## 160497    25.00

tail(election_data_randomized, 10)

##                        cand_nm disb_dt recipient_st       disb_desc
## 137452 Clinton, Hillary Rodham    2016           NY          TRAVEL
## 55678  Clinton, Hillary Rodham    2015           CA          TRAVEL
## 198704 Clinton, Hillary Rodham    2016           CA          TRAVEL
## 157851 Clinton, Hillary Rodham    2016           GA          TRAVEL
## 108397        Sanders, Bernard    2016           CA OFFICE SUPPLIES
## 103969 Clinton, Hillary Rodham    2016           GA          TRAVEL
## 33950  Clinton, Hillary Rodham    2015           NY          TRAVEL
## 180694 Clinton, Hillary Rodham    2016           CA          TRAVEL
## 106126 Clinton, Hillary Rodham    2016           NY          TRAVEL
## 156279 Clinton, Hillary Rodham    2016           NY          TRAVEL
##        disb_amt
## 137452  1760.40
## 55678    481.60
## 198704     9.00
## 157851   371.80
## 108397   475.94
## 103969   415.10
## 33950    248.10
## 180694   239.39
## 106126    83.10
## 156279    49.16

Replication:

As all disbursements by each candidate were made on different days in different states, there are no repeated measurements. However, as the candidates purchase the same items on different days in different states, those disbursements can be counted as replicates.

Blocking:

Blocking is used to reduce the variability of a sample. Typically, there are nuisance factors for a given experiment that are suspected to have an effect on the response variable but are not considered to be one of the main factors. To block for a nuisance factor, it is held constant during the experiment. As the data used for this study was selected after completion and the original data included many factors, data were blocked to include only the four factors listed above.

3. (Statistical) Analysis

Exploratory Boxplots

fifteen_data = na.omit(subset(election_data_randomized, disb_dt == "2015"))
fifteen_data$cand_nm <- factor(fifteen_data$cand_nm)
fifteen_data$recipient_st <- factor(fifteen_data$recipient_st)
fifteen_data$disb_desc <- factor(fifteen_data$disb_desc)
fifteen_data$disb_dt <- factor(fifteen_data$disb_dt)

#Boxplots examining levels of each factor for exploratory analysis for 2015
boxplotcand <- boxplot(fifteen_data$disb_amt~fifteen_data$cand_nm, xlab = "Candidate", ylab = "Disbursement Amount", main = "Candidate Effect on Disbursement in 2015")

boxplotstate <- boxplot(fifteen_data$disb_amt~fifteen_data$recipient_st, xlab = "State", ylab = "Disbursement Amount", main = "State Effect on Disbursement in 2015")

boxplottype <- boxplot(fifteen_data$disb_amt~fifteen_data$disb_desc, xlab = "Disbursemet Type", ylab = "Disbursement Amount", main = "Disbursement Type Effect on Disbursement in 2015")

sixteen_data = na.omit(subset(election_data_randomized, disb_dt == "2016"))
sixteen_data$cand_nm <- factor(sixteen_data$cand_nm)
sixteen_data$recipient_st <- factor(sixteen_data$recipient_st)
sixteen_data$disb_desc <- factor(sixteen_data$disb_desc)
sixteen_data$disb_dt <- factor(sixteen_data$disb_dt)

#Boxplots examining levels of each factor for exploratory analysis for 2016
boxplotcand <- boxplot(sixteen_data$disb_amt~sixteen_data$cand_nm, xlab = "Candidate", ylab = "Disbursement Amount", main = "Candidate Effect on Disbursement in 2016")

boxplotstate <- boxplot(sixteen_data$disb_amt~sixteen_data$recipient_st, xlab = "State", ylab = "Disbursement Amount", main = "State Effect on Disbursement in 2016")

boxplottype <- boxplot(sixteen_data$disb_amt~sixteen_data$disb_desc, xlab = "Disbursemet Type", ylab = "Disbursement Amount", main = "Disbursement Type Effect on Disbursement in 2016")

#Calculation of Main Effect
HC_data = subset(election_data_randomized, cand_nm == "Clinton, Hillary Rodham")
max_HC = max(HC_data$disb_amt)
max_HC

## [1] 1567000

min_HC = min(HC_data$disb_amt)
min_HC

## [1] -1769.32

mean_HC = mean(HC_data$disb_amt)
mean_HC

## [1] 3086.88

median_HC = median(HC_data$disb_amt)
median_HC

## [1] 132.55

DT_data = subset(election_data_randomized, cand_nm == "Trump, Donald")
max_DT = max(DT_data$disb_amt)
max_DT

## [1] 3e+05

min_DT = min(DT_data$disb_amt)
min_DT

## [1] 3.27

mean_DT = mean(DT_data$disb_amt)
mean_DT

## [1] 9126.512

median_DT = median(DT_data$disb_amt)
median_DT

## [1] 1048

TC_data = subset(election_data_randomized, cand_nm == "Cruz, Ted")
max_TC = max(TC_data$disb_amt)
max_TC

## [1] 98360.26

min_TC = min(TC_data$disb_amt)
min_TC

## [1] 2.98

mean_TC = mean(TC_data$disb_amt)
mean_TC

## [1] 870.9839

median_TC = median(TC_data$disb_amt)
median_TC

## [1] 136.93

BS_data = subset(election_data_randomized, cand_nm == "Sanders, Bernard")
max_BS = max(BS_data$disb_amt)
max_BS

## [1] 13046.97

min_BS = min(BS_data$disb_amt)
min_BS

## [1] -1388.91

mean_BS = mean(BS_data$disb_amt)
mean_BS

## [1] 904.3147

median_BS = median(BS_data$disb_amt)
median_BS

## [1] 374.69

max_cand = max(median_HC, median_DT, median_BS, median_TC)
max_cand

## [1] 1048

min_cand = min(median_HC, median_DT, median_BS, median_TC)
min_cand

## [1] 132.55

Based on max and min, Hillary Clinton had both the highest and lowest disbursement. Thus, means were used as a second assessment. As the mean is more skewed by outliers, median was also used to determine the which levels will be used to calculate the main effect. Donald Trump had the highest median disbursement, while Hillary had the lowest median disbursement. Thus, these levels were used to calculate the main effect of candidate.

main_effect_candidate = median_DT - median_HC
main_effect_candidate

## [1] 915.45

Calculate the main effect of state

NY_data = subset(election_data_randomized, recipient_st == "NY")
max_NY = max(NY_data$disb_amt)
max_NY

## [1] 377111.2

min_NY = min(NY_data$disb_amt)
min_NY

## [1] -1769.32

mean_NY = mean(NY_data$disb_amt)
mean_NY

## [1] 3318.91

median_NY = median(NY_data$disb_amt)
median_NY

## [1] 51

MA_data = subset(election_data_randomized, recipient_st == "MA")
max_MA = max(MA_data$disb_amt)
max_MA

## [1] 18670.97

min_MA = min(MA_data$disb_amt)
min_MA

## [1] -113.4

mean_MA = mean(MA_data$disb_amt)
mean_MA

## [1] 314.0279

median_MA = median(MA_data$disb_amt)
median_MA

## [1] 55.3

CA_data = subset(election_data_randomized, recipient_st == "CA")
max_CA = max(CA_data$disb_amt)
max_CA

## [1] 3e+05

min_CA = min(CA_data$disb_amt)
min_CA

## [1] -272.55

mean_CA = mean(CA_data$disb_amt)
mean_CA

## [1] 578.0414

median_CA = median(CA_data$disb_amt)
median_CA

## [1] 82.07

GA_data = subset(election_data_randomized, recipient_st == "GA")
max_GA = max(GA_data$disb_amt)
max_GA

## [1] 31212.53

min_GA = min(GA_data$disb_amt)
min_GA

## [1] -1388.91

mean_GA = mean(GA_data$disb_amt)
mean_GA

## [1] 459.4689

median_GA = median(GA_data$disb_amt)
median_GA

## [1] 330.54

KY_data = subset(election_data_randomized, recipient_st == "KY")
max_KY = max(KY_data$disb_amt)
max_KY

## [1] 13046.97

min_KY = min(KY_data$disb_amt)
min_KY

## [1] -384.42

mean_KY = mean(KY_data$disb_amt)
mean_KY

## [1] 1347.883

median_KY = median(KY_data$disb_amt)
median_KY

## [1] 360.155

UT_data = subset(election_data_randomized, recipient_st == "UT")
max_UT = max(UT_data$disb_amt)
max_UT

## [1] 18288.98

min_UT = min(UT_data$disb_amt)
min_UT

## [1] 5.28

mean_UT = mean(UT_data$disb_amt)
mean_UT

## [1] 569.2759

median_UT = median(UT_data$disb_amt)
median_UT

## [1] 291.03

max_st = max(median_NY, median_MA, median_CA, median_GA, median_UT, median_KY)
max_st

## [1] 360.155

min_st = min(median_NY, median_MA, median_CA, median_GA, median_UT, median_KY)
min_st

## [1] 51

Based on max and min, NY had both the highest and lowest disbursement. Thus, means were used as a second assessment. As the mean is more skewed by outliers, median was also used to determine the which levels will be used to calculate the main effect. KY had the highest median disbursement, while NY had the lowest median disbursement. Thus, these levels were used to calculate the main effect of state.

main_effect_state = median_KY - median_NY
main_effect_state

## [1] 309.155

Calculate the main effect of the type of disbursement

inkind_data = na.omit(subset(election_data_randomized, disb_desc == "IN-KIND CONTRIBUTION"))
max_inkind = max(inkind_data$disb_amt)
max_inkind

## [1] 2700

min_inkind = min(inkind_data$disb_amt)
min_inkind

## [1] 50

mean_inkind = mean(inkind_data$disb_amt)
mean_inkind

## [1] 581.7833

median_inkind = median(inkind_data$disb_amt)
median_inkind

## [1] 300

office_data = na.omit(subset(election_data_randomized, disb_desc == "OFFICE SUPPLIES"))
max_office = max(office_data$disb_amt)
max_office

## [1] 31212.53

min_office = min(office_data$disb_amt)
min_office

## [1] -1388.91

mean_office = mean(office_data$disb_amt)
mean_office

## [1] 924.6212

median_office = median(office_data$disb_amt)
median_office

## [1] 118.87

travel_data = na.omit(subset(election_data_randomized, disb_desc == "TRAVEL"))
max_travel = max(travel_data$disb_amt)
max_travel

## [1] 377111.2

min_travel = min(travel_data$disb_amt)
min_travel

## [1] -1769.32

mean_travel = mean(travel_data$disb_amt)
mean_travel

## [1] 1599.595

median_travel = median(travel_data$disb_amt)
median_travel

## [1] 137.17

online_data = na.omit(subset(election_data_randomized, disb_desc == "ONLINE ADVERTISING"))
max_online = max(online_data$disb_amt)
max_online

## [1] 1567000

min_online = min(online_data$disb_amt)
min_online

## [1] 9.92

mean_online = mean(online_data$disb_amt)
mean_online

## [1] 289364.2

median_online = median(online_data$disb_amt)
median_online

## [1] 43461

max_type = max(median_travel, median_online, median_office, median_inkind)
max_type

## [1] 43461

min_type = min(median_travel, median_online, median_office, median_inkind)
min_type

## [1] 118.87

To be consistent with other conditions, median was the final method used to determine which levels will be used to calculate the main effect. Online Advertising had the highest median disbursement, while office supplies had the lowest median disbursement. Thus, these levels were used to calculate the main effect of type of disbursement.

## [1] 43342.13

Calculate the main effect of year

fifteen_data = subset(election_data_randomized, disb_dt == "2015")
max_fifteen = max(fifteen_data$disb_amt)
max_fifteen

## [1] 1138063

min_fifteen= min(fifteen_data$disb_amt)
min_fifteen

## [1] -1769.32

mean_fifteen= mean(fifteen_data$disb_amt)
mean_fifteen

## [1] 2581.242

median_fifteen= median(fifteen_data$disb_amt)
median_fifteen

## [1] 126

sixteen_data = subset(election_data_randomized, disb_dt == "2016")
max_sixteen= max(sixteen_data$disb_amt)
max_sixteen

## [1] 1567000

min_sixteen= min(sixteen_data$disb_amt)
min_sixteen

## [1] -1388.91

mean_sixteen= mean(sixteen_data$disb_amt)
mean_sixteen

## [1] 2921.319

median_sixteen = median(sixteen_data$disb_amt)
median_sixteen

## [1] 143.19

max_year = max(median_fifteen, median_sixteen)
max_year

## [1] 143.19

#To be consistent with other conditions, median was the final method used to determine which levels will be used to calculate the main effect. Year 2016 had the highest median disbursement, while year 2015 had the lowest median disbursement. Thus, these levels were used to calculate the main effect of type of disbursement.

main_effect_year = median_sixteen - median_fifteen
main_effect_year

## [1] 17.19

# All main effects
main_effect_candidate

## [1] 915.45

main_effect_state

## [1] 309.155

main_effect_type

## [1] 43342.13

main_effect_year

## [1] 17.19

# The largest main effect is due to the type of disbursement.

ANOVA

Compute Analysis of Variance for all main effects (me) and two factor interactions (2fi)

#cand_nm
anova_cand <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm)
summary.aov(anova_cand)

##                                    Df    Sum Sq   Mean Sq F value Pr(>F)
## election_data_randomized$cand_nm    3 7.592e+09 2.531e+09   2.058  0.104
## Residuals                        9658 1.188e+13 1.230e+09

# recipient_st
anova_state <- aov(election_data_randomized$disb_amt ~ election_data_randomized$recipient_st)
summary(anova_state)

##                                         Df    Sum Sq   Mean Sq F value
## election_data_randomized$recipient_st    6 9.835e+10 1.639e+10   13.43
## Residuals                             9655 1.178e+13 1.221e+09        
##                                         Pr(>F)    
## election_data_randomized$recipient_st 3.16e-15 ***
## Residuals                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# disb_desc
anova_description <- aov(election_data_randomized$disb_amt ~ election_data_randomized$disb_desc)
summary(anova_description)

##                                      Df    Sum Sq   Mean Sq F value Pr(>F)
## election_data_randomized$disb_desc    3 3.464e+12 1.155e+12    1325 <2e-16
## Residuals                          9658 8.419e+12 8.717e+08               
##                                       
## election_data_randomized$disb_desc ***
## Residuals                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# disb_dt
anova_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$disb_dt)
summary(anova_date)

##                                    Df    Sum Sq  Mean Sq F value Pr(>F)
## election_data_randomized$disb_dt    1 2.510e+08 2.51e+08   0.204  0.651
## Residuals                        9660 1.188e+13 1.23e+09

#**By ANOVA, the main effects from type of disbursement and state both demonstrate that the variance is explained, not due to randomization.**

# cand_nm and recipient_st
anova_cand_state <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm*election_data_randomized$recipient_st)
summary(anova_cand_state)

##                                                                          Df
## election_data_randomized$cand_nm                                          3
## election_data_randomized$recipient_st                                     6
## election_data_randomized$cand_nm:election_data_randomized$recipient_st   14
## Residuals                                                              9638
##                                                                           Sum Sq
## election_data_randomized$cand_nm                                       7.592e+09
## election_data_randomized$recipient_st                                  9.478e+10
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 9.028e+10
## Residuals                                                              1.169e+13
##                                                                          Mean Sq
## election_data_randomized$cand_nm                                       2.531e+09
## election_data_randomized$recipient_st                                  1.580e+10
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 6.448e+09
## Residuals                                                              1.213e+09
##                                                                        F value
## election_data_randomized$cand_nm                                         2.086
## election_data_randomized$recipient_st                                   13.023
## election_data_randomized$cand_nm:election_data_randomized$recipient_st   5.316
## Residuals                                                                     
##                                                                          Pr(>F)
## election_data_randomized$cand_nm                                         0.0998
## election_data_randomized$recipient_st                                  9.99e-15
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 3.34e-10
## Residuals                                                                      
##                                                                           
## election_data_randomized$cand_nm                                       .  
## election_data_randomized$recipient_st                                  ***
## election_data_randomized$cand_nm:election_data_randomized$recipient_st ***
## Residuals                                                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# cand_nm and disb_desc
anova_cand_description <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm*election_data_randomized$disb_desc)
summary(anova_cand_description)

##                                                                       Df
## election_data_randomized$cand_nm                                       3
## election_data_randomized$disb_desc                                     3
## election_data_randomized$cand_nm:election_data_randomized$disb_desc    3
## Residuals                                                           9652
##                                                                        Sum Sq
## election_data_randomized$cand_nm                                    7.592e+09
## election_data_randomized$disb_desc                                  3.458e+12
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 1.170e+08
## Residuals                                                           8.418e+12
##                                                                       Mean Sq
## election_data_randomized$cand_nm                                    2.531e+09
## election_data_randomized$disb_desc                                  1.153e+12
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 3.898e+07
## Residuals                                                           8.722e+08
##                                                                      F value
## election_data_randomized$cand_nm                                       2.901
## election_data_randomized$disb_desc                                  1321.447
## election_data_randomized$cand_nm:election_data_randomized$disb_desc    0.045
## Residuals                                                                   
##                                                                     Pr(>F)
## election_data_randomized$cand_nm                                    0.0335
## election_data_randomized$disb_desc                                  <2e-16
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 0.9875
## Residuals                                                                 
##                                                                        
## election_data_randomized$cand_nm                                    *  
## election_data_randomized$disb_desc                                  ***
## election_data_randomized$cand_nm:election_data_randomized$disb_desc    
## Residuals                                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# cand_nm and disb_dt
anova_cand_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm*election_data_randomized$disb_dt)
summary(anova_cand_date)

##                                                                     Df
## election_data_randomized$cand_nm                                     3
## election_data_randomized$disb_dt                                     1
## election_data_randomized$cand_nm:election_data_randomized$disb_dt    3
## Residuals                                                         9654
##                                                                      Sum Sq
## election_data_randomized$cand_nm                                  7.592e+09
## election_data_randomized$disb_dt                                  5.696e+07
## election_data_randomized$cand_nm:election_data_randomized$disb_dt 5.866e+08
## Residuals                                                         1.188e+13
##                                                                     Mean Sq
## election_data_randomized$cand_nm                                  2.531e+09
## election_data_randomized$disb_dt                                  5.696e+07
## election_data_randomized$cand_nm:election_data_randomized$disb_dt 1.955e+08
## Residuals                                                         1.230e+09
##                                                                   F value
## election_data_randomized$cand_nm                                    2.057
## election_data_randomized$disb_dt                                    0.046
## election_data_randomized$cand_nm:election_data_randomized$disb_dt   0.159
## Residuals                                                                
##                                                                   Pr(>F)
## election_data_randomized$cand_nm                                   0.104
## election_data_randomized$disb_dt                                   0.830
## election_data_randomized$cand_nm:election_data_randomized$disb_dt  0.924
## Residuals

# recipient_st and disb_desc
anova_state_description <- aov(election_data_randomized$disb_amt ~ election_data_randomized$recipient_st*election_data_randomized$disb_desc)
summary(anova_state_description)

##                                                                            Df
## election_data_randomized$recipient_st                                       6
## election_data_randomized$disb_desc                                          3
## election_data_randomized$recipient_st:election_data_randomized$disb_desc    9
## Residuals                                                                9643
##                                                                             Sum Sq
## election_data_randomized$recipient_st                                    9.835e+10
## election_data_randomized$disb_desc                                       3.401e+12
## election_data_randomized$recipient_st:election_data_randomized$disb_desc 1.008e+12
## Residuals                                                                7.376e+12
##                                                                            Mean Sq
## election_data_randomized$recipient_st                                    1.639e+10
## election_data_randomized$disb_desc                                       1.134e+12
## election_data_randomized$recipient_st:election_data_randomized$disb_desc 1.120e+11
## Residuals                                                                7.649e+08
##                                                                          F value
## election_data_randomized$recipient_st                                      21.43
## election_data_randomized$disb_desc                                       1482.14
## election_data_randomized$recipient_st:election_data_randomized$disb_desc  146.48
## Residuals                                                                       
##                                                                          Pr(>F)
## election_data_randomized$recipient_st                                    <2e-16
## election_data_randomized$disb_desc                                       <2e-16
## election_data_randomized$recipient_st:election_data_randomized$disb_desc <2e-16
## Residuals                                                                      
##                                                                             
## election_data_randomized$recipient_st                                    ***
## election_data_randomized$disb_desc                                       ***
## election_data_randomized$recipient_st:election_data_randomized$disb_desc ***
## Residuals                                                                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# recipient_st and disb_dt
anova_state_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$recipient_st*election_data_randomized$disb_dt)
summary(anova_state_date)

##                                                                          Df
## election_data_randomized$recipient_st                                     6
## election_data_randomized$disb_dt                                          1
## election_data_randomized$recipient_st:election_data_randomized$disb_dt    6
## Residuals                                                              9648
##                                                                           Sum Sq
## election_data_randomized$recipient_st                                  9.835e+10
## election_data_randomized$disb_dt                                       3.062e+08
## election_data_randomized$recipient_st:election_data_randomized$disb_dt 1.226e+10
## Residuals                                                              1.177e+13
##                                                                          Mean Sq
## election_data_randomized$recipient_st                                  1.639e+10
## election_data_randomized$disb_dt                                       3.062e+08
## election_data_randomized$recipient_st:election_data_randomized$disb_dt 2.043e+09
## Residuals                                                              1.220e+09
##                                                                        F value
## election_data_randomized$recipient_st                                   13.434
## election_data_randomized$disb_dt                                         0.251
## election_data_randomized$recipient_st:election_data_randomized$disb_dt   1.674
## Residuals                                                                     
##                                                                          Pr(>F)
## election_data_randomized$recipient_st                                  3.12e-15
## election_data_randomized$disb_dt                                          0.616
## election_data_randomized$recipient_st:election_data_randomized$disb_dt    0.123
## Residuals                                                                      
##                                                                           
## election_data_randomized$recipient_st                                  ***
## election_data_randomized$disb_dt                                          
## election_data_randomized$recipient_st:election_data_randomized$disb_dt    
## Residuals                                                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#disb_desc and disb_dt
anova_description_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$disb_desc*election_data_randomized$disb_dt)
summary(anova_description_date)

##                                                                       Df
## election_data_randomized$disb_desc                                     3
## election_data_randomized$disb_dt                                       1
## election_data_randomized$disb_desc:election_data_randomized$disb_dt    3
## Residuals                                                           9654
##                                                                        Sum Sq
## election_data_randomized$disb_desc                                  3.464e+12
## election_data_randomized$disb_dt                                    5.686e+09
## election_data_randomized$disb_desc:election_data_randomized$disb_dt 1.156e+11
## Residuals                                                           8.298e+12
##                                                                       Mean Sq
## election_data_randomized$disb_desc                                  1.155e+12
## election_data_randomized$disb_dt                                    5.686e+09
## election_data_randomized$disb_desc:election_data_randomized$disb_dt 3.852e+10
## Residuals                                                           8.595e+08
##                                                                      F value
## election_data_randomized$disb_desc                                  1343.470
## election_data_randomized$disb_dt                                       6.615
## election_data_randomized$disb_desc:election_data_randomized$disb_dt   44.818
## Residuals                                                                   
##                                                                     Pr(>F)
## election_data_randomized$disb_desc                                  <2e-16
## election_data_randomized$disb_dt                                    0.0101
## election_data_randomized$disb_desc:election_data_randomized$disb_dt <2e-16
## Residuals                                                                 
##                                                                        
## election_data_randomized$disb_desc                                  ***
## election_data_randomized$disb_dt                                    *  
## election_data_randomized$disb_desc:election_data_randomized$disb_dt ***
## Residuals                                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#**By studying the F statistic results from the ANOVA, many of the interactions have large F statistics, suggesting we can reject the null hypothesis and that the variance is more than variance resulting from randomization.

Interaction Plots

Interaction plots for factors are first split by year, 2015 and 2016. The interactions between candidate, state, and type of disbursement are then plotted.

par(mfrow=c(1,1))
fifteen_data <- droplevels(fifteen_data)
#interaction of candidate and type of disbursement in 2015
cand_type_2015_plot = interaction.plot(fifteen_data$disb_desc, fifteen_data$cand_nm, fifteen_data$disb_amt)

#interaction of candidate and state in 2015
cand_state_2015_plot = interaction.plot(fifteen_data$cand_nm, fifteen_data$recipient_st, fifteen_data$disb_amt)

#interaction of type of disbursement and state in 2015
type_state_2015_plot = interaction.plot(fifteen_data$disb_desc, fifteen_data$recipient_st, fifteen_data$disb_amt)

sixteen_data <- droplevels(sixteen_data)

#interaction of candidate and state in 2016
cand_state_2016_plot = interaction.plot(sixteen_data$cand_nm, sixteen_data$recipient_st, sixteen_data$disb_amt)

#interaction of candidate and type of disbursement in 2016
cand_type_2016_plot = interaction.plot(sixteen_data$disb_desc, sixteen_data$cand_nm, sixteen_data$disb_amt)

#interaction of state and type of disbursement in 2016
type_state_2016_plot = interaction.plot(sixteen_data$disb_desc, sixteen_data$recipient_st, sixteen_data$disb_amt)

In conclusion, this was a fixed effect model, factorial design experiment. The factors candidate, state, type of disbursement, and disbursement year were all studied to determine if they had an effect on disbursement amount. Ultimately, many of these factors had main effects as well as interaction effects. ANOVA was used to confirm that the variance seen were more than a result of variation. However, the results must be examined with caution as the appropriateness of the model was not studied.

4. References

Federal Election Commision, “Presidential Campaign Finance Download,” 2016 Presidential Campaign Finance. [Online]. Available: http://www.fec.gov/disclosurep/PDownload.do. [Accessed: 12-Oct-2016].
D. C. Montgomery, Design and Analysis of Experiments, 8th ed. Hoboken, NJ: John Wiley & Sons, Inc., 2013.

5. Appendix

No additional commenting in the appendix.