This is a factorial design experiment, where multiple factors are varied at the same time. This uses a fixed effect model; thus, the aim is to determine the effect of the four categorical variables. No inferences will be made about the population in this experiment. This experiment studies the 2016 presidential candidates and candidate disbursements to determine if candidate, state, type of disbursement, and year have an effect on disbursement amount. The null hypothesis is that there is no difference between these factors; by the null hypothesis, observed variation is due to randomization. The dataset was chosen from the list of 100 interesting datasets for Statistics: http://www.fec.gov/disclosurep/PDownload.do
#Load in excel data using Utils package
library(utils)
setwd("C:/Users/Alexis/Documents/R/win-library/3.3")
election_data <- read.csv("Project1_PresidentialElectionData3.csv", header = TRUE)
#Show first and last ten rows of data table
head(election_data, 10)
## cand_nm disb_dt recipient_st
## 1 Walker, Scott 7/17/2014 IA
## 2 Webb, James Henry Jr. 11/25/2014 VA
## 3 Webb, James Henry Jr. 11/25/2014 VA
## 4 Webb, James Henry Jr. 12/1/2014 DC
## 5 Webb, James Henry Jr. 12/1/2014 VA
## 6 Webb, James Henry Jr. 12/22/2014 VA
## 7 Webb, James Henry Jr. 2015 VA
## 8 Kasich, John R. 2015 TX
## 9 Kasich, John R. 2015 TX
## 10 Kasich, John R. 2015 TX
## disb_desc disb_amt cmte_id cand_id
## 1 HAHN 9/9 REIMBURSEMENT: MEETING EXPENSE 228.83 C00580480 P60006046
## 2 SALARY 1000.00 C00581215 P60008885
## 3 ADMINISTRATIVE CONSULTING 1000.00 C00581215 P60008885
## 4 DATABASE SERVICES 750.00 C00581215 P60008885
## 5 MERCHANT FEES 2923.86 C00581215 P60008885
## 6 ADMINISTRATIVE CONSULTING 1000.00 C00581215 P60008885
## 7 MERCHANT FEES 1755.78 C00581215 P60008885
## 8 AIRFARE 278.10 C00581876 P60003670
## 9 AIRFARE 163.10 C00581876 P60003670
## 10 AIRFARE 269.60 C00581876 P60003670
## recipient_nm recipient_city recipient_zip memo_cd memo_text
## 1 HY VEE DES MOINES 50266 X
## 2 STANLEY, JOE CALLAWAY 240673414
## 3 WEBB, AMY BURKE 220152949
## 4 NGP VAN WASHINGTON 200055006
## 5 SAGE PAYMENT SOLUTIONS RESTON 201905858
## 6 WEBB, AMY BURKE 220152949
## 7 SAGE PAYMENT SOLUTIONS RESTON 201905858
## 8 AMERICAN AIRLINES FORT WORTH 761552605 X
## 9 AMERICAN AIRLINES FORT WORTH 761552605 X
## 10 AMERICAN AIRLINES FORT WORTH 761552605 X
## form_tp file_num tran_id election_tp
## 1 SB23 1057127 SB23.5056 P2016
## 2 SB23 1057535 B8DB5C9B769244633A1F P2016
## 3 SB23 1057535 B2116ED8A6E7443ACB97 P2016
## 4 SB23 1057535 BD81F6F22406D40CAB7B P2016
## 5 SB23 1057535 B48B71AFC20D54D6CBFB P2016
## 6 SB23 1057535 B769F25E80F014565BC9 P2016
## 7 SB23 1057535 B574650C0BC614DC194D P2016
## 8 SB23 1070348 B42F15A0C9670493A8AA P2016
## 9 SB23 1070348 B7A74D9DB5DDE439284F P2016
## 10 SB23 1070348 B1EC2F9CAFEF048D49E3 P2016
tail(election_data, 10)
## cand_nm disb_dt recipient_st
## 219570 Trump, Donald 2016 WA
## 219571 Trump, Donald 2016 WA
## 219572 Trump, Donald 2016 WA
## 219573 Trump, Donald 2016 WA
## 219574 Trump, Donald 2016 WA
## 219575 Trump, Donald 2016 WA
## 219576 Trump, Donald 2016 WI
## 219577 Trump, Donald 2016 WV
## 219578 Walker, Scott 2016 DC
## 219579 Trump, Donald 2015 AL
## disb_desc disb_amt cmte_id
## 219570 IN-KIND: OFFICE SUPPLIES 518.20 C00580100
## 219571 FIELD CONSULTING 10500.00 C00580100
## 219572 EVENT STAGING EXPENSE 774.17 C00580100
## 219573 FIELD CONSULTING 10500.00 C00580100
## 219574 TRAVEL EXPENSE REIMBURSEMENT: ITEMIZATIO 668.83 C00580100
## 219575 IN-KIND: MEETING EXPENSE: MEALS 234.24 C00580100
## 219576 FIELD CONSULTING 10000.00 C00580100
## 219577 IN-KIND: RENT 1500.00 C00580100
## 219578 CREDIT CARD PROCESSING FEES 435.00 C00580480
## 219579 TRAVEL: LODGING [TUCKER: SB23.289541] 528.19 C00580100
## cand_id recipient_nm recipient_city recipient_zip
## 219570 P80001571 BENTON, DON VANCOUVER 98668
## 219571 P80001571 THE BENTON GROUP VANCOUVER 98668
## 219572 P80001571 SALISH CONSULTANT SERVICES WOODINVILLE 98077
## 219573 P80001571 BENTON, DON VANCOUVER 98668
## 219574 P80001571 BENTON, DON VANCOUVER 98668
## 219575 P80001571 BENTON, DON VANCOUVER 98668
## 219576 P80001571 TROVATO, VINCE WAUKESHA 53188
## 219577 P80001571 ROGERS, ROBERT CLARKSBURG 26301
## 219578 P60006046 STRIPE WASHINGTON 20045
## 219579 P80001571 DOUBLETREE BIRMINGHAM 35205
## memo_cd memo_text form_tp file_num tran_id election_tp
## 219570 SB23 1100920 SB23.2143059 G2016
## 219571 SB23 1100920 SB23.2141371 G2016
## 219572 SB23 1100920 SB23.2140787 G2016
## 219573 SB23 1100920 SB23.2141327 G2016
## 219574 SB23 1100920 SB23.2141354 G2016
## 219575 SB23 1100920 SB23.2143056 G2016
## 219576 SB23 1100920 SB23.2141385 G2016
## 219577 SB23 1100920 SB23.2225539 G2016
## 219578 SB23 1100469 SB23.150087 P2016
## 219579 X SB23 1051572 SB23.289887 P2016
The 4 factors (independent variables) being studied are presidential candidate, state where donations were made/money disbursed, description of the disbursements, and year money disbursed. disb_dt was changed from d-m-y to YEAR, so there are no continuous variables used here.
cand_nm levels: Hillary Clinton, Bernie Sanders, Donald Trump, Ted Cruz recipient_st levels: DC, CA, NY, MA, KY, UT, GA disb_desc levels: in-kind contribution, office supplies, travel, online advertising disb_dt levels: 2015, 2016 The response variable (dependent variable) is disb_amt (disbursement amount).
Select the final 2 democrats and final 2 republicans (also select recipient state, disbursement description, and dispursement date (simplified to year))
election_data_subset <- subset(election_data, cand_nm == "Clinton, Hillary Rodham" | cand_nm == "Sanders, Bernard" | cand_nm == "Cruz, Ted" | cand_nm == "Trump, Donald", select = cand_nm:disb_amt)
election_data_subset2 <- subset(election_data_subset, recipient_st == "DC" | recipient_st == "MA" | recipient_st == "NY" | recipient_st == "CA" | recipient_st == "KY" | recipient_st == "GA" | recipient_st == "UT")
election_data_subset3 <- subset(election_data_subset2, disb_desc == "IN-KIND CONTRIBUTION" | disb_desc == "OFFICE SUPPLIES" | disb_desc == "TRAVEL" | disb_desc == "ONLINE ADVERTISING")
election_data_subset4 <- subset(election_data_subset3, disb_dt == "2015" | disb_dt == "2016")
#Show first and last ten rows of new subsetted data table
head(election_data_subset4, 10)
## cand_nm disb_dt recipient_st disb_desc disb_amt
## 14 Clinton, Hillary Rodham 2015 NY TRAVEL 903.70
## 15 Clinton, Hillary Rodham 2015 NY TRAVEL 1320.39
## 17 Clinton, Hillary Rodham 2015 NY TRAVEL 51.15
## 44 Clinton, Hillary Rodham 2015 NY TRAVEL 4032.00
## 46 Clinton, Hillary Rodham 2015 NY TRAVEL 242.33
## 68 Clinton, Hillary Rodham 2015 CA TRAVEL 320.82
## 75 Clinton, Hillary Rodham 2015 CA TRAVEL 185.00
## 118 Clinton, Hillary Rodham 2015 NY TRAVEL 400.10
## 119 Clinton, Hillary Rodham 2015 NY TRAVEL 2287.60
## 120 Clinton, Hillary Rodham 2015 NY TRAVEL 1173.70
tail(election_data_subset4, 10)
## cand_nm disb_dt recipient_st disb_desc
## 217386 Clinton, Hillary Rodham 2016 GA TRAVEL
## 217395 Clinton, Hillary Rodham 2016 NY TRAVEL
## 217418 Clinton, Hillary Rodham 2016 DC TRAVEL
## 217419 Clinton, Hillary Rodham 2016 DC TRAVEL
## 217420 Clinton, Hillary Rodham 2016 DC TRAVEL
## 217423 Clinton, Hillary Rodham 2016 GA TRAVEL
## 217424 Clinton, Hillary Rodham 2016 GA TRAVEL
## 217462 Cruz, Ted 2016 MA OFFICE SUPPLIES
## 217577 Trump, Donald 2016 MA OFFICE SUPPLIES
## 219511 Trump, Donald 2016 NY OFFICE SUPPLIES
## disb_amt
## 217386 496.33
## 217395 595.10
## 217418 542.70
## 217419 1090.44
## 217420 917.03
## 217423 201.60
## 217424 497.60
## 217462 121.22
## 217577 2062.14
## 219511 1633.13
sample_size = length(1:nrow(election_data_subset4))
sample_size
## [1] 9662
#The data has five columns, 1 for each of the factors and 1 for the response variable. For the subsetted data, the factors are all categorical variables with a set number of levels. The data has 9662 rows.
The main effect will be conducted for each factor as well as the interaction effects for all two factor interaction.
As there was no control over randomization in data collection, the data will be randomized without replacement for analysis. Factorial design experiments assume that data are randomized (in object selection, assignment to treatment, and experimental run order).
election_data_randomized = election_data_subset4[sample(1:nrow(election_data_subset4), size = sample_size, replace = FALSE),]
#Show first and last ten rows of randomized data table
head(election_data_randomized, 10)
## cand_nm disb_dt recipient_st disb_desc
## 90968 Sanders, Bernard 2015 MA OFFICE SUPPLIES
## 80751 Clinton, Hillary Rodham 2015 GA TRAVEL
## 164349 Clinton, Hillary Rodham 2016 NY TRAVEL
## 115606 Clinton, Hillary Rodham 2016 GA OFFICE SUPPLIES
## 207510 Trump, Donald 2016 NY OFFICE SUPPLIES
## 54510 Clinton, Hillary Rodham 2015 NY TRAVEL
## 105706 Clinton, Hillary Rodham 2016 NY TRAVEL
## 216630 Trump, Donald 2016 NY OFFICE SUPPLIES
## 123123 Clinton, Hillary Rodham 2016 UT TRAVEL
## 160497 Clinton, Hillary Rodham 2016 GA TRAVEL
## disb_amt
## 90968 3634.34
## 80751 342.10
## 164349 19260.00
## 115606 25.00
## 207510 445.93
## 54510 18.25
## 105706 17.25
## 216630 955.56
## 123123 134.47
## 160497 25.00
tail(election_data_randomized, 10)
## cand_nm disb_dt recipient_st disb_desc
## 137452 Clinton, Hillary Rodham 2016 NY TRAVEL
## 55678 Clinton, Hillary Rodham 2015 CA TRAVEL
## 198704 Clinton, Hillary Rodham 2016 CA TRAVEL
## 157851 Clinton, Hillary Rodham 2016 GA TRAVEL
## 108397 Sanders, Bernard 2016 CA OFFICE SUPPLIES
## 103969 Clinton, Hillary Rodham 2016 GA TRAVEL
## 33950 Clinton, Hillary Rodham 2015 NY TRAVEL
## 180694 Clinton, Hillary Rodham 2016 CA TRAVEL
## 106126 Clinton, Hillary Rodham 2016 NY TRAVEL
## 156279 Clinton, Hillary Rodham 2016 NY TRAVEL
## disb_amt
## 137452 1760.40
## 55678 481.60
## 198704 9.00
## 157851 371.80
## 108397 475.94
## 103969 415.10
## 33950 248.10
## 180694 239.39
## 106126 83.10
## 156279 49.16
As all disbursements by each candidate were made on different days in different states, there are no repeated measurements. However, as the candidates purchase the same items on different days in different states, those disbursements can be counted as replicates.
Blocking is used to reduce the variability of a sample. Typically, there are nuisance factors for a given experiment that are suspected to have an effect on the response variable but are not considered to be one of the main factors. To block for a nuisance factor, it is held constant during the experiment. As the data used for this study was selected after completion and the original data included many factors, data were blocked to include only the four factors listed above.
fifteen_data = na.omit(subset(election_data_randomized, disb_dt == "2015"))
fifteen_data$cand_nm <- factor(fifteen_data$cand_nm)
fifteen_data$recipient_st <- factor(fifteen_data$recipient_st)
fifteen_data$disb_desc <- factor(fifteen_data$disb_desc)
fifteen_data$disb_dt <- factor(fifteen_data$disb_dt)
#Boxplots examining levels of each factor for exploratory analysis for 2015
boxplotcand <- boxplot(fifteen_data$disb_amt~fifteen_data$cand_nm, xlab = "Candidate", ylab = "Disbursement Amount", main = "Candidate Effect on Disbursement in 2015")
boxplotstate <- boxplot(fifteen_data$disb_amt~fifteen_data$recipient_st, xlab = "State", ylab = "Disbursement Amount", main = "State Effect on Disbursement in 2015")
boxplottype <- boxplot(fifteen_data$disb_amt~fifteen_data$disb_desc, xlab = "Disbursemet Type", ylab = "Disbursement Amount", main = "Disbursement Type Effect on Disbursement in 2015")
sixteen_data = na.omit(subset(election_data_randomized, disb_dt == "2016"))
sixteen_data$cand_nm <- factor(sixteen_data$cand_nm)
sixteen_data$recipient_st <- factor(sixteen_data$recipient_st)
sixteen_data$disb_desc <- factor(sixteen_data$disb_desc)
sixteen_data$disb_dt <- factor(sixteen_data$disb_dt)
#Boxplots examining levels of each factor for exploratory analysis for 2016
boxplotcand <- boxplot(sixteen_data$disb_amt~sixteen_data$cand_nm, xlab = "Candidate", ylab = "Disbursement Amount", main = "Candidate Effect on Disbursement in 2016")
boxplotstate <- boxplot(sixteen_data$disb_amt~sixteen_data$recipient_st, xlab = "State", ylab = "Disbursement Amount", main = "State Effect on Disbursement in 2016")
boxplottype <- boxplot(sixteen_data$disb_amt~sixteen_data$disb_desc, xlab = "Disbursemet Type", ylab = "Disbursement Amount", main = "Disbursement Type Effect on Disbursement in 2016")
#Calculation of Main Effect
HC_data = subset(election_data_randomized, cand_nm == "Clinton, Hillary Rodham")
max_HC = max(HC_data$disb_amt)
max_HC
## [1] 1567000
min_HC = min(HC_data$disb_amt)
min_HC
## [1] -1769.32
mean_HC = mean(HC_data$disb_amt)
mean_HC
## [1] 3086.88
median_HC = median(HC_data$disb_amt)
median_HC
## [1] 132.55
DT_data = subset(election_data_randomized, cand_nm == "Trump, Donald")
max_DT = max(DT_data$disb_amt)
max_DT
## [1] 3e+05
min_DT = min(DT_data$disb_amt)
min_DT
## [1] 3.27
mean_DT = mean(DT_data$disb_amt)
mean_DT
## [1] 9126.512
median_DT = median(DT_data$disb_amt)
median_DT
## [1] 1048
TC_data = subset(election_data_randomized, cand_nm == "Cruz, Ted")
max_TC = max(TC_data$disb_amt)
max_TC
## [1] 98360.26
min_TC = min(TC_data$disb_amt)
min_TC
## [1] 2.98
mean_TC = mean(TC_data$disb_amt)
mean_TC
## [1] 870.9839
median_TC = median(TC_data$disb_amt)
median_TC
## [1] 136.93
BS_data = subset(election_data_randomized, cand_nm == "Sanders, Bernard")
max_BS = max(BS_data$disb_amt)
max_BS
## [1] 13046.97
min_BS = min(BS_data$disb_amt)
min_BS
## [1] -1388.91
mean_BS = mean(BS_data$disb_amt)
mean_BS
## [1] 904.3147
median_BS = median(BS_data$disb_amt)
median_BS
## [1] 374.69
max_cand = max(median_HC, median_DT, median_BS, median_TC)
max_cand
## [1] 1048
min_cand = min(median_HC, median_DT, median_BS, median_TC)
min_cand
## [1] 132.55
Based on max and min, Hillary Clinton had both the highest and lowest disbursement. Thus, means were used as a second assessment. As the mean is more skewed by outliers, median was also used to determine the which levels will be used to calculate the main effect. Donald Trump had the highest median disbursement, while Hillary had the lowest median disbursement. Thus, these levels were used to calculate the main effect of candidate.
main_effect_candidate = median_DT - median_HC
main_effect_candidate
## [1] 915.45
Calculate the main effect of state
NY_data = subset(election_data_randomized, recipient_st == "NY")
max_NY = max(NY_data$disb_amt)
max_NY
## [1] 377111.2
min_NY = min(NY_data$disb_amt)
min_NY
## [1] -1769.32
mean_NY = mean(NY_data$disb_amt)
mean_NY
## [1] 3318.91
median_NY = median(NY_data$disb_amt)
median_NY
## [1] 51
MA_data = subset(election_data_randomized, recipient_st == "MA")
max_MA = max(MA_data$disb_amt)
max_MA
## [1] 18670.97
min_MA = min(MA_data$disb_amt)
min_MA
## [1] -113.4
mean_MA = mean(MA_data$disb_amt)
mean_MA
## [1] 314.0279
median_MA = median(MA_data$disb_amt)
median_MA
## [1] 55.3
CA_data = subset(election_data_randomized, recipient_st == "CA")
max_CA = max(CA_data$disb_amt)
max_CA
## [1] 3e+05
min_CA = min(CA_data$disb_amt)
min_CA
## [1] -272.55
mean_CA = mean(CA_data$disb_amt)
mean_CA
## [1] 578.0414
median_CA = median(CA_data$disb_amt)
median_CA
## [1] 82.07
GA_data = subset(election_data_randomized, recipient_st == "GA")
max_GA = max(GA_data$disb_amt)
max_GA
## [1] 31212.53
min_GA = min(GA_data$disb_amt)
min_GA
## [1] -1388.91
mean_GA = mean(GA_data$disb_amt)
mean_GA
## [1] 459.4689
median_GA = median(GA_data$disb_amt)
median_GA
## [1] 330.54
KY_data = subset(election_data_randomized, recipient_st == "KY")
max_KY = max(KY_data$disb_amt)
max_KY
## [1] 13046.97
min_KY = min(KY_data$disb_amt)
min_KY
## [1] -384.42
mean_KY = mean(KY_data$disb_amt)
mean_KY
## [1] 1347.883
median_KY = median(KY_data$disb_amt)
median_KY
## [1] 360.155
UT_data = subset(election_data_randomized, recipient_st == "UT")
max_UT = max(UT_data$disb_amt)
max_UT
## [1] 18288.98
min_UT = min(UT_data$disb_amt)
min_UT
## [1] 5.28
mean_UT = mean(UT_data$disb_amt)
mean_UT
## [1] 569.2759
median_UT = median(UT_data$disb_amt)
median_UT
## [1] 291.03
max_st = max(median_NY, median_MA, median_CA, median_GA, median_UT, median_KY)
max_st
## [1] 360.155
min_st = min(median_NY, median_MA, median_CA, median_GA, median_UT, median_KY)
min_st
## [1] 51
Based on max and min, NY had both the highest and lowest disbursement. Thus, means were used as a second assessment. As the mean is more skewed by outliers, median was also used to determine the which levels will be used to calculate the main effect. KY had the highest median disbursement, while NY had the lowest median disbursement. Thus, these levels were used to calculate the main effect of state.
main_effect_state = median_KY - median_NY
main_effect_state
## [1] 309.155
Calculate the main effect of the type of disbursement
inkind_data = na.omit(subset(election_data_randomized, disb_desc == "IN-KIND CONTRIBUTION"))
max_inkind = max(inkind_data$disb_amt)
max_inkind
## [1] 2700
min_inkind = min(inkind_data$disb_amt)
min_inkind
## [1] 50
mean_inkind = mean(inkind_data$disb_amt)
mean_inkind
## [1] 581.7833
median_inkind = median(inkind_data$disb_amt)
median_inkind
## [1] 300
office_data = na.omit(subset(election_data_randomized, disb_desc == "OFFICE SUPPLIES"))
max_office = max(office_data$disb_amt)
max_office
## [1] 31212.53
min_office = min(office_data$disb_amt)
min_office
## [1] -1388.91
mean_office = mean(office_data$disb_amt)
mean_office
## [1] 924.6212
median_office = median(office_data$disb_amt)
median_office
## [1] 118.87
travel_data = na.omit(subset(election_data_randomized, disb_desc == "TRAVEL"))
max_travel = max(travel_data$disb_amt)
max_travel
## [1] 377111.2
min_travel = min(travel_data$disb_amt)
min_travel
## [1] -1769.32
mean_travel = mean(travel_data$disb_amt)
mean_travel
## [1] 1599.595
median_travel = median(travel_data$disb_amt)
median_travel
## [1] 137.17
online_data = na.omit(subset(election_data_randomized, disb_desc == "ONLINE ADVERTISING"))
max_online = max(online_data$disb_amt)
max_online
## [1] 1567000
min_online = min(online_data$disb_amt)
min_online
## [1] 9.92
mean_online = mean(online_data$disb_amt)
mean_online
## [1] 289364.2
median_online = median(online_data$disb_amt)
median_online
## [1] 43461
max_type = max(median_travel, median_online, median_office, median_inkind)
max_type
## [1] 43461
min_type = min(median_travel, median_online, median_office, median_inkind)
min_type
## [1] 118.87
To be consistent with other conditions, median was the final method used to determine which levels will be used to calculate the main effect. Online Advertising had the highest median disbursement, while office supplies had the lowest median disbursement. Thus, these levels were used to calculate the main effect of type of disbursement.
## [1] 43342.13
Calculate the main effect of year
fifteen_data = subset(election_data_randomized, disb_dt == "2015")
max_fifteen = max(fifteen_data$disb_amt)
max_fifteen
## [1] 1138063
min_fifteen= min(fifteen_data$disb_amt)
min_fifteen
## [1] -1769.32
mean_fifteen= mean(fifteen_data$disb_amt)
mean_fifteen
## [1] 2581.242
median_fifteen= median(fifteen_data$disb_amt)
median_fifteen
## [1] 126
sixteen_data = subset(election_data_randomized, disb_dt == "2016")
max_sixteen= max(sixteen_data$disb_amt)
max_sixteen
## [1] 1567000
min_sixteen= min(sixteen_data$disb_amt)
min_sixteen
## [1] -1388.91
mean_sixteen= mean(sixteen_data$disb_amt)
mean_sixteen
## [1] 2921.319
median_sixteen = median(sixteen_data$disb_amt)
median_sixteen
## [1] 143.19
max_year = max(median_fifteen, median_sixteen)
max_year
## [1] 143.19
#To be consistent with other conditions, median was the final method used to determine which levels will be used to calculate the main effect. Year 2016 had the highest median disbursement, while year 2015 had the lowest median disbursement. Thus, these levels were used to calculate the main effect of type of disbursement.
main_effect_year = median_sixteen - median_fifteen
main_effect_year
## [1] 17.19
# All main effects
main_effect_candidate
## [1] 915.45
main_effect_state
## [1] 309.155
main_effect_type
## [1] 43342.13
main_effect_year
## [1] 17.19
# The largest main effect is due to the type of disbursement.
Compute Analysis of Variance for all main effects (me) and two factor interactions (2fi)
#cand_nm
anova_cand <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm)
summary.aov(anova_cand)
## Df Sum Sq Mean Sq F value Pr(>F)
## election_data_randomized$cand_nm 3 7.592e+09 2.531e+09 2.058 0.104
## Residuals 9658 1.188e+13 1.230e+09
# recipient_st
anova_state <- aov(election_data_randomized$disb_amt ~ election_data_randomized$recipient_st)
summary(anova_state)
## Df Sum Sq Mean Sq F value
## election_data_randomized$recipient_st 6 9.835e+10 1.639e+10 13.43
## Residuals 9655 1.178e+13 1.221e+09
## Pr(>F)
## election_data_randomized$recipient_st 3.16e-15 ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# disb_desc
anova_description <- aov(election_data_randomized$disb_amt ~ election_data_randomized$disb_desc)
summary(anova_description)
## Df Sum Sq Mean Sq F value Pr(>F)
## election_data_randomized$disb_desc 3 3.464e+12 1.155e+12 1325 <2e-16
## Residuals 9658 8.419e+12 8.717e+08
##
## election_data_randomized$disb_desc ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# disb_dt
anova_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$disb_dt)
summary(anova_date)
## Df Sum Sq Mean Sq F value Pr(>F)
## election_data_randomized$disb_dt 1 2.510e+08 2.51e+08 0.204 0.651
## Residuals 9660 1.188e+13 1.23e+09
#**By ANOVA, the main effects from type of disbursement and state both demonstrate that the variance is explained, not due to randomization.**
# cand_nm and recipient_st
anova_cand_state <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm*election_data_randomized$recipient_st)
summary(anova_cand_state)
## Df
## election_data_randomized$cand_nm 3
## election_data_randomized$recipient_st 6
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 14
## Residuals 9638
## Sum Sq
## election_data_randomized$cand_nm 7.592e+09
## election_data_randomized$recipient_st 9.478e+10
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 9.028e+10
## Residuals 1.169e+13
## Mean Sq
## election_data_randomized$cand_nm 2.531e+09
## election_data_randomized$recipient_st 1.580e+10
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 6.448e+09
## Residuals 1.213e+09
## F value
## election_data_randomized$cand_nm 2.086
## election_data_randomized$recipient_st 13.023
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 5.316
## Residuals
## Pr(>F)
## election_data_randomized$cand_nm 0.0998
## election_data_randomized$recipient_st 9.99e-15
## election_data_randomized$cand_nm:election_data_randomized$recipient_st 3.34e-10
## Residuals
##
## election_data_randomized$cand_nm .
## election_data_randomized$recipient_st ***
## election_data_randomized$cand_nm:election_data_randomized$recipient_st ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# cand_nm and disb_desc
anova_cand_description <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm*election_data_randomized$disb_desc)
summary(anova_cand_description)
## Df
## election_data_randomized$cand_nm 3
## election_data_randomized$disb_desc 3
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 3
## Residuals 9652
## Sum Sq
## election_data_randomized$cand_nm 7.592e+09
## election_data_randomized$disb_desc 3.458e+12
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 1.170e+08
## Residuals 8.418e+12
## Mean Sq
## election_data_randomized$cand_nm 2.531e+09
## election_data_randomized$disb_desc 1.153e+12
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 3.898e+07
## Residuals 8.722e+08
## F value
## election_data_randomized$cand_nm 2.901
## election_data_randomized$disb_desc 1321.447
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 0.045
## Residuals
## Pr(>F)
## election_data_randomized$cand_nm 0.0335
## election_data_randomized$disb_desc <2e-16
## election_data_randomized$cand_nm:election_data_randomized$disb_desc 0.9875
## Residuals
##
## election_data_randomized$cand_nm *
## election_data_randomized$disb_desc ***
## election_data_randomized$cand_nm:election_data_randomized$disb_desc
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# cand_nm and disb_dt
anova_cand_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$cand_nm*election_data_randomized$disb_dt)
summary(anova_cand_date)
## Df
## election_data_randomized$cand_nm 3
## election_data_randomized$disb_dt 1
## election_data_randomized$cand_nm:election_data_randomized$disb_dt 3
## Residuals 9654
## Sum Sq
## election_data_randomized$cand_nm 7.592e+09
## election_data_randomized$disb_dt 5.696e+07
## election_data_randomized$cand_nm:election_data_randomized$disb_dt 5.866e+08
## Residuals 1.188e+13
## Mean Sq
## election_data_randomized$cand_nm 2.531e+09
## election_data_randomized$disb_dt 5.696e+07
## election_data_randomized$cand_nm:election_data_randomized$disb_dt 1.955e+08
## Residuals 1.230e+09
## F value
## election_data_randomized$cand_nm 2.057
## election_data_randomized$disb_dt 0.046
## election_data_randomized$cand_nm:election_data_randomized$disb_dt 0.159
## Residuals
## Pr(>F)
## election_data_randomized$cand_nm 0.104
## election_data_randomized$disb_dt 0.830
## election_data_randomized$cand_nm:election_data_randomized$disb_dt 0.924
## Residuals
# recipient_st and disb_desc
anova_state_description <- aov(election_data_randomized$disb_amt ~ election_data_randomized$recipient_st*election_data_randomized$disb_desc)
summary(anova_state_description)
## Df
## election_data_randomized$recipient_st 6
## election_data_randomized$disb_desc 3
## election_data_randomized$recipient_st:election_data_randomized$disb_desc 9
## Residuals 9643
## Sum Sq
## election_data_randomized$recipient_st 9.835e+10
## election_data_randomized$disb_desc 3.401e+12
## election_data_randomized$recipient_st:election_data_randomized$disb_desc 1.008e+12
## Residuals 7.376e+12
## Mean Sq
## election_data_randomized$recipient_st 1.639e+10
## election_data_randomized$disb_desc 1.134e+12
## election_data_randomized$recipient_st:election_data_randomized$disb_desc 1.120e+11
## Residuals 7.649e+08
## F value
## election_data_randomized$recipient_st 21.43
## election_data_randomized$disb_desc 1482.14
## election_data_randomized$recipient_st:election_data_randomized$disb_desc 146.48
## Residuals
## Pr(>F)
## election_data_randomized$recipient_st <2e-16
## election_data_randomized$disb_desc <2e-16
## election_data_randomized$recipient_st:election_data_randomized$disb_desc <2e-16
## Residuals
##
## election_data_randomized$recipient_st ***
## election_data_randomized$disb_desc ***
## election_data_randomized$recipient_st:election_data_randomized$disb_desc ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# recipient_st and disb_dt
anova_state_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$recipient_st*election_data_randomized$disb_dt)
summary(anova_state_date)
## Df
## election_data_randomized$recipient_st 6
## election_data_randomized$disb_dt 1
## election_data_randomized$recipient_st:election_data_randomized$disb_dt 6
## Residuals 9648
## Sum Sq
## election_data_randomized$recipient_st 9.835e+10
## election_data_randomized$disb_dt 3.062e+08
## election_data_randomized$recipient_st:election_data_randomized$disb_dt 1.226e+10
## Residuals 1.177e+13
## Mean Sq
## election_data_randomized$recipient_st 1.639e+10
## election_data_randomized$disb_dt 3.062e+08
## election_data_randomized$recipient_st:election_data_randomized$disb_dt 2.043e+09
## Residuals 1.220e+09
## F value
## election_data_randomized$recipient_st 13.434
## election_data_randomized$disb_dt 0.251
## election_data_randomized$recipient_st:election_data_randomized$disb_dt 1.674
## Residuals
## Pr(>F)
## election_data_randomized$recipient_st 3.12e-15
## election_data_randomized$disb_dt 0.616
## election_data_randomized$recipient_st:election_data_randomized$disb_dt 0.123
## Residuals
##
## election_data_randomized$recipient_st ***
## election_data_randomized$disb_dt
## election_data_randomized$recipient_st:election_data_randomized$disb_dt
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#disb_desc and disb_dt
anova_description_date <- aov(election_data_randomized$disb_amt ~ election_data_randomized$disb_desc*election_data_randomized$disb_dt)
summary(anova_description_date)
## Df
## election_data_randomized$disb_desc 3
## election_data_randomized$disb_dt 1
## election_data_randomized$disb_desc:election_data_randomized$disb_dt 3
## Residuals 9654
## Sum Sq
## election_data_randomized$disb_desc 3.464e+12
## election_data_randomized$disb_dt 5.686e+09
## election_data_randomized$disb_desc:election_data_randomized$disb_dt 1.156e+11
## Residuals 8.298e+12
## Mean Sq
## election_data_randomized$disb_desc 1.155e+12
## election_data_randomized$disb_dt 5.686e+09
## election_data_randomized$disb_desc:election_data_randomized$disb_dt 3.852e+10
## Residuals 8.595e+08
## F value
## election_data_randomized$disb_desc 1343.470
## election_data_randomized$disb_dt 6.615
## election_data_randomized$disb_desc:election_data_randomized$disb_dt 44.818
## Residuals
## Pr(>F)
## election_data_randomized$disb_desc <2e-16
## election_data_randomized$disb_dt 0.0101
## election_data_randomized$disb_desc:election_data_randomized$disb_dt <2e-16
## Residuals
##
## election_data_randomized$disb_desc ***
## election_data_randomized$disb_dt *
## election_data_randomized$disb_desc:election_data_randomized$disb_dt ***
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#**By studying the F statistic results from the ANOVA, many of the interactions have large F statistics, suggesting we can reject the null hypothesis and that the variance is more than variance resulting from randomization.
Interaction plots for factors are first split by year, 2015 and 2016. The interactions between candidate, state, and type of disbursement are then plotted.
par(mfrow=c(1,1))
fifteen_data <- droplevels(fifteen_data)
#interaction of candidate and type of disbursement in 2015
cand_type_2015_plot = interaction.plot(fifteen_data$disb_desc, fifteen_data$cand_nm, fifteen_data$disb_amt)
#interaction of candidate and state in 2015
cand_state_2015_plot = interaction.plot(fifteen_data$cand_nm, fifteen_data$recipient_st, fifteen_data$disb_amt)
#interaction of type of disbursement and state in 2015
type_state_2015_plot = interaction.plot(fifteen_data$disb_desc, fifteen_data$recipient_st, fifteen_data$disb_amt)
sixteen_data <- droplevels(sixteen_data)
#interaction of candidate and state in 2016
cand_state_2016_plot = interaction.plot(sixteen_data$cand_nm, sixteen_data$recipient_st, sixteen_data$disb_amt)
#interaction of candidate and type of disbursement in 2016
cand_type_2016_plot = interaction.plot(sixteen_data$disb_desc, sixteen_data$cand_nm, sixteen_data$disb_amt)
#interaction of state and type of disbursement in 2016
type_state_2016_plot = interaction.plot(sixteen_data$disb_desc, sixteen_data$recipient_st, sixteen_data$disb_amt)
In conclusion, this was a fixed effect model, factorial design experiment. The factors candidate, state, type of disbursement, and disbursement year were all studied to determine if they had an effect on disbursement amount. Ultimately, many of these factors had main effects as well as interaction effects. ANOVA was used to confirm that the variance seen were more than a result of variation. However, the results must be examined with caution as the appropriateness of the model was not studied.
No additional commenting in the appendix.