##
## Welcome to CUNY DATA606 Statistics and Probability for Data Analytics
## This package is designed to support this course. The text book used
## is OpenIntro Statistics, 3rd Edition. You can read this by typing
## vignette('os3') or visit www.OpenIntro.org.
##
## The getLabs() function will return a list of the labs available.
##
## The demo(package='DATA606') will list the demos that are available.
In this project we are going to use the Federal Election Commission data set to research the campaign contributions and expenditures for election 2016 vs 2012.
We are going to analyze the data of 4 major party candidates, 2 Democrats and 2 Republicans. Would like to see how the Democrats and Republicans have spent in both the election cycle.
Further would like to do some analysis on how the Political Action Committes have spend their amounts in 2012 and 2016. Would like to investigate how the PACs have sepnd for/against a particular candidates.
Would like to present a heat map with the statewise contributions or expenditure and represent in plot_ly or choropleth.
Reading the Data from FEC Website. We have downloaded the contribution files of Hillary Clinton and Donald J Trump. Owing to pretty huge data for 2012, we are skipping the data for 2012 and going to compare the contributions for 2016 between two major party candidates.
First we are loading the data HRC data contributions and we are cleaning the data by removing unneccessary data attributes.
# The HRC DataSet to data.frame
hrccontributions=fread('HRC_Cont.csv',stringsAsFactors = TRUE)
## Warning in fread("HRC_Cont.csv", stringsAsFactors = TRUE):
## Starting data input on line 2 and discarding line 1 because
## it has too few or too many items to be column names or data:
## cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
##
Read 2.3% of 3506081 rows
Read 14.0% of 3506081 rows
Read 23.1% of 3506081 rows
Read 31.7% of 3506081 rows
Read 41.9% of 3506081 rows
Read 53.9% of 3506081 rows
Read 67.9% of 3506081 rows
Read 85.6% of 3506081 rows
Read 86.1% of 3506081 rows
Read 3506081 rows and 19 (of 19) columns from 0.621 GB file in 00:00:14
hrccontributions=data.frame(hrccontributions)
hrccontributions = subset(hrccontributions, hrccontributions$V10 > 0)
hrccontributions = subset(hrccontributions, hrccontributions$V4 != "HILLARY VICTORY FUND - UNITEMIZED")
hrccontributions=subset(hrccontributions,select = c(V3:V13))
Now having loaded the HRC data set the dimensions are 3471105, 11
Next we will be loading the data set for DJT and cleaning the data by removing unneccessary values.
djtcontributions=fread('DJT_Cont.csv',stringsAsFactors = TRUE )
## Warning in fread("DJT_Cont.csv", stringsAsFactors = TRUE):
## Starting data input on line 2 and discarding line 1 because
## it has too few or too many items to be column names or data:
## cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
djtcontributions=data.frame(djtcontributions)
djtcontributions = subset(djtcontributions, djtcontributions$V10 > 0)
djtcontributions=subset(djtcontributions,select = c(V3:V13))
hrccontributions$V11=str_replace_all(hrccontributions$V11, "[:digit:]","")
hrccontributions$V11=str_replace_all(hrccontributions$V11, "-","")
djtcontributions$V11=str_replace_all(djtcontributions$V11, "[:digit:]","")
djtcontributions$V11=str_replace_all(djtcontributions$V11, "-","")
Loaded the DJT data set the dimensions are 762871, 11
Now for the two data set we will do some summary statistics using summary. The only variable of interest in this data set is that of contributions.
# Will do a box plot to see how the contributions are. We could see a outlier in DJT contributions, which is way out of reach.
boxplot(hrccontributions$V10 , djtcontributions$V10)
We will see who are the top contributors for both the campaigns.
#HRC Campaign.
top_fifty = hrccontributions %>%
filter(rank(dplyr::desc(hrccontributions$V10))<=100)
# Top Fifty Individual Contributor's to HRC Campaign.
knitr::kable(head(plyr::arrange(top_fifty,desc(top_fifty$V10)), n = 20))
V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 |
---|---|---|---|---|---|---|---|---|---|---|
Clinton, Hillary Rodham | GOCKE, THOMAS | JUPITER | FL | 334691584 | SELF-EMPLOYED | PHYSICIAN | 20000 | JUN | ||
Clinton, Hillary Rodham | YOUNG, SAMUEL J. | TEHACHAPI | CA | 935618652 | SELF-EMPLOYED | REAL ESTATE INVESTOR | 10000 | JUN | X | |
Clinton, Hillary Rodham | YOUNG, SAMUEL J. | TEHACHAPI | CA | 935618652 | INFORMATION REQUESTED | INFORMATION REQUESTED | 7300 | JUN | ||
Clinton, Hillary Rodham | HILLARY ACTION FUND - UNITEMIZED | NEW YORK | NY | 101855256 | 5620 | SEP | X | |||
Clinton, Hillary Rodham | PROPPER, GREG | LOS ANGELES | CA | 900691429 | PROPPER DALEY | PHILANTHROPIC CONSULTING | 5400 | JUN | ||
Clinton, Hillary Rodham | PICKER, MICHAEL | SACRAMENTO | CA | 958413111 | STATE OF CALIFORNIA | COMMISSIONER | 5400 | NOV | ||
Clinton, Hillary Rodham | AUSTIN, ALAN | ATHERTON | CA | 940275458 | N/A | RETIRED | 5400 | AUG | ||
Clinton, Hillary Rodham | FOX, ALAN | STUDIO CITY | CA | 916042407 | ACF PROPERTY MANAGEMENT | PRESIDENT | 5400 | MAR | ||
Clinton, Hillary Rodham | BEAUBIEN, JAMES | SANTA MONICA | CA | 904023024 | LATHAM & WATKINS LLP | ATTORNEY | 5400 | APR | ||
Clinton, Hillary Rodham | CARROLL, DANIEL ASHTON | SAN FRANCISCO | CA | 941151125 | TPG CAPITAL | INVESTMENT MANAGER | 5400 | MAR | ||
Clinton, Hillary Rodham | HOLZMAN, WINNIE | BURBANK | CA | 915054005 | SELF-EMPLOYED | WRITER | 5400 | JUN | ||
Clinton, Hillary Rodham | ITKIN, MARK | LOS ANGELES | CA | 900691128 | WILLIAM MORRIS ENDEAVOR | TALENT AGENT | 5400 | APR | ||
Clinton, Hillary Rodham | FENG, JIONG | CLAREMONT | CA | 917116500 | W. CALIFORNIA ART ACADEMY | TEACHER | 5400 | MAY | ||
Clinton, Hillary Rodham | WALLACE, RICHARD | SHERMAN OAKS | CA | 914032915 | WARNER BROS TV | WARNER BROTHERS TV | 5400 | APR | ||
Clinton, Hillary Rodham | HANNA, MONA Z. | VILLA PARK | CA | 928615322 | MICHELMAN & ROBINSON, LLP | ATTORNEY | 5400 | JUN | ||
Clinton, Hillary Rodham | BARSZCZ, MICHAEL | WINTER PARK | FL | 327893347 | SELF-EMPLOYED | ATTORNEY | 5400 | MAY | ||
Clinton, Hillary Rodham | GHAZVINI, MEHRAN | TALLAHASSEE | FL | 323081547 | PREMIER HEALTH CLINIC & REHAB OF TALLA | DOCTOR OF CHIRPORACTIC | 5400 | MAY | ||
Clinton, Hillary Rodham | FINK KOHL, BINA | WESTON | FL | 333262726 | SELF-EMPLOYED | PUBLIC RELATIONS | 5400 | MAY | ||
Clinton, Hillary Rodham | SCHRAGIE, GOLDBLATT | MIAMI BEACH | FL | 331403429 | PROMED MANAGEMENT, INC. | PRESIDENT | 5400 | APR | X | |
Clinton, Hillary Rodham | TOLL, ROBERT | MIAMI BEACH | FL | 331404226 | TOLL BROTHERS, INC | EXECUTIVE CHAIRMAN | 5400 | JUN | X |
# DJT Campaign.
top_fifty_djt = djtcontributions %>%
filter(rank(desc(djtcontributions$V10))<=100)
# Top Fifty Individual Contributor's to DJT Campaign.
knitr::kable(head(plyr::arrange(top_fifty_djt,desc(top_fifty_djt$V10)), n = 20))
V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 |
---|---|---|---|---|---|---|---|---|---|---|
Trump, Donald J. | BOCH, ERNIE | NORWOOD | MA | 02062 | BOCH AUTOMOTIVE GROUP | EXECUTIVE | 86936.80 | AUG | ||
Trump, Donald J. | FERRERO, LOUIS P MR. | CANTON | GA | 30115 | INFORMATION REQUESTED | INFORMATION REQUESTED | 12500.00 | JUL | ||
Trump, Donald J. | CONSERVATIVE ACTION FUND | ALEXANDRIA | VA | 22314 | 10030.24 | DEC | X | |||
Trump, Donald J. | COBB, ROBERT | BIRMINGHAM | AL | 35209 | COBB THEATERS | OWNER | 10000.00 | NOV | ||
Trump, Donald J. | DOBSKI, ROBERT | BLOOMINGTON | IL | 61704 | INFORMATION REQUESTED | INFORMATION REQUESTED | 10000.00 | NOV | ||
Trump, Donald J. | GORMAN, L.D. MR. | HAZARD | KY | 41702 | INFORMATION REQUESTED | INFORMATION REQUESTED | 10000.00 | JUL | ||
Trump, Donald J. | ROVT, ALEXANDER MR. | BROOKLYN | NY | 11234 | INFORMATION REQUESTED | INFORMATION REQUESTED | 10000.00 | NOV | ||
Trump, Donald J. | GIGANTE, PETER | BELLINGHAM | WA | 98225 | SELF-EMPLOYED | INTERNATIONAL TRADE | 10000.00 | AUG | ||
Trump, Donald J. | TANZER, LEONARD J MR. | SCARSDALE | NY | 10583 | PATIENT CARE ASSOCIATES | PRESIDENT | 7300.00 | NOV | ||
Trump, Donald J. | NORTHCUTT, JOHN D MR. III | FAIRHOPE | AL | 36532 | INFORMATION REQUESTED | INFORMATION REQUESTED | 5400.00 | JUL | ||
Trump, Donald J. | HARBERT, NORMAN C MR. | SOTTSDALE | AZ | 85262 | RETIRED | RETIRED | 5400.00 | SEP | ||
Trump, Donald J. | DAY, TIMOTHY T MR. | PHOENIX | AZ | 85018 | INFORMATION REQUESTED | INFORMATION REQUESTED | 5400.00 | JUL | ||
Trump, Donald J. | STERN, MARC MR. | MALIBU | CA | 90265 | THE TCW GROUP, INC | CHAIRMAN | 5400.00 | JUL | ||
Trump, Donald J. | LITTLEFAIR, ANDREW | NEW PORT BEACH | CA | 92663 | CLEAN ENERGY FUELD | PRESIDENT | 5400.00 | OCT | ||
Trump, Donald J. | SCHUMANN, ROBERT | MANHATTAN BEACH | CA | 90266 | REAL ESTATE WEST INC. | BROKER | 5400.00 | NOV | ||
Trump, Donald J. | LISENKO, ROSE H MS. | SAN DIEGO | CA | 92107 | INFORMATION REQUESTED | INFORMATION REQUESTED | 5400.00 | AUG | ||
Trump, Donald J. | NEE, ALICE MS. | NEWPORT COAST | CA | 92657 | INFORMATION REQUESTED | INFORMATION REQUESTED | 5400.00 | AUG | ||
Trump, Donald J. | MIZEL, CAROL MS. | DENVER | CO | 80237 | HOMEMAKER | HOMEMAKER | 5400.00 | SEP | ||
Trump, Donald J. | MIZEL, LARRY A MR. | DENVER | CO | 80237 | M.D.C HOLDINGS | INVESTMENT | 5400.00 | SEP | ||
Trump, Donald J. | TRAVIS, CHARLOTTE MS. | BRIGHTON | CO | 80601 | INFORMATION REQUESTED | INFORMATION REQUESTED | 5400.00 | AUG |
Group By Profession.
We will look at who were the top contributor for both the campaigns based on the profession.
top_contributor_prof <- hrccontributions %>%
group_by(V9) %>%
summarise(V10=n())
top_contributor_djt <- djtcontributions %>%
group_by(V9) %>%
summarise(V10=n())
knitr::kable(head(plyr::arrange(top_contributor_prof,desc(top_contributor_prof$V10)), n = 10))
V9 | V10 |
---|---|
RETIRED | 862272 |
ATTORNEY | 145237 |
INFORMATION REQUESTED | 80238 |
TEACHER | 79455 |
PROFESSOR | 73324 |
PHYSICIAN | 67938 |
HOMEMAKER | 60355 |
NOT EMPLOYED | 56072 |
CONSULTANT | 54765 |
LAWYER | 43813 |
knitr::kable(head(plyr::arrange(top_contributor_djt,desc(top_contributor_djt$V10)), n = 10))
V9 | V10 |
---|---|
RETIRED | 292655 |
INFORMATION REQUESTED | 153128 |
SELF-EMPLOYED | 11118 |
SALES | 10047 |
PHYSICIAN | 8837 |
HOMEMAKER | 8188 |
BUSINESS OWNER | 7438 |
ENGINEER | 6915 |
ATTORNEY | 6071 |
OWNER | 5401 |
Group By State
We will look at who were the top contributor for both the campaigns based on the state were in they reside.
top_state_hrc <- hrccontributions %>%
group_by(V6) %>%
summarise(V10=n())
top_state_djt <- djtcontributions %>%
group_by(V6) %>%
summarise(V10=n())
knitr::kable(head(plyr::arrange(top_state_hrc,desc(top_state_hrc$V10)), n = 10))
V6 | V10 |
---|---|
CA | 681518 |
NY | 394653 |
TX | 201805 |
FL | 182113 |
MA | 152570 |
WA | 125099 |
IL | 121487 |
PA | 118255 |
NJ | 113906 |
VA | 109987 |
knitr::kable(head(plyr::arrange(top_state_djt,desc(top_state_djt$V10)), n = 10))
V6 | V10 |
---|---|
CA | 83741 |
FL | 76961 |
TX | 76143 |
NY | 35762 |
PA | 30046 |
GA | 29055 |
OH | 26281 |
VA | 23630 |
IL | 22748 |
AZ | 21429 |
We are going to plot the states where these top contributors are residing by using a Plotly Map.
top_state_hrc$hover = with(top_state_hrc, paste(top_state_hrc$V6, '<br>', "Total contributions", top_state_hrc$V10))
l=list(color = toRGB("white"), width = 2)
g=list(
scope = 'usa',
projection = list(type = 'albers usa'),
showlakes = TRUE,
lakecolor = toRGB('white')
)
p=plot_geo(top_state_hrc, locationmode = 'USA-states') %>%
add_trace(
z = ~V10, text = ~hover, locations = ~V6,
color = ~V10, colors = 'Blues'
) %>%
colorbar(title = "Contributions") %>%
layout(
title = 'Total HRC Contributions by State<br>(Hover for breakdown)',
geo = g
)
top_state_djt$hover=with(top_state_djt, paste(top_state_djt$V6, '<br>', "Total contributions", top_state_djt$V10))
p_dgt=plot_geo(top_state_djt, locationmode = 'USA-states') %>%
add_trace(
z = ~V10, text = ~hover, locations = ~V6,
color = ~V10, colors = 'Reds'
) %>%
colorbar(title = "Contributions") %>%
layout(
title = 'Total DJT Contributions by State<br>(Hover for breakdown)',
geo = g
)
#Sys.setenv("plotly_username"="data607g3")
#Sys.setenv("plotly_api_key"="dDZhymyplVxhzUuXv2MX")
#chart_link = api_create(p, filename="HRCStateOne",id="HRC")
#chart_link_djt = api_create(p_dgt,filename = "DJTStateOne")
#chart_link_djt
p
p_dgt
\(H_0\) Null Hypothesis : The average Contributions during last four months does not vary across the campaigns.
\(H_A\) Alternative Hypothesis : The average Contributions during last four months varies across the campaigns.
For this first we are cleaning the data and picking data only those we are interested in.
So we are picking contributions only for the month of JUN, JUL, AUG, SET and OCT for both the campaigns.
hrcfinal = data.frame(hrccontributions)
hrcfinal=subset(hrcfinal,select = c(V10,V11))
hrclast=subset(hrcfinal, hrcfinal$V11 == "JUN" | hrcfinal$V11 == "JUL" | hrcfinal$V11 == "AUG" |hrcfinal$V11 == "SEP" |hrcfinal$V11 == "OCT" )
djtfinal = data.frame(djtcontributions)
djtfinal=subset(djtfinal,select = c(V10,V11))
djtlast=subset(djtfinal, djtfinal$V11 == "JUN" | djtfinal$V11 == "JUL" | djtfinal$V11 == "AUG" |djtfinal$V11 == "SEP" |djtfinal$V11 == "OCT" )
Box Plot of last four months contributions.
bwplot(hrclast$V10 ~ hrclast$V11 ,main="Contributions across final four Months",xlab="Month",ylab="Contributions")
bwplot(djtlast$V10 ~ djtlast$V11 ,main="Contributions across final four Months",xlab="Month",ylab="Contributions")
Since the box plot is not able to give a clear picture, we are going to compare the means across four months.
hrc_summary_four = aggregate(hrclast$V10 ~ hrclast$V11, data = hrclast, function(x) c(Total = sum(x), Mean = mean(x), SD = sd(x) ))
hrc_summary_four=cbind(hrc_summary_four[-ncol(hrc_summary_four)], hrc_summary_four[[ncol(hrc_summary_four)]])
knitr::kable(hrc_summary_four)
hrclast$V11 | Total | Mean | SD |
---|---|---|---|
AUG | 52298582 | 134.17462 | 408.1672 |
JUL | 45677073 | 119.42655 | 366.8881 |
JUN | 41466446 | 170.44040 | 509.5670 |
OCT | 72668987 | 91.65413 | 269.8099 |
SEP | 71749414 | 125.26522 | 364.4244 |
djt_summary_four <- aggregate(djtlast$V10 ~ djtlast$V11, data = djtlast, function(x) c(Total = sum(x), Mean = mean(x), SD = sd(x) ))
djt_summary_four<- cbind(djt_summary_four[-ncol(djt_summary_four)], djt_summary_four[[ncol(djt_summary_four)]])
knitr::kable(djt_summary_four)
djtlast$V11 | Total | Mean | SD |
---|---|---|---|
AUG | 24049608 | 128.28373 | 372.4752 |
JUL | 25068072 | 95.85821 | 248.8110 |
JUN | 16474648 | 145.63096 | 367.7656 |
OCT | 25179304 | 340.62451 | 399.2310 |
SEP | 25577068 | 377.37647 | 517.3588 |
From the above analysis it is pretty clear that average monthly contributions across the last four between two candidates are not same. As a result we are going to reject the Null Hypothesis and we are going to accept the Alternative Hypothesis.
These are plots of Contributions across all the months.
ggplot(hrcfinal, aes(x=hrcfinal$V11, y=hrcfinal$V10 ,group=hrcfinal$V11, colour=hrcfinal$V10)) +
geom_point() + ylab("Total Contributions") +
xlab("Month")
ggplot(djtfinal, aes(x=djtfinal$V11, y=djtfinal$V10 ,group=djtfinal$V11, colour=djtfinal$V10)) +
geom_point() + ylab("Total Contributions") +
xlab("Month")
Having worked on the campaign contributions from individuals of both the campaigns, now we are going to work on expenditures associated with PAC’s PACs spend a huge amount of money for and against the candidates. In this experiment, we are going to see how PACs have done their monthly expenditure heading up to the election.
pacexpenditure=fread('FEC_independent-expenditure.csv')
## Warning in fread("FEC_independent-expenditure.csv"): Bumped column 7 to
## type character on data row 12046, field contains 'AG'. Coercing previously
## read values in this column from logical, integer or numeric back to
## character which may not be lossless; e.g., if '00' and '000' occurred
## before they will now be just '0', and there may be inconsistencies with
## treatment of ',,' and ',NA,' too (if they occurred in this column before
## the bump). If this matters please rerun and set 'colClasses' to 'character'
## for this column. Please note that column type detection uses a sample of
## 1,000 rows (100 rows at 10 points) so hopefully this message should be very
## rare. If reporting to datatable-help, please rerun and include the output
## from verbose=TRUE.
pacexpenditure=data.frame(pacexpenditure)
# ready numeric vector
pacexpenditure$expenditure=str_replace_all(as.character(pacexpenditure$exp_amo), fixed("$"), "")
pacexpenditure$expenditure = as.numeric(str_replace_all(pacexpenditure$expenditure, ',', ''))
pacexpenditure=pacexpenditure[pacexpenditure$amn_ind == "N",]
pacexpenditure$can_nam <- as.character(pacexpenditure$can_nam)
pacexpenditure$can_nam <- tolower(pacexpenditure$can_nam)
pacexpenditure$spe_nam <- tolower(pacexpenditure$spe_nam)
# HRC PAC
hrcpac=pacexpenditure[grepl("inton|hil", pacexpenditure$can_nam), ]
hrcpac=hrcpac[!grepl("kirk|james|george", hrcpac$can_nam), ]
hrcpac$org = "HRC"
# DJT PAC
djtpac=pacexpenditure[grepl("trump", pacexpenditure$can_nam),]
djtpac$org = "DJT"
# combine, subset
pactot=rbind(hrcpac, djtpac)
pactot=select(pactot,org, spe_nam, rec_dat, expenditure)
# clean type
pactot$org <- as.factor(pactot$org)
pactot$spe_nam <- as.character(pactot$spe_nam)
pactot$rec_dat <- as.Date(pactot$rec_dat, "%m/%d/%Y")
dateval=as.month(pactot$rec_dat)
pactot$rec_dat=dateval$month
Now having the two PAC Contributions sorted out. Next we will try to Summarize the monthly expenditures across both the PAC and then we can try to compare the average monthly expenditure across both the PAC’s
hrcmonthlyexp = pactot %>%
filter(pactot$org == "HRC") %>%
group_by(rec_dat) %>%
summarize(sum(expenditure))
hrcmonthlyexp$org="HRC"
djtpacmonthlyepx = pactot %>%
filter(pactot$org == "DJT") %>%
group_by(rec_dat) %>%
summarize(sum(expenditure))
djtpacmonthlyepx$org = "DJT"
totpacmnthexp = rbind(hrcmonthlyexp, djtpacmonthlyepx)
ggplot(totpacmnthexp, aes(x=rec_dat, y=`sum(expenditure)`,
group=org, colour=org)) +
geom_line() + ylab("Expenditure") + xlab("Month") +
ggtitle("PAC Expenditure across both the campaigns")
\(H_O\): Null Hypothesis : Average monthly expenditure across both the PAC’s does not vary
\(H_A\): Alternative Hypothesis : Average monthly expenditure across both the PAC’s varies
ggplot(totpacmnthexp, aes(factor(org), `sum(expenditure)`)) + geom_boxplot(aes(fill=org)) + xlab("") + theme(axis.title.x=element_blank(),
axis.text.x=element_blank(),
axis.ticks.x=element_blank())
pacmonexpfinal = aggregate(totpacmnthexp$`sum(expenditure)` ~ totpacmnthexp$org, data = totpacmnthexp, function(x) c(Total = sum(x), Mean = mean(x), SD = sd(x) ))
pacmonexpfinal=cbind(pacmonexpfinal[-ncol(pacmonexpfinal)], pacmonexpfinal[[ncol(pacmonexpfinal)]])
knitr::kable(pacmonexpfinal)
totpacmnthexp$org | Total | Mean | SD |
---|---|---|---|
DJT | 349184499 | 29098708 | 30902356 |
HRC | 160661431 | 13388453 | 17769403 |
From the box plots and our calculations above it is pretty clear that both the averages across both the PACs are not same, so we can reject Null Hypothesis and Accept the Alternative Hypothesis.
For this project as per my initial proposal we wanted to compare the contributions and expenditure of presidential elections between 2012 and 2016. Since the volume of data involved in both the campaigns was so huge to even load in for our project, so we decided to go ahead with only the analysis of 2016.
As a first experiment, we read the individual contributions across both the major party candidates in the presidential election. We did some initial analysis of data like top contributors, top contributors across the states and we even plot the campaign contributors in the map of USA to see where the maximum contributions are from.
We also did a hypothesis testing by comparing the average contributions of the last five months heading up to election, to see whether the average contributions across both the candidates are same. From our testing we rejected out Null Hypothesis and determined that average contributions between two candidates in the last four months were different.
Once having finished working on contributions, then we loaded the expenditure of PACs. We wanted to see how the PACs associated to HRC and DJT have spent the money during this election. We did some munging of data for both the candidates associated PAC’s.
For PAC expenditure we did a hypothesis testing to see whether the monthly expenditures are same across both the candidates. Finaly the means were way off limit so we concluded by rejecting null hypothesis and accepting the alternative hypothesis.
Overall this was a great experience in working on this project and DATA606 class as a whole. For a new comer as me in Statistics was a great learning experience.
Reference : Research analysis inspired from Kaggle 2016 Election Analysis