Further sharpen data wrangling and data visualization skills
You will analyze Federal Election Commission data and report your findings in a blog post. The topic is up to you. The best projects will:
mutate()s)You may work with a partner, but are not required to.
As with the first mini project, you will use GitHub to collaborate with your, as well as to submit the final product. To create a repository for your mini project, complete the following steps:
github_id.github_id fork the mini-project-2 repository from cmsc205github_id add other teammates as "Collaborators" (under "Settings")github_id/mini-project-2’s repoTo submit your final blog post, please follow the instructions from HW-0
The link to blog post is here
The data for this project come from the Federal Election Commission, and are based on the 2011-2012 federal election cycle. These data were collected using the fec R package.
Be sure to read the supporting documentation for these data!! It is your responsibility to know what data you are working with!
There are four tables to consider: house_elections, candidates, committees, and contributions. You can use the following code to load in the data sets.
library(tidyverse)
house_elections <- read_csv("house_elections.csv")
committees <- read_csv("committees.csv")
candidates <- read_csv("candidates.csv")
# The contributions file was too big for GitHub, so it's
# stored slightly differently.
load("contributions.rda") After loading the data sets, verify that your data sets look like those displayed below (be sure that the row and column dimensions match):
glimpse(house_elections)## Observations: 2,178
## Variables: 10
## $ fec_id <chr> "B2CA08156", "H0AK00097", "H0AL01030", "H0AL020...
## $ state <chr> "CA", "AK", "AL", "AL", "AL", "AL", "AL", "AR",...
## $ district <chr> "08", "00", "01", "02", "05", "07", "07", "01",...
## $ incumbent <lgl> FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, T...
## $ candidate_name <chr> "Mitzelfelt, Brad", "Cox, John R.", "Gounares, ...
## $ party <chr> "R", "R", "R", "R", "R", "D", "R", "R", "R", "R...
## $ primary_votes <int> 8801, 11179, 3854, 0, 65163, 0, 11537, 0, 0, 0,...
## $ runoff_votes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ general_votes <int> 0, 0, 0, 180591, 189185, 232520, 73835, 138800,...
## $ ge_winner <chr> NA, NA, NA, "W", "W", "W", "N", "W", "W", "W", ...
glimpse(committees)## Observations: 14,454
## Variables: 15
## $ cmte_id <chr> "C00000042", "C00000059", "C00000422", ...
## $ cmte_name <chr> "ILLINOIS TOOL WORKS INC. FOR BETTER GO...
## $ tres_name <chr> "LYNCH, MICHAEL J. MR.", "GREG SWARENS"...
## $ cmte_st1 <chr> "3600 WEST LAKE AVENUE", "2501 MCGEE", ...
## $ cmte_st2 <chr> NA, "MD#288", "SUITE 600", NA, NA, NA, ...
## $ cmte_city <chr> "GLENVIEW", "KANSAS CITY", "WASHINGTON"...
## $ cmte_state <chr> "IL", "MO", "DC", "OK", "KS", "IN", "DC...
## $ cmte_zip <int> 60026, 64108, 20001, 73107, 66612, 4620...
## $ cmte_dsgn <chr> "B", "U", "B", "U", "U", "U", "B", "B",...
## $ cmte_type <chr> "Q", "Q", "Q", "N", "Q", "Q", "Q", "Q",...
## $ cmte_party_affiliation <chr> NA, "UNK", NA, NA, "UNK", NA, "UNK", "U...
## $ cmte_filing_freq <chr> "Q", "M", "M", "Q", "Q", "Q", "M", "M",...
## $ org_type <chr> "C", "C", "M", "L", "T", "M", "M", "L",...
## $ connected_org_name <chr> "ILLINOIS TOOL WORKS INC.", NA, "AMERIC...
## $ cand_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
glimpse(contributions)## Observations: 396,369
## Variables: 22
## $ cmte_id <chr> "C00478404", "C00140855", "C00140855", "C0014...
## $ amndt_ind <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", ...
## $ rpt_type <chr> "M3", "M3", "M3", "M3", "M3", "M3", "M3", "M3...
## $ transaction_pgi <chr> "P", "P", "P", "P", "P", "P", "G", "P", "P", ...
## $ image_num <chr> "11930476751.0", "11930476826.0", "1193047682...
## $ transaction_type <chr> "24K", "24K", "24K", "24K", "24K", "24K", "24...
## $ entity_type <chr> "COM", "CCM", "CCM", "CCM", "CCM", "CCM", "CC...
## $ name <chr> "KLINE FOR CONGRESS", "TIM RYAN FOR U.S. CONG...
## $ city <chr> "BURNSVILLE", "WASHINGTON", "WASHINGTON", "BO...
## $ state <chr> "MN", "DC", "DC", "MD", "ND", "MI", "MN", "IA...
## $ zip_code <chr> "55337", "20013", "20005", "20716", "58106", ...
## $ employer <chr> "", "", "", "", "", "", "", "", "", "", "", "...
## $ occupation <chr> "", "", "", "", "", "", "", "", "", "", "", "...
## $ transaction_dt <chr> "02252011", "02012011", "02012011", "02222011...
## $ transaction_amt <dbl> 2400, 1000, 1000, 2500, 1000, 5000, 1000, 100...
## $ other_id <chr> "C00326629", "C00373464", "C00289983", "C0014...
## $ cand_id <chr> "H8MN06047", "H2OH17109", "H4KY01040", "H2MD0...
## $ tran_id <chr> "B37FBC79414E54DD7A1C", "38595006", "38595007...
## $ file_num <int> 717033, 717042, 717042, 717042, 717043, 71704...
## $ memo_cd <chr> "", "", "", "", "", "", "", "", "", "X", "", ...
## $ memo_text <chr> "", "", "", "", "", "", "", "", "", "CHECK 23...
## $ sub_id <dbl> 4.03182e+18, 4.03172e+18, 4.03172e+18, 4.0317...
glimpse(candidates)## Observations: 5,628
## Variables: 15
## $ cand_id <chr> "H0AK00089", "H0AK00097", "H0AL00016", ...
## $ cand_name <chr> "CRAWFORD, HARRY T JR", "COX, JOHN ROBE...
## $ cand_party_affiliation <chr> "DEM", "REP", "UNK", "REP", "REP", "DEM...
## $ cand_election_yr <int> 2010, 2012, 2010, 2012, 2012, 2008, 201...
## $ cand_office_state <chr> "AK", "AK", "AL", "AL", "AL", "AL", "AL...
## $ cand_office <chr> "H", "H", "H", "H", "H", "H", "H", "H",...
## $ cand_office_district <int> 0, 0, 7, 1, 2, 5, 5, 5, 5, 5, 6, 7, 7, ...
## $ cand_ici <chr> "C", "C", "O", "C", "I", "C", "C", "I",...
## $ cand_status <chr> "P", "N", "C", "C", "C", "C", "C", "C",...
## $ cand_pcc <chr> "C00466698", "C00525261", "C00464040", ...
## $ cand_st1 <chr> "4350 BUTTE CIR", "PO BOX 1092", "PO BO...
## $ cand_st2 <chr> NA, NA, NA, NA, NA, NA, NA, NA, "SUITE ...
## $ cand_city <chr> "ANCHORAGE", "ANCHOR POINT", "BIRMINGHA...
## $ cand_state <chr> "AK", "AK", "AL", "AL", "AL", "AL", "AL...
## $ cand_zip <int> 99504, 8388607, 35201, 36561, 36106, 35...
Our first data represents the 15 candidates with highest transactions amounts. Candidates transaction amounts are different in different time of the year. We took sum of the transaction amounts for each candidate, to compare who had the most contribution.
Our second data is about relationship between primary votes and general votes for Republicans and Democrats. Some candidate has zero primary votes or some candidate have zero general vote for some reasons (for example, for zero primary votes, the candidate might be the only candidate in the party.) Our data only includes votes larger than 0.
library(tidyverse)
committees_transaction_amt <- full_join(candidates, contributions, by = "cand_id") %>%
select(cmte_id, transaction_dt, transaction_amt, cand_name, cand_election_yr, cand_party_affiliation)%>%
filter(transaction_amt > 0) %>%
group_by(cand_name, cand_election_yr, cand_party_affiliation) %>%
summarize(total_trans_amt = sum(transaction_amt)) %>%
arrange(desc(total_trans_amt))
head(committees_transaction_amt, 15)## Source: local data frame [15 x 4]
## Groups: cand_name, cand_election_yr [15]
##
## cand_name cand_election_yr cand_party_affiliation
## <chr> <int> <chr>
## 1 OBAMA, BARACK 2012 DEM
## 2 ROMNEY, MITT / RYAN, PAUL D. 2012 REP
## 3 KAINE, TIMOTHY MICHAEL 2012 DEM
## 4 GINGRICH, NEWT 2012 REP
## 5 SANTORUM, RICHARD J. 2012 REP
## 6 BROWN, SHERROD 2012 DEM
## 7 THOMPSON, TOMMY G 2012 REP
## 8 BALDWIN, TAMMY 2012 DEM
## 9 ALLEN, GEORGE 2012 REP
## 10 MANDEL, JOSH 2012 REP
## 11 MOURDOCK, RICHARD E 2012 REP
## 12 BERKLEY, SHELLEY 2012 DEM
## 13 HELLER, DEAN 2012 REP
## 14 NELSON, BILL 2012 DEM
## 15 TESTER, JON 2012 DEM
## # ... with 1 more variables: total_trans_amt <dbl>
transaction_15 <- head(committees_transaction_amt, 15)The reason of why we chose a bar chart is to see the difference between the transaction amounts.
transaction_15$cand_party_affiliation <- as.factor(transaction_15$cand_party_affiliation)
graph_1 <-
ggplot(transaction_15,
aes(x = reorder(cand_name, total_trans_amt),
y = total_trans_amt)) +
geom_bar(stat = "identity", aes(fill = factor(cand_party_affiliation))) +
geom_text(aes(label = total_trans_amt, hjust = -.1)) +
coord_flip() +
expand_limits(y = 500000000) +
scale_y_continuous(labels = scales::comma) +
scale_fill_manual("Party Affiliation", values = c("red", "blue")) +
labs(x = NULL, y = NULL,
title = "Top 15 Transaction Amount in 2012",
subtitle = "Highest Transction Amount of Candidates in 2012 ")
graph_1First graph shows that in 2012, the highest transaction amount belongs to Barack Obama, and the second highest transaction amount belongs to Mitt Romney. In the bar chart, we can see 8 Republican transactions and 7 Democrats transactions. Furthermore, we represented the Democrats in red, and Republicans in blue to make the party affiliations easier to recognize visually.
Second graph shows that relationship between general votes and primary votes are directly proportional.
The reason of why we chose a scatter plot is to see the primary and general votes and their relations.
house_elections %>%
filter(general_votes > 0, primary_votes >0 , party=="R"| party=="D" ) %>%
select(party, primary_votes, general_votes, ge_winner) %>%
filter(!is.na("ge_winner")) %>%
ggplot(aes(x= primary_votes, y= general_votes)) +
facet_wrap(~party) + geom_point(stat = "identity", aes(color = factor(ge_winner))) + geom_smooth() +
scale_y_continuous(labels = scales::comma) +
labs(x = "Primary Votes", y = "General Votes",
title = "Relationship between Primary Votes and General Votes",
subtitle = "House Elections in the US") +
scale_color_manual("Win?", values = c("goldenrod1", "dodgerblue1"))## `geom_smooth()` using method = 'loess'
Here, we can see the relationship between Primary votes and General votes. The graph suggests a strong correlation between the two, and as the number of votes get bigger, they are become more disperse. The election winners are represented as blue, and losers as yellow. There is clear correlation between the number of both votes and whether they are elected, however, the correlation seems weaker in the Democratic party, which we may investigate further in the future.
This prompt was adapted from a prompt written by Ben Baumer and R. Jordan Crouser.