Further sharpen data wrangling and data visualization skills
You will analyze Federal Election Commission data and report your findings in a blog post. The topic is up to you. The best projects will:
mutate()s)You may work with a partner, but are not required to.
As with the first mini project, you will use GitHub to collaborate with your, as well as to submit the final product. To create a repository for your mini project, complete the following steps:
github_id.github_id fork the mini-project-2 repository from cmsc205github_id add other teammates as "Collaborators" (under "Settings")github_id/mini-project-2’s repoTo submit your final blog post, please follow the instructions from HW-0
The data for this project come from the Federal Election Commission, and are based on the 2011-2012 federal election cycle. These data were collected using the fec R package.
Be sure to read the supporting documentation for these data!! It is your responsibility to know what data you are working with!
There are four tables to consider: house_elections, candidates, committees, and contributions. You can use the following code to load in the data sets.
library(tidyverse)
house_elections <- read_csv("house_elections.csv")
committees <- read_csv("committees.csv")
candidates <- read_csv("candidates.csv")
# The contributions file was too big for GitHub, so it's
# stored slightly differently.
load("contributions.rda") After loading the data sets, verify that your data sets look like those displayed below (be sure that the row and column dimensions match):
glimpse(house_elections)## Observations: 2,178
## Variables: 10
## $ fec_id <chr> "B2CA08156", "H0AK00097", "H0AL01030", "H0AL020...
## $ state <chr> "CA", "AK", "AL", "AL", "AL", "AL", "AL", "AR",...
## $ district <chr> "08", "00", "01", "02", "05", "07", "07", "01",...
## $ incumbent <lgl> FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, T...
## $ candidate_name <chr> "Mitzelfelt, Brad", "Cox, John R.", "Gounares, ...
## $ party <chr> "R", "R", "R", "R", "R", "D", "R", "R", "R", "R...
## $ primary_votes <int> 8801, 11179, 3854, 0, 65163, 0, 11537, 0, 0, 0,...
## $ runoff_votes <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ general_votes <int> 0, 0, 0, 180591, 189185, 232520, 73835, 138800,...
## $ ge_winner <chr> NA, NA, NA, "W", "W", "W", "N", "W", "W", "W", ...
glimpse(committees)## Observations: 14,454
## Variables: 15
## $ cmte_id <chr> "C00000042", "C00000059", "C00000422", ...
## $ cmte_name <chr> "ILLINOIS TOOL WORKS INC. FOR BETTER GO...
## $ tres_name <chr> "LYNCH, MICHAEL J. MR.", "GREG SWARENS"...
## $ cmte_st1 <chr> "3600 WEST LAKE AVENUE", "2501 MCGEE", ...
## $ cmte_st2 <chr> NA, "MD#288", "SUITE 600", NA, NA, NA, ...
## $ cmte_city <chr> "GLENVIEW", "KANSAS CITY", "WASHINGTON"...
## $ cmte_state <chr> "IL", "MO", "DC", "OK", "KS", "IN", "DC...
## $ cmte_zip <int> 60026, 64108, 20001, 73107, 66612, 4620...
## $ cmte_dsgn <chr> "B", "U", "B", "U", "U", "U", "B", "B",...
## $ cmte_type <chr> "Q", "Q", "Q", "N", "Q", "Q", "Q", "Q",...
## $ cmte_party_affiliation <chr> NA, "UNK", NA, NA, "UNK", NA, "UNK", "U...
## $ cmte_filing_freq <chr> "Q", "M", "M", "Q", "Q", "Q", "M", "M",...
## $ org_type <chr> "C", "C", "M", "L", "T", "M", "M", "L",...
## $ connected_org_name <chr> "ILLINOIS TOOL WORKS INC.", NA, "AMERIC...
## $ cand_id <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
glimpse(contributions)## Observations: 396,369
## Variables: 22
## $ cmte_id <chr> "C00478404", "C00140855", "C00140855", "C0014...
## $ amndt_ind <chr> "N", "N", "N", "N", "N", "N", "N", "N", "N", ...
## $ rpt_type <chr> "M3", "M3", "M3", "M3", "M3", "M3", "M3", "M3...
## $ transaction_pgi <chr> "P", "P", "P", "P", "P", "P", "G", "P", "P", ...
## $ image_num <chr> "11930476751.0", "11930476826.0", "1193047682...
## $ transaction_type <chr> "24K", "24K", "24K", "24K", "24K", "24K", "24...
## $ entity_type <chr> "COM", "CCM", "CCM", "CCM", "CCM", "CCM", "CC...
## $ name <chr> "KLINE FOR CONGRESS", "TIM RYAN FOR U.S. CONG...
## $ city <chr> "BURNSVILLE", "WASHINGTON", "WASHINGTON", "BO...
## $ state <chr> "MN", "DC", "DC", "MD", "ND", "MI", "MN", "IA...
## $ zip_code <chr> "55337", "20013", "20005", "20716", "58106", ...
## $ employer <chr> "", "", "", "", "", "", "", "", "", "", "", "...
## $ occupation <chr> "", "", "", "", "", "", "", "", "", "", "", "...
## $ transaction_dt <chr> "02252011", "02012011", "02012011", "02222011...
## $ transaction_amt <dbl> 2400, 1000, 1000, 2500, 1000, 5000, 1000, 100...
## $ other_id <chr> "C00326629", "C00373464", "C00289983", "C0014...
## $ cand_id <chr> "H8MN06047", "H2OH17109", "H4KY01040", "H2MD0...
## $ tran_id <chr> "B37FBC79414E54DD7A1C", "38595006", "38595007...
## $ file_num <int> 717033, 717042, 717042, 717042, 717043, 71704...
## $ memo_cd <chr> "", "", "", "", "", "", "", "", "", "X", "", ...
## $ memo_text <chr> "", "", "", "", "", "", "", "", "", "CHECK 23...
## $ sub_id <dbl> 4.03182e+18, 4.03172e+18, 4.03172e+18, 4.0317...
glimpse(candidates)## Observations: 5,628
## Variables: 15
## $ cand_id <chr> "H0AK00089", "H0AK00097", "H0AL00016", ...
## $ cand_name <chr> "CRAWFORD, HARRY T JR", "COX, JOHN ROBE...
## $ cand_party_affiliation <chr> "DEM", "REP", "UNK", "REP", "REP", "DEM...
## $ cand_election_yr <int> 2010, 2012, 2010, 2012, 2012, 2008, 201...
## $ cand_office_state <chr> "AK", "AK", "AL", "AL", "AL", "AL", "AL...
## $ cand_office <chr> "H", "H", "H", "H", "H", "H", "H", "H",...
## $ cand_office_district <int> 0, 0, 7, 1, 2, 5, 5, 5, 5, 5, 6, 7, 7, ...
## $ cand_ici <chr> "C", "C", "O", "C", "I", "C", "C", "I",...
## $ cand_status <chr> "P", "N", "C", "C", "C", "C", "C", "C",...
## $ cand_pcc <chr> "C00466698", "C00525261", "C00464040", ...
## $ cand_st1 <chr> "4350 BUTTE CIR", "PO BOX 1092", "PO BO...
## $ cand_st2 <chr> NA, NA, NA, NA, NA, NA, NA, NA, "SUITE ...
## $ cand_city <chr> "ANCHORAGE", "ANCHOR POINT", "BIRMINGHA...
## $ cand_state <chr> "AK", "AK", "AL", "AL", "AL", "AL", "AL...
## $ cand_zip <int> 99504, 8388607, 35201, 36561, 36106, 35...
library(tidyverse)
committees_transaction_amt <- full_join(candidates, contributions, by = "cand_id") %>%
select(cmte_id, transaction_dt, transaction_amt, cand_name, cand_election_yr, cand_party_affiliation)%>%
filter(transaction_amt > 0 ) %>%
group_by(cand_name, cand_election_yr, cand_party_affiliation) %>%
summarize(total_trans_amt = sum(transaction_amt)) %>%
arrange(desc(total_trans_amt))
head(committees_transaction_amt, 15)## Source: local data frame [15 x 4]
## Groups: cand_name, cand_election_yr [15]
##
## cand_name cand_election_yr cand_party_affiliation
## <chr> <int> <chr>
## 1 OBAMA, BARACK 2012 DEM
## 2 ROMNEY, MITT / RYAN, PAUL D. 2012 REP
## 3 KAINE, TIMOTHY MICHAEL 2012 DEM
## 4 GINGRICH, NEWT 2012 REP
## 5 SANTORUM, RICHARD J. 2012 REP
## 6 BROWN, SHERROD 2012 DEM
## 7 THOMPSON, TOMMY G 2012 REP
## 8 BALDWIN, TAMMY 2012 DEM
## 9 ALLEN, GEORGE 2012 REP
## 10 MANDEL, JOSH 2012 REP
## 11 MOURDOCK, RICHARD E 2012 REP
## 12 BERKLEY, SHELLEY 2012 DEM
## 13 HELLER, DEAN 2012 REP
## 14 NELSON, BILL 2012 DEM
## 15 TESTER, JON 2012 DEM
## # ... with 1 more variables: total_trans_amt <dbl>
transaction_15 <- head(committees_transaction_amt, 15)transaction_15$cand_party_affiliation <- as.factor(transaction_15$cand_party_affiliation)
graph_1 <-
ggplot(transaction_15,
aes(x = reorder(cand_name, total_trans_amt),
y = total_trans_amt)) +
geom_bar(stat = "identity", aes(fill = factor(cand_party_affiliation))) +
geom_text(aes(label = total_trans_amt, hjust = -.1)) +
coord_flip() +
expand_limits(y = 500000000) +
scale_fill_manual("Party Affiliation", values = c("red", "blue")) +
labs(x = NULL, y = NULL,
title = "Top 15 Transaction Amount in 2012",
subtitle = "Highest Transction Amount of Candidates in 2012 ")
graph_1house_elections$ge_winner <- (house_elections$ge_winner)
house_elections %>%
filter(general_votes > 0, primary_votes >0 , party=="R"| party=="D" ) %>%
select(party, primary_votes, general_votes, ge_winner) %>%
filter(!is.na("ge_winner")) %>%
ggplot(aes(x= primary_votes, y= general_votes)) +
facet_wrap(~party) + geom_point(stat = "identity", aes(color = factor(ge_winner))) + geom_smooth() +
labs(x = "General Votes", y = "Primary Votes",
title = "Relationship between Primary Votes and General Votes",
subtitle = "House Elections in the US") +
scale_color_manual("Win?", values = c("goldenrod1", "dodgerblue1"))## `geom_smooth()` using method = 'loess'
This prompt was adapted from a prompt written by Ben Baumer and R. Jordan Crouser.