The data for this assignment is from the lastest Governor Polls data found on the FiveThirtyEight site.
#install.packages("RCurl")
library(RCurl)
library(ggplot2)
x <- getURL("https://raw.githubusercontent.com/ltcancel/DATA607_HWK1/master/Week1/polls/governor_polls.csv")
df <- read.csv(text = x)
head(df)
## question_id poll_id cycle state pollster_id
## 1 127660 68156 2020 North Carolina 1562
## 2 127660 68156 2020 North Carolina 1562
## 3 127496 68072 2020 Missouri 1056
## 4 127496 68072 2020 Missouri 1056
## 5 127505 68076 2020 North Carolina 1516
## 6 127505 68076 2020 North Carolina 1516
## pollster sponsor_ids sponsors
## 1 Redfield & Wilton Strategies
## 2 Redfield & Wilton Strategies
## 3 Remington Research Group 421 Missouri Scout
## 4 Remington Research Group 421 Missouri Scout
## 5 East Carolina University
## 6 East Carolina University
## display_name pollster_rating_id pollster_rating_name
## 1 Redfield & Wilton Strategies 562 Redfield & Wilton Strategies
## 2 Redfield & Wilton Strategies 562 Redfield & Wilton Strategies
## 3 Remington Research Group 279 Remington Research Group
## 4 Remington Research Group 279 Remington Research Group
## 5 East Carolina University 523 East Carolina University
## 6 East Carolina University 523 East Carolina University
## fte_grade sample_size population population_full methodology office_type
## 1 967 lv lv Online Governor
## 2 967 lv lv Online Governor
## 3 C- 1112 lv lv Automated Phone Governor
## 4 C- 1112 lv lv Automated Phone Governor
## 5 B/C 1255 rv rv IVR/Online Governor
## 6 B/C 1255 rv rv IVR/Online Governor
## seat_number seat_name start_date end_date election_date sponsor_candidate
## 1 0 NA 8/16/20 8/17/20 11/3/20
## 2 0 NA 8/16/20 8/17/20 11/3/20
## 3 0 NA 8/12/20 8/13/20 11/3/20
## 4 0 NA 8/12/20 8/13/20 11/3/20
## 5 0 NA 8/12/20 8/13/20 11/3/20
## 6 0 NA 8/12/20 8/13/20 11/3/20
## internal partisan tracking nationwide_batch ranked_choice_reallocated
## 1 false NA false false
## 2 false NA false false
## 3 false NA false false
## 4 false NA false false
## 5 false NA false false
## 6 false NA false false
## created_at notes
## 1 8/21/20 19:00
## 2 8/21/20 19:00
## 3 8/15/20 08:39
## 4 8/15/20 08:39
## 5 8/16/20 17:06
## 6 8/16/20 17:06
## url
## 1 https://redfieldandwiltonstrategies.com/latest-usa-swing-state-senate-and-governor-voting-intention-16-to-19-august/
## 2 https://redfieldandwiltonstrategies.com/latest-usa-swing-state-senate-and-governor-voting-intention-16-to-19-august/
## 3 https://moscout.com/s/MOSCOUT-Weekly-081420.pptx
## 4 https://moscout.com/s/MOSCOUT-Weekly-081420.pptx
## 5 https://surveyresearch-ecu.reportablenews.com/pr/latest-ecu-poll-shows-trump-and-biden-tied-in-north-carolina-democrats-leading-in-contests-for-governor-and-u-s-senate-kamala-harris-selection-draws-mixed-reaction
## 6 https://surveyresearch-ecu.reportablenews.com/pr/latest-ecu-poll-shows-trump-and-biden-tied-in-north-carolina-democrats-leading-in-contests-for-governor-and-u-s-senate-kamala-harris-selection-draws-mixed-reaction
## stage race_id answer candidate_name candidate_party pct
## 1 general 7824 Cooper Roy A. Cooper DEM 51.0
## 2 general 7824 Forest Dan Forest REP 38.0
## 3 general 7820 Galloway Nicole Galloway DEM 43.0
## 4 general 7820 Parson Mike Parson REP 50.0
## 5 general 7824 Cooper Roy A. Cooper DEM 51.5
## 6 general 7824 Forest Dan Forest REP 38.4
I would use bar charts to explore this data starting with a count of candidates based on party (example below). Since there is a low count for most of the candidate parties, I would investigate the top 5 parties and break down who makes up the top 5 parties.
ggplot(df) +
geom_bar(mapping = aes(x = candidate_party)) +
coord_flip()
The method of voting would also be an interesting data point to explore.
ggplot(df) +
geom_bar(mapping = aes(x = methodology)) +
coord_flip()