Intro

The data for this assignment is from the lastest Governor Polls data found on the FiveThirtyEight site.

Load libraries

#install.packages("RCurl")
library(RCurl)
library(ggplot2)

Load Governors Polls Data from GitHub and create a dataframe

x <- getURL("https://raw.githubusercontent.com/ltcancel/DATA607_HWK1/master/Week1/polls/governor_polls.csv")

df <- read.csv(text = x)

head(df)
##   question_id poll_id cycle          state pollster_id
## 1      127660   68156  2020 North Carolina        1562
## 2      127660   68156  2020 North Carolina        1562
## 3      127496   68072  2020       Missouri        1056
## 4      127496   68072  2020       Missouri        1056
## 5      127505   68076  2020 North Carolina        1516
## 6      127505   68076  2020 North Carolina        1516
##                       pollster sponsor_ids       sponsors
## 1 Redfield & Wilton Strategies                           
## 2 Redfield & Wilton Strategies                           
## 3     Remington Research Group         421 Missouri Scout
## 4     Remington Research Group         421 Missouri Scout
## 5     East Carolina University                           
## 6     East Carolina University                           
##                   display_name pollster_rating_id         pollster_rating_name
## 1 Redfield & Wilton Strategies                562 Redfield & Wilton Strategies
## 2 Redfield & Wilton Strategies                562 Redfield & Wilton Strategies
## 3     Remington Research Group                279     Remington Research Group
## 4     Remington Research Group                279     Remington Research Group
## 5     East Carolina University                523     East Carolina University
## 6     East Carolina University                523     East Carolina University
##   fte_grade sample_size population population_full     methodology office_type
## 1                   967         lv              lv          Online    Governor
## 2                   967         lv              lv          Online    Governor
## 3        C-        1112         lv              lv Automated Phone    Governor
## 4        C-        1112         lv              lv Automated Phone    Governor
## 5       B/C        1255         rv              rv      IVR/Online    Governor
## 6       B/C        1255         rv              rv      IVR/Online    Governor
##   seat_number seat_name start_date end_date election_date sponsor_candidate
## 1           0        NA    8/16/20  8/17/20       11/3/20                  
## 2           0        NA    8/16/20  8/17/20       11/3/20                  
## 3           0        NA    8/12/20  8/13/20       11/3/20                  
## 4           0        NA    8/12/20  8/13/20       11/3/20                  
## 5           0        NA    8/12/20  8/13/20       11/3/20                  
## 6           0        NA    8/12/20  8/13/20       11/3/20                  
##   internal partisan tracking nationwide_batch ranked_choice_reallocated
## 1    false                NA            false                     false
## 2    false                NA            false                     false
## 3    false                NA            false                     false
## 4    false                NA            false                     false
## 5    false                NA            false                     false
## 6    false                NA            false                     false
##      created_at notes
## 1 8/21/20 19:00      
## 2 8/21/20 19:00      
## 3 8/15/20 08:39      
## 4 8/15/20 08:39      
## 5 8/16/20 17:06      
## 6 8/16/20 17:06      
##                                                                                                                                                                                                                    url
## 1                                                                                                 https://redfieldandwiltonstrategies.com/latest-usa-swing-state-senate-and-governor-voting-intention-16-to-19-august/
## 2                                                                                                 https://redfieldandwiltonstrategies.com/latest-usa-swing-state-senate-and-governor-voting-intention-16-to-19-august/
## 3                                                                                                                                                                     https://moscout.com/s/MOSCOUT-Weekly-081420.pptx
## 4                                                                                                                                                                     https://moscout.com/s/MOSCOUT-Weekly-081420.pptx
## 5 https://surveyresearch-ecu.reportablenews.com/pr/latest-ecu-poll-shows-trump-and-biden-tied-in-north-carolina-democrats-leading-in-contests-for-governor-and-u-s-senate-kamala-harris-selection-draws-mixed-reaction
## 6 https://surveyresearch-ecu.reportablenews.com/pr/latest-ecu-poll-shows-trump-and-biden-tied-in-north-carolina-democrats-leading-in-contests-for-governor-and-u-s-senate-kamala-harris-selection-draws-mixed-reaction
##     stage race_id   answer  candidate_name candidate_party  pct
## 1 general    7824   Cooper   Roy A. Cooper             DEM 51.0
## 2 general    7824   Forest      Dan Forest             REP 38.0
## 3 general    7820 Galloway Nicole Galloway             DEM 43.0
## 4 general    7820   Parson     Mike Parson             REP 50.0
## 5 general    7824   Cooper   Roy A. Cooper             DEM 51.5
## 6 general    7824   Forest      Dan Forest             REP 38.4

Conclusion

I would use bar charts to explore this data starting with a count of candidates based on party (example below). Since there is a low count for most of the candidate parties, I would investigate the top 5 parties and break down who makes up the top 5 parties.

ggplot(df) +
  geom_bar(mapping = aes(x = candidate_party)) +
  coord_flip()

The method of voting would also be an interesting data point to explore.

ggplot(df) +
  geom_bar(mapping = aes(x = methodology)) +
  coord_flip()