The goal of this project is to analyze the US election polls between Trump and Biden and all of the other presedential candidates in 2020. We have a dataset combines the many different polls with the percentages of each candidte got in these polls.
the original data is from https://data.fivethirtyeight.com/. I have added the row data on new repository on Github via the link ‘https://raw.githubusercontent.com/akarimhammoud/Recen2020PollsUS/master/president_polls.csv’
url <- "https://raw.githubusercontent.com/akarimhammoud/Recen2020PollsUS/master/president_polls.csv"
Polls <- read.csv(file= url, header=TRUE)
head(Polls)
## question_id poll_id cycle state pollster_id pollster
## 1 127807 68235 2020 1610 USC Dornsife/Los Angeles Times
## 2 127807 68235 2020 1610 USC Dornsife/Los Angeles Times
## 3 127808 68235 2020 1610 USC Dornsife/Los Angeles Times
## 4 127808 68235 2020 1610 USC Dornsife/Los Angeles Times
## 5 127825 68237 2020 1562 Redfield & Wilton Strategies
## 6 127825 68237 2020 1562 Redfield & Wilton Strategies
## sponsor_ids sponsors display_name pollster_rating_id
## 1 USC Dornsife 343
## 2 USC Dornsife 343
## 3 USC Dornsife 343
## 4 USC Dornsife 343
## 5 Redfield & Wilton Strategies 562
## 6 Redfield & Wilton Strategies 562
## pollster_rating_name fte_grade sample_size population
## 1 USC Dornsife/Los Angeles Times B/C 2545 lv
## 2 USC Dornsife/Los Angeles Times B/C 2545 lv
## 3 USC Dornsife/Los Angeles Times B/C 2544 lv
## 4 USC Dornsife/Los Angeles Times B/C 2544 lv
## 5 Redfield & Wilton Strategies 1834 lv
## 6 Redfield & Wilton Strategies 1834 lv
## population_full methodology office_type seat_number seat_name start_date
## 1 lv Online U.S. President 0 NA 8/21/20
## 2 lv Online U.S. President 0 NA 8/21/20
## 3 lv Online U.S. President 0 NA 8/21/20
## 4 lv Online U.S. President 0 NA 8/21/20
## 5 lv Online U.S. President 0 NA 8/25/20
## 6 lv Online U.S. President 0 NA 8/25/20
## end_date election_date sponsor_candidate internal partisan tracking
## 1 8/27/20 11/3/20 FALSE TRUE
## 2 8/27/20 11/3/20 FALSE TRUE
## 3 8/27/20 11/3/20 FALSE TRUE
## 4 8/27/20 11/3/20 FALSE TRUE
## 5 8/26/20 11/3/20 FALSE NA
## 6 8/26/20 11/3/20 FALSE NA
## nationwide_batch ranked_choice_reallocated created_at
## 1 FALSE FALSE 8/28/20 6:02
## 2 FALSE FALSE 8/28/20 6:02
## 3 FALSE FALSE 8/28/20 6:02
## 4 FALSE FALSE 8/28/20 6:02
## 5 FALSE FALSE 8/28/20 10:08
## 6 FALSE FALSE 8/28/20 10:08
## notes
## 1 probabilistic voting question
## 2 probabilistic voting question
## 3 traditional voting question
## 4 traditional voting question
## 5
## 6
## url
## 1 https://election.usc.edu/
## 2 https://election.usc.edu/
## 3 https://election.usc.edu/
## 4 https://election.usc.edu/
## 5 https://redfieldandwiltonstrategies.com/latest-usa-voting-intention-august-25-26/
## 6 https://redfieldandwiltonstrategies.com/latest-usa-voting-intention-august-25-26/
## stage race_id answer candidate_id candidate_name candidate_party pct
## 1 general 6210 Biden 13256 Joseph R. Biden Jr. DEM 52.73
## 2 general 6210 Trump 13254 Donald Trump REP 40.32
## 3 general 6210 Biden 13256 Joseph R. Biden Jr. DEM 54.24
## 4 general 6210 Trump 13254 Donald Trump REP 39.68
## 5 general 6210 Biden 13256 Joseph R. Biden Jr. DEM 48.85
## 6 general 6210 Trump 13254 Donald Trump REP 38.83
summary(Polls)
## question_id poll_id cycle state
## Min. : 92078 Min. :57025 Min. :2020 Length:6244
## 1st Qu.:102947 1st Qu.:59595 1st Qu.:2020 Class :character
## Median :116839 Median :63470 Median :2020 Mode :character
## Mean :114085 Mean :63307 Mean :2020
## 3rd Qu.:124539 3rd Qu.:66696 3rd Qu.:2020
## Max. :127826 Max. :68238 Max. :2020
##
## pollster_id pollster sponsor_ids sponsors
## Min. : 11.0 Length:6244 Length:6244 Length:6244
## 1st Qu.: 509.0 Class :character Class :character Class :character
## Median :1102.0 Mode :character Mode :character Mode :character
## Mean : 964.9
## 3rd Qu.:1416.0
## Max. :1610.0
##
## display_name pollster_rating_id pollster_rating_name fte_grade
## Length:6244 Min. : 3.0 Length:6244 Length:6244
## Class :character 1st Qu.:133.0 Class :character Class :character
## Mode :character Median :245.0 Mode :character Mode :character
## Mean :264.3
## 3rd Qu.:391.0
## Max. :606.0
## NA's :2
## sample_size population population_full methodology
## Min. : 140 Length:6244 Length:6244 Length:6244
## 1st Qu.: 767 Class :character Class :character Class :character
## Median : 1000 Mode :character Mode :character Mode :character
## Mean : 1900
## 3rd Qu.: 1279
## Max. :33549
##
## office_type seat_number seat_name start_date
## Length:6244 Min. :0 Mode:logical Length:6244
## Class :character 1st Qu.:0 NA's:6244 Class :character
## Mode :character Median :0 Mode :character
## Mean :0
## 3rd Qu.:0
## Max. :0
##
## end_date election_date sponsor_candidate internal
## Length:6244 Length:6244 Length:6244 Mode :logical
## Class :character Class :character Class :character FALSE:6228
## Mode :character Mode :character Mode :character TRUE :16
##
##
##
##
## partisan tracking nationwide_batch ranked_choice_reallocated
## Length:6244 Mode:logical Mode :logical Mode :logical
## Class :character TRUE:450 FALSE:6244 FALSE:6244
## Mode :character NA's:5794
##
##
##
##
## created_at notes url stage
## Length:6244 Length:6244 Length:6244 Length:6244
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## race_id answer candidate_id candidate_name
## Min. :6210 Length:6244 Min. :13253 Length:6244
## 1st Qu.:6210 Class :character 1st Qu.:13254 Class :character
## Median :6214 Mode :character Median :13256 Mode :character
## Mean :6238 Mean :13301
## 3rd Qu.:6238 3rd Qu.:13257
## Max. :8718 Max. :15856
##
## candidate_party pct
## Length:6244 Min. : 0.00
## Class :character 1st Qu.:41.00
## Mode :character Median :45.00
## Mean :43.24
## 3rd Qu.:48.10
## Max. :68.80
##
myframe <- subset(Polls, select = c(state, pollster, sample_size, created_at, answer, pct ))
head(myframe)
## state pollster sample_size created_at answer pct
## 1 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Biden 52.73
## 2 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Trump 40.32
## 3 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Biden 54.24
## 4 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Trump 39.68
## 5 Redfield & Wilton Strategies 1834 8/28/20 10:08 Biden 48.85
## 6 Redfield & Wilton Strategies 1834 8/28/20 10:08 Trump 38.83
colnames(myframe) <- c("state", "pollster", "size", "date", "candidate", "percentage")
head(myframe)
## state pollster size date candidate percentage
## 1 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Biden 52.73
## 2 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Trump 40.32
## 3 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Biden 54.24
## 4 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Trump 39.68
## 5 Redfield & Wilton Strategies 1834 8/28/20 10:08 Biden 48.85
## 6 Redfield & Wilton Strategies 1834 8/28/20 10:08 Trump 38.83
write.csv(myframe, file="2020polls.csv", row.names=FALSE)
getwd()
## [1] "/Users/karimh/Documents/Google Drive/R"
url <- "https://raw.githubusercontent.com/akarimhammoud/Recen2020PollsUS/master/2020polls.csv"
Pollsfile <- read.csv(file= url, header=TRUE)
head(Pollsfile)
## state pollster size date candidate percentage
## 1 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Biden 52.73
## 2 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Trump 40.32
## 3 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Biden 54.24
## 4 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Trump 39.68
## 5 Redfield & Wilton Strategies 1834 8/28/20 10:08 Biden 48.85
## 6 Redfield & Wilton Strategies 1834 8/28/20 10:08 Trump 38.83
Pollsfile$candidate <- sub("Biden", "Joe Biden", Pollsfile$candidate)
Pollsfile$candidate <- sub("Trump", "Donald Trump", Pollsfile$candidate)
head(Pollsfile)
## state pollster size date candidate
## 1 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Joe Biden
## 2 USC Dornsife/Los Angeles Times 2545 8/28/20 6:02 Donald Trump
## 3 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Joe Biden
## 4 USC Dornsife/Los Angeles Times 2544 8/28/20 6:02 Donald Trump
## 5 Redfield & Wilton Strategies 1834 8/28/20 10:08 Joe Biden
## 6 Redfield & Wilton Strategies 1834 8/28/20 10:08 Donald Trump
## percentage
## 1 52.73
## 2 40.32
## 3 54.24
## 4 39.68
## 5 48.85
## 6 38.83
require(ggplot2)
## Loading required package: ggplot2
barplot(table(Pollsfile$candidate ), main = "candidate")
As we see in the chart some presednetial candidates droped off race earlier than others thats what some of them shows on the polls more than the other and of course the major two candidate that most of the polling show are Biden and Trump.
The data is from https://data.fivethirtyeight.com/‘. the original data was saved to: https://raw.githubusercontent.com/akarimhammoud/Recen2020PollsUS/master/president_polls.csv’ The new frame was saved at: “https://raw.githubusercontent.com/akarimhammoud/Recen2020PollsUS/master/2020polls.csv” Github link for the assingment: https://github.com/akarimhammoud/Recen2020PollsUS/blob/master/CUNY%20SPS%20-%20607%20Week%201%20Assignment..Rmd