Homework1

Dataset

The data set I chose is titled “How Unpopular is Donald Trump?” The link can be found here: https://projects.fivethirtyeight.com/trump-approval-ratings/

The dataset is hosted on my github account.

approval_poll <- read.csv("https://raw.githubusercontent.com/jglendrange/DATA607/main/approval_polllist.csv", TRUE, ",")

head(approval_poll)

##      president  subgroup modeldate startdate   enddate        pollster grade
## 1 Donald Trump All polls 1/20/2021 1/20/2017 1/22/2017 Morning Consult   B/C
## 2 Donald Trump All polls 1/20/2021 1/20/2017 1/22/2017          Gallup     B
## 3 Donald Trump All polls 1/20/2021 1/20/2017 1/24/2017           Ipsos    B-
## 4 Donald Trump All polls 1/20/2021 1/21/2017 1/23/2017          Gallup     B
## 5 Donald Trump All polls 1/20/2021 1/22/2017 1/24/2017          Gallup     B
## 6 Donald Trump All polls 1/20/2021 1/21/2017 1/25/2017           Ipsos    B-
##   samplesize population    weight influence approve disapprove adjusted_approve
## 1       1992         rv 0.6800286         0    46.0       37.0         45.68678
## 2       1500          a 0.2623230         0    45.0       45.0         45.86144
## 3       1632          a 0.1534812         0    42.1       45.2         43.45156
## 4       1500          a 0.2428446         0    45.0       46.0         45.86144
## 5       1500          a 0.2273795         0    46.0       45.0         46.86144
## 6       1651          a 0.1415310         0    42.3       45.8         43.65156
##   adjusted_disapprove multiversions tracking
## 1            38.05580                     NA
## 2            43.53919                   TRUE
## 3            43.78039                   TRUE
## 4            44.53919                   TRUE
## 5            43.53919                   TRUE
## 6            44.38039                   TRUE
##                                                                                               url
## 1 http://static.politico.com/9b/13/82a3baf542ae9018e5b6e1008379/170103-topline-politico-v3-kd.pdf
## 2                          http://www.gallup.com/poll/201617/gallup-daily-trump-job-approval.aspx
## 3                                                         http://polling.reuters.com/#poll/CP3_2/
## 4                          http://www.gallup.com/poll/201617/gallup-daily-trump-job-approval.aspx
## 5                          http://www.gallup.com/poll/201617/gallup-daily-trump-job-approval.aspx
## 6                                                         http://polling.reuters.com/#poll/CP3_2/
##   poll_id question_id createddate            timestamp
## 1   49249       77261   1/23/2017 11:47:59 20 Jan 2021
## 2   49253       77265   1/23/2017 11:47:59 20 Jan 2021
## 3   49426       77599    3/1/2017 11:47:59 20 Jan 2021
## 4   49262       77274   1/24/2017 11:47:59 20 Jan 2021
## 5   49236       77248   1/25/2017 11:47:59 20 Jan 2021
## 6   49425       77598    3/1/2017 11:47:59 20 Jan 2021

The column names are pretty clear and straighforward. I decided to drop some columns due to data redundancy and mostly null values.

wanted_columns <- c("pollster","subgroup","startdate","enddate","grade","samplesize","population","adjusted_approve","adjusted_disapprove","poll_id","question_id","createddate")
approval_poll <- approval_poll[wanted_columns]
head(approval_poll)

##          pollster  subgroup startdate   enddate grade samplesize population
## 1 Morning Consult All polls 1/20/2017 1/22/2017   B/C       1992         rv
## 2          Gallup All polls 1/20/2017 1/22/2017     B       1500          a
## 3           Ipsos All polls 1/20/2017 1/24/2017    B-       1632          a
## 4          Gallup All polls 1/21/2017 1/23/2017     B       1500          a
## 5          Gallup All polls 1/22/2017 1/24/2017     B       1500          a
## 6           Ipsos All polls 1/21/2017 1/25/2017    B-       1651          a
##   adjusted_approve adjusted_disapprove poll_id question_id createddate
## 1         45.68678            38.05580   49249       77261   1/23/2017
## 2         45.86144            43.53919   49253       77265   1/23/2017
## 3         43.45156            43.78039   49426       77599    3/1/2017
## 4         45.86144            44.53919   49262       77274   1/24/2017
## 5         46.86144            43.53919   49236       77248   1/25/2017
## 6         43.65156            44.38039   49425       77598    3/1/2017

I decided to take a look at the approval and disapproval rattings via a historgram

hist(approval_poll$adjusted_approve)

hist(approval_poll$adjusted_disapprove)

You can see easily see that the disapproval histogram skews further to the right.

For fun I decided to plot the dissapproval rating against the approval rating. I expected to see a 1 to 1 relationship, but was suprised to see a few outliers. It would be an interesting follow up to see which pollsters reported those numbers.

plot(approval_poll$adjusted_disapprove, approval_poll$adjusted_approve)

Homework1

Jordan Glendrange

February 6, 2021

Dataset

Conclusion