This article is based on polling averages compiled by the FiveThirtyEight polling organizations about President Biden’s handling of the covid crisis. The conclusion of the polling results indicated that Americans approved of Biden’s handling of the corona virus crises in the first year of his presidency. However, the approval was mainly among Democrats and Independents. Republicans disapproved of Biden’s handling of the virus. The approval of his handling of the virus dropped over time. Below is the link to the article.
Link to article:https://projects.fivethirtyeight.com/coronavirus-polls/ link to data frame: https://raw.githubusercontent.com/hawa1983/Week1_Assignment/main/covid_approval_polls_adjusted.csv
We start by installing the relevant packages and loading the libraries as below
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.2 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.3 âś” tibble 3.2.1
## âś” lubridate 1.9.2 âś” tidyr 1.3.0
## âś” purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(lubridate)
Next, we import the covid polls data into a data frame: The code below reads the covid approval polls data from my github page into the covid_approval_polls data frame.
covid_approval_polls <- read_csv(
"https://raw.githubusercontent.com/hawa1983/Week1_Assignment/main/covid_approval_polls_adjusted.csv"
)
## Rows: 1626 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): subject, modeldate, party, startdate, enddate, pollster, grade, po...
## dbl (7): samplesize, weight, influence, approve, disapprove, approve_adjust...
## lgl (1): tracking
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(covid_approval_polls, n=3)
## # A tibble: 3 Ă— 19
## subject modeldate party startdate enddate pollster grade samplesize population
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 Biden 11/27/20… D 1/24/2021 1/26/2… YouGov B+ 477 a
## 2 Biden 11/27/20… D 1/28/2021 2/1/20… Quinnip… A- 333. a
## 3 Biden 11/27/20… D 1/29/2021 2/1/20… Morning… B 808 rv
## # ℹ 10 more variables: weight <dbl>, influence <dbl>, multiversions <chr>,
## # tracking <lgl>, approve <dbl>, disapprove <dbl>, approve_adjusted <dbl>,
## # disapprove_adjusted <dbl>, timestamp <chr>, url <chr>
Next, we will take a preview of the data to ensure that each variable/column is imported as the correct or desired data type.
A preview of the data shows that modeldate, startdate and enddate, and timestamp variables are loaded as character data type instead of date, date, date, and datetime data types respectively. We must change these to the appropriate data types. Similarly, party, pollster, and population will be changed to factors.
glimpse(covid_approval_polls)
## Rows: 1,626
## Columns: 19
## $ subject <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ modeldate <chr> "11/27/2022", "11/27/2022", "11/27/2022", "11/27/2…
## $ party <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", …
## $ startdate <chr> "1/24/2021", "1/28/2021", "1/29/2021", "1/31/2021"…
## $ enddate <chr> "1/26/2021", "2/1/2021", "2/1/2021", "2/2/2021", "…
## $ pollster <chr> "YouGov", "Quinnipiac", "Morning Consult", "YouGov…
## $ grade <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population <chr> "a", "a", "rv", "a", "a", "a", "a", "a", "a", "rv"…
## $ weight <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ tracking <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ timestamp <chr> "02:31:11 27 Nov 2022", "02:31:11 27 Nov 2022", "0…
## $ url <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…
A preview of the updated data frame shows that changing the timestamp/column to datetime data type fill the column with NA. I will need help with this.
covid_approval_polls <- covid_approval_polls |>
mutate(
party = as_factor(party),
modeldate = as_date(modeldate, format = "%m/%d/%Y"),
startdate = as_date(startdate, format = "%m/%d/%Y"),
enddate = as_date(enddate, format = "%m/%d/%Y"),
pollster = as_factor(pollster),
population = as_factor(population),
timestamp = parse_character(
if_else(timestamp == "02:31:11 27 Nov 2022", "2022-11-27 02:31:11",
if_else(timestamp == "02:31:16 27 Nov 2022", "2022-11-27 02:31:16",
if_else(timestamp == "02:31:21 27 Nov 2022", "2022-11-27 02:31:21",
if_else(timestamp == "02:31:28 27 Nov 2022", "2022-11-27 02:31:28", "NA")
)
)
)),
timestamp = parse_character(if_else(timestamp == "02:31:11 27 Nov 2022", "2022-11-27 02:31:11", "2022-11-27 02:31:11")),
timestamp = as_datetime(timestamp, format = "%Y-%m-%d %H:%M:%S")
)
glimpse(covid_approval_polls)
## Rows: 1,626
## Columns: 19
## $ subject <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ modeldate <date> 2022-11-27, 2022-11-27, 2022-11-27, 2022-11-27, 2…
## $ party <fct> D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D,…
## $ startdate <date> 2021-01-24, 2021-01-28, 2021-01-29, 2021-01-31, 2…
## $ enddate <date> 2021-01-26, 2021-02-01, 2021-02-01, 2021-02-02, 2…
## $ pollster <fct> YouGov, Quinnipiac, Morning Consult, YouGov, Data …
## $ grade <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population <fct> a, a, rv, a, a, a, a, a, a, rv, rv, a, a, rv, a, a…
## $ weight <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ tracking <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ timestamp <dttm> 2022-11-27 02:31:11, 2022-11-27 02:31:11, 2022-11…
## $ url <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…
The preview of the data above reveals multiple missing values in the multiversions and tracking columns. Filtering the rows as shown below shows that there are 4 records where multiple versions of the raw data are combined. The tracking column has no values implying that all the polls are not tracking polls.
multiversions_rocords_with_values <- covid_approval_polls |>
filter(!is.na(multiversions))
multiversions_rocords_with_values
## # A tibble: 4 Ă— 19
## subject modeldate party startdate enddate pollster grade samplesize
## <chr> <date> <fct> <date> <date> <fct> <chr> <dbl>
## 1 Biden 2022-11-27 D 2022-01-29 2022-02-01 YouGov B+ 514
## 2 Biden 2022-11-27 I 2022-01-29 2022-02-01 YouGov B+ 458
## 3 Biden 2022-11-27 R 2022-01-29 2022-02-01 YouGov B+ 384.
## 4 Biden 2022-11-27 all 2022-01-29 2022-02-01 YouGov B+ 1500
## # ℹ 11 more variables: population <fct>, weight <dbl>, influence <dbl>,
## # multiversions <chr>, tracking <lgl>, approve <dbl>, disapprove <dbl>,
## # approve_adjusted <dbl>, disapprove_adjusted <dbl>, timestamp <dttm>,
## # url <chr>
tracking_rocords_with_values <- covid_approval_polls |>
filter(!is.na(tracking))
tracking_rocords_with_values
## # A tibble: 0 Ă— 19
## # ℹ 19 variables: subject <chr>, modeldate <date>, party <fct>,
## # startdate <date>, enddate <date>, pollster <fct>, grade <chr>,
## # samplesize <dbl>, population <fct>, weight <dbl>, influence <dbl>,
## # multiversions <chr>, tracking <lgl>, approve <dbl>, disapprove <dbl>,
## # approve_adjusted <dbl>, disapprove_adjusted <dbl>, timestamp <dttm>,
## # url <chr>
The tracking column will be removed since it has no values. The modeldate and timestamp columns are also not relevant to our analysis. These columns will also be removed as shown below.
covid_approval_polls <- covid_approval_polls |> select(-modeldate, -tracking, -timestamp)
glimpse(covid_approval_polls)
## Rows: 1,626
## Columns: 16
## $ subject <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ party <fct> D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D,…
## $ startdate <date> 2021-01-24, 2021-01-28, 2021-01-29, 2021-01-31, 2…
## $ enddate <date> 2021-01-26, 2021-02-01, 2021-02-01, 2021-02-02, 2…
## $ pollster <fct> YouGov, Quinnipiac, Morning Consult, YouGov, Data …
## $ grade <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population <fct> a, a, rv, a, a, a, a, a, a, rv, rv, a, a, rv, a, a…
## $ weight <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ url <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…
The population column contain the following non-intuitive abbreviations: a = adults, rv = registered voters, lv = likely voters and v = voters. These values will be replace by adult, registered voters, likely voters, and voters respectively. The abbreviations for D for democrat, R for Republican, I for Independent will be replaced by the party names.
covid_approval_polls <- covid_approval_polls |>
mutate(
population = if_else(population == "a", "adult",
if_else(population == "rv", "registered voter",
if_else(population == "lv", "likely voter",
if_else(population == "v", "voter", "NA")
)
)
)
,
party = if_else(party == "D", "Democrat",
if_else(party == "R", "Republican",
if_else(party == "I", "Independent",
if_else(party == "all", "All","NA")
)
)
)
)
glimpse(covid_approval_polls )
## Rows: 1,626
## Columns: 16
## $ subject <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ party <chr> "Democrat", "Democrat", "Democrat", "Democrat", "D…
## $ startdate <date> 2021-01-24, 2021-01-28, 2021-01-29, 2021-01-31, 2…
## $ enddate <date> 2021-01-26, 2021-02-01, 2021-02-01, 2021-02-02, 2…
## $ pollster <fct> YouGov, Quinnipiac, Morning Consult, YouGov, Data …
## $ grade <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population <chr> "adult", "adult", "registered voter", "adult", "ad…
## $ weight <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ url <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…
Approval rating by Democrats is high but lowest among Republicans. Independents are divided.
ggplot(covid_approval_polls, aes(x = party, y = approve_adjusted)) + geom_boxplot() +
labs(title = "Biden's Covid Handling Approval by Political Party", x = "Political Party", y = "Approve")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
Disapproval rating by Republicans is high but lowest among Democrats.
Independents are split but below 50 percent.
ggplot(covid_approval_polls, aes(x = party, y = disapprove_adjusted)) + geom_boxplot() +
labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Political Party", y = "Disapprove")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
Overall approval ratings among the voting population is high.
ggplot(covid_approval_polls, aes(x = population, y = approve_adjusted)) + geom_boxplot()+
labs(title = "Biden's Covid Handling Approval by Population", x = "Political Party", y = "Approve")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
Overall disapproval rating is relatively low.
ggplot(covid_approval_polls, aes(x = population, y = disapprove_adjusted)) + geom_boxplot() +
labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Political Party", y = "Disapprove")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
## Approval ratings over time
The approval ratings among all political parties and population dropped over time but it remained high among democrats.
ggplot(covid_approval_polls, aes(x = startdate, y = approve_adjusted, color = party)) + geom_point() +
labs(title = "Biden's Covid Handling Approval by Political Party", x = "Poll Date", y = "Approve")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
ggplot(covid_approval_polls, aes(x = startdate, y = disapprove_adjusted, color = party)) + geom_point() +
labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Poll Date", y = "Disapprove")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
ggplot(covid_approval_polls, aes(x = startdate, y = approve_adjusted, color = population)) + geom_point() +
labs(title = "Biden's Covid Handling Approval by Political Party", x = "Poll Date", y = "Approve")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
k
ggplot(covid_approval_polls, aes(x = startdate, y = disapprove_adjusted, color = population)) + geom_point() +
labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Poll Date", y = "Disapprove")+ theme(
plot.title = element_text(hjust = 0.5) # Set the horizontal justification to center (0.5)
)
## Conclusions
The aggregation of the polls and the illustrated graphics shows that approval for Presidents Biden’s handling of the covid pandemic was largely on party line with high approval among Democrats and low approval among Republicans and independents somewhere in between. This indicates that partisanship may have significantly influenced the approval or disapproval ratings.
The polls should include more data that should enable further disaggregation of the data. For example, the state or resident of the poll participants could help to highlight the opinion of the population by state or region. The pollsters should also include a weight for partisan bias when they collect information about the participants party registration. Also given that the approval/disapproval is along party lines, the pollsters should also state the proportion of the respective party members included in the article.