How Americans View Biden’s Response To The Coronavirus Crisis

Overview of the article

This article is based on polling averages compiled by the FiveThirtyEight polling organizations about President Biden’s handling of the covid crisis. The conclusion of the polling results indicated that Americans approved of Biden’s handling of the corona virus crises in the first year of his presidency. However, the approval was mainly among Democrats and Independents. Republicans disapproved of Biden’s handling of the virus. The approval of his handling of the virus dropped over time. Below is the link to the article.

Link to article:https://projects.fivethirtyeight.com/coronavirus-polls/ link to data frame: https://raw.githubusercontent.com/hawa1983/Week1_Assignment/main/covid_approval_polls_adjusted.csv

Load the relevant libraries

We start by installing the relevant packages and loading the libraries as below

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(ggplot2)
library(lubridate)

Import the covid polls data into a data frame

Next, we import the covid polls data into a data frame: The code below reads the covid approval polls data from my github page into the covid_approval_polls data frame.

covid_approval_polls <- read_csv(
  "https://raw.githubusercontent.com/hawa1983/Week1_Assignment/main/covid_approval_polls_adjusted.csv"
  )

## Rows: 1626 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): subject, modeldate, party, startdate, enddate, pollster, grade, po...
## dbl  (7): samplesize, weight, influence, approve, disapprove, approve_adjust...
## lgl  (1): tracking
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(covid_approval_polls, n=3)

## # A tibble: 3 × 19
##   subject modeldate party startdate enddate pollster grade samplesize population
##   <chr>   <chr>     <chr> <chr>     <chr>   <chr>    <chr>      <dbl> <chr>     
## 1 Biden   11/27/20… D     1/24/2021 1/26/2… YouGov   B+          477  a         
## 2 Biden   11/27/20… D     1/28/2021 2/1/20… Quinnip… A-          333. a         
## 3 Biden   11/27/20… D     1/29/2021 2/1/20… Morning… B           808  rv        
## # ℹ 10 more variables: weight <dbl>, influence <dbl>, multiversions <chr>,
## #   tracking <lgl>, approve <dbl>, disapprove <dbl>, approve_adjusted <dbl>,
## #   disapprove_adjusted <dbl>, timestamp <chr>, url <chr>

Preview the data

Next, we will take a preview of the data to ensure that each variable/column is imported as the correct or desired data type.

A preview of the data shows that modeldate, startdate and enddate, and timestamp variables are loaded as character data type instead of date, date, date, and datetime data types respectively. We must change these to the appropriate data types. Similarly, party, pollster, and population will be changed to factors.

glimpse(covid_approval_polls)

## Rows: 1,626
## Columns: 19
## $ subject             <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ modeldate           <chr> "11/27/2022", "11/27/2022", "11/27/2022", "11/27/2…
## $ party               <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", …
## $ startdate           <chr> "1/24/2021", "1/28/2021", "1/29/2021", "1/31/2021"…
## $ enddate             <chr> "1/26/2021", "2/1/2021", "2/1/2021", "2/2/2021", "…
## $ pollster            <chr> "YouGov", "Quinnipiac", "Morning Consult", "YouGov…
## $ grade               <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize          <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population          <chr> "a", "a", "rv", "a", "a", "a", "a", "a", "a", "rv"…
## $ weight              <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ tracking            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve             <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove          <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted    <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ timestamp           <chr> "02:31:11 27 Nov 2022", "02:31:11 27 Nov 2022", "0…
## $ url                 <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…

Change data type to the correct type for each variable/column

A preview of the updated data frame shows that changing the timestamp/column to datetime data type fill the column with NA. I will need help with this.

covid_approval_polls <- covid_approval_polls |>
  mutate(
    party = as_factor(party),
    modeldate = as_date(modeldate, format = "%m/%d/%Y"),
    startdate = as_date(startdate, format = "%m/%d/%Y"),
    enddate = as_date(enddate, format = "%m/%d/%Y"),
    pollster = as_factor(pollster),
    population = as_factor(population),
    timestamp = parse_character(
      if_else(timestamp == "02:31:11 27 Nov 2022", "2022-11-27 02:31:11",
                    if_else(timestamp == "02:31:16 27 Nov 2022", "2022-11-27 02:31:16",
                          if_else(timestamp == "02:31:21 27 Nov 2022", "2022-11-27 02:31:21",
                                    if_else(timestamp == "02:31:28 27 Nov 2022", "2022-11-27 02:31:28", "NA")
                            )
                    )
    )),
    timestamp = parse_character(if_else(timestamp == "02:31:11 27 Nov 2022", "2022-11-27 02:31:11", "2022-11-27 02:31:11")),
    timestamp = as_datetime(timestamp, format = "%Y-%m-%d %H:%M:%S")
    )
glimpse(covid_approval_polls)

## Rows: 1,626
## Columns: 19
## $ subject             <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ modeldate           <date> 2022-11-27, 2022-11-27, 2022-11-27, 2022-11-27, 2…
## $ party               <fct> D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D,…
## $ startdate           <date> 2021-01-24, 2021-01-28, 2021-01-29, 2021-01-31, 2…
## $ enddate             <date> 2021-01-26, 2021-02-01, 2021-02-01, 2021-02-02, 2…
## $ pollster            <fct> YouGov, Quinnipiac, Morning Consult, YouGov, Data …
## $ grade               <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize          <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population          <fct> a, a, rv, a, a, a, a, a, a, rv, rv, a, a, rv, a, a…
## $ weight              <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ tracking            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve             <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove          <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted    <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ timestamp           <dttm> 2022-11-27 02:31:11, 2022-11-27 02:31:11, 2022-11…
## $ url                 <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…

Determine columns that have all missing values

The preview of the data above reveals multiple missing values in the multiversions and tracking columns. Filtering the rows as shown below shows that there are 4 records where multiple versions of the raw data are combined. The tracking column has no values implying that all the polls are not tracking polls.

multiversions_rocords_with_values <- covid_approval_polls |>
  filter(!is.na(multiversions))
multiversions_rocords_with_values

## # A tibble: 4 × 19
##   subject modeldate  party startdate  enddate    pollster grade samplesize
##   <chr>   <date>     <fct> <date>     <date>     <fct>    <chr>      <dbl>
## 1 Biden   2022-11-27 D     2022-01-29 2022-02-01 YouGov   B+          514 
## 2 Biden   2022-11-27 I     2022-01-29 2022-02-01 YouGov   B+          458 
## 3 Biden   2022-11-27 R     2022-01-29 2022-02-01 YouGov   B+          384.
## 4 Biden   2022-11-27 all   2022-01-29 2022-02-01 YouGov   B+         1500 
## # ℹ 11 more variables: population <fct>, weight <dbl>, influence <dbl>,
## #   multiversions <chr>, tracking <lgl>, approve <dbl>, disapprove <dbl>,
## #   approve_adjusted <dbl>, disapprove_adjusted <dbl>, timestamp <dttm>,
## #   url <chr>

tracking_rocords_with_values <- covid_approval_polls |>
  filter(!is.na(tracking))
tracking_rocords_with_values

## # A tibble: 0 × 19
## # ℹ 19 variables: subject <chr>, modeldate <date>, party <fct>,
## #   startdate <date>, enddate <date>, pollster <fct>, grade <chr>,
## #   samplesize <dbl>, population <fct>, weight <dbl>, influence <dbl>,
## #   multiversions <chr>, tracking <lgl>, approve <dbl>, disapprove <dbl>,
## #   approve_adjusted <dbl>, disapprove_adjusted <dbl>, timestamp <dttm>,
## #   url <chr>

Remove columns that are not relevant

The tracking column will be removed since it has no values. The modeldate and timestamp columns are also not relevant to our analysis. These columns will also be removed as shown below.

covid_approval_polls <- covid_approval_polls |> select(-modeldate, -tracking, -timestamp)
glimpse(covid_approval_polls)

## Rows: 1,626
## Columns: 16
## $ subject             <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ party               <fct> D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D, D,…
## $ startdate           <date> 2021-01-24, 2021-01-28, 2021-01-29, 2021-01-31, 2…
## $ enddate             <date> 2021-01-26, 2021-02-01, 2021-02-01, 2021-02-02, 2…
## $ pollster            <fct> YouGov, Quinnipiac, Morning Consult, YouGov, Data …
## $ grade               <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize          <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population          <fct> a, a, rv, a, a, a, a, a, a, rv, rv, a, a, rv, a, a…
## $ weight              <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve             <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove          <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted    <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ url                 <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…

Replace non-intuitive abbreviations

The population column contain the following non-intuitive abbreviations: a = adults, rv = registered voters, lv = likely voters and v = voters. These values will be replace by adult, registered voters, likely voters, and voters respectively. The abbreviations for D for democrat, R for Republican, I for Independent will be replaced by the party names.

covid_approval_polls <- covid_approval_polls |>
  mutate(
    population = if_else(population == "a", "adult",
                    if_else(population == "rv", "registered voter",
                          if_else(population == "lv", "likely voter",
                                    if_else(population == "v", "voter", "NA")
                            )
                    )
    )
    ,
    party = if_else(party == "D", "Democrat",
                    if_else(party == "R", "Republican",
                        if_else(party == "I", "Independent", 
                            if_else(party == "all", "All","NA")
                      )
              )
        )
    )

glimpse(covid_approval_polls )

## Rows: 1,626
## Columns: 16
## $ subject             <chr> "Biden", "Biden", "Biden", "Biden", "Biden", "Bide…
## $ party               <chr> "Democrat", "Democrat", "Democrat", "Democrat", "D…
## $ startdate           <date> 2021-01-24, 2021-01-28, 2021-01-29, 2021-01-31, 2…
## $ enddate             <date> 2021-01-26, 2021-02-01, 2021-02-01, 2021-02-02, 2…
## $ pollster            <fct> YouGov, Quinnipiac, Morning Consult, YouGov, Data …
## $ grade               <chr> "B+", "A-", "B", "B+", "B", "B+", "B+", "B+", "A-"…
## $ samplesize          <dbl> 477.00, 333.25, 808.00, 484.00, 564.00, 336.00, 56…
## $ population          <chr> "adult", "adult", "registered voter", "adult", "ad…
## $ weight              <dbl> 0.6285238, 0.6317152, 0.8337467, 0.5493243, 0.8883…
## $ influence           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ multiversions       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ approve             <dbl> 84.00, 93.00, 89.00, 88.00, 89.22, 88.00, 89.00, 9…
## $ disapprove          <dbl> 3.00, 5.00, 7.00, 7.00, 7.14, 8.00, 5.00, 5.00, 2.…
## $ approve_adjusted    <dbl> 87.08801, 90.93074, 90.81520, 91.08801, 89.80104, …
## $ disapprove_adjusted <dbl> 2.595882, 6.402365, 5.901370, 6.595882, 6.647952, …
## $ url                 <chr> "https://docs.cdn.yougov.com/ld46rgtdlz/econTabRep…

Optional Exploratory Data Analysis Graphiccs

Box Plot of Biden’s covid handling by political party and population

Approval rating by Democrats is high but lowest among Republicans. Independents are divided.

ggplot(covid_approval_polls, aes(x = party, y = approve_adjusted)) + geom_boxplot() +
  labs(title = "Biden's Covid Handling Approval by Political Party", x = "Political Party", y = "Approve")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

Disapproval rating by Republicans is high but lowest among Democrats. Independents are split but below 50 percent.

ggplot(covid_approval_polls, aes(x = party, y = disapprove_adjusted)) + geom_boxplot() +
  labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Political Party", y = "Disapprove")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

Overall approval ratings among the voting population is high.

ggplot(covid_approval_polls, aes(x = population, y = approve_adjusted)) + geom_boxplot()+
  labs(title = "Biden's Covid Handling Approval by Population", x = "Political Party", y = "Approve")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

Overall disapproval rating is relatively low.

ggplot(covid_approval_polls, aes(x = population, y = disapprove_adjusted)) + geom_boxplot() +
  labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Political Party", y = "Disapprove")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

## Approval ratings over time

The approval ratings among all political parties and population dropped over time but it remained high among democrats.

ggplot(covid_approval_polls, aes(x = startdate, y = approve_adjusted, color = party)) + geom_point() +
  labs(title = "Biden's Covid Handling Approval by Political Party", x = "Poll Date", y = "Approve")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

ggplot(covid_approval_polls, aes(x = startdate, y = disapprove_adjusted, color = party)) + geom_point() + 
  labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Poll Date", y = "Disapprove")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

ggplot(covid_approval_polls, aes(x = startdate, y = approve_adjusted, color = population)) + geom_point() +
  labs(title = "Biden's Covid Handling Approval by Political Party", x = "Poll Date", y = "Approve")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

ggplot(covid_approval_polls, aes(x = startdate, y = disapprove_adjusted, color = population)) + geom_point() +
  labs(title = "Biden's Covid Handling Disapproval by Political Party", x = "Poll Date", y = "Disapprove")+ theme(
  plot.title = element_text(hjust = 0.5)  # Set the horizontal justification to center (0.5)
)

## Conclusions

The aggregation of the polls and the illustrated graphics shows that approval for Presidents Biden’s handling of the covid pandemic was largely on party line with high approval among Democrats and low approval among Republicans and independents somewhere in between. This indicates that partisanship may have significantly influenced the approval or disapproval ratings.

Recommendations

The polls should include more data that should enable further disaggregation of the data. For example, the state or resident of the poll participants could help to highlight the opinion of the population by state or region. The pollsters should also include a weight for partisan bias when they collect information about the participants party registration. Also given that the approval/disapproval is along party lines, the pollsters should also state the proportion of the respective party members included in the article.

Week1 Assignment: Loading Data into a Data Frame

Fomba Kassoh

2023-09-08