library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.3     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidyr)

Data

I wanted to find a data set that allowed me to do quantitative comparisons for different groups. This dataset provides the approval ratings for Presidents Biden and Trump on their handling of the Coronavirus pandemic. For this study, I have focused only on president Biden.

1

The data was extracted from fivethirtyeight.com (https://data.fivethirtyeight.com/) related to their article “How Americans View Biden’s Response To The Coronavirus Crisis”. Once acquired, the csv file was added to a github repository, from where it was read for analysis.

polls <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Tidyverse/main/covid_approval_polls_adjusted.csv"))
colnames(polls)
##  [1] "subject"             "modeldate"           "party"              
##  [4] "startdate"           "enddate"             "pollster"           
##  [7] "grade"               "samplesize"          "population"         
## [10] "weight"              "influence"           "multiversions"      
## [13] "tracking"            "approve"             "disapprove"         
## [16] "approve_adjusted"    "disapprove_adjusted" "timestamp"          
## [19] "url"

2

The variables of interest for us were: Subject, Party, StartDate, and Approve (number of approvals). The selected data was filtered to only contain information about president Biden and the three parties of interest (Democrat, Republican, Independent). The month part was extracted from the startDate column and used to create a month column to group by.

poll<- polls %>% select(Subject = subject, Party = party, Date = startdate, Approve = approve) %>% filter(Subject=="Biden", Party == "D" | Party == "R" | Party == "I")
poll <- poll %>% mutate(Month = substr(Date, start = 1, stop = 2))

poll$Month <- gsub("/","",as.character(poll$Month))
poll$Month <- factor(poll$Month,levels = c(1,2,3,4,5,6,7,8,9,10))
head(poll)
##   Subject Party      Date Approve Month
## 1   Biden     D 1/24/2021   84.00     1
## 2   Biden     D 1/28/2021   93.00     1
## 3   Biden     D 1/29/2021   89.00     1
## 4   Biden     D 1/31/2021   88.00     1
## 5   Biden     D  2/2/2021   89.22     2
## 6   Biden     D  2/5/2021   88.00     2

3

A stacked bar graph was created with the changed data. Each bar represents the number of approvals from members of each party for each month from January to Oct 2021.

ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
    geom_bar(position="stack", stat="identity") + ggtitle("Biden Approval By Party")