library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.3 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
I wanted to find a data set that allowed me to do quantitative comparisons for different groups. This dataset provides the approval ratings for Presidents Biden and Trump on their handling of the Coronavirus pandemic. For this study, I have focused only on president Biden.
The data was extracted from fivethirtyeight.com (https://data.fivethirtyeight.com/) related to their article “How Americans View Biden’s Response To The Coronavirus Crisis”. Once acquired, the csv file was added to a github repository, from where it was read for analysis.
polls <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Tidyverse/main/covid_approval_polls_adjusted.csv"))
colnames(polls)
## [1] "subject" "modeldate" "party"
## [4] "startdate" "enddate" "pollster"
## [7] "grade" "samplesize" "population"
## [10] "weight" "influence" "multiversions"
## [13] "tracking" "approve" "disapprove"
## [16] "approve_adjusted" "disapprove_adjusted" "timestamp"
## [19] "url"
The variables of interest for us were: Subject, Party, StartDate, and Approve (number of approvals). The selected data was filtered to only contain information about president Biden and the three parties of interest (Democrat, Republican, Independent). The month part was extracted from the startDate column and used to create a month column to group by.
poll<- polls %>% select(Subject = subject, Party = party, Date = startdate, Approve = approve) %>% filter(Subject=="Biden", Party == "D" | Party == "R" | Party == "I")
poll <- poll %>% mutate(Month = substr(Date, start = 1, stop = 2))
poll$Month <- gsub("/","",as.character(poll$Month))
poll$Month <- factor(poll$Month,levels = c(1,2,3,4,5,6,7,8,9,10))
head(poll)
## Subject Party Date Approve Month
## 1 Biden D 1/24/2021 84.00 1
## 2 Biden D 1/28/2021 93.00 1
## 3 Biden D 1/29/2021 89.00 1
## 4 Biden D 1/31/2021 88.00 1
## 5 Biden D 2/2/2021 89.22 2
## 6 Biden D 2/5/2021 88.00 2
A stacked bar graph was created with the changed data. Each bar represents the number of approvals from members of each party for each month from January to Oct 2021.
ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") + ggtitle("Biden Approval By Party")