library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
I wanted to find a data set that allowed me to do quantitative comparisons for different groups. This dataset provides the approval ratings for Presidents Biden and Trump on their handling of the Coronavirus pandemic. For this study, I have focused only on president Biden.
The data was extracted from fivethirtyeight.com (https://data.fivethirtyeight.com/) related to their article “How Americans View Biden’s Response To The Coronavirus Crisis”. Once acquired, the csv file was added to a github repository, from where it was read for analysis.
polls <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Tidyverse/main/covid_approval_polls_adjusted.csv"))
colnames(polls)
## [1] "subject" "modeldate" "party"
## [4] "startdate" "enddate" "pollster"
## [7] "grade" "samplesize" "population"
## [10] "weight" "influence" "multiversions"
## [13] "tracking" "approve" "disapprove"
## [16] "approve_adjusted" "disapprove_adjusted" "timestamp"
## [19] "url"
The variables of interest for us were: Subject, Party, StartDate, and Approve (number of approvals). The selected data was filtered to only contain information about president Biden and the three parties of interest (Democrat, Republican, Independent). The month part was extracted from the startDate column and used to create a month column to group by.
poll<- polls %>% select(Subject = subject, Party = party, Date = startdate, Approve = approve) %>% filter(Subject=="Biden", Party == "D" | Party == "R" | Party == "I")
poll <- poll %>% mutate(Month = substr(Date, start = 1, stop = 2))
poll$Month <- gsub("/","",as.character(poll$Month))
poll$Month <- factor(poll$Month,levels = c(1,2,3,4,5,6,7,8,9,10))
head(poll)
## Subject Party Date Approve Month
## 1 Biden D 1/24/2021 84.00 1
## 2 Biden D 1/28/2021 93.00 1
## 3 Biden D 1/29/2021 89.00 1
## 4 Biden D 1/31/2021 88.00 1
## 5 Biden D 2/2/2021 89.22 2
## 6 Biden D 2/5/2021 88.00 2
A stacked bar graph was created with the changed data. Each bar represents the number of approvals from members of each party for each month from January to Oct 2021.
ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") + ggtitle("Biden Approval By Party")
ggplot has built in color palettes that can be used for customizing different plots. The scale_fill_brewer() function can be used for Stacked Bar Graphs.
ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") +
ggtitle("Biden Approval By Party") +
scale_fill_brewer()
ggplot allows for the use of many different custom color combinations to be applied to different plots. The for custom colors call the function scale_fill_manual(). Below are a handful of examples of the color combinations that are possible.
Here is a link to a comprehensive color list available in R: http://derekogle.com/NCGraphing/resources/colors
library(ggpubr) # for arranging plots
# ex. 1
color1 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") +
ggtitle("Biden Approval By Party") +
scale_fill_manual(values = c("cyan3","black","brown"))
# ex. 2
color2 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") +
ggtitle("Biden Approval By Party") +
scale_fill_manual(values = c("greenyellow","grey80","orangered"))
# ex. 3
color3 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") +
ggtitle("Biden Approval By Party") +
scale_fill_manual(values = c("orchid","indianred","lightgreen"))
# ex.4
color4 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") +
ggtitle("Biden Approval By Party") +
scale_fill_manual(values = c("navy","deeppink4","tomato"))
# Put plots together
ggarrange(color1, color2, color3, color4,
ncol = 2, nrow = 2)
To change the Legend values you can use scale_fill_discrete() or scale_fill_manual(). Examples are shown below using both functions.
legend1 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") +
ggtitle("Biden Approval By Party") +
scale_fill_discrete(labels = c("Left", "Middle", "Right"))
legend2 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) +
geom_bar(position="stack", stat="identity") +
ggtitle("Biden Approval By Party") +
scale_fill_manual(labels = c("Dem", "Ind", "Rep"), values = c("cyan3","black","brown"))
ggarrange(legend1, legend2,
ncol = 2, nrow = 1)