Krutika Patel’s Tidyverse CREATE Assignment

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(tidyr)

Data

I wanted to find a data set that allowed me to do quantitative comparisons for different groups. This dataset provides the approval ratings for Presidents Biden and Trump on their handling of the Coronavirus pandemic. For this study, I have focused only on president Biden.

1

The data was extracted from fivethirtyeight.com (https://data.fivethirtyeight.com/) related to their article “How Americans View Biden’s Response To The Coronavirus Crisis”. Once acquired, the csv file was added to a github repository, from where it was read for analysis.

polls <- data.frame(read.csv("https://raw.githubusercontent.com/Patel-Krutika/Tidyverse/main/covid_approval_polls_adjusted.csv"))
colnames(polls)
##  [1] "subject"             "modeldate"           "party"              
##  [4] "startdate"           "enddate"             "pollster"           
##  [7] "grade"               "samplesize"          "population"         
## [10] "weight"              "influence"           "multiversions"      
## [13] "tracking"            "approve"             "disapprove"         
## [16] "approve_adjusted"    "disapprove_adjusted" "timestamp"          
## [19] "url"

2

The variables of interest for us were: Subject, Party, StartDate, and Approve (number of approvals). The selected data was filtered to only contain information about president Biden and the three parties of interest (Democrat, Republican, Independent). The month part was extracted from the startDate column and used to create a month column to group by.

poll<- polls %>% select(Subject = subject, Party = party, Date = startdate, Approve = approve) %>% filter(Subject=="Biden", Party == "D" | Party == "R" | Party == "I")
poll <- poll %>% mutate(Month = substr(Date, start = 1, stop = 2))
poll$Month <- gsub("/","",as.character(poll$Month))
poll$Month <- factor(poll$Month,levels = c(1,2,3,4,5,6,7,8,9,10))
head(poll)
##   Subject Party      Date Approve Month
## 1   Biden     D 1/24/2021   84.00     1
## 2   Biden     D 1/28/2021   93.00     1
## 3   Biden     D 1/29/2021   89.00     1
## 4   Biden     D 1/31/2021   88.00     1
## 5   Biden     D  2/2/2021   89.22     2
## 6   Biden     D  2/5/2021   88.00     2

3

A stacked bar graph was created with the changed data. Each bar represents the number of approvals from members of each party for each month from January to Oct 2021.

ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
    geom_bar(position="stack", stat="identity") + ggtitle("Biden Approval By Party")

Eric Lehmphul - Extension

Customizing Stacked Bar Graph: Colors

Predifined Palette

ggplot has built in color palettes that can be used for customizing different plots. The scale_fill_brewer() function can be used for Stacked Bar Graphs.

ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
  geom_bar(position="stack", stat="identity") + 
  ggtitle("Biden Approval By Party") +
  scale_fill_brewer()

Manual Palette

ggplot allows for the use of many different custom color combinations to be applied to different plots. The for custom colors call the function scale_fill_manual(). Below are a handful of examples of the color combinations that are possible.

Here is a link to a comprehensive color list available in R: http://derekogle.com/NCGraphing/resources/colors

library(ggpubr) # for arranging plots

# ex. 1
color1 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
  geom_bar(position="stack", stat="identity") +
  ggtitle("Biden Approval By Party") +
  scale_fill_manual(values = c("cyan3","black","brown"))

# ex. 2
color2 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
  geom_bar(position="stack", stat="identity") +
  ggtitle("Biden Approval By Party") +
  scale_fill_manual(values = c("greenyellow","grey80","orangered"))

# ex. 3
color3 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
  geom_bar(position="stack", stat="identity") +
  ggtitle("Biden Approval By Party") +
  scale_fill_manual(values = c("orchid","indianred","lightgreen"))

# ex.4
color4 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
  geom_bar(position="stack", stat="identity") +
  ggtitle("Biden Approval By Party") +
  scale_fill_manual(values = c("navy","deeppink4","tomato"))

# Put plots together
ggarrange(color1, color2, color3, color4,
          ncol = 2, nrow = 2)

Customizing Stacked Bar Graph: Legend

To change the Legend values you can use scale_fill_discrete() or scale_fill_manual(). Examples are shown below using both functions.

legend1 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
  geom_bar(position="stack", stat="identity") + 
  ggtitle("Biden Approval By Party") +
  scale_fill_discrete(labels = c("Left", "Middle", "Right"))

legend2 <- ggplot(poll, aes(fill=Party, y=Approve, x=Month)) + 
  geom_bar(position="stack", stat="identity") + 
  ggtitle("Biden Approval By Party") +
  scale_fill_manual(labels = c("Dem", "Ind", "Rep"), values = c("cyan3","black","brown"))

ggarrange(legend1, legend2,
          ncol = 2, nrow = 1)