TIDYVERSE PROJECT

This project uses data from fivethirtyeight.com to demonstrate tidyverse package capabilities.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

THE DATA

url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/election-deniers/fivethirtyeight_election_deniers.csv"
df <- read.csv(url) 
head(df)
##         Candidate Incumbent   State         Office District       Stance
## 1     Katie Britt        No Alabama        Senator      N/A Fully denied
## 2      Jerry Carl       Yes Alabama Representative        1 Fully denied
## 3     Barry Moore       Yes Alabama Representative        2 Fully denied
## 4     Mike Rogers       Yes Alabama Representative        3 Fully denied
## 5 Robert Aderholt       Yes Alabama Representative        4 Fully denied
## 6     Dale Strong        No Alabama Representative        5 Fully denied
##                                                Source
## 1                                            NBC News
## 2 Congressional roll call, Alabama Political Reporter
## 3                             Congressional roll call
## 4                             Congressional roll call
## 5                             Congressional roll call
## 6                                            Facebook
##                                                                                                                                                                                                                                                                                                                                                                             URL
## 1                                                                                                                                                                                                                                                                                                                 https://twitter.com/VaughnHillyard/status/1528918324192608257
## 2                                                                                                                                                                                                                                                                                                                      https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 3                                                                                                                                                                                                                                                                                                                      https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 4                                                                                                                                                                                                                                                                                                                      https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 5                                                                                                                                                                                                                                                                                                                      https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 6 https://www.facebook.com/strongforalabama/posts/pfbid0Km4TZaJPUTYuSAnwLNeezfrQSaL3uCNDLhvWFwRkMDpKPByKPTXDFYEdvP2m1rvkl?__cft__[0]=AZVE-LgrXbka-w1K8Z5IBxGpBh61GPUdZxnovjIPky71lJcPQWMquAkzM7W8jVhSl7p6idLzrG9toCil8_CQJUa_1NsiaaLVvWTqhVnrllFsloNysJKgcXmgeC0g3E_M4nJnPgsbVJ4ATcVTKt3uUJsRCZRWe028wDT8sg4hQaIRJsMxEFe46wgxhd-UBLyMZ5OKiWp2sNUyyvnmr92-UpxH&__tn__=%2CO%2CP-R
##   Note
## 1     
## 2     
## 3     
## 4     
## 5     
## 6

The dplyr package allows us to use pipes to transform the data:

df <- df |>
  mutate(Stance = toupper(Stance))

It also let’s us count the number of times a value occurs:

stance_ct <- df |>
      count(Stance) |>
      arrange(desc(n))
stance_ct
##                       Stance   n
## 1               FULLY DENIED 199
## 2                 NO COMMENT 104
## 3 ACCEPTED WITH RESERVATIONS  93
## 4             FULLY ACCEPTED  77
## 5           RAISED QUESTIONS  61
## 6          AVOIDED ANSWERING  18

The ggplot package allows us to visualize this without the transformation.

ggplot(df) +
  geom_histogram(aes(x=Stance), stat="count") +
  coord_flip()

The tidyr package lets us pivot data:

by_state <- df |> # using dplyr pipe
  group_by(State, Stance) |>
  summarise(count = n()) |>
  pivot_wider(names_from = Stance, values_from = count)
## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.
by_state
## # A tibble: 50 × 7
## # Groups:   State [50]
##    State       ACCEPTED WITH RESERVATI…¹ FULLY…² RAISE…³ FULLY…⁴ NO CO…⁵ AVOID…⁶
##    <chr>                           <int>   <int>   <int>   <int>   <int>   <int>
##  1 Alabama                             1       8       2      NA      NA      NA
##  2 Alaska                              1       1       2       1      NA      NA
##  3 Arizona                             1       9       2      NA       1      NA
##  4 Arkansas                            1       1       1       3       1       1
##  5 California                          6      13       2       9      18       1
##  6 Colorado                            1       4      NA       6       1      NA
##  7 Connecticut                         2      NA       2       2       3      NA
##  8 Delaware                           NA      NA      NA      NA       2      NA
##  9 Florida                             2      19       5       2       2       1
## 10 Georgia                             6       8       2       2      NA      NA
## # … with 40 more rows, and abbreviated variable names
## #   ¹​`ACCEPTED WITH RESERVATIONS`, ²​`FULLY DENIED`, ³​`RAISED QUESTIONS`,
## #   ⁴​`FULLY ACCEPTED`, ⁵​`NO COMMENT`, ⁶​`AVOIDED ANSWERING`

Extended part summary

I will see the numbers of Republican nominee for different offices who took different stances on the 2020 election. I will also see the office wise nominee numbers who fully denied the election. Finally, I will find the incumbents by types for different stances and will also graphically represent the incumbent numbers by grouped barplot for each stance category.

Subsetting the df dataframe with only Incumbent,Office, and Stance columns by using select function from dplyr

df1<-df %>% select(Incumbent,Office,Stance)
head(df1)
##   Incumbent         Office       Stance
## 1        No        Senator FULLY DENIED
## 2       Yes Representative FULLY DENIED
## 3       Yes Representative FULLY DENIED
## 4       Yes Representative FULLY DENIED
## 5       Yes Representative FULLY DENIED
## 6        No Representative FULLY DENIED

Finding nominee numbers for different offices who took different stances on the 2020 election by using group_by and summarize functions of dplyr

df2 <- df1 %>% 
  group_by(Office) %>% 
  summarize(count = n())
df2
## # A tibble: 6 × 2
##   Office                   count
##   <chr>                    <int>
## 1 Attorney general            30
## 2 Governor                    36
## 3 Representative             424
## 4 Secretary of state          26
## 5 Senator                     35
## 6 Senator (unexpired term)     1

Finding the Republican nominee numbers by offices who fully denied the 2020 election by filtering the required data by using filter function of dplyr

df3<-df1 %>% filter(Stance=="FULLY DENIED") %>% 
group_by(Office) %>% summarize(count = n())
df3
## # A tibble: 6 × 2
##   Office                   count
##   <chr>                    <int>
## 1 Attorney general             7
## 2 Governor                     7
## 3 Representative             170
## 4 Secretary of state           7
## 5 Senator                      7
## 6 Senator (unexpired term)     1

Finding Incumbent numbers for different stance cateory by using group_by annd summarize functions of dplyr package

df3 <- df1 %>% 
  group_by(Stance,Incumbent) %>% 
  summarize(count = n())
## `summarise()` has grouped output by 'Stance'. You can override using the
## `.groups` argument.
df3
## # A tibble: 12 × 3
## # Groups:   Stance [6]
##    Stance                     Incumbent count
##    <chr>                      <chr>     <int>
##  1 ACCEPTED WITH RESERVATIONS No           42
##  2 ACCEPTED WITH RESERVATIONS Yes          51
##  3 AVOIDED ANSWERING          No           15
##  4 AVOIDED ANSWERING          Yes           3
##  5 FULLY ACCEPTED             No           38
##  6 FULLY ACCEPTED             Yes          39
##  7 FULLY DENIED               No           80
##  8 FULLY DENIED               Yes         119
##  9 NO COMMENT                 No           98
## 10 NO COMMENT                 Yes           6
## 11 RAISED QUESTIONS           No           51
## 12 RAISED QUESTIONS           Yes          10

Visualizing two types of incumbent numbers for each category of stance by grouped barplot using different functions of ggplot2 package

ggplot(df3, aes(factor(Stance), count,fill=Incumbent)) + 
  geom_bar(stat="identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90, size = 10))+
  scale_fill_brewer(palette = "Set1")