This project uses data from fivethirtyeight.com to demonstrate tidyverse package capabilities.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/election-deniers/fivethirtyeight_election_deniers.csv"
df <- read.csv(url)
head(df)
## Candidate Incumbent State Office District Stance
## 1 Katie Britt No Alabama Senator N/A Fully denied
## 2 Jerry Carl Yes Alabama Representative 1 Fully denied
## 3 Barry Moore Yes Alabama Representative 2 Fully denied
## 4 Mike Rogers Yes Alabama Representative 3 Fully denied
## 5 Robert Aderholt Yes Alabama Representative 4 Fully denied
## 6 Dale Strong No Alabama Representative 5 Fully denied
## Source
## 1 NBC News
## 2 Congressional roll call, Alabama Political Reporter
## 3 Congressional roll call
## 4 Congressional roll call
## 5 Congressional roll call
## 6 Facebook
## URL
## 1 https://twitter.com/VaughnHillyard/status/1528918324192608257
## 2 https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 3 https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 4 https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 5 https://clerk.house.gov/Votes/202111?Date=01%2F07%2F2021
## 6 https://www.facebook.com/strongforalabama/posts/pfbid0Km4TZaJPUTYuSAnwLNeezfrQSaL3uCNDLhvWFwRkMDpKPByKPTXDFYEdvP2m1rvkl?__cft__[0]=AZVE-LgrXbka-w1K8Z5IBxGpBh61GPUdZxnovjIPky71lJcPQWMquAkzM7W8jVhSl7p6idLzrG9toCil8_CQJUa_1NsiaaLVvWTqhVnrllFsloNysJKgcXmgeC0g3E_M4nJnPgsbVJ4ATcVTKt3uUJsRCZRWe028wDT8sg4hQaIRJsMxEFe46wgxhd-UBLyMZ5OKiWp2sNUyyvnmr92-UpxH&__tn__=%2CO%2CP-R
## Note
## 1
## 2
## 3
## 4
## 5
## 6
The dplyr package allows us to use pipes to transform the data:
df <- df |>
mutate(Stance = toupper(Stance))
It also let’s us count the number of times a value occurs:
stance_ct <- df |>
count(Stance) |>
arrange(desc(n))
stance_ct
## Stance n
## 1 FULLY DENIED 199
## 2 NO COMMENT 104
## 3 ACCEPTED WITH RESERVATIONS 93
## 4 FULLY ACCEPTED 77
## 5 RAISED QUESTIONS 61
## 6 AVOIDED ANSWERING 18
The ggplot package allows us to visualize this without the transformation.
ggplot(df) +
geom_histogram(aes(x=Stance), stat="count") +
coord_flip()
The tidyr package lets us pivot data:
by_state <- df |> # using dplyr pipe
group_by(State, Stance) |>
summarise(count = n()) |>
pivot_wider(names_from = Stance, values_from = count)
## `summarise()` has grouped output by 'State'. You can override using the
## `.groups` argument.
by_state
## # A tibble: 50 × 7
## # Groups: State [50]
## State ACCEPTED WITH RESERVATI…¹ FULLY…² RAISE…³ FULLY…⁴ NO CO…⁵ AVOID…⁶
## <chr> <int> <int> <int> <int> <int> <int>
## 1 Alabama 1 8 2 NA NA NA
## 2 Alaska 1 1 2 1 NA NA
## 3 Arizona 1 9 2 NA 1 NA
## 4 Arkansas 1 1 1 3 1 1
## 5 California 6 13 2 9 18 1
## 6 Colorado 1 4 NA 6 1 NA
## 7 Connecticut 2 NA 2 2 3 NA
## 8 Delaware NA NA NA NA 2 NA
## 9 Florida 2 19 5 2 2 1
## 10 Georgia 6 8 2 2 NA NA
## # … with 40 more rows, and abbreviated variable names
## # ¹`ACCEPTED WITH RESERVATIONS`, ²`FULLY DENIED`, ³`RAISED QUESTIONS`,
## # ⁴`FULLY ACCEPTED`, ⁵`NO COMMENT`, ⁶`AVOIDED ANSWERING`
I will see the numbers of Republican nominee for different offices who took different stances on the 2020 election. I will also see the office wise nominee numbers who fully denied the election. Finally, I will find the incumbents by types for different stances and will also graphically represent the incumbent numbers by grouped barplot for each stance category.
df1<-df %>% select(Incumbent,Office,Stance)
head(df1)
## Incumbent Office Stance
## 1 No Senator FULLY DENIED
## 2 Yes Representative FULLY DENIED
## 3 Yes Representative FULLY DENIED
## 4 Yes Representative FULLY DENIED
## 5 Yes Representative FULLY DENIED
## 6 No Representative FULLY DENIED
df2 <- df1 %>%
group_by(Office) %>%
summarize(count = n())
df2
## # A tibble: 6 × 2
## Office count
## <chr> <int>
## 1 Attorney general 30
## 2 Governor 36
## 3 Representative 424
## 4 Secretary of state 26
## 5 Senator 35
## 6 Senator (unexpired term) 1
df3<-df1 %>% filter(Stance=="FULLY DENIED") %>%
group_by(Office) %>% summarize(count = n())
df3
## # A tibble: 6 × 2
## Office count
## <chr> <int>
## 1 Attorney general 7
## 2 Governor 7
## 3 Representative 170
## 4 Secretary of state 7
## 5 Senator 7
## 6 Senator (unexpired term) 1
df3 <- df1 %>%
group_by(Stance,Incumbent) %>%
summarize(count = n())
## `summarise()` has grouped output by 'Stance'. You can override using the
## `.groups` argument.
df3
## # A tibble: 12 × 3
## # Groups: Stance [6]
## Stance Incumbent count
## <chr> <chr> <int>
## 1 ACCEPTED WITH RESERVATIONS No 42
## 2 ACCEPTED WITH RESERVATIONS Yes 51
## 3 AVOIDED ANSWERING No 15
## 4 AVOIDED ANSWERING Yes 3
## 5 FULLY ACCEPTED No 38
## 6 FULLY ACCEPTED Yes 39
## 7 FULLY DENIED No 80
## 8 FULLY DENIED Yes 119
## 9 NO COMMENT No 98
## 10 NO COMMENT Yes 6
## 11 RAISED QUESTIONS No 51
## 12 RAISED QUESTIONS Yes 10
ggplot(df3, aes(factor(Stance), count,fill=Incumbent)) +
geom_bar(stat="identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90, size = 10))+
scale_fill_brewer(palette = "Set1")