The article and the subsequent data that I choose to work with is titled “Congress Today Is Older Than It’s Ever Been: OK, boomer? More like boomer, OK!” publised on the FiveThirtyEight.com. The articl can be found here
The article describes some basic statistics, identifies and visualizes trends in the ages of the memebers of The House of Representatives and the Senate in the US starting with the 66th Congress (1919 - 1921) to the 118th Congress (2023-2025).
# loading in the necessary libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Reading in the data via link to the raw data on github
congress_data <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-demographics/data_aging_congress.csv")
head(congress_data)
I wanted to filter the data to include the parties, chamber of congress, and age years in each of the Congressional periods for the last 20 years. I used age_years to represent the age of the member of congress.
The code below is how I filtered and renamed the columns.
congress_sub <- congress_data %>% filter(start_date >= "2003-01-03") %>%
select(congress, start_date, age_years, chamber, party_code, generation)
names(congress_sub) <- c("congress_served", "start_date", "age_years", "chamber", "party", "generation")
dim(congress_sub)
## [1] 6002 6
congress_sub$chamber <- as.factor(congress_sub$chamber)
congress_sub$generation <- as.factor(congress_sub$generation)
congress_sub$party <- as.character(congress_sub$party)
congress_sub$start_date <- as.Date(congress_sub$start_date, tryFormats = "%Y-%m-%d")
glimpse(congress_sub)
## Rows: 6,002
## Columns: 6
## $ congress_served <int> 108, 109, 110, 111, 108, 109, 110, 111, 112, 108, 109,…
## $ start_date <date> 2003-01-03, 2005-01-03, 2007-01-03, 2009-01-03, 2003-…
## $ age_years <dbl> 64.52293, 66.52430, 68.52293, 70.52430, 60.12320, 62.1…
## $ chamber <fct> House, House, House, House, House, House, House, House…
## $ party <chr> "100", "100", "100", "100", "100", "100", "100", "100"…
## $ generation <fct> Silent, Silent, Silent, Silent, Silent, Silent, Silent…
congress_sub$party[congress_sub$party == "100"] <- "Democrat"
congress_sub$party[congress_sub$party == "200"] <- "Republican"
congress_sub$party[congress_sub$party =="328"] <- "Inependent"
congress_sub %>%
mutate(generation = generation %>% fct_infreq() %>% fct_rev()) %>%
ggplot(aes(x = generation)) +
geom_bar()
plot2 <- congress_sub %>% group_by(start_date, party) %>% summarize(med_age_years = round(median(age_years), 2))
ggplot(data = plot2, mapping = aes(x = start_date, y = med_age_years)) +
geom_line(aes(color = party)) +
geom_point() +
scale_color_manual(values=c('Blue','Green', 'Red'))
In the article it mentioned several reasons for the increased age of the US Congress. Although not an exhautive list some of these reasons include
Other areas that would be interesting to visualize are the bills introduced by the memebers of congress and the age of their coinstiuents.