Introduction
The article I chose from 538 involves an aging congress. They discuss a number of reasons the average age of congress is getting older, inluding how a large number of Baby Boomers make up its constituents as well as how their drive to stay a part of lawmaking is still strong. Here is a link to the article: https://fivethirtyeight.com/features/aging-congress-boomers/
Getting Started
library(dplyr);
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Here we have installed and activated important packages for our following code.
Finding our data
congress <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/congress-demographics/data_aging_congress.csv")
## Rows: 29120 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): chamber, state_abbrev, bioname, bioguide_id, generation
## dbl (6): congress, party_code, cmltv_cong, cmltv_chamber, age_days, age_years
## date (2): start_date, birthday
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Here we have taken our data from Github, where 538 has stored the raw data in a .csv file, and loaded it into the value “congress” as a table.
Table
congress
## # A tibble: 29,120 × 13
## congress start_date chamber state_abbrev party_code bioname bioguide_id
## <dbl> <date> <chr> <chr> <dbl> <chr> <chr>
## 1 82 1951-01-03 House ND 200 AANDAHL, Fre… A000001
## 2 80 1947-01-03 House VA 100 ABBITT, Watk… A000002
## 3 81 1949-01-03 House VA 100 ABBITT, Watk… A000002
## 4 82 1951-01-03 House VA 100 ABBITT, Watk… A000002
## 5 83 1953-01-03 House VA 100 ABBITT, Watk… A000002
## 6 84 1955-01-03 House VA 100 ABBITT, Watk… A000002
## 7 85 1957-01-03 House VA 100 ABBITT, Watk… A000002
## 8 86 1959-01-03 House VA 100 ABBITT, Watk… A000002
## 9 87 1961-01-03 House VA 100 ABBITT, Watk… A000002
## 10 88 1963-01-03 House VA 100 ABBITT, Watk… A000002
## # ℹ 29,110 more rows
## # ℹ 6 more variables: birthday <date>, cmltv_cong <dbl>, cmltv_chamber <dbl>,
## # age_days <dbl>, age_years <dbl>, generation <chr>
From here, we want to manipulate the data into something more concise by cutting out some variables as well as clarifying others.
Trimming Rows
congress <- congress |>
select(!bioguide_id,
-cmltv_cong,
-cmltv_chamber)
congress
## # A tibble: 29,120 × 10
## congress start_date chamber state_abbrev party_code bioname birthday
## <dbl> <date> <chr> <chr> <dbl> <chr> <date>
## 1 82 1951-01-03 House ND 200 AANDAHL, Fred… 1897-04-09
## 2 80 1947-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 3 81 1949-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 4 82 1951-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 5 83 1953-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 6 84 1955-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 7 85 1957-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 8 86 1959-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 9 87 1961-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## 10 88 1963-01-03 House VA 100 ABBITT, Watki… 1908-05-21
## # ℹ 29,110 more rows
## # ℹ 3 more variables: age_days <dbl>, age_years <dbl>, generation <chr>
The first thing I identify as extraneous in regards to age is the “bioguide_id” column. Additionally, while it is interesting, we can get rid of “cmltv_cong” and “cmltv_chamber” as that is more of a reflection of their careers as opposed to their ages. The rest of the data can accurately trace their ages as well as who they are so we can keep it. Here, we cut the extraneous out and create a new “congress” table without them.
Identifying and Converting Data
congress|>
distinct(party_code)
## # A tibble: 14 × 1
## party_code
## <dbl>
## 1 200
## 2 100
## 3 329
## 4 370
## 5 537
## 6 328
## 7 380
## 8 112
## 9 356
## 10 522
## 11 331
## 12 523
## 13 347
## 14 402
From the previous table, we could see that party_code listed only numerical data. The information provided on Github tells us that these are to identify the senator’s political parties. Instead of needing to identify each code every time, we can determine how many distinct codes there are(as we did above), match them with their parties once and change them all within the data (as is we will do below).
Information about party codes here: https://voteview.com/articles/data_help_parties
congress <- congress|>
mutate(party_code = recode(party_code, "200" = "Republican",
"100" = "Democrat",
"329" = "Independent Democrat",
"370" = "Progressive Party",
"537" = "Farmer-Labor Party",
"328" = "Independent",
"380" = "Socialist Party",
"112" = "Conservative Party",
"356" = "Union Labor Party",
"522" = "American Labor Party",
"331" = "Independent Republican",
"523" = "American Labor Party (La Guardia)",
"347" = "Prohibitionist Party",
"402" = "Liberal Party"))
congress
## # A tibble: 29,120 × 10
## congress start_date chamber state_abbrev party_code bioname birthday
## <dbl> <date> <chr> <chr> <chr> <chr> <date>
## 1 82 1951-01-03 House ND Republican AANDAHL, Fred… 1897-04-09
## 2 80 1947-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 3 81 1949-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 4 82 1951-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 5 83 1953-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 6 84 1955-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 7 85 1957-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 8 86 1959-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 9 87 1961-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## 10 88 1963-01-03 House VA Democrat ABBITT, Watki… 1908-05-21
## # ℹ 29,110 more rows
## # ℹ 3 more variables: age_days <dbl>, age_years <dbl>, generation <chr>
The new table presents their parties without the need to look elsewhere. Now that the data is no longer in code, we can also change the title of the column.
congress <- congress|>
rename(party = party_code)
congress
## # A tibble: 29,120 × 10
## congress start_date chamber state_abbrev party bioname birthday age_days
## <dbl> <date> <chr> <chr> <chr> <chr> <date> <dbl>
## 1 82 1951-01-03 House ND Republi… AANDAH… 1897-04-09 19626
## 2 80 1947-01-03 House VA Democrat ABBITT… 1908-05-21 14106
## 3 81 1949-01-03 House VA Democrat ABBITT… 1908-05-21 14837
## 4 82 1951-01-03 House VA Democrat ABBITT… 1908-05-21 15567
## 5 83 1953-01-03 House VA Democrat ABBITT… 1908-05-21 16298
## 6 84 1955-01-03 House VA Democrat ABBITT… 1908-05-21 17028
## 7 85 1957-01-03 House VA Democrat ABBITT… 1908-05-21 17759
## 8 86 1959-01-03 House VA Democrat ABBITT… 1908-05-21 18489
## 9 87 1961-01-03 House VA Democrat ABBITT… 1908-05-21 19220
## 10 88 1963-01-03 House VA Democrat ABBITT… 1908-05-21 19950
## # ℹ 29,110 more rows
## # ℹ 2 more variables: age_years <dbl>, generation <chr>
Now we have a much more concise set of data that is easy to navigate.
Conclusions
It is clear that the Senate is currently much older than it has ever been and I believe we can expand our view of the age of our public servants by looking at presidential ages. Joe Biden and Donald Trump are our two oldest presidents, and our most recent - perhaps their staff reflect a similar phenomenon of being, on average, older. I also think that it could be interesting to look at local politicians to see if they skew older or younger on average.