Data 607 Week 1 Assignment

Introduction

The article I chose from 538 involves an aging congress. They discuss a number of reasons the average age of congress is getting older, inluding how a large number of Baby Boomers make up its constituents as well as how their drive to stay a part of lawmaking is still strong. Here is a link to the article: https://fivethirtyeight.com/features/aging-congress-boomers/

Getting Started

library(dplyr);

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Here we have installed and activated important packages for our following code.

Finding our data

congress <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/congress-demographics/data_aging_congress.csv")

## Rows: 29120 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): chamber, state_abbrev, bioname, bioguide_id, generation
## dbl  (6): congress, party_code, cmltv_cong, cmltv_chamber, age_days, age_years
## date (2): start_date, birthday
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Here we have taken our data from Github, where 538 has stored the raw data in a .csv file, and loaded it into the value “congress” as a table.

Table

congress

## # A tibble: 29,120 × 13
##    congress start_date chamber state_abbrev party_code bioname       bioguide_id
##       <dbl> <date>     <chr>   <chr>             <dbl> <chr>         <chr>      
##  1       82 1951-01-03 House   ND                  200 AANDAHL, Fre… A000001    
##  2       80 1947-01-03 House   VA                  100 ABBITT, Watk… A000002    
##  3       81 1949-01-03 House   VA                  100 ABBITT, Watk… A000002    
##  4       82 1951-01-03 House   VA                  100 ABBITT, Watk… A000002    
##  5       83 1953-01-03 House   VA                  100 ABBITT, Watk… A000002    
##  6       84 1955-01-03 House   VA                  100 ABBITT, Watk… A000002    
##  7       85 1957-01-03 House   VA                  100 ABBITT, Watk… A000002    
##  8       86 1959-01-03 House   VA                  100 ABBITT, Watk… A000002    
##  9       87 1961-01-03 House   VA                  100 ABBITT, Watk… A000002    
## 10       88 1963-01-03 House   VA                  100 ABBITT, Watk… A000002    
## # ℹ 29,110 more rows
## # ℹ 6 more variables: birthday <date>, cmltv_cong <dbl>, cmltv_chamber <dbl>,
## #   age_days <dbl>, age_years <dbl>, generation <chr>

From here, we want to manipulate the data into something more concise by cutting out some variables as well as clarifying others.

Trimming Rows

congress <- congress |>
  select(!bioguide_id, 
         -cmltv_cong,
         -cmltv_chamber)
congress

## # A tibble: 29,120 × 10
##    congress start_date chamber state_abbrev party_code bioname        birthday  
##       <dbl> <date>     <chr>   <chr>             <dbl> <chr>          <date>    
##  1       82 1951-01-03 House   ND                  200 AANDAHL, Fred… 1897-04-09
##  2       80 1947-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
##  3       81 1949-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
##  4       82 1951-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
##  5       83 1953-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
##  6       84 1955-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
##  7       85 1957-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
##  8       86 1959-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
##  9       87 1961-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
## 10       88 1963-01-03 House   VA                  100 ABBITT, Watki… 1908-05-21
## # ℹ 29,110 more rows
## # ℹ 3 more variables: age_days <dbl>, age_years <dbl>, generation <chr>

The first thing I identify as extraneous in regards to age is the “bioguide_id” column. Additionally, while it is interesting, we can get rid of “cmltv_cong” and “cmltv_chamber” as that is more of a reflection of their careers as opposed to their ages. The rest of the data can accurately trace their ages as well as who they are so we can keep it. Here, we cut the extraneous out and create a new “congress” table without them.

Identifying and Converting Data

congress|>
  distinct(party_code)

## # A tibble: 14 × 1
##    party_code
##         <dbl>
##  1        200
##  2        100
##  3        329
##  4        370
##  5        537
##  6        328
##  7        380
##  8        112
##  9        356
## 10        522
## 11        331
## 12        523
## 13        347
## 14        402

From the previous table, we could see that party_code listed only numerical data. The information provided on Github tells us that these are to identify the senator’s political parties. Instead of needing to identify each code every time, we can determine how many distinct codes there are(as we did above), match them with their parties once and change them all within the data (as is we will do below).

Information about party codes here: https://voteview.com/articles/data_help_parties

congress <- congress|>
  mutate(party_code = recode(party_code, "200" = "Republican",
                             "100" = "Democrat",
                             "329" = "Independent Democrat",
                             "370" = "Progressive Party",
                             "537" = "Farmer-Labor Party",
                             "328" = "Independent",
                             "380" = "Socialist Party",
                             "112" = "Conservative Party",
                             "356" = "Union Labor Party",
                             "522" = "American Labor Party",
                             "331" = "Independent Republican",
                             "523" = "American Labor Party (La Guardia)",
                             "347" = "Prohibitionist Party",
                             "402" = "Liberal Party"))
congress

## # A tibble: 29,120 × 10
##    congress start_date chamber state_abbrev party_code bioname        birthday  
##       <dbl> <date>     <chr>   <chr>        <chr>      <chr>          <date>    
##  1       82 1951-01-03 House   ND           Republican AANDAHL, Fred… 1897-04-09
##  2       80 1947-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
##  3       81 1949-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
##  4       82 1951-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
##  5       83 1953-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
##  6       84 1955-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
##  7       85 1957-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
##  8       86 1959-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
##  9       87 1961-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
## 10       88 1963-01-03 House   VA           Democrat   ABBITT, Watki… 1908-05-21
## # ℹ 29,110 more rows
## # ℹ 3 more variables: age_days <dbl>, age_years <dbl>, generation <chr>

The new table presents their parties without the need to look elsewhere. Now that the data is no longer in code, we can also change the title of the column.

congress <- congress|>
  rename(party = party_code)
congress

## # A tibble: 29,120 × 10
##    congress start_date chamber state_abbrev party    bioname birthday   age_days
##       <dbl> <date>     <chr>   <chr>        <chr>    <chr>   <date>        <dbl>
##  1       82 1951-01-03 House   ND           Republi… AANDAH… 1897-04-09    19626
##  2       80 1947-01-03 House   VA           Democrat ABBITT… 1908-05-21    14106
##  3       81 1949-01-03 House   VA           Democrat ABBITT… 1908-05-21    14837
##  4       82 1951-01-03 House   VA           Democrat ABBITT… 1908-05-21    15567
##  5       83 1953-01-03 House   VA           Democrat ABBITT… 1908-05-21    16298
##  6       84 1955-01-03 House   VA           Democrat ABBITT… 1908-05-21    17028
##  7       85 1957-01-03 House   VA           Democrat ABBITT… 1908-05-21    17759
##  8       86 1959-01-03 House   VA           Democrat ABBITT… 1908-05-21    18489
##  9       87 1961-01-03 House   VA           Democrat ABBITT… 1908-05-21    19220
## 10       88 1963-01-03 House   VA           Democrat ABBITT… 1908-05-21    19950
## # ℹ 29,110 more rows
## # ℹ 2 more variables: age_years <dbl>, generation <chr>

Now we have a much more concise set of data that is easy to navigate.

Conclusions

It is clear that the Senate is currently much older than it has ever been and I believe we can expand our view of the age of our public servants by looking at presidential ages. Joe Biden and Donald Trump are our two oldest presidents, and our most recent - perhaps their staff reflect a similar phenomenon of being, on average, older. I also think that it could be interesting to look at local politicians to see if they skew older or younger on average.

Data 607 Week 1 Assignment

Samuel Crummett

2025-02-03