Overview

For this weekly assignment I will be taking a look at data related to the ages of Congressional Representatives Historically. This is of particular relevance because Congress today is older than it has ever been.

https://fivethirtyeight.com/features/aging-congress-boomers/

data_aging_congress.csv contains information about the age of every member of the U.S. Senate and House from the 66th Congress (1919-1921) to the 118th Congress (2023-2025). Data is as of March 29, 2023, and is based on all voting members who served in either the Senate or House in each Congress. The data excludes delegates or resident commissioners from non-states. Any member who served in both chambers in the same Congress was assigned to the chamber in which they cast more votes. We began with the 66th Congress because it was the first Congress in which all senators had been directly elected, rather than elected by state legislatures, following the ratification of the 17th Amendment in 1913.

Our goal with this data set is to see if we can gain some insights into any meaningful trends in the historical age of the US Congress. Time-permiting, I will break down which traits are correlated with older congress members.

The code below imports the tidyverse library along with the raw csv data which is posted on 538’s Github page:

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.3     âś” readr     2.1.4
## âś” forcats   1.0.0     âś” stringr   1.5.0
## âś” ggplot2   3.4.3     âś” tibble    3.2.1
## âś” lubridate 1.9.2     âś” tidyr     1.3.0
## âś” purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
urlfile="https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-demographics/data_aging_congress.csv"

CongressionalAge<-read_csv(url(urlfile))
## Rows: 29120 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): chamber, state_abbrev, bioname, bioguide_id, generation
## dbl  (6): congress, party_code, cmltv_cong, cmltv_chamber, age_days, age_years
## date (2): start_date, birthday
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df <- tibble(CongressionalAge)


summary(df)
##     congress        start_date           chamber          state_abbrev      
##  Min.   : 66.00   Min.   :1919-03-04   Length:29120       Length:29120      
##  1st Qu.: 79.00   1st Qu.:1945-01-03   Class :character   Class :character  
##  Median : 92.00   Median :1971-01-03   Mode  :character   Mode  :character  
##  Mean   : 91.88   Mean   :1970-10-18                                        
##  3rd Qu.:105.00   3rd Qu.:1997-01-03                                        
##  Max.   :118.00   Max.   :2023-01-03                                        
##    party_code      bioname          bioguide_id           birthday         
##  Min.   :100.0   Length:29120       Length:29120       Min.   :1835-06-10  
##  1st Qu.:100.0   Class :character   Class :character   1st Qu.:1891-12-21  
##  Median :100.0   Mode  :character   Mode  :character   Median :1918-11-22  
##  Mean   :146.7                                         Mean   :1917-01-24  
##  3rd Qu.:200.0                                         3rd Qu.:1943-05-16  
##  Max.   :537.0                                         Max.   :1997-01-17  
##    cmltv_cong     cmltv_chamber       age_days       age_years    
##  Min.   : 1.000   Min.   : 1.000   Min.   : 8644   Min.   :23.67  
##  1st Qu.: 2.000   1st Qu.: 2.000   1st Qu.:16732   1st Qu.:45.81  
##  Median : 4.000   Median : 4.000   Median :19523   Median :53.45  
##  Mean   : 5.414   Mean   : 5.112   Mean   :19626   Mean   :53.73  
##  3rd Qu.: 8.000   3rd Qu.: 7.000   3rd Qu.:22359   3rd Qu.:61.22  
##  Max.   :30.000   Max.   :30.000   Max.   :35824   Max.   :98.08  
##   generation       
##  Length:29120      
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Modifying the columns

# changing the name of the column 'party_code' to just 'party', changing the party codes to what party they actually correspond to based on the README file. 

names(df)[names(df) == "party_code"] <- "party"

df$party <- ifelse(df$party == 100, "Democrat",
                   ifelse(df$party == 200, "Republican",
                          ifelse(df$party == 328, "Independent", NA)))

glimpse(df)
## Rows: 29,120
## Columns: 13
## $ congress      <dbl> 82, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, …
## $ start_date    <date> 1951-01-03, 1947-01-03, 1949-01-03, 1951-01-03, 1953-01…
## $ chamber       <chr> "House", "House", "House", "House", "House", "House", "H…
## $ state_abbrev  <chr> "ND", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "V…
## $ party         <chr> "Republican", "Democrat", "Democrat", "Democrat", "Democ…
## $ bioname       <chr> "AANDAHL, Fred George", "ABBITT, Watkins Moorman", "ABBI…
## $ bioguide_id   <chr> "A000001", "A000002", "A000002", "A000002", "A000002", "…
## $ birthday      <date> 1897-04-09, 1908-05-21, 1908-05-21, 1908-05-21, 1908-05…
## $ cmltv_cong    <dbl> 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1, 2, 3, 4…
## $ cmltv_chamber <dbl> 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1, 2, 3, 4…
## $ age_days      <dbl> 19626, 14106, 14837, 15567, 16298, 17028, 17759, 18489, …
## $ age_years     <dbl> 53.73306, 38.62012, 40.62149, 42.62012, 44.62149, 46.620…
## $ generation    <chr> "Lost", "Greatest", "Greatest", "Greatest", "Greatest", …

Attempting to Plot

My goal here is to see whether or not we can establish a trend in the age of Congressmembers historically. Using ggplot I have managed to separate all the ages of Congressmembers and sort them by the Congress (the groups of Congressmembers that served together).

ggplot(data = df, aes(x = congress, y = age_years)) +
  geom_jitter(width = 0.1, height = 0, size = 2, alpha = 0.6, color = "darkblue") +
  xlab("Congress") +
  ylab("Age") +
  theme_minimal()

The above plot is not the most practical, so I am going to try a box and whisker plot instead:

# Create a single graph with multiple box and whisker plots
ggplot(data = df, aes(x = factor(congress), y = age_years, group = congress)) +
  geom_boxplot(width = 0.5, fill = "lightblue", color = "darkblue", alpha = 0.6) +
  xlab("Congress") +
  ylab("Age") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 6))

# Group by congress and calculate mean and median age
sortedCongress <- df %>%
  group_by(congress) %>%
  summarize(avg_age = mean(age_years, na.rm = TRUE),
            median_age = median(age_years, na.rm = TRUE))

# Plotting the average and median age
ggplot(sortedCongress, aes(x = congress)) +
  geom_point(aes(y = avg_age, color = "Average Age")) +
  geom_point(aes(y = median_age, color = "Median Age")) +
  xlab("Congress") +
  ylab("Age") +
  scale_color_manual(values = c("blue", "red"), guide = guide_legend(title = NULL)) +
  theme_minimal()

# Create a line graph for average and median age
ggplot(sortedCongress, aes(x = congress)) +
  geom_line(aes(y = avg_age, color = "Average Age")) +
  geom_line(aes(y = median_age, color = "Median Age")) +
  xlab("Congress") +
  ylab("Age") +
  scale_color_manual(values = c("blue", "red"), guide = guide_legend(title = NULL)) +
  theme_minimal()

# Display the summary of sortedCongress
glimpse(sortedCongress)
## Rows: 53
## Columns: 3
## $ congress   <dbl> 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,…
## $ avg_age    <dbl> 51.73002, 52.60360, 52.56625, 53.22577, 53.95525, 54.61430,…
## $ median_age <dbl> 51.02533, 51.76044, 52.22450, 53.25530, 54.29158, 54.99521,…

The use of the plots reveals several key bits of information. The current Congress stands out as having the highest average age in the modern era, with Senators being elected directly. This seems to be the continuation of a trend established around the time of the 97th Congress, which serves as the minimum for the data set. The average age of Congress members has risen steadily since then to its current value.

Findings and Recommendations

With the code in this document we have been able to look at the trend in Congressional ages over time and determine that the current Congress is in fact that the oldest under the current scheme of Congressional Data. Below I take a look at how individual states and Generations have contributed to this trend over time. In the future, I would like to create separate trendlines for both of the parties in order to evaluate if there are historical preferences for older candidates that are distinguishable between the two major parties.

I would want to split the dataset up based on the party affiliation, state and generation data available in the CSV.

I was able to get some meaningful data by state and generation, however I would need to modify the code below further in order to effectively break the data down based on historical affiliation.

# Filter data for Democrat and Republican parties
democrat_df <- df %>%
  filter(party == "Democrat")

republican_df <- df %>%
  filter (party == "Republican")
  
# Sort congress by party
DemocratsortedCongress <- democrat_df %>%
  group_by(congress) %>%
  summarize(avg_age = mean(age_years, na.rm = TRUE),
            median_age = median(age_years, na.rm = TRUE))

RepublicasortedCongress <- republican_df %>%
  group_by(congress) %>%
  summarize(avg_age = mean(age_years, na.rm =TRUE), 
            median_age = median(age_years, na.rm = TRUE))


# Create a line graph for average and median age
ggplot(DemocratsortedCongress, aes(x = congress)) +
  geom_line(aes(y = avg_age, color = "Average Age")) +
  geom_line(aes(y = median_age, color = "Median Age")) +
  xlab("Congress") +
  ylab("Age") +
  scale_color_manual(values = c("blue", "red"), guide = guide_legend(title = NULL)) +
  theme_minimal()

Now taking a look at it by state, we can see the states sorted based on their historical contribution to the age of congress, based on the average age of the congressmen elected from that state.

##    state_abbrev  avg_age
## 46           VT 60.62154
## 11           HI 59.18275
## 1            AK 59.04962
## 50           WY 58.32722
## 27           NC 56.56834
## 28           ND 56.29585
## 45           VA 56.25488
## 49           WV 55.99294
## 29           NE 55.90023
## 8            DE 55.68293
## 44           UT 55.63519
## 33           NV 55.17901
## 12           IA 55.06993
## 4            AZ 54.87047
## 17           KY 54.78432
## 14           IL 54.77631
## 26           MT 54.77056
## 20           MD 54.49438
## 6            CO 54.40649
## 5            CA 54.40593
## 37           OR 54.39561
## 16           KS 54.35446
## 40           SC 54.31028
## 31           NJ 54.12764
## 19           MA 53.96023
## 38           PA 53.92870
## 10           GA 53.86750
## 13           ID 53.69074
## 43           TX 53.67426
## 9            FL 53.55584
## 39           RI 53.49647
## 2            AL 53.26870
## 32           NM 53.19441
## 24           MO 53.17416
## 35           OH 53.16196
## 30           NH 52.86027
## 34           NY 52.79494
## 21           ME 52.72298
## 42           TN 52.71196
## 22           MI 52.69646
## 25           MS 52.68117
## 3            AR 52.41892
## 47           WA 52.40123
## 7            CT 51.66845
## 48           WI 51.59190
## 41           SD 51.45681
## 18           LA 50.94785
## 23           MN 50.90118
## 15           IN 50.62130
## 36           OK 50.12730

I wanted to take a look at it by Generation. The general pattern seems to be that older generation’s have an older average age.

# Calculate average age for each generation
generation_avg_age <- aggregate(age_years ~ generation, data = df, FUN = mean, na.rm = TRUE)

# Calculate count of congressmen for each generation
generation_count <- aggregate(age_years ~ generation, data = df, FUN = length)

# Merge the two aggregated data frames
generation_summary <- merge(generation_avg_age, generation_count, by = "generation")

# Rename the columns in the result
colnames(generation_summary) <- c("generation", "avg_age", "num_congressmen")

# Sort the data frame by average age in descending order
generation_summary <- generation_summary[order(-generation_summary$avg_age), ]

# Print the sorted data frame
generation_summary
##     generation  avg_age num_congressmen
## 4       Gilded 82.60151              15
## 9  Progressive 68.11945             485
## 8   Missionary 57.88836            4768
## 10      Silent 54.02800            5601
## 6         Lost 53.36611            4732
## 1      Boomers 53.11984            5108
## 5     Greatest 52.10814            7147
## 2        Gen X 44.84130            1130
## 7   Millennial 36.18125             133
## 3        Gen Z 25.96030               1
# Doing the same thing, but looking at the number of cumulative terms for each generation, as opposed to the age of the person. 


# Calculate average cumulative number of terms that congressmen were elected to  for each generation
generation_cmltv_cong <- aggregate(cmltv_cong ~ generation, data = df, FUN = mean, na.rm = TRUE)

# Calculate count of congressmen for each generation
generation_count <- aggregate(cmltv_cong ~ generation, data = df, FUN = length)

# Merge the two aggregated data frames
generation_summary_cmltv_cong <- merge(generation_cmltv_cong, generation_count, by = "generation")

# Rename the columns in the result
colnames(generation_summary_cmltv_cong) <- c("generation", "cmltv_cong", "num_congressmen")

# Sort the data frame by average age in descending order
generation_summary_cmltv_cong <- generation_summary_cmltv_cong[order(-generation_summary_cmltv_cong$cmltv_cong), ]

# Print the sorted data frame
generation_summary_cmltv_cong
##     generation cmltv_cong num_congressmen
## 4       Gilded  10.933333              15
## 9  Progressive   7.340206             485
## 10      Silent   6.105517            5601
## 5     Greatest   5.726459            7147
## 8   Missionary   5.379195            4768
## 6         Lost   5.114117            4732
## 1      Boomers   4.925020            5108
## 2        Gen X   3.154867            1130
## 7   Millennial   1.721805             133
## 3        Gen Z   1.000000               1
install.packages( "qt")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
## Warning: package 'qt' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages