Overview

I’m a politics junkie, and I’ve been following Nate Silver’s political writing and forecasts since the spring of 2008. While I am a big fan of 538’s analysis and commentary, I often find myself wishing they would take their analysis further. This article by Geoffrey Skelley is a great example. It breaks down the age of each member of each Congress and then shows the change in the median age of the House and Senate over time. Reading it, I was left wondering whether one party’s representatives are meaningfully older than the other’s, and how those numbers have looked historically. Fortunately, I have the tools to find out!

Initial Data

In this code block I import the csv from fivethirtyeight’s github repository and create a data frame from it.

datasource = 
  "https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-demographics/data_aging_congress.csv"

age_data <- read.csv(url(datasource))

Transforming the Data

In this code block, I take only the subset of data I’m interested in and clean it up. I limit the columns to the Congress the member served in, the state they represented, the party they were a member of, and their age. I also rename the columns to make the names more simple. Since the party codes are not necessarily informative and I’m only really interested in Democrats and Republicans, I overwrite the data in the “party” column with “D” for Democrats, “R” for Republicans, and “I” for everyone else.

useful_age_data <- subset(age_data, select = c("congress","state_abbrev","party_code","age_years"))

colnames(useful_age_data) <- c("congress","state","party","age")

useful_age_data["party"][useful_age_data["party"] == 100 | useful_age_data["party"] == 329] <- "D"
useful_age_data["party"][useful_age_data["party"] == 200 | useful_age_data["party"] == 331] <- "R"
useful_age_data["party"][useful_age_data["party"] != "D" & useful_age_data["party"] != "R"] <- "I"

Creating a New Data Frame

In this code block, I create the data frame that will be used to generate the graph of median party age over time. This requires one row for each Congress and, because I would prefer for the x-axis to be the year (meaningful to most readers) instead of the Congress (meaningful to a small subset of readers), I need a column for year as well. Then I rename the columns and put the rows in chronological order.

unique_congress <- unique(useful_age_data$congress)
congress_year <- 1919 + 2 * (unique_congress - 66)
party_age_over_time <- data.frame(unique_congress, congress_year)
colnames(party_age_over_time) <- c("congress", "year")
party_age_over_time <- arrange(party_age_over_time, congress)

Adding Median Age Data for Each Party

I’m sure there’s a better way to do this, but in this code block I’m calculating the median age for each party in each Congress. The way I came up with to do this was to create a new data frame and overwrite the data in that frame with only the members of that Congress from one party, then calculate the median of the “age” column and store it in the row for that Congress, and then overwrite the entire frame again in the next iteration of the for loop, and to do this once for each party to create the two columns. I’m continuing to research alternative methods of performing this process and am eager to streamline it.

party_median <- data.frame()

for (i in 1:nrow(party_age_over_time)) {
  partymedian <- subset(useful_age_data, congress == party_age_over_time$congress[i] & party == "R", select = c("congress","state","party","age"))
  party_age_over_time$republicans[i] <- median(partymedian$age)
}

for (i in 1:nrow(party_age_over_time)) {
  partymedian <- subset(useful_age_data, congress == party_age_over_time$congress[i] & party == "D", select = c("congress","state","party","age"))
  party_age_over_time$democrats[i] <- median(partymedian$age)
}

Tabular Output

In this code block I use the kable package to display the table of data that has been created.

library(knitr)
kable(party_age_over_time, col.names = c("Congress", "Year", "Republicans", "Democrats"), format = "pipe", digits = 2, caption = "Median Delegation Age by Party", align = "c")
Median Delegation Age by Party
Congress Year Republicans Democrats
66 1919 51.43 49.90
67 1921 52.05 50.78
68 1923 53.05 50.63
69 1925 54.46 52.06
70 1927 55.40 52.74
71 1929 55.95 53.88
72 1931 56.31 54.53
73 1933 55.43 53.40
74 1935 54.78 52.38
75 1937 54.95 51.86
76 1939 51.36 51.24
77 1941 52.33 50.80
78 1943 52.53 51.71
79 1945 53.31 51.97
80 1947 52.43 51.62
81 1949 54.39 50.60
82 1951 54.09 51.03
83 1953 54.69 51.47
84 1955 55.43 51.15
85 1957 56.36 52.87
86 1959 55.53 51.32
87 1961 54.95 52.64
88 1963 53.26 52.71
89 1965 50.98 51.38
90 1967 49.95 52.46
91 1969 51.45 53.21
92 1971 51.18 52.99
93 1973 50.48 52.83
94 1975 50.57 50.87
95 1977 50.08 50.43
96 1979 49.65 50.00
97 1981 48.95 49.46
98 1983 50.43 49.37
99 1985 51.61 50.16
100 1987 51.85 49.66
101 1989 52.87 50.63
102 1991 54.56 52.07
103 1993 52.58 51.56
104 1995 51.56 52.79
105 1997 51.54 53.25
106 1999 53.08 53.66
107 2001 53.68 55.26
108 2003 54.26 56.19
109 2005 55.29 57.88
110 2007 56.34 58.45
111 2009 56.55 59.10
112 2011 55.32 60.76
113 2013 55.96 60.34
114 2015 55.92 61.58
115 2017 56.98 62.12
116 2019 58.01 60.22
117 2021 58.34 61.12
118 2023 58.69 60.06

Data Visualization!

In this code block I create a line graph of the median ages of Republicans and Democrats in Congress over time. Obviously, the Democrats should be a blue line and the Republicans should be a red line.

library(ggplot2)

ggplot() + geom_line(data = party_age_over_time, aes(x = year, y = democrats), color = "blue") + geom_line(data = party_age_over_time, aes(x = year, y = republicans), color = "red") + xlab("Year") + ylab("Median Age") + scale_x_continuous(n.breaks = 13) + ggtitle("Median Age of Democrats and Republicans in Congress")

Findings and Recommendations

It appears that the median age in both parties fluctuated between 51 and 56 from the time this data begins (after the midterm elections of 1918) until the Presidential election in 1996, with the Republican numbers varying a little bit more than Democratic numbers and being a little higher overall. Since 1996, both parties have experienced a sharp trend upward, eventually increasing beyond their historical norms in the mid-aughts. The Democrats had their trend start about 10 years earlier than the Republicans, and because of their head start have had a higher median age for about 30 years, although the Republicans have narrowed the gap in the last few cycles.

Another analysis that I’d be interested in seeing would be to plot for each State the current median age of its Senators and Representatives against its historical median age. Seeing which States lie above and below the line y = x on that graph would be a way of seeing which state delegations are unusually young or old.