I’m a politics junkie, and I’ve been following Nate Silver’s political writing and forecasts since the spring of 2008. While I am a big fan of 538’s analysis and commentary, I often find myself wishing they would take their analysis further. This article by Geoffrey Skelley is a great example. It breaks down the age of each member of each Congress and then shows the change in the median age of the House and Senate over time. Reading it, I was left wondering whether one party’s representatives are meaningfully older than the other’s, and how those numbers have looked historically. Fortunately, I have the tools to find out!
In this code block I import the csv from fivethirtyeight’s github repository and create a data frame from it.
datasource =
"https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-demographics/data_aging_congress.csv"
age_data <- read.csv(url(datasource))
In this code block, I take only the subset of data I’m interested in and clean it up. I limit the columns to the Congress the member served in, the state they represented, the party they were a member of, and their age. I also rename the columns to make the names more simple. Since the party codes are not necessarily informative and I’m only really interested in Democrats and Republicans, I overwrite the data in the “party” column with “D” for Democrats, “R” for Republicans, and “I” for everyone else.
useful_age_data <- subset(age_data, select = c("congress","state_abbrev","party_code","age_years"))
colnames(useful_age_data) <- c("congress","state","party","age")
useful_age_data["party"][useful_age_data["party"] == 100 | useful_age_data["party"] == 329] <- "D"
useful_age_data["party"][useful_age_data["party"] == 200 | useful_age_data["party"] == 331] <- "R"
useful_age_data["party"][useful_age_data["party"] != "D" & useful_age_data["party"] != "R"] <- "I"
In this code block, I create the data frame that will be used to generate the graph of median party age over time. This requires one row for each Congress and, because I would prefer for the x-axis to be the year (meaningful to most readers) instead of the Congress (meaningful to a small subset of readers), I need a column for year as well. Then I rename the columns and put the rows in chronological order.
unique_congress <- unique(useful_age_data$congress)
congress_year <- 1919 + 2 * (unique_congress - 66)
party_age_over_time <- data.frame(unique_congress, congress_year)
colnames(party_age_over_time) <- c("congress", "year")
party_age_over_time <- arrange(party_age_over_time, congress)
I’m sure there’s a better way to do this, but in this code block I’m calculating the median age for each party in each Congress. The way I came up with to do this was to create a new data frame and overwrite the data in that frame with only the members of that Congress from one party, then calculate the median of the “age” column and store it in the row for that Congress, and then overwrite the entire frame again in the next iteration of the for loop, and to do this once for each party to create the two columns. I’m continuing to research alternative methods of performing this process and am eager to streamline it.
party_median <- data.frame()
for (i in 1:nrow(party_age_over_time)) {
partymedian <- subset(useful_age_data, congress == party_age_over_time$congress[i] & party == "R", select = c("congress","state","party","age"))
party_age_over_time$republicans[i] <- median(partymedian$age)
}
for (i in 1:nrow(party_age_over_time)) {
partymedian <- subset(useful_age_data, congress == party_age_over_time$congress[i] & party == "D", select = c("congress","state","party","age"))
party_age_over_time$democrats[i] <- median(partymedian$age)
}
In this code block I use the kable package to display the table of data that has been created.
library(knitr)
kable(party_age_over_time, col.names = c("Congress", "Year", "Republicans", "Democrats"), format = "pipe", digits = 2, caption = "Median Delegation Age by Party", align = "c")
Congress | Year | Republicans | Democrats |
---|---|---|---|
66 | 1919 | 51.43 | 49.90 |
67 | 1921 | 52.05 | 50.78 |
68 | 1923 | 53.05 | 50.63 |
69 | 1925 | 54.46 | 52.06 |
70 | 1927 | 55.40 | 52.74 |
71 | 1929 | 55.95 | 53.88 |
72 | 1931 | 56.31 | 54.53 |
73 | 1933 | 55.43 | 53.40 |
74 | 1935 | 54.78 | 52.38 |
75 | 1937 | 54.95 | 51.86 |
76 | 1939 | 51.36 | 51.24 |
77 | 1941 | 52.33 | 50.80 |
78 | 1943 | 52.53 | 51.71 |
79 | 1945 | 53.31 | 51.97 |
80 | 1947 | 52.43 | 51.62 |
81 | 1949 | 54.39 | 50.60 |
82 | 1951 | 54.09 | 51.03 |
83 | 1953 | 54.69 | 51.47 |
84 | 1955 | 55.43 | 51.15 |
85 | 1957 | 56.36 | 52.87 |
86 | 1959 | 55.53 | 51.32 |
87 | 1961 | 54.95 | 52.64 |
88 | 1963 | 53.26 | 52.71 |
89 | 1965 | 50.98 | 51.38 |
90 | 1967 | 49.95 | 52.46 |
91 | 1969 | 51.45 | 53.21 |
92 | 1971 | 51.18 | 52.99 |
93 | 1973 | 50.48 | 52.83 |
94 | 1975 | 50.57 | 50.87 |
95 | 1977 | 50.08 | 50.43 |
96 | 1979 | 49.65 | 50.00 |
97 | 1981 | 48.95 | 49.46 |
98 | 1983 | 50.43 | 49.37 |
99 | 1985 | 51.61 | 50.16 |
100 | 1987 | 51.85 | 49.66 |
101 | 1989 | 52.87 | 50.63 |
102 | 1991 | 54.56 | 52.07 |
103 | 1993 | 52.58 | 51.56 |
104 | 1995 | 51.56 | 52.79 |
105 | 1997 | 51.54 | 53.25 |
106 | 1999 | 53.08 | 53.66 |
107 | 2001 | 53.68 | 55.26 |
108 | 2003 | 54.26 | 56.19 |
109 | 2005 | 55.29 | 57.88 |
110 | 2007 | 56.34 | 58.45 |
111 | 2009 | 56.55 | 59.10 |
112 | 2011 | 55.32 | 60.76 |
113 | 2013 | 55.96 | 60.34 |
114 | 2015 | 55.92 | 61.58 |
115 | 2017 | 56.98 | 62.12 |
116 | 2019 | 58.01 | 60.22 |
117 | 2021 | 58.34 | 61.12 |
118 | 2023 | 58.69 | 60.06 |
In this code block I create a line graph of the median ages of Republicans and Democrats in Congress over time. Obviously, the Democrats should be a blue line and the Republicans should be a red line.
library(ggplot2)
ggplot() + geom_line(data = party_age_over_time, aes(x = year, y = democrats), color = "blue") + geom_line(data = party_age_over_time, aes(x = year, y = republicans), color = "red") + xlab("Year") + ylab("Median Age") + scale_x_continuous(n.breaks = 13) + ggtitle("Median Age of Democrats and Republicans in Congress")
It appears that the median age in both parties fluctuated between 51 and 56 from the time this data begins (after the midterm elections of 1918) until the Presidential election in 1996, with the Republican numbers varying a little bit more than Democratic numbers and being a little higher overall. Since 1996, both parties have experienced a sharp trend upward, eventually increasing beyond their historical norms in the mid-aughts. The Democrats had their trend start about 10 years earlier than the Republicans, and because of their head start have had a higher median age for about 30 years, although the Republicans have narrowed the gap in the last few cycles.
Another analysis that I’d be interested in seeing would be to plot for each State the current median age of its Senators and Representatives against its historical median age. Seeing which States lie above and below the line y = x on that graph would be a way of seeing which state delegations are unusually young or old.