Data 607 Week 1 Assignment

Introduction:

Congress is the oldest that it has ever been. This can be explained by an aging general population and a high senior voter turnout. The bulk of congress is currently made up of baby boomers, with 48% of congress being represented by them. In 2001, the median age of booomers in office was 49. Today their median age is 66. This is concerning, as older politicians may not have experience with issues regarding technology, or things that are important to younger Americans, such as climate change.

The article can be accessed here: https://fivethirtyeight.com/features/aging-congress-boomers/

library(readr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Load Data

First the raw data must be retrieved from the github repo and read.

data <- read_csv("https://raw.githubusercontent.com/sphill12/DATA607/main/Congress_Age_Data.csv")

## Rows: 29120 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (5): chamber, state_abbrev, bioname, bioguide_id, generation
## dbl  (6): congress, party_code, cmltv_cong, cmltv_chamber, age_days, age_years
## date (2): start_date, birthday
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Subsetting the Data

Several columns were not all that pertinent to how I would be analyzing this dataset. I opted to remove the names of the politicians, and their bioguide ID (number attached to each politician), as well as the cmltv_chamber (provides a distiction between years spent in the house and senate), and the age in days of the politicians.

Most of the column names were self explanatory, but I changed the “cmltv_cong” column to be slightly more readable as “congresses_served” and “state_abbrev” to “state_abbreviation”.

final_data <- data[,c("congress","start_date","chamber","state_abbrev","cmltv_cong", "age_years","generation")]

final_data <- final_data %>% rename("congresses_served" = "cmltv_cong","state_abbreviation" = "state_abbrev",)

final_data

## # A tibble: 29,120 × 7
##    congress start_date chamber state_abbreviation congresses_served age_years
##       <dbl> <date>     <chr>   <chr>                          <dbl>     <dbl>
##  1       82 1951-01-03 House   ND                                 1      53.7
##  2       80 1947-01-03 House   VA                                 1      38.6
##  3       81 1949-01-03 House   VA                                 2      40.6
##  4       82 1951-01-03 House   VA                                 3      42.6
##  5       83 1953-01-03 House   VA                                 4      44.6
##  6       84 1955-01-03 House   VA                                 5      46.6
##  7       85 1957-01-03 House   VA                                 6      48.6
##  8       86 1959-01-03 House   VA                                 7      50.6
##  9       87 1961-01-03 House   VA                                 8      52.6
## 10       88 1963-01-03 House   VA                                 9      54.6
## # ℹ 29,110 more rows
## # ℹ 1 more variable: generation <chr>

glimpse(final_data)

## Rows: 29,120
## Columns: 7
## $ congress           <dbl> 82, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,…
## $ start_date         <date> 1951-01-03, 1947-01-03, 1949-01-03, 1951-01-03, 19…
## $ chamber            <chr> "House", "House", "House", "House", "House", "House…
## $ state_abbreviation <chr> "ND", "VA", "VA", "VA", "VA", "VA", "VA", "VA", "VA…
## $ congresses_served  <dbl> 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1, 2,…
## $ age_years          <dbl> 53.73306, 38.62012, 40.62149, 42.62012, 44.62149, 4…
## $ generation         <chr> "Lost", "Greatest", "Greatest", "Greatest", "Greate…

Exploratory Analysis Example

I wanted to show what some exploratory analysis that I would potentially do on this dataset would look like. First I subsetted my dataset to be from 2000 onward, as I think this more interesting and relevant than the entire history. Next, I grouped by state, and calculated the mean age of politicians for these groupings.

I put this data into a bar plot, for easy trend identification. If I was doing a more in depth analysis, I would determine the most recent/most popular political party of each of the states and assign the bar colors based on this.

state_data <-subset(final_data, start_date >= "2000-01-01") %>% group_by(state_abbreviation) %>% summarise(age = mean(age_years))

ggplot(data = state_data, aes(x = state_abbreviation, y = age)) + geom_bar(stat = "identity", aes(fill = "red")) + theme(axis.text.x = element_text(angle = 90,hjust = 1, vjust = 0.3),legend.position = "none") + ggtitle("Average Politician Age Per State since 2000") + xlab("State") + ylab("Age")

Conclusion:

The article provided a good summary of general age trends in congress over time, but there are other details that could be explored. There was no mention of age differences across party lines, or if certain states had older average politicians. It would be interesting to see if the democratic party which generally favors younger people had a younger average politician age. As an example of exploratory analysis that could be done, I plotted the average age of politicians for each state, since the year 2000.

Links:

Github repository: https://github.com/sphill12/DATA607