We are often tasked with taking data in one form and transforming it for easier downstream analysis. We will spend several weeks in this course on tidying and transformation operations. Some of this work could be done in SQL or R (or Python or…). Here, you are asked to use R—you may use any base functions or packages as you like. Your task is to first choose one of the provided datasets on fivethirtyeight.com that you find interesting: https://data.fivethirtyeight.com/ You should first study the data and any other information on the GitHub site, and read the associated fivethirtyeight.com article. To receive full credit, you should: 1. Take the data, and create one or more code blocks. You should finish with a data frame that contains a subset of the columns in your selected dataset. If there is an obvious target (aka predictor or independent) variable, you should include this in your set of columns. You should include (or add if necessary) meaningful column names and replace (if necessary) any non-intuitive abbreviations used in the data that you selected. For example, if you had instead been tasked with working with the UCI mushroom dataset, you would include the target column for edible or poisonous, and transform “e” values to “edible.” Your deliverable is the R code to perform these transformation tasks. 2. Make sure that the original data file is accessible through your code—for example, stored in a GitHub repository or AWS S3 bucket and referenced in your code. If the code references data on your local machine, then your work is not reproducible! 3. Start your R Markdown document with a two to three sentence “Overview” or “Introduction” description of what the article that you chose is about, and include a link to the article. 4. Finish with a “Conclusions” or “Findings and Recommendations” text block that includes what you might do to extend, verify, or update the work from the selected article. 5. Each of your text blocks should minimally include at least one header, and additional non-header text. 6. You’re of course welcome—but not required–to include additional information, such as exploratory data analysis graphics (which we will cover later in the course). 7. Place your solution into a single R Markdown (.Rmd) file and publish your solution out to rpubs.com. 8. Post the .Rmd file in your GitHub repository, and provide the appropriate URLs to your GitHub repository and your rpubs.com file in your assignment link.
The article “Congress Today Is Older Than It’s Ever Been” by Geoffrey Skelley explores the significant aging trend within Congress, noting that it is currently older than ever before, with a median age of 59 years for the 118th Congress. The median age of senators is 65, and representatives stand at 58. The shift towards an older Congress affects policy attention, potentially sidelining younger concerns like climate change and affordable housing. Historical data shows a rising median age over the decades, influenced by longer life expectancies and political factors. However, as baby boomers eventually retire, there may be a gradual shift in Congress’s age composition with increased representation from younger generations like Gen X, millennials, and Gen Z, potentially leading to a more balanced legislative focus.
Article: https://fivethirtyeight.com/features/aging-congress-boomers/ Datasource: https://data.fivethirtyeight.com/
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
congress_age_data <- read.csv('https://raw.githubusercontent.com/ErickH1/DATA607Assignment1/main/congress-demographics/data_aging_congress.csv')
Subsetting dataframe without extra information and rounding the age_years to the nearest whole
congress_age_subset <- subset(congress_age_data, congress == "118", select = c('chamber','party_code','age_years','generation'))
congress_age_subset$age_years <- round(congress_age_subset$age_years, digits = 0)
Subsetting based on position
congress_age_house <- subset(congress_age_subset, chamber == "House", select = c('chamber','party_code','age_years','generation'))
congress_age_subset$age_years <- round(congress_age_subset$age_years, digits = 0)
congress_age_senate <- subset(congress_age_subset, chamber == "Senate", select = c('chamber','party_code','age_years','generation'))
congress_age_subset$age_years <- round(congress_age_subset$age_years, digits = 0)
Subsetting based on party
congress_age_dem <- subset(congress_age_subset, party_code == "100", select = c('chamber','party_code','age_years','generation'))
congress_age_subset$age_years <- round(congress_age_subset$age_years, digits = 0)
congress_age_rep <- subset(congress_age_subset, party_code == "200", select = c('chamber','party_code','age_years','generation'))
congress_age_subset$age_years <- round(congress_age_subset$age_years, digits = 0)
congress_age_ind <- subset(congress_age_subset, party_code == "328", select = c('chamber','party_code','age_years','generation'))
congress_age_subset$age_years <- round(congress_age_subset$age_years, digits = 0)
Summary of the dataset
glimpse(congress_age_subset)
## Rows: 536
## Columns: 4
## $ chamber <chr> "House", "House", "House", "House", "House", "House", "Hous…
## $ party_code <int> 200, 100, 200, 100, 100, 200, 200, 100, 200, 200, 100, 100,…
## $ age_years <dbl> 57, 35, 65, 77, 44, 71, 51, 40, 46, 59, 76, 74, 71, 49, 36,…
## $ generation <chr> "Gen X", "Millennial", "Boomers", "Boomers", "Gen X", "Boom…
Summary of the dataset mean age is 59 years
summary(congress_age_subset)
## chamber party_code age_years generation
## Length:536 Min. :100 Min. :26.00 Length:536
## Class :character 1st Qu.:100 1st Qu.:49.00 Class :character
## Mode :character Median :200 Median :59.00 Mode :character
## Mean :152 Mean :58.56
## 3rd Qu.:200 3rd Qu.:68.00
## Max. :328 Max. :90.00
Summary of subset based on party
summary(congress_age_dem)
## chamber party_code age_years generation
## Length:261 Min. :100 Min. :26.0 Length:261
## Class :character 1st Qu.:100 1st Qu.:49.0 Class :character
## Mode :character Median :100 Median :60.0 Mode :character
## Mean :100 Mean :59.4
## 3rd Qu.:100 3rd Qu.:70.0
## Max. :100 Max. :90.0
summary(congress_age_rep)
## chamber party_code age_years generation
## Length:272 Min. :200 Min. :34.00 Length:272
## Class :character 1st Qu.:200 1st Qu.:49.00 Class :character
## Mode :character Median :200 Median :59.00 Mode :character
## Mean :200 Mean :57.64
## 3rd Qu.:200 3rd Qu.:66.00
## Max. :200 Max. :89.00
summary(congress_age_ind)
## chamber party_code age_years generation
## Length:3 Min. :328 Min. :46.00 Length:3
## Class :character 1st Qu.:328 1st Qu.:62.50 Class :character
## Mode :character Median :328 Median :79.00 Mode :character
## Mean :328 Mean :68.67
## 3rd Qu.:328 3rd Qu.:80.00
## Max. :328 Max. :81.00
Summary of subset based in position
summary(congress_age_house)
## chamber party_code age_years generation
## Length:435 Min. :100 Min. :26.00 Length:435
## Class :character 1st Qu.:100 1st Qu.:47.00 Class :character
## Mode :character Median :200 Median :58.00 Mode :character
## Mean :151 Mean :57.31
## 3rd Qu.:200 3rd Qu.:67.00
## Max. :200 Max. :86.00
summary(congress_age_senate)
## chamber party_code age_years generation
## Length:101 Min. :100.0 Min. :36.00 Length:101
## Class :character 1st Qu.:100.0 1st Qu.:56.00 Class :character
## Mode :character Median :200.0 Median :65.00 Mode :character
## Mean :156.3 Mean :63.91
## 3rd Qu.:200.0 3rd Qu.:71.00
## Max. :328.0 Max. :90.00
In this section graphs will visualize the distribution of ages throughout the subsets. Graph of the Generation
ggplot(data = congress_age_subset, aes(x = generation)) +
geom_bar()
Graph of the Age distribution
ggplot(data = congress_age_subset, aes(x = age_years)) +
geom_bar()
Graph of the Age distribution based on House
ggplot(data = congress_age_house, aes(x = age_years)) +
geom_bar()
Graph of the Age distribution based on Senate
ggplot(data = congress_age_senate, aes(x = age_years)) +
geom_bar()
Graph of the Age distribution based on Democrat
ggplot(data = congress_age_dem, aes(x = age_years)) +
geom_bar()
Graph of the Age distribution based on Republican
ggplot(data = congress_age_rep, aes(x = age_years)) +
geom_bar()
Graph of the Age distribution based on Independent
ggplot(data = congress_age_ind, aes(x = age_years)) +
geom_bar()
Analysis was done on finding the median and mean ages of congress members within the 118 congress based on party and position in office. It seems that across the board an overwhelming majority of congress is just under 60 years of age. It would be interesting to do analysis and comparing the average age of congress members throughout at the year and see whether the mean age has been increasing, decreasing, or remaining the same.