HW Assignment 1

R Markdown Introduction

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

In this R Markdown Document I am loading the data of the ages of congress members from modern times all the way back to the 1800s The article I got this data from was Congress Today Is Older Than It’s Ever Been By Geoffrey Skelley.

Find the original article here: https://fivethirtyeight.com/features/aging-congress-boomers/ raw data csv: https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-demographics/data_aging_congress.csv

In this article Geoffrey is arguing that boomers are the oldest members in congress since the past couple century. I picked this article because I believe this goes with the global issue we are facing today of an ageing population, declining birth rates, and increase in cost of living. This is a topic I have been following closely and following whats been happening from country to country that are facing this issue. There is also the case that there is a lot of money to be made from the insider trading and lobbying when being a part of congress. A final point I would like to mention is that we increased life expectancy due to vaccines, nursing, hygeine, and antibiotics. So it’s no wonder the baby boomer generation account for half the members of congress.

Skelly in the articles talks about how half of congress being boomers can affect policy. For example, he mentions that the median age for house of representative members is 59 years old while the median age for senators is 65 years old. I would say the reason for that age difference between house members and senators is because its a lot harder to stay as a house member than a senator. A senators term is 6 years, while a house term is 2 years. Therefore a person who is in the house of representative is constantly campaigning throughout their term. Besides my conclusion, Skelly argues that these members of congress would pass policy that reflect based on what their age group since the older population tend to vote more than the younger population.

The author mentioned that

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/congress-demographics/data_aging_congress.csv"

congress_age_Data <- read.csv(url)

#congress_age_Data

Conclusion

Here I omitted all the columns except for generation and age_years. The other columns such as the state_abbrev”, “party_code”, “bioname”, “bioguide_id” are redunant information to look at the data for the congress people’s ages. What I wanted to here was take the count of all members of congress by generation. Some generations pre 50 states of America I want to omit because the United States was a much smaller country then and didn’t have as high of a population count. So we wont have as much members in the house. the only generations I would keep as part of my data are boomers, Gen x, Gen Z, Millennial, Missionary, and Silent generation. These are a better comparison to see what happened in the last century. I filtered data as well through the data from the years 2000-2020. My findings for the filtered data and unfiltered data showed really no change in the average age. One thing I can do better is group by the name of the members of congress and take their median age during their stay at congress. But, I don’t think this data really truly represents the declining birth rate and the aging population. The author of the article mentioned that members of congress were always older by default. Another problem is that millennials and Gen Z have not reached their 50’s or 60’s yet. A final problem I would like to add is that there is also the fact that the incumbent who is running has more experience, better name recognition among the voters giving that person a more likely chance of winning reelection. We can see that with the Presidency since the time Washington was in office that it was very rare for a president to lose the election on his second term in office. There is also the case of a higher life expectancy. I would like to do more analysis on the data as the author did as soon as I learn more how to work with data like this

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.3.3

group_by_generation <- congress_age_Data %>%
  select(generation, age_years) %>%
  group_by(generation) %>%
  summarize(
    average_age = mean(age_years, na.rm = TRUE),
    count = n()  
  )

print(group_by_generation)

## # A tibble: 10 × 3
##    generation  average_age count
##    <chr>             <dbl> <int>
##  1 Boomers            53.1  5108
##  2 Gen X              44.8  1130
##  3 Gen Z              26.0     1
##  4 Gilded             82.6    15
##  5 Greatest           52.1  7147
##  6 Lost               53.4  4732
##  7 Millennial         36.2   133
##  8 Missionary         57.9  4768
##  9 Progressive        68.1   485
## 10 Silent             54.0  5601

filtered_data <- congress_age_Data %>%
  filter(start_date >= as.Date("2000-01-01") & start_date <= as.Date("2024-12-31"))


group_by_generation1 <- filtered_data %>%
  select(generation, age_years) %>%
  group_by(generation) %>%
  summarize(
    average_age = mean(age_years, na.rm = TRUE),
    count = n()  
  )

print(group_by_generation1)

## # A tibble: 6 × 3
##   generation average_age count
##   <chr>            <dbl> <int>
## 1 Boomers           57.3  3762
## 2 Gen X             45.0  1114
## 3 Gen Z             26.0     1
## 4 Greatest          80.6    73
## 5 Millennial        36.2   133
## 6 Silent            68.1  1466

congress_age_Data$start_date <- as.Date(congress_age_Data$start_date, format = "%Y-%m-%d")
congress_age_Data$year <- format(congress_age_Data$start_date, "%Y")

congress_avg_age <- congress_age_Data %>%
  group_by(year) %>%
  summarize(avg_age = mean(age_years, na.rm = TRUE))

congress_avg_age

## # A tibble: 53 × 2
##    year  avg_age
##    <chr>   <dbl>
##  1 1919     51.7
##  2 1921     52.6
##  3 1923     52.6
##  4 1925     53.2
##  5 1927     54.0
##  6 1929     54.6
##  7 1931     54.4
##  8 1933     53.5
##  9 1935     52.6
## 10 1937     52.5
## # ℹ 43 more rows

#ggplot(data = congress_avg_age, aes(x = year, y = avg_age)) +
#  geom_line()

I would like to add a data set that shows the total births in the US. I am interested in comparing this dataset with the age of congress in the future

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.3.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ✔ readr     2.1.5     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(openintro)

## Warning: package 'openintro' was built under R version 4.3.3

## Loading required package: airports

## Warning: package 'airports' was built under R version 4.3.3

## Loading required package: cherryblossom

## Warning: package 'cherryblossom' was built under R version 4.3.3

## Loading required package: usdata

## Warning: package 'usdata' was built under R version 4.3.3

data('present', package='openintro')

present

## # A tibble: 63 × 3
##     year    boys   girls
##    <dbl>   <dbl>   <dbl>
##  1  1940 1211684 1148715
##  2  1941 1289734 1223693
##  3  1942 1444365 1364631
##  4  1943 1508959 1427901
##  5  1944 1435301 1359499
##  6  1945 1404587 1330869
##  7  1946 1691220 1597452
##  8  1947 1899876 1800064
##  9  1948 1813852 1721216
## 10  1949 1826352 1733177
## # ℹ 53 more rows

present <- present %>%
  mutate(total_births = boys + girls)

ggplot(data = present, aes(x = year, y = total_births)) + 
  geom_line()

HW Assignment 1

Ahmed Hassan

2024-08-31

R Markdown Introduction

Conclusion