The article discusses the increasing age of the U.S. Congress, noting that it is older now than ever before, with a median age of 59 years. It highlights the challenges this aging body faces, particularly in understanding modern technology, and the impact of having older members on legislative priorities, which tend to focus more on issues concerning older Americans. The article also suggests that the age trend in Congress might plateau or decline as younger generations like Gen X and Millennials gradually replace the aging baby boomers.
The link to the article: https://fivethirtyeight.com/features/aging-congress-boomers/
Reading the whole .csv file. The path to the .csv file is from GitHub repository.
The data has 29,120 rows. The output just shows first 10 rows.
file_path<-"https://raw.githubusercontent.com/Natacode819/Data-607-Assignment1/main/data_aging_congress.csv"
datacongress<-read.csv(file_path)
head(datacongress, 10)
## congress start_date chamber state_abbrev party_code bioname
## 1 82 1951-01-03 House ND 200 AANDAHL, Fred George
## 2 80 1947-01-03 House VA 100 ABBITT, Watkins Moorman
## 3 81 1949-01-03 House VA 100 ABBITT, Watkins Moorman
## 4 82 1951-01-03 House VA 100 ABBITT, Watkins Moorman
## 5 83 1953-01-03 House VA 100 ABBITT, Watkins Moorman
## 6 84 1955-01-03 House VA 100 ABBITT, Watkins Moorman
## 7 85 1957-01-03 House VA 100 ABBITT, Watkins Moorman
## 8 86 1959-01-03 House VA 100 ABBITT, Watkins Moorman
## 9 87 1961-01-03 House VA 100 ABBITT, Watkins Moorman
## 10 88 1963-01-03 House VA 100 ABBITT, Watkins Moorman
## bioguide_id birthday cmltv_cong cmltv_chamber age_days age_years
## 1 A000001 1897-04-09 1 1 19626 53.73306
## 2 A000002 1908-05-21 1 1 14106 38.62012
## 3 A000002 1908-05-21 2 2 14837 40.62149
## 4 A000002 1908-05-21 3 3 15567 42.62012
## 5 A000002 1908-05-21 4 4 16298 44.62149
## 6 A000002 1908-05-21 5 5 17028 46.62012
## 7 A000002 1908-05-21 6 6 17759 48.62149
## 8 A000002 1908-05-21 7 7 18489 50.62012
## 9 A000002 1908-05-21 8 8 19220 52.62149
## 10 A000002 1908-05-21 9 9 19950 54.62012
## generation
## 1 Lost
## 2 Greatest
## 3 Greatest
## 4 Greatest
## 5 Greatest
## 6 Greatest
## 7 Greatest
## 8 Greatest
## 9 Greatest
## 10 Greatest
Checking missing values
There are various techniques in statistical programs to represent missing data. R uses NA. I checked whether there are any missing values in the data. I used is.na() function. The output shows that there are no missing data in any columns in the file.
na_counts<-colSums(is.na(datacongress))
print(na_counts)
## congress start_date chamber state_abbrev party_code
## 0 0 0 0 0
## bioname bioguide_id birthday cmltv_cong cmltv_chamber
## 0 0 0 0 0
## age_days age_years generation
## 0 0 0
Another way to check whether there are any missing values using sapply() function.
na_columns<-sapply(datacongress,anyNA)
print(na_columns)
## congress start_date chamber state_abbrev party_code
## FALSE FALSE FALSE FALSE FALSE
## bioname bioguide_id birthday cmltv_cong cmltv_chamber
## FALSE FALSE FALSE FALSE FALSE
## age_days age_years generation
## FALSE FALSE FALSE
Summary function:
Sometimes it is useful to get a basic summary (min, max, median, mean, …) for each variable of the data
summary(datacongress)
## congress start_date chamber state_abbrev
## Min. : 66.00 Length:29120 Length:29120 Length:29120
## 1st Qu.: 79.00 Class :character Class :character Class :character
## Median : 92.00 Mode :character Mode :character Mode :character
## Mean : 91.88
## 3rd Qu.:105.00
## Max. :118.00
## party_code bioname bioguide_id birthday
## Min. :100.0 Length:29120 Length:29120 Length:29120
## 1st Qu.:100.0 Class :character Class :character Class :character
## Median :100.0 Mode :character Mode :character Mode :character
## Mean :146.7
## 3rd Qu.:200.0
## Max. :537.0
## cmltv_cong cmltv_chamber age_days age_years
## Min. : 1.000 Min. : 1.000 Min. : 8644 Min. :23.67
## 1st Qu.: 2.000 1st Qu.: 2.000 1st Qu.:16732 1st Qu.:45.81
## Median : 4.000 Median : 4.000 Median :19523 Median :53.45
## Mean : 5.414 Mean : 5.112 Mean :19626 Mean :53.73
## 3rd Qu.: 8.000 3rd Qu.: 7.000 3rd Qu.:22359 3rd Qu.:61.22
## Max. :30.000 Max. :30.000 Max. :35824 Max. :98.08
## generation
## Length:29120
## Class :character
## Mode :character
##
##
##
Checking the data type
class(datacongress)
## [1] "data.frame"
List of all variables in the data
colnames(datacongress)
## [1] "congress" "start_date" "chamber" "state_abbrev"
## [5] "party_code" "bioname" "bioguide_id" "birthday"
## [9] "cmltv_cong" "cmltv_chamber" "age_days" "age_years"
## [13] "generation"
Subset of data:
All of rows 1 through 20
subset_data1<-datacongress[1:20, ]
subset_data1
## congress start_date chamber state_abbrev party_code bioname
## 1 82 1951-01-03 House ND 200 AANDAHL, Fred George
## 2 80 1947-01-03 House VA 100 ABBITT, Watkins Moorman
## 3 81 1949-01-03 House VA 100 ABBITT, Watkins Moorman
## 4 82 1951-01-03 House VA 100 ABBITT, Watkins Moorman
## 5 83 1953-01-03 House VA 100 ABBITT, Watkins Moorman
## 6 84 1955-01-03 House VA 100 ABBITT, Watkins Moorman
## 7 85 1957-01-03 House VA 100 ABBITT, Watkins Moorman
## 8 86 1959-01-03 House VA 100 ABBITT, Watkins Moorman
## 9 87 1961-01-03 House VA 100 ABBITT, Watkins Moorman
## 10 88 1963-01-03 House VA 100 ABBITT, Watkins Moorman
## 11 89 1965-01-03 House VA 100 ABBITT, Watkins Moorman
## 12 90 1967-01-03 House VA 100 ABBITT, Watkins Moorman
## 13 91 1969-01-03 House VA 100 ABBITT, Watkins Moorman
## 14 92 1971-01-03 House VA 100 ABBITT, Watkins Moorman
## 15 93 1973-01-03 House SD 200 ABDNOR, James
## 16 94 1975-01-03 House SD 200 ABDNOR, James
## 17 95 1977-01-03 House SD 200 ABDNOR, James
## 18 96 1979-01-03 House SD 200 ABDNOR, James
## 19 97 1981-01-03 Senate SD 200 ABDNOR, James
## 20 98 1983-01-03 Senate SD 200 ABDNOR, James
## bioguide_id birthday cmltv_cong cmltv_chamber age_days age_years
## 1 A000001 1897-04-09 1 1 19626 53.73306
## 2 A000002 1908-05-21 1 1 14106 38.62012
## 3 A000002 1908-05-21 2 2 14837 40.62149
## 4 A000002 1908-05-21 3 3 15567 42.62012
## 5 A000002 1908-05-21 4 4 16298 44.62149
## 6 A000002 1908-05-21 5 5 17028 46.62012
## 7 A000002 1908-05-21 6 6 17759 48.62149
## 8 A000002 1908-05-21 7 7 18489 50.62012
## 9 A000002 1908-05-21 8 8 19220 52.62149
## 10 A000002 1908-05-21 9 9 19950 54.62012
## 11 A000002 1908-05-21 10 10 20681 56.62149
## 12 A000002 1908-05-21 11 11 21411 58.62012
## 13 A000002 1908-05-21 12 12 22142 60.62149
## 14 A000002 1908-05-21 13 13 22872 62.62012
## 15 A000009 1923-02-13 1 1 18222 49.88912
## 16 A000009 1923-02-13 2 2 18952 51.88775
## 17 A000009 1923-02-13 3 3 19683 53.88912
## 18 A000009 1923-02-13 4 4 20413 55.88775
## 19 A000009 1923-02-13 5 1 21144 57.88912
## 20 A000009 1923-02-13 6 2 21874 59.88775
## generation
## 1 Lost
## 2 Greatest
## 3 Greatest
## 4 Greatest
## 5 Greatest
## 6 Greatest
## 7 Greatest
## 8 Greatest
## 9 Greatest
## 10 Greatest
## 11 Greatest
## 12 Greatest
## 13 Greatest
## 14 Greatest
## 15 Greatest
## 16 Greatest
## 17 Greatest
## 18 Greatest
## 19 Greatest
## 20 Greatest
Subset of data:
Selected only columns: “congress”,“start_date”,“chamber”, “age_years”. The output is for the first 20 rows
subset_data2<-datacongress[, c("congress","start_date","chamber", "age_years" ) ]
head(subset_data2,20)
## congress start_date chamber age_years
## 1 82 1951-01-03 House 53.73306
## 2 80 1947-01-03 House 38.62012
## 3 81 1949-01-03 House 40.62149
## 4 82 1951-01-03 House 42.62012
## 5 83 1953-01-03 House 44.62149
## 6 84 1955-01-03 House 46.62012
## 7 85 1957-01-03 House 48.62149
## 8 86 1959-01-03 House 50.62012
## 9 87 1961-01-03 House 52.62149
## 10 88 1963-01-03 House 54.62012
## 11 89 1965-01-03 House 56.62149
## 12 90 1967-01-03 House 58.62012
## 13 91 1969-01-03 House 60.62149
## 14 92 1971-01-03 House 62.62012
## 15 93 1973-01-03 House 49.88912
## 16 94 1975-01-03 House 51.88775
## 17 95 1977-01-03 House 53.88912
## 18 96 1979-01-03 House 55.88775
## 19 97 1981-01-03 Senate 57.88912
## 20 98 1983-01-03 Senate 59.88775
Subset of data:
All records where age is > 65
The output is only for the first 20 rows
subset_data3<-datacongress[datacongress$age_years>65,]
head(subset_data3,20)
## congress start_date chamber state_abbrev party_code
## 32 109 2005-01-03 House HI 100
## 33 110 2007-01-03 House HI 100
## 34 111 2009-01-03 House HI 100
## 55 91 1969-01-03 House MS 100
## 56 92 1971-01-03 House MS 100
## 69 71 1929-03-04 House NJ 200
## 70 72 1931-03-04 House NJ 200
## 84 111 2009-01-03 House NY 100
## 85 112 2011-01-03 House NY 100
## 103 77 1941-01-03 Senate CO 100
## 152 71 1929-03-04 House IL 200
## 153 72 1931-03-04 House IL 200
## 163 86 1959-01-03 Senate VT 200
## 164 87 1961-01-03 Senate VT 200
## 165 88 1963-01-03 Senate VT 200
## 166 89 1965-01-03 Senate VT 200
## 167 90 1967-01-03 Senate VT 200
## 168 91 1969-01-03 Senate VT 200
## 169 92 1971-01-03 Senate VT 200
## 170 93 1973-01-03 Senate VT 200
## bioname bioguide_id birthday cmltv_cong cmltv_chamber
## 32 ABERCROMBIE, Neil A000014 1938-06-26 9 9
## 33 ABERCROMBIE, Neil A000014 1938-06-26 10 10
## 34 ABERCROMBIE, Neil A000014 1938-06-26 11 11
## 55 ABERNETHY, Thomas Gerstle A000016 1903-05-16 14 14
## 56 ABERNETHY, Thomas Gerstle A000016 1903-05-16 15 15
## 69 ACKERMAN, Ernest Robinson A000021 1863-06-17 6 6
## 70 ACKERMAN, Ernest Robinson A000021 1863-06-17 7 7
## 84 ACKERMAN, Gary Leonard A000022 1942-11-19 14 14
## 85 ACKERMAN, Gary Leonard A000022 1942-11-19 15 15
## 103 ADAMS, Alva Blanchard A000028 1875-10-29 6 6
## 152 ADKINS, Charles A000057 1863-02-07 3 3
## 153 ADKINS, Charles A000057 1863-02-07 4 4
## 163 AIKEN, George David A000062 1892-08-20 10 10
## 164 AIKEN, George David A000062 1892-08-20 11 11
## 165 AIKEN, George David A000062 1892-08-20 12 12
## 166 AIKEN, George David A000062 1892-08-20 13 13
## 167 AIKEN, George David A000062 1892-08-20 14 14
## 168 AIKEN, George David A000062 1892-08-20 15 15
## 169 AIKEN, George David A000062 1892-08-20 16 16
## 170 AIKEN, George David A000062 1892-08-20 17 17
## age_days age_years generation
## 32 24298 66.52430 Silent
## 33 25028 68.52293 Silent
## 34 25759 70.52430 Silent
## 55 23974 65.63723 Greatest
## 56 24704 67.63587 Greatest
## 69 24001 65.71116 Missionary
## 70 24731 67.70979 Missionary
## 84 24152 66.12457 Silent
## 85 24882 68.12320 Silent
## 103 23807 65.18001 Missionary
## 152 24131 66.06708 Missionary
## 153 24861 68.06571 Missionary
## 163 24241 66.36824 Lost
## 164 24972 68.36961 Lost
## 165 25702 70.36824 Lost
## 166 26433 72.36961 Lost
## 167 27163 74.36824 Lost
## 168 27894 76.36961 Lost
## 169 28624 78.36824 Lost
## 170 29355 80.36961 Lost
Grooping of records based on “bioname” column
Since the same name listed multiple times, I decided to group names and take only those records that have the latest date in the “start_date” column
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df_latest<-datacongress%>%group_by(bioname)%>%
filter(start_date==max(start_date))%>%ungroup()
head(df_latest,20)
## # A tibble: 20 × 13
## congress start_date chamber state_abbrev party_code bioname bioguide_id
## <int> <chr> <chr> <chr> <int> <chr> <chr>
## 1 82 1951-01-03 House ND 200 AANDAHL, Fre… A000001
## 2 92 1971-01-03 House VA 100 ABBITT, Watk… A000002
## 3 99 1985-01-03 Senate SD 200 ABDNOR, James A000009
## 4 83 1953-01-03 Senate NE 200 ABEL, Hazel … A000010
## 5 88 1963-01-03 House OH 200 ABELE, Homer… A000011
## 6 111 2009-01-03 House HI 100 ABERCROMBIE,… A000014
## 7 73 1933-03-04 House NC 100 ABERNETHY, C… A000015
## 8 92 1971-01-03 House MS 100 ABERNETHY, T… A000016
## 9 95 1977-01-03 Senate SD 100 ABOUREZK, Ja… A000017
## 10 94 1975-01-03 House NY 100 ABZUG, Bella… A000018
## 11 72 1931-03-04 House NJ 200 ACKERMAN, Er… A000021
## 12 112 2011-01-03 House NY 100 ACKERMAN, Ga… A000022
## 13 91 1969-01-03 House IN 200 ADAIR, Edwin… A000024
## 14 74 1935-01-03 House IL 100 ADAIR, Jacks… A000025
## 15 77 1941-01-03 Senate CO 100 ADAMS, Alva … A000028
## 16 102 1991-01-03 Senate WA 100 ADAMS, Brock… A000031
## 17 79 1945-01-03 House NH 200 ADAMS, Sherm… A000046
## 18 73 1933-03-04 House DE 100 ADAMS, Wilbu… A000050
## 19 99 1985-01-03 House NY 100 ADDABBO, Jos… A000052
## 20 87 1961-01-03 House NJ 100 ADDONIZIO, H… A000054
## # ℹ 6 more variables: birthday <chr>, cmltv_cong <int>, cmltv_chamber <int>,
## # age_days <int>, age_years <dbl>, generation <chr>
Filtering records
For future graph, I want to select only those records who are in “House” chamber.
Also, I applied summarise() function to get the median age.
df_house=df_latest%>%filter(chamber=="House")
df_house_year =df_house%>%group_by(start_date)%>%summarise(age=median(age_years))
df_house_result<-data.frame(df_house_year, type="House")
head(df_house_result,20)
## start_date age type
## 1 1919-03-04 50.67488 House
## 2 1921-03-04 51.18138 House
## 3 1923-03-04 52.51745 House
## 4 1925-03-04 54.89528 House
## 5 1927-03-04 56.43806 House
## 6 1929-03-04 56.68720 House
## 7 1931-03-04 56.78850 House
## 8 1933-03-04 56.60507 House
## 9 1935-01-03 55.28268 House
## 10 1937-01-03 50.95414 House
## 11 1939-01-03 54.27789 House
## 12 1941-01-03 50.55715 House
## 13 1943-01-03 54.08077 House
## 14 1945-01-03 52.87337 House
## 15 1947-01-03 53.25941 House
## 16 1949-01-03 54.63655 House
## 17 1951-01-03 55.76044 House
## 18 1953-01-03 55.65503 House
## 19 1955-01-03 56.16975 House
## 20 1957-01-03 60.40246 House
Filtering records
Selected only those records who are in “Senate” chamber.
Also,summarise() function applied to get the median age.
df_Senate=df_latest%>%filter(chamber=="Senate")
df_Senate_year =df_Senate%>%group_by(start_date)%>%summarise(age=median(age_years))
df_Senate_result<-data.frame(df_Senate_year, type="Senate")
head(df_Senate_result,20)
## start_date age type
## 1 1919-03-04 61.16359 Senate
## 2 1921-03-04 59.26626 Senate
## 3 1923-03-04 64.02464 Senate
## 4 1925-03-04 62.82546 Senate
## 5 1927-03-04 64.79671 Senate
## 6 1929-03-04 62.46954 Senate
## 7 1931-03-04 61.67693 Senate
## 8 1933-03-04 61.12936 Senate
## 9 1935-01-03 60.50650 Senate
## 10 1937-01-03 57.80835 Senate
## 11 1939-01-03 64.27105 Senate
## 12 1941-01-03 60.50787 Senate
## 13 1943-01-03 54.42300 Senate
## 14 1945-01-03 62.84736 Senate
## 15 1947-01-03 60.07666 Senate
## 16 1949-01-03 60.85421 Senate
## 17 1951-01-03 54.12457 Senate
## 18 1953-01-03 63.00616 Senate
## 19 1955-01-03 61.67146 Senate
## 20 1957-01-03 60.82683 Senate
Combined filtered records
Two filtered records should be combined in order represent them together in the graph.
df_combined<-rbind(df_house_result, df_Senate_result)
head(df_combined,20)
## start_date age type
## 1 1919-03-04 50.67488 House
## 2 1921-03-04 51.18138 House
## 3 1923-03-04 52.51745 House
## 4 1925-03-04 54.89528 House
## 5 1927-03-04 56.43806 House
## 6 1929-03-04 56.68720 House
## 7 1931-03-04 56.78850 House
## 8 1933-03-04 56.60507 House
## 9 1935-01-03 55.28268 House
## 10 1937-01-03 50.95414 House
## 11 1939-01-03 54.27789 House
## 12 1941-01-03 50.55715 House
## 13 1943-01-03 54.08077 House
## 14 1945-01-03 52.87337 House
## 15 1947-01-03 53.25941 House
## 16 1949-01-03 54.63655 House
## 17 1951-01-03 55.76044 House
## 18 1953-01-03 55.65503 House
## 19 1955-01-03 56.16975 House
## 20 1957-01-03 60.40246 House
Graph
A line graph was selected to show the median age of House and Senate members over years
library(ggplot2)
g<-ggplot(df_combined,aes(x=start_date, y=age, color=type, group=type))+
geom_line(size=1)+
geom_point(size=3)+
labs(title="Median age of the U.S. Senate and U.S. House by Congress, 1919 to 2023", x="Years", Y="Age")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
g
The provided graph shows that “Congress today is older than it’s ever been”.