# Enter your name here: Patrick Smith
# 1. I did this homework by myself, with help from the book and the professor.
Reminders of things to practice from previous weeks:
Descriptive statistics: mean( ) max( ) min( )
Coerce to numeric: as.numeric( )
Below, I have provided a starter file to help you.
Each of these lines of code must be commented (the comment must that explains what is going on, so that I know you understand the code and results).
library(jsonlite)
dataset <- url("https://intro-datascience.s3.us-east-2.amazonaws.com/role.json")
readlines <- jsonlite::fromJSON(dataset)
df <- readlines$objects$person
bioguideid birthday cspanid firstname gender gender_label lastname link middlename
1 C000880 1951-05-20 26440 Michael male Male Crapo https://www.govtrack.us/congress/members/michael_crapo/300030 D.
2 G000386 1933-09-17 1167 Charles male Male Grassley https://www.govtrack.us/congress/members/charles_grassley/300048 E.
3 L000174 1940-03-31 1552 Patrick male Male Leahy https://www.govtrack.us/congress/members/patrick_leahy/300065 J.
4 M001153 1957-05-22 1004138 Lisa female Female Murkowski https://www.govtrack.us/congress/members/lisa_murkowski/300075 A.
5 M001111 1950-10-11 25277 Patty female Female Murray https://www.govtrack.us/congress/members/patty_murray/300076
6 S000148 1950-11-23 5929 Charles male Male Schumer https://www.govtrack.us/congress/members/charles_schumer/300087 E.
name namemod nickname osid pvsid sortname twitterid youtubeid
1 Sen. Michael “Mike” Crapo [R-ID] Mike N00006267 26830 Crapo, Michael “Mike” (Sen.) [R-ID] MikeCrapo senatorcrapo
2 Sen. Charles “Chuck” Grassley [R-IA] Chuck N00001758 53293 Grassley, Charles “Chuck” (Sen.) [R-IA] ChuckGrassley senchuckgrassley
3 Sen. Patrick Leahy [D-VT] N00009918 53353 Leahy, Patrick (Sen.) [D-VT] SenatorLeahy SenatorPatrickLeahy
4 Sen. Lisa Murkowski [R-AK] N00026050 15841 Murkowski, Lisa (Sen.) [R-AK] LisaMurkowski senatormurkowski
5 Sen. Patty Murray [D-WA] N00007876 53358 Murray, Patty (Sen.) [D-WA] PattyMurray SenatorPattyMurray
6 Sen. Charles “Chuck” Schumer [D-NY] Chuck N00001093 26976 Schumer, Charles “Chuck” (Sen.) [D-NY] SenSchumer SenatorSchumer
bioguideid birthday cspanid firstname gender gender_label lastname link
Length:100 Length:100 Min. : 260 Length:100 Length:100 Length:100 Length:100 Length:100
Class :character Class :character 1st Qu.: 25277 Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Median : 68489 Mode :character Mode :character Mode :character Mode :character Mode :character
Mean : 584001
3rd Qu.:1004138
Max. :9269028
NA's :11
middlename name namemod nickname osid pvsid sortname twitterid
Length:100 Length:100 Length:100 Length:100 Length:100 Length:100 Length:100 Length:100
Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
youtubeid
Length:100
Class :character
Mode :character
nrow(df)
[1] 100
ncol(df)
[1] 17
colnames(df)
[1] "bioguideid" "birthday" "cspanid" "firstname" "gender" "gender_label" "lastname" "link" "middlename"
[10] "name" "namemod" "nickname" "osid" "pvsid" "sortname" "twitterid" "youtubeid"
C. What does running this line of code do? Explain in a comment:
vals <- substr(df$birthday,1,4)
The substr gets a substring of data. birthday1,4 is the birthday in year due to 4 element value so like 1951 instead of 1951-05-20 for Michael Crapo .
D. Create a new attribute ‘age’ - how old the person is old Hint: You may need to convert it to numeric first.
```{r}> old<-as.numeric(vals) > age<-2021-old > age [1] 70 88 81 64 71 71 87 72 71 71 66 67 66 60 62 60 57 49 53 56 75 50 58 64 50 66 49 53 70 63 57 63 74 88 71 59 69 69 78 67 80 70 48 74 55 61 65 61 [49] 66 69 50 74 45 72 77 60 70 51 63 64 69 67 42 74 57 69 69 77 66 87 79 65 72 78 74 67 58 68 64 49 75 63 44 59 52 57 51 61 67 49 61 63 62 67 67 69 [97] 62 48 34 52
E. Create a function that reads in the role json dataset, and adds the age attribute to the dataframe, and returns that dataframe
```{r}> df<-data.frame(df,age)
> head(df)
bioguideid birthday cspanid firstname gender gender_label lastname link middlename
1 C000880 1951-05-20 26440 Michael male Male Crapo https://www.govtrack.us/congress/members/michael_crapo/300030 D.
2 G000386 1933-09-17 1167 Charles male Male Grassley https://www.govtrack.us/congress/members/charles_grassley/300048 E.
3 L000174 1940-03-31 1552 Patrick male Male Leahy https://www.govtrack.us/congress/members/patrick_leahy/300065 J.
4 M001153 1957-05-22 1004138 Lisa female Female Murkowski https://www.govtrack.us/congress/members/lisa_murkowski/300075 A.
5 M001111 1950-10-11 25277 Patty female Female Murray https://www.govtrack.us/congress/members/patty_murray/300076
6 S000148 1950-11-23 5929 Charles male Male Schumer https://www.govtrack.us/congress/members/charles_schumer/300087 E.
name namemod nickname osid pvsid sortname twitterid youtubeid age
1 Sen. Michael “Mike” Crapo [R-ID] Mike N00006267 26830 Crapo, Michael “Mike” (Sen.) [R-ID] MikeCrapo senatorcrapo 70
2 Sen. Charles “Chuck” Grassley [R-IA] Chuck N00001758 53293 Grassley, Charles “Chuck” (Sen.) [R-IA] ChuckGrassley senchuckgrassley 88
3 Sen. Patrick Leahy [D-VT] N00009918 53353 Leahy, Patrick (Sen.) [D-VT] SenatorLeahy SenatorPatrickLeahy 81
4 Sen. Lisa Murkowski [R-AK] N00026050 15841 Murkowski, Lisa (Sen.) [R-AK] LisaMurkowski senatormurkowski 64
5 Sen. Patty Murray [D-WA] N00007876 53358 Murray, Patty (Sen.) [D-WA] PattyMurray SenatorPattyMurray 71
6 Sen. Charles “Chuck” Schumer [D-NY] Chuck N00001093 26976 Schumer, Charles “Chuck” (Sen.) [D-NY] SenSchumer SenatorSchumer 71
F. Use (call, invoke) the function, and store the results in df
```{r}> df<-function(df,age) + df<-data.frame(df,age)
## Part 2: Investigate the resulting dataframe 'df'
A. How many senators are women?
```{r}> sum(df$gender=='female')
[1] 24
nrow(df[df$gender=='female',])
[1] 24
100-nrow(df[df$youtubeid=='youtubeid',])
[1] 73 Senators have a YouTube Account
[1] 16
youtubewomen<-data.frame(df$youtubeid=='youtubeid'&df$gender=='female')
```{r} hist(age)
``` Most Senators are around the age of 70. Almost a Triangle/Pyramid shape. Most of the female senators are under 40. The age gap may be the reason of increased percentage of youtube.