# Enter your name here: Merissah Gilbert
# 1. I did this homework by myself, with help from the book and the professor.
Reminders of things to practice from previous weeks:
Descriptive
statistics: mean( ) max( ) min( )
Coerce to numeric: as.numeric(
)
Below, I have provided a starter file to help you.
Each of these lines of code must be commented (the comment must that explains what is going on, so that I know you understand the code and results).
library(jsonlite)
#this line of code is tell R to add jsonlite (pre-installed) to the current R document so it may be used
dataset <- url("https://intro-datascience.s3.us-east-2.amazonaws.com/role.json")
#this line of code is taking data from a url and assigned the values to 'dataset' (the name we are calling the data)
readlines <- jsonlite::fromJSON(dataset)
#this line of code is assigning the name 'readlines' and telling jsonlite to gather from data frame of json file 'dateaset'
df <- readlines$objects$person
#this line of code is assigning the name 'df' to a dataset that is from the 'person' columns in the 'objects' data set from the 'readlines' data frame
str(df)
## 'data.frame': 100 obs. of 17 variables:
## $ bioguideid : chr "C000880" "G000386" "L000174" "M001153" ...
## $ birthday : chr "1951-05-20" "1933-09-17" "1940-03-31" "1957-05-22" ...
## $ cspanid : int 26440 1167 1552 1004138 25277 5929 1859 1962 45465 92069 ...
## $ firstname : chr "Michael" "Charles" "Patrick" "Lisa" ...
## $ gender : chr "male" "male" "male" "female" ...
## $ gender_label: chr "Male" "Male" "Male" "Female" ...
## $ lastname : chr "Crapo" "Grassley" "Leahy" "Murkowski" ...
## $ link : chr "https://www.govtrack.us/congress/members/michael_crapo/300030" "https://www.govtrack.us/congress/members/charles_grassley/300048" "https://www.govtrack.us/congress/members/patrick_leahy/300065" "https://www.govtrack.us/congress/members/lisa_murkowski/300075" ...
## $ middlename : chr "D." "E." "J." "A." ...
## $ name : chr "Sen. Michael “Mike” Crapo [R-ID]" "Sen. Charles “Chuck” Grassley [R-IA]" "Sen. Patrick Leahy [D-VT]" "Sen. Lisa Murkowski [R-AK]" ...
## $ namemod : chr "" "" "" "" ...
## $ nickname : chr "Mike" "Chuck" "" "" ...
## $ osid : chr "N00006267" "N00001758" "N00009918" "N00026050" ...
## $ pvsid : chr "26830" "53293" "53353" "15841" ...
## $ sortname : chr "Crapo, Michael “Mike” (Sen.) [R-ID]" "Grassley, Charles “Chuck” (Sen.) [R-IA]" "Leahy, Patrick (Sen.) [D-VT]" "Murkowski, Lisa (Sen.) [R-AK]" ...
## $ twitterid : chr "MikeCrapo" "ChuckGrassley" "SenatorLeahy" "LisaMurkowski" ...
## $ youtubeid : chr "senatorcrapo" "senchuckgrassley" "SenatorPatrickLeahy" "senatormurkowski" ...
head(df)
## bioguideid birthday cspanid firstname gender gender_label lastname
## 1 C000880 1951-05-20 26440 Michael male Male Crapo
## 2 G000386 1933-09-17 1167 Charles male Male Grassley
## 3 L000174 1940-03-31 1552 Patrick male Male Leahy
## 4 M001153 1957-05-22 1004138 Lisa female Female Murkowski
## 5 M001111 1950-10-11 25277 Patty female Female Murray
## 6 S000148 1950-11-23 5929 Charles male Male Schumer
## link middlename
## 1 https://www.govtrack.us/congress/members/michael_crapo/300030 D.
## 2 https://www.govtrack.us/congress/members/charles_grassley/300048 E.
## 3 https://www.govtrack.us/congress/members/patrick_leahy/300065 J.
## 4 https://www.govtrack.us/congress/members/lisa_murkowski/300075 A.
## 5 https://www.govtrack.us/congress/members/patty_murray/300076
## 6 https://www.govtrack.us/congress/members/charles_schumer/300087 E.
## name namemod nickname osid pvsid
## 1 Sen. Michael “Mike” Crapo [R-ID] Mike N00006267 26830
## 2 Sen. Charles “Chuck” Grassley [R-IA] Chuck N00001758 53293
## 3 Sen. Patrick Leahy [D-VT] N00009918 53353
## 4 Sen. Lisa Murkowski [R-AK] N00026050 15841
## 5 Sen. Patty Murray [D-WA] N00007876 53358
## 6 Sen. Charles “Chuck” Schumer [D-NY] Chuck N00001093 26976
## sortname twitterid youtubeid
## 1 Crapo, Michael “Mike” (Sen.) [R-ID] MikeCrapo senatorcrapo
## 2 Grassley, Charles “Chuck” (Sen.) [R-IA] ChuckGrassley senchuckgrassley
## 3 Leahy, Patrick (Sen.) [D-VT] SenatorLeahy SenatorPatrickLeahy
## 4 Murkowski, Lisa (Sen.) [R-AK] LisaMurkowski senatormurkowski
## 5 Murray, Patty (Sen.) [D-WA] PattyMurray SenatorPattyMurray
## 6 Schumer, Charles “Chuck” (Sen.) [D-NY] SenSchumer SenatorSchumer
#there are 100 rows in the dataset and each row is a US Senator.
#There are 17 columns representing identifiable information on the senators such as first name, last name, birthday, gender, link to senator page, youtube and twitter names, etc.
C. What does running this line of code do? Explain in a comment:
vals <- substr(df$birthday,1,4)
#running this line of code takes the 1st 4 characters from the birthday row of the data set 'df' and assigned it under the value name 'val'
D. Create a new attribute ‘age’ - how old the person is Hint: You may need to convert it to numeric first.
vals <- as.numeric(vals)
age <- (2024-(vals))
E. Create a function that reads in the role json dataset, and adds the age attribute to the dataframe, and returns that dataframe
newfun <- function(df){
df$age <- c(age)
return(df)
}
F. Use (call, invoke) the function, and store the results in df
df <- newfun(df)
library(dbplyr)
sum(df$gender_label=="Female")
## [1] 24
length(na.omit(df$youtubeid))
## [1] 73
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:dbplyr':
##
## ident, sql
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
womenyout <- function(df){
df %>%
filter(!is.na(youtubeid), gender_label=="Female")
}
nrow(womenyout(df))
## [1] 16
youtubeWomen <- womenyout(df)
hist(youtubeWomen$age)
hist(df$age)
#the histograms shape resembles a normal distribution with some outliers or anomolies.