Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON d ata, and transform it to an R dataframe
Note: Load the require packages for easy accesibility.
library(knitr)
library(XML)
library(jsonlite)
library(plyr)
Load the data file from the New York Times webpage. And an API key is required for accessing the webpage as a developer.
url.times <- ("http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107-113/nominees/updated.xml?api-key=685874ff71bb286631a8ea2c3f9989bb:16:74820193")
XML_doc <- htmlParse(url.times)
str(XML_doc)
## Classes 'HTMLInternalDocument', 'HTMLInternalDocument', 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>
We found out that the dataset isnt in dataframe yet, we will therefore convert if from list to dataframe.
nytimes_all <- ldply(xmlToList(XML_doc), data.frame)
As you have noticed, we have a whole bunch of “repeating” columns, we will therefore subset 10 of it and rename it.
nytimes <- nytimes_all[, 1:10]
names(nytimes)[names(nytimes)==".id"] <- "id"
names(nytimes)[names(nytimes)=="result_set.status"] <- "status"
names(nytimes)[names(nytimes)=="result_set.copyright"] <- "copyright"
names(nytimes)[names(nytimes)=="result_set.results.congress"] <- "congress_results"
names(nytimes)[names(nytimes)=="result_set.results.num_results"] <- "num_results"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.id"] <- "nomination_id"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.uri"] <- "nomination_uri"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.date_received"] <- "date_received"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.description"] <- "description"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.nominee_state"] <- "state"
xmlSize(nytimes) #how many children in node, 10
## [1] 10
nytimes[[1]]
## [1] "body"
Here is the extracted dataset head
names(nytimes)
## [1] "id" "status" "copyright"
## [4] "congress_results" "num_results" "nomination_id"
## [7] "nomination_uri" "date_received" "description"
## [10] "state"
kable(head(nytimes))
| id | status | copyright | congress_results | num_results | nomination_id | nomination_uri | date_received | description | state |
|---|---|---|---|---|---|---|---|---|---|
| body | OK | Copyright (c) 2016 The New York Times Company. All Rights Reserved. | 107 | 20 | PN965 | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN965.xml | 2001-09-04 | John P. Walters, of Michigan, to be Director of National Drug Control Policy, vice Barry R. McCaffrey, resigned. | MI |
lets now try to access thesame dataset from different format called (JSON)
url.json <- ("http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107-113/nominees/updated.json?api-key=685874ff71bb286631a8ea2c3f9989bb:16:74820193")
# We have to extract the file from url to json format using fromJson
nytimes.json <- fromJSON(url.json)
# Setting it to dataframe for analysis
nytimes.json <- ldply (nytimes.json[4], data.frame)
kable(head(nytimes.json))
| .id | id | uri | date_received | description | nominee_state | committee_uri | latest_action_date | status |
|---|---|---|---|---|---|---|---|---|
| results | PN965 | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN965.json | 2001-09-04 | John P. Walters, of Michigan, to be Director of National Drug Control Policy, vice Barry R. McCaffrey, resigned. | MI | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSJU.json | 2005-07-28 | Confirmed |
| results | PN851 | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN851.json | 2001-09-04 | Dennis L. Schornack, of Michigan, to be Commissioner on the part of the United States on the International Joint Commission, United States and Canada, vice Thomas L. Baldini. | MI | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSFR.json | 2002-11-20 | Nomination Expired |
| results | PN817 | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN817.json | 2001-09-04 | Thomas C. Dorr, of Iowa, to be a Member of the Board of Directors of the Commodity Credit Corporation, vice Jill L. Long, resigned. | IA | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSAF.json | 2002-11-20 | Nomination Expired |
| results | PN814 | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN814.json | 2001-09-04 | Thomas C. Dorr, of Iowa, to be Under Secretary of Agriculture for Rural Development, vice Jill L. Long, resigned. | IA | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSAF.json | 2002-11-20 | Nomination Expired |
| results | PN923 | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN923.json | 2001-09-04 | Marian Blank Horn, of Maryland, to be a Judge of the United States Court of Federal Claims for a term of fifteen years. (Reappointment) | MD | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSJU.json | 2002-11-20 | Nomination Expired |
| results | PN922 | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN922.json | 2001-09-04 | Charles F. Lettow, of Virginia, to be a Judge of the United States Court of Federal Claims for a term of fifteen years, vice John Paul Wiese, term expiring. | VA | http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSJU.json | 2002-11-20 | Nomination Expired |