Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON d ata, and transform it to an R dataframe

Note: Load the require packages for easy accesibility.

library(knitr)
library(XML)
library(jsonlite)
library(plyr)

Load the data file from the New York Times webpage. And an API key is required for accessing the webpage as a developer.

url.times <- ("http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107-113/nominees/updated.xml?api-key=685874ff71bb286631a8ea2c3f9989bb:16:74820193")

XML_doc   <- htmlParse(url.times)

str(XML_doc)
## Classes 'HTMLInternalDocument', 'HTMLInternalDocument', 'XMLInternalDocument', 'XMLAbstractDocument' <externalptr>

We found out that the dataset isnt in dataframe yet, we will therefore convert if from list to dataframe.

nytimes_all <- ldply(xmlToList(XML_doc), data.frame)

As you have noticed, we have a whole bunch of “repeating” columns, we will therefore subset 10 of it and rename it.

nytimes <- nytimes_all[, 1:10]

names(nytimes)[names(nytimes)==".id"] <- "id"
names(nytimes)[names(nytimes)=="result_set.status"] <- "status"
names(nytimes)[names(nytimes)=="result_set.copyright"] <- "copyright"
names(nytimes)[names(nytimes)=="result_set.results.congress"] <- "congress_results"
names(nytimes)[names(nytimes)=="result_set.results.num_results"] <- "num_results"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.id"] <- "nomination_id"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.uri"] <- "nomination_uri"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.date_received"] <- "date_received"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.description"] <- "description"
names(nytimes)[names(nytimes)=="result_set.results.nominations.nomination.nominee_state"] <- "state"

xmlSize(nytimes) #how many children in node, 10
## [1] 10
nytimes[[1]]
## [1] "body"

Here is the extracted dataset head

names(nytimes)
##  [1] "id"               "status"           "copyright"       
##  [4] "congress_results" "num_results"      "nomination_id"   
##  [7] "nomination_uri"   "date_received"    "description"     
## [10] "state"
kable(head(nytimes))
id status copyright congress_results num_results nomination_id nomination_uri date_received description state
body OK Copyright (c) 2016 The New York Times Company. All Rights Reserved. 107 20 PN965 http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN965.xml 2001-09-04 John P. Walters, of Michigan, to be Director of National Drug Control Policy, vice Barry R. McCaffrey, resigned. MI

lets now try to access thesame dataset from different format called (JSON)

url.json <- ("http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107-113/nominees/updated.json?api-key=685874ff71bb286631a8ea2c3f9989bb:16:74820193")

# We have to extract the file from url to json format using fromJson

nytimes.json <- fromJSON(url.json)

# Setting it to dataframe for analysis

nytimes.json <- ldply (nytimes.json[4], data.frame)

kable(head(nytimes.json))
.id id uri date_received description nominee_state committee_uri latest_action_date status
results PN965 http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN965.json 2001-09-04 John P. Walters, of Michigan, to be Director of National Drug Control Policy, vice Barry R. McCaffrey, resigned. MI http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSJU.json 2005-07-28 Confirmed
results PN851 http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN851.json 2001-09-04 Dennis L. Schornack, of Michigan, to be Commissioner on the part of the United States on the International Joint Commission, United States and Canada, vice Thomas L. Baldini. MI http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSFR.json 2002-11-20 Nomination Expired
results PN817 http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN817.json 2001-09-04 Thomas C. Dorr, of Iowa, to be a Member of the Board of Directors of the Commodity Credit Corporation, vice Jill L. Long, resigned. IA http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSAF.json 2002-11-20 Nomination Expired
results PN814 http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN814.json 2001-09-04 Thomas C. Dorr, of Iowa, to be Under Secretary of Agriculture for Rural Development, vice Jill L. Long, resigned. IA http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSAF.json 2002-11-20 Nomination Expired
results PN923 http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN923.json 2001-09-04 Marian Blank Horn, of Maryland, to be a Judge of the United States Court of Federal Claims for a term of fifteen years. (Reappointment) MD http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSJU.json 2002-11-20 Nomination Expired
results PN922 http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/nominees/PN922.json 2001-09-04 Charles F. Lettow, of Virginia, to be a Judge of the United States Court of Federal Claims for a term of fifteen years, vice John Paul Wiese, term expiring. VA http://api.nytimes.com/svc/politics/v3/us/legislative/congress/107/senate/committees/SSJU.json 2002-11-20 Nomination Expired