The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.

Loading libraries

#To parse JSON data into dataframe
require (jsonlite)
## Loading required package: jsonlite
#To display R data objects as tables on HTML pages
library(DT)

URI Structure

We are using articlesearch API from New York Times to retrieve articles by our search terms. We need to populate a URI structure containg the web url and the API Key acquired from New York Times by simply using the paste() command.

baseUrl <- "https://api.nytimes.com/svc/search/v2/articlesearch.json"
apiKey <- "?api-key=1b5b89d9860f42cba5d3db52032f1ef8"
uri <- paste(baseUrl, apiKey, sep="") 
uri
## [1] "https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=1b5b89d9860f42cba5d3db52032f1ef8"

Adding up a query

Search query term. Search is performed on the article body, headline and byline etc. Like the API key, we paste the query into the URl. We will use URLencode function to percent-encode characters in URL ().

query <- "&q=Data Science"
uri <- paste(uri, URLencode(query), sep="")
uri
## [1] "https://api.nytimes.com/svc/search/v2/articlesearch.json?api-key=1b5b89d9860f42cba5d3db52032f1ef8&q=Data%20Science"

Convert to R dataframe

Now we will use fromJSON function to parse JSON data into the R data frame. Once we have it parsed, we choose to get the docs porition of the response data as we get our desired data there.

jsonContent <- fromJSON(uri)
df <- jsonContent$response$docs
colnames(df)
##  [1] "web_url"          "snippet"          "print_page"      
##  [4] "blog"             "source"           "multimedia"      
##  [7] "headline"         "keywords"         "pub_date"        
## [10] "document_type"    "new_desk"         "byline"          
## [13] "type_of_material" "_id"              "word_count"      
## [16] "score"            "uri"              "section_name"

We use flatten function that automatically flattens nested data frames into a single non-nested data frame.

flatdf <- flatten(df)
colnames(flatdf)
##  [1] "web_url"                 "snippet"                
##  [3] "print_page"              "source"                 
##  [5] "multimedia"              "keywords"               
##  [7] "pub_date"                "document_type"          
##  [9] "new_desk"                "type_of_material"       
## [11] "_id"                     "word_count"             
## [13] "score"                   "uri"                    
## [15] "section_name"            "headline.main"          
## [17] "headline.kicker"         "headline.content_kicker"
## [19] "headline.print_headline" "headline.name"          
## [21] "headline.seo"            "headline.sub"           
## [23] "byline.original"         "byline.person"          
## [25] "byline.organization"

Subsetting the falt data based on columns of interest and then displaying the data as a table on HTML page.

subsetData <- flatdf[ , c("pub_date", "web_url","headline.main", "byline.original", "snippet")]
options(DT.options = list(pageLength = 5))
datatable(subsetData)