Assignment Instructions

The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs

You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.

Article Search API

With the New York Times Article Search API, you can search New York Times articles from September 18, 1851 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia, and other article metadata. Note: In URI examples and field names, italics indicate placeholders for variables or values. Brackets [ ] indicate optional items. Parentheses ( ) are not a convention– when URIs include parentheses, interpret them literally.

Note: The value of page corresponds to a set of \(10\) results. For example, page \(=0\) correspondents to records \(0\) to \(9\); and page \(=1\) correspondents to records \(10\) to \(19\). Also, ach API is limited to \(1,000\) calls per day, and \(5\) calls per second.

library(httr)
library(jsonlite)

query <- "CUNY School of Professional Studies"
results <- 500

NYT_Articles <- function(query, results) {
  NYT <- data <- {}
  set <- 0:(results / 10 - 1)
  for (i in 1:max(set)) {
    url <- paste0("https://api.nytimes.com/svc/search/v2/articlesearch.json?q=",
                  gsub(" ", "\\+", query), "&page=", i,"&api-key=", Hidden_API_Key)
    json <- GET(url)
    json <- rawToChar(json$content)
    json <- fromJSON(json, simplifyVector = TRUE)
    if (length(json$response[[2]]) ==0 ) { break }
    data <- data.frame(json)
    data[,12] <- data[,12][1]
    data <- data[,c(4:8,12,14,16,17,20,22)]
    NYT <- data.frame(rbind(NYT, data))
    if (i %% 5) { Sys.sleep(2) }
  }
  colnames(NYT) <- gsub("\\.1", "", colnames(NYT))
  colnames(NYT) <- gsub("\\w+\\.", "", colnames(NYT))
  return(NYT)
}
NYT <- NYT_Articles(query, results)
t(head(NYT, 1))
##                  1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
## web_url          "http://query.nytimes.com/gst/abstract.html?res=9D00E3DD1631E632A25750C1A9659C946990D6CF"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## snippet          "Prof William L Tung holds Tinley N Akar's Feb 18 article on Tibet being integral part of China was incorrect in stating that Tibet was independent state in '40s; illus"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## lead_paragraph   NA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## abstract         "Prof William L Tung holds Tinley N Akar's Feb 18 article on Tibet being integral part of China was incorrect in stating that Tibet was independent state in '40s; illus"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## print_page       "A20"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
## headline         "Letters; Blueprint for a Reformed School System Tibet: 'An Integral Part of China' Of Racism in Britain and Mrs. Thatcher's Words Gov. Brown's Symbols The Refugee Problems Of the Middle East Central Park Zoo: The Animals' Friends Letter: On Elections in India Indira Gandhi's 'Extraordinary Performance' Professor of International Law and East Asian Studies Queens College, CUNY Flushing, N.Y., March 6, 1978 COLIN D. TWEEDY Linden, N.J., March 4, 1978 PHILIP A. SCHAEFER Belvedere, Calif., March 6, 1978 (Ambassador) C. HERZOG Permanent Representative of Israel to the United Nations New York, March 6, 1978 ROBIN BLANCO Menagerie Keeper New York, March 2, 1978 RALPH BUULTJENS New York, March 6, 1978"
## pub_date         "1978-03-13T00:00:00Z"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
## news_desk        NA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## section_name     NA                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
## type_of_material "Letter"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
## word_count       "2314"

References

http://developer.nytimes.com/article_search_v2.json#/README

http://web.stanford.edu/~cengel/cgi-bin/anthrospace/scraping-new-york-times-articles-with-r (cached)

http://web.stanford.edu/~cengel/cgi-bin/anthrospace/wp-content/uploads/2009/09/scrapeNYT_API2.txt