Our goal is to use the New York Times developer API service to search for all articles since January 2019 that contain the word “sabermetrics.” We are big baseball analytics fans and would like to read all of the returned articles.


Load libraries

library(httr)
## Warning: package 'httr' was built under R version 3.5.1
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.1
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.5.1
library(rvest)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.5.1
library(knitr)
## Warning: package 'knitr' was built under R version 3.5.2


Get API key

We start by going to the New York Times website and signing up for an API key. After doing so, we copy and paste the API key into a text file called “nyt_api.txt” and place the file into our current working directory. (This for security purposes.) From there we can read it into R.

api_key <- readLines("nyt_api.txt")


Construct Search URL

The New York Times website gives instructions on how to form a URL string to achieve a specific search query. We use that information, particularly the query text, start date, and end date to paste together the pertinent URL. The API returns only 10 results at a time, so we will start with the first page of results and then see if we need more.

query_text <- "sabermetrics"
start_date <- "2019-01-01"
finish_date <- Sys.Date()
return_page <- "0"

base_url <- "https://api.nytimes.com/svc/search/v2/"
api_slug <- "articlesearch.json"

full_url <- paste0(base_url, api_slug, "?q=", query_text, "&begin_date=", start_date, "&end_date=", finish_date, "&page=", return_page, "&api-key=", api_key)


Get results

Now that we have constructed the correct search string, we can use the httr and jsonlite packages to get the content of the query. We traverse the resulting json object and extract the links of the sabermetrics articles we are interested in.

httr_obj <- GET(full_url)
content_obj <- content(httr_obj, as="text")
jl_obj <- fromJSON(content_obj)
article_links <- jl_obj[["response"]][["docs"]]["web_url"]

article_links %>% kable()
web_url
https://www.nytimes.com/2019/01/30/sports/sandy-alderson-mets-oakland-athletics.html
https://www.nytimes.com/2019/01/04/sports/baseball/pitching-counts.html


There are only two articles, so we can use the links to view them online directly.