library(httr)
library(jsonlite)
library(stringr)
library(pander)
In Week 9’s assignment, the task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.
I first checked NYTimes public specs on Github to find the API URL format.
https://github.com/NYTimes/public_api_specs/commit/fca6f6c9def8eede59726f3b06a2734f07e689ee
My inital interest was in the Top Stories category.
I set up a concatenated URL from the host, path, query and api-key for the API call.
Here, the API-key is concatenated into the query directly, though I would rather use Set environment function (Sys.setenv) to hide secrets.
host <- "http://api.nytimes.com"
path <- "/svc/topstories/v2.json"
query <- "?order=by-date"
api_key <- "&api-key=af67b77762da45ccbdcded839b22ce8a"
nyt.url <- str_c(host,path, query, api_key, sep = "", collapse = NULL)
nyt.url
## [1] "http://api.nytimes.com/svc/topstories/v2.json?order=by-date&api-key=af67b77762da45ccbdcded839b22ce8a"
The GET method in Wickham’s httr package makes a request to NYT’s server, only to be rejected.
r <- GET(nyt.url, verbose())
## <- HTTP/1.1 403 Forbidden
HTTP/1.1 403 Forbidden Access-Control-Allow-Credentials: false
http_status(r)
## $category
## [1] "Client error"
##
## $reason
## [1] "Forbidden"
##
## $message
## [1] "Client error: (403) Forbidden"
The Top Stories seem off limits. The New York Times website is off-limits in China, blocked by the Great Firewall. I suppose such a news feed too may be suppressed.
When we set the path to movie reviews, API access is suddenly permitted.
path <- "/svc/movies/v2/reviews/dvd-picks.json"
nyt.url <- str_c(host, path, query, api_key, sep = "", collapse = NULL)
r <- GET(nyt.url, verbose())
Booklists are apparently also not subject to the Great Firewall.
path <- "/svc/books/v2/lists/overview.json"
query <- "?published_date=2018-01-01"
nyt.url <- str_c(host, path, query, api_key, sep = "", collapse = NULL)
raw.result2 <- GET(nyt.url, verbose())
http_status(raw.result2)
## $category
## [1] "Success"
##
## $reason
## [1] "OK"
##
## $message
## [1] "Success: (200) OK"
Now, to create a dataframe of the Bestselling Books on Jan 1, 2018.
First, extracting the content from the body as a JSON, then reading it into R before furthering munging.
this.raw.content2 <- content(raw.result2, "text", encoding='UTF-8')
this.json2 <- fromJSON(this.raw.content2, flatten = TRUE)
bestsellers <- this.json2$results$list
Here we are only concerned with the data in the books column.
names(bestsellers[[1]])
## NULL
I then follow this procedure:
books <- unlist(bestsellers[[1]]["books"])
books <- matrix(t(books), nrow = 29, ncol = 5)
books_df <- data.frame("Title" = books[21,], "Author" = books[4,], "Description" = books[12,], "Wks_Bestseller" = books[23,])
books_df
## Title Author Description Wks_Bestseller
## 1 NA NA NA NA
## 2 NA NA NA NA
## 3 NA NA NA NA
## 4 NA NA NA NA
## 5 NA NA NA NA
Note, these next lines of code in the jsonlite vignette did not work for me in RMarkdown due to: Error in bestsellers[[1, “books”]] : incorrect number of subscripts. See sources below. But, after knitting, this code mysterioudly works, though it didn’t in RStudio.
category1 <- bestsellers[[1, "books"]]
df <- subset(category1, select = c("title", "author", "description", "weeks_on_list"))
pander(df, style = 'rmarkdown')
title | author |
---|---|
ORIGIN | Dan Brown |
THE ROOSTER BAR | John Grisham |
THE SUN AND HER FLOWERS | Rupi Kaur |
THE PEOPLE VS. ALEX CROSS | James Patterson |
MILK AND HONEY | Rupi Kaur |
description |
---|
A symbology professor goes on a perilous quest with a beautiful museum director. |
Three students at a sleazy for-profit law school hope to expose the student-loan banker who runs it. |
A new collection of poetry from the author of “Milk and Honey.” |
Detective Cross takes on a case even though he has been suspended from the department and taken to federal court to stand trial on murder charges. |
Poetic approaches to surviving adversity and loss. |
weeks_on_list |
---|
12 |
9 |
12 |
5 |
45 |
The following links were helpful to this assignment.
Sources:
https://www.r-bloggers.com/accessing-apis-from-r-and-a-little-r-programming/
https://rstudio-pubs-static.s3.amazonaws.com/223073_37195ccc31b846fc8c963e5d10416887.html
https://cran.r-project.org/web/packages/jsonlite/vignettes/json-apis.html