library(tidyverse)
library(httr2)
library(jsonlite)
library(keyring)
For this assignment, we’ll be testing our capabilities of accessing APIs and pulling json data from them into data frames. Specifically, we’ll be looking at data from the New York Times books API.
First and foremost, we’ve registered for an API Key that has access to the books API. Since we don’t want our API key publicly accessible, we have loaded it into a keyring with following code:
if (!("NYtimes" %in% as_vector(key_list()[2]))) {
key_set("APIKeys","NYtimes")
}
The next step once we have our API key is to use httr2 to send a request to the books API. Let’s say we want a dataframe that consists of the latest nonfiction bestsellers list.
api <- request(r"(https://api.nytimes.com/svc/books/v3)") # We initialize a request object to our main API path
req <- api %>%
req_url_path_append("lists.json") %>% # This is the API path we want to take to get bestsellers lists
req_url_query(`list` = "Combined Print and E-Book Nonfiction", `api-key` = key_get("APIKeys","NYtimes")) # We get the specific type of list we want and authenticate within the request query
resp <- req %>%
req_perform()
resp$status_code
## [1] 200
With a status code of 200 we should have a succesful API request executed. Now we have an API response where we need to process the json body from raw into something R can read.
blist <- resp %>%
resp_body_json(flatten=TRUE)
summary(blist)
## Length Class Mode
## status 1 -none- character
## copyright 1 -none- character
## num_results 1 -none- numeric
## last_modified 1 -none- character
## results 15 -none- list
We now have the response JSON parsed and stored as a list, however looking at the summary we see the information also contains metadata which we do not want in our dataframe. We only want to extract the data contained within results. Unfortunately, the data contained within results is quite messy as well and needs to be properly unnested to give us a good dataframe result. Here we utilize rbinding and unnesting to get the dataframe into a format of some of the more informative data.
blist2 <- blist$results
names(blist2) <- rep("book",15)
blist2 <- do.call(rbind,blist2)
bframe <- unnest(as_tibble(blist2),cols=colnames(blist2)) |>
unnest_wider(book_details) |>
select(title, author, rank, rank_last_week, weeks_on_list, amazon_product_url) |>
distinct()
knitr::kable(head(bframe))
| title | author | rank | rank_last_week | weeks_on_list | amazon_product_url |
|---|---|---|---|---|---|
| SPARE | Prince Harry | 1 | 2 | 9 | https://www.amazon.com/dp/0593593804?tag=NYTBSREV-20 |
| THE COURAGE TO BE FREE | Ron DeSantis | 2 | 1 | 2 | https://www.amazon.com/dp/0063276003?tag=NYTBSREV-20 |
| THE BODY KEEPS THE SCORE | Bessel van der Kolk | 3 | 3 | 133 | http://www.amazon.com/The-Body-Keeps-Score-Healing/dp/0670785938?tag=NYTBSREV-20 |
| I’M GLAD MY MOM DIED | Jennette McCurdy | 4 | 4 | 31 | https://www.amazon.com/dp/1982185821?tag=NYTBSREV-20 |
| IT’S OK TO BE ANGRY ABOUT CAPITALISM | Bernie Sanders with John Nichols | 5 | 7 | 3 | https://www.amazon.com/dp/0593238710?tag=NYTBSREV-20 |
| WALK THE BLUE LINE | James Patterson and Matt Eversmann with Chris Mooney | 6 | 10 | 5 | https://www.amazon.com/dp/0316406600?tag=NYTBSREV-20 |
We now have a nice table of top 6 best sellers of last week, and an Amazon purchase link if one of these really struck our fancy.
We’ve learned how to access data through APIs directly utilizing httr2. Although the process of calling an API is easy, json isn’t always the cleanest format to change into a usable dataframe. So, it’s important to learn how to handle malformed json. If we wanted to extend this assignment, we could track the ranking of books overtime and determine which books stay on the rankings the longest.