The Books API

The New York Times’ Books API provides a mechanism for retrieving bestsellers lists and book reviews. For this exercise, the bestsellers lists are investigated. A key for this API is obtained, but not displayed here for security.

Investigating the API

The standard format of a bestsellers list request takes the format http://api.nytimes.com/svc/books/{version}/lists[.response_format]?{search-param1=value1}&[...]&[optional-param1=value1]&[...]&api-key={your-API-key}

Querying the API

The current API version is v3. Since the standard response format for this API is JSON, this format can be simplified and used to create a function to query the API. The jsonlite package is used to read in the JSON response to the query:

library(jsonlite)
books_request <- function(list_name, params = c(NA)){
  books_query <- paste0('http://api.nytimes.com/svc/books/v3/lists/', list_name, '?api-key=', books_api_key)
  if (length(na.exclude(params)) > 0){
    books_query <- paste0(books_query, '&', paste(params, collapse = '&'))
  }
  fromJSON(books_query)
}

API Responses

Requests to the API return a list, which contain the following:

  • The status, with “OK” indicating an HTTP response code of 200 (status)
  • A copyright message (copyright)
  • The number of results in the response (num_results)
  • The results (results)
  • Other information (e.g. corrections)

For this exercise, only the results are of interest.

Retrieving Data

Now that data can easily be requested and the responses understood, data is requested and stored in R.

Bestsellers List Names

The set of bestsellers list names can be obtained by querying the API for the list names.

list_names <- books_request('names')$results
list_name display_name list_name_encoded oldest_published_date newest_published_date updated
Combined Print and E-Book Fiction Combined Print & E-Book Fiction combined-print-and-e-book-fiction 2011-02-13 2016-04-10 WEEKLY
Combined Print and E-Book Nonfiction Combined Print & E-Book Nonfiction combined-print-and-e-book-nonfiction 2011-02-13 2016-04-10 WEEKLY
Hardcover Fiction Hardcover Fiction hardcover-fiction 2008-06-08 2016-04-10 WEEKLY
Hardcover Nonfiction Hardcover Nonfiction hardcover-nonfiction 2008-06-08 2016-04-10 WEEKLY
Trade Fiction Paperback Paperback Trade Fiction trade-fiction-paperback 2008-06-08 2016-04-10 WEEKLY
Mass Market Paperback Paperback Mass-Market Fiction mass-market-paperback 2008-06-08 2016-04-10 WEEKLY
Paperback Nonfiction Paperback Nonfiction paperback-nonfiction 2008-06-08 2016-04-10 WEEKLY
E-Book Fiction E-Book Fiction e-book-fiction 2011-02-13 2016-04-10 WEEKLY
E-Book Nonfiction E-Book Nonfiction e-book-nonfiction 2011-02-13 2016-04-10 WEEKLY

The second item in this list, Combined Print & E-Book Fiction, will be used – this list considers all fiction, regardless of format or sub-genre.

Current Bestsellers List

The current Combined Print and E-Book Fiction bestsellers list is obtained from the API using the encoded name:

books_response <- books_request('combined-print-and-e-book-fiction')$results
str(books_response, max.level = 1)
List of 9
 $ list_name          : chr "Combined Print and E-Book Fiction"
 $ list_name_encoded  : NULL
 $ bestsellers_date   : chr "2016-03-26"
 $ published_date     : chr "2016-04-10"
 $ display_name       : chr "Combined Print & E-Book Fiction"
 $ normal_list_ends_at: int 15
 $ updated            : chr "WEEKLY"
 $ books              :'data.frame':    20 obs. of  22 variables:
 $ corrections        : list()

In this response, results is formatted as a list containing information about the list itself in addition to the list itself. Accordingly, the books item within the results list will need to be considered:

books_df <- books_response$books
class(books_df)
[1] "data.frame"

The resulting data frame contains a large amount of information outside of the scope of this investigation. The desired columns are selected with the dplyr package:

library(dplyr)
books_df <- books_df %>% select(rank, rank_last_week, weeks_on_list, title, author, publisher, primary_isbn10)
names(books_df) <- c("Rank", "Last", "Weeks", "Title", "Author", "Publisher", "ISBN10")
Rank Last Weeks Title Author Publisher ISBN10
1 0 1 FOOL ME ONCE Harlan Coben Dutton 0698404173
2 1 2 PRIVATE PARIS James Patterson and Mark Sullivan Little, Brown 0316408999
3 0 1 THE NEST Cynthia D’Aprix Sweeney Ecco/HarperCollins 0062414232
4 4 8 ME BEFORE YOU Jojo Moyes Penguin 0143124544
5 2 2 PROPERTY OF A NOBLEWOMAN Danielle Steel Delacorte 034553106X
6 5 48 THE NIGHTINGALE Kristin Hannah St. Martin’s 1466850604
7 0 1 THE SUMMER BEFORE THE WAR Helen Simonson Random House 0679644644
8 0 9 THE GUILTY David Baldacci Grand Central 1455586439
9 7 5 THE WEDDING DRESS Rachel Hauck Thomas Nelson 1401686311
10 0 1 THE WEDDING Nicholas Sparks Grand Central 0759507910
11 8 60 THE GIRL ON THE TRAIN Paula Hawkins Riverhead 1594633665
12 0 2 THE THIRD GATE Lincoln Child Anchor 0385531397
13 11 78 ALL THE LIGHT WE CANNOT SEE Anthony Doerr Scribner 1476746583
14 12 10 THE LIAR Nora Roberts Putnam 1101989750
15 9 3 OFF THE GRID C J Box Putnam None
16 0 0 HEARTBREAKER Linda Howard Avon Impulse 006242226X
17 0 0 ROOM Emma Donoghue Little, Brown 0316098329
18 0 0 MEMORY MAN David Baldacci Grand Central 1455559806
19 0 0 A MAN CALLED OVE Fredrik Backman Washington Square Press 1476738025
20 0 0 THE MARTIAN Andy Weir Crown 0553418025