The New York Times’ Books API provides a mechanism for retrieving bestsellers lists and book reviews. For this exercise, the bestsellers lists are investigated. A key for this API is obtained, but not displayed here for security.
The standard format of a bestsellers list request takes the format http://api.nytimes.com/svc/books/{version}/lists[.response_format]?{search-param1=value1}&[...]&[optional-param1=value1]&[...]&api-key={your-API-key}
The current API version is v3. Since the standard response format for this API is JSON, this format can be simplified and used to create a function to query the API. The jsonlite package is used to read in the JSON response to the query:
library(jsonlite)
books_request <- function(list_name, params = c(NA)){
books_query <- paste0('http://api.nytimes.com/svc/books/v3/lists/', list_name, '?api-key=', books_api_key)
if (length(na.exclude(params)) > 0){
books_query <- paste0(books_query, '&', paste(params, collapse = '&'))
}
fromJSON(books_query)
}
Requests to the API return a list, which contain the following:
status)copyright)num_results)results)For this exercise, only the results are of interest.
Now that data can easily be requested and the responses understood, data is requested and stored in R.
The set of bestsellers list names can be obtained by querying the API for the list names.
list_names <- books_request('names')$results
| list_name | display_name | list_name_encoded | oldest_published_date | newest_published_date | updated |
|---|---|---|---|---|---|
| Combined Print and E-Book Fiction | Combined Print & E-Book Fiction | combined-print-and-e-book-fiction | 2011-02-13 | 2016-04-10 | WEEKLY |
| Combined Print and E-Book Nonfiction | Combined Print & E-Book Nonfiction | combined-print-and-e-book-nonfiction | 2011-02-13 | 2016-04-10 | WEEKLY |
| Hardcover Fiction | Hardcover Fiction | hardcover-fiction | 2008-06-08 | 2016-04-10 | WEEKLY |
| Hardcover Nonfiction | Hardcover Nonfiction | hardcover-nonfiction | 2008-06-08 | 2016-04-10 | WEEKLY |
| Trade Fiction Paperback | Paperback Trade Fiction | trade-fiction-paperback | 2008-06-08 | 2016-04-10 | WEEKLY |
| Mass Market Paperback | Paperback Mass-Market Fiction | mass-market-paperback | 2008-06-08 | 2016-04-10 | WEEKLY |
| Paperback Nonfiction | Paperback Nonfiction | paperback-nonfiction | 2008-06-08 | 2016-04-10 | WEEKLY |
| E-Book Fiction | E-Book Fiction | e-book-fiction | 2011-02-13 | 2016-04-10 | WEEKLY |
| E-Book Nonfiction | E-Book Nonfiction | e-book-nonfiction | 2011-02-13 | 2016-04-10 | WEEKLY |
The second item in this list, Combined Print & E-Book Fiction, will be used – this list considers all fiction, regardless of format or sub-genre.
The current Combined Print and E-Book Fiction bestsellers list is obtained from the API using the encoded name:
books_response <- books_request('combined-print-and-e-book-fiction')$results
str(books_response, max.level = 1)
List of 9
$ list_name : chr "Combined Print and E-Book Fiction"
$ list_name_encoded : NULL
$ bestsellers_date : chr "2016-03-26"
$ published_date : chr "2016-04-10"
$ display_name : chr "Combined Print & E-Book Fiction"
$ normal_list_ends_at: int 15
$ updated : chr "WEEKLY"
$ books :'data.frame': 20 obs. of 22 variables:
$ corrections : list()
In this response, results is formatted as a list containing information about the list itself in addition to the list itself. Accordingly, the books item within the results list will need to be considered:
books_df <- books_response$books
class(books_df)
[1] "data.frame"
The resulting data frame contains a large amount of information outside of the scope of this investigation. The desired columns are selected with the dplyr package:
library(dplyr)
books_df <- books_df %>% select(rank, rank_last_week, weeks_on_list, title, author, publisher, primary_isbn10)
names(books_df) <- c("Rank", "Last", "Weeks", "Title", "Author", "Publisher", "ISBN10")
| Rank | Last | Weeks | Title | Author | Publisher | ISBN10 |
|---|---|---|---|---|---|---|
| 1 | 0 | 1 | FOOL ME ONCE | Harlan Coben | Dutton | 0698404173 |
| 2 | 1 | 2 | PRIVATE PARIS | James Patterson and Mark Sullivan | Little, Brown | 0316408999 |
| 3 | 0 | 1 | THE NEST | Cynthia D’Aprix Sweeney | Ecco/HarperCollins | 0062414232 |
| 4 | 4 | 8 | ME BEFORE YOU | Jojo Moyes | Penguin | 0143124544 |
| 5 | 2 | 2 | PROPERTY OF A NOBLEWOMAN | Danielle Steel | Delacorte | 034553106X |
| 6 | 5 | 48 | THE NIGHTINGALE | Kristin Hannah | St. Martin’s | 1466850604 |
| 7 | 0 | 1 | THE SUMMER BEFORE THE WAR | Helen Simonson | Random House | 0679644644 |
| 8 | 0 | 9 | THE GUILTY | David Baldacci | Grand Central | 1455586439 |
| 9 | 7 | 5 | THE WEDDING DRESS | Rachel Hauck | Thomas Nelson | 1401686311 |
| 10 | 0 | 1 | THE WEDDING | Nicholas Sparks | Grand Central | 0759507910 |
| 11 | 8 | 60 | THE GIRL ON THE TRAIN | Paula Hawkins | Riverhead | 1594633665 |
| 12 | 0 | 2 | THE THIRD GATE | Lincoln Child | Anchor | 0385531397 |
| 13 | 11 | 78 | ALL THE LIGHT WE CANNOT SEE | Anthony Doerr | Scribner | 1476746583 |
| 14 | 12 | 10 | THE LIAR | Nora Roberts | Putnam | 1101989750 |
| 15 | 9 | 3 | OFF THE GRID | C J Box | Putnam | None |
| 16 | 0 | 0 | HEARTBREAKER | Linda Howard | Avon Impulse | 006242226X |
| 17 | 0 | 0 | ROOM | Emma Donoghue | Little, Brown | 0316098329 |
| 18 | 0 | 0 | MEMORY MAN | David Baldacci | Grand Central | 1455559806 |
| 19 | 0 | 0 | A MAN CALLED OVE | Fredrik Backman | Washington Square Press | 1476738025 |
| 20 | 0 | 0 | THE MARTIAN | Andy Weir | Crown | 0553418025 |