Abstract

Using the NY Times API, we can retrieve data that corresponds to our needs. Instead of manual web scrapping or writing emails, we can programmatically assess bestselling books. This is done with an HTML request which returns a JSON data structure. It is a highly efficient work practice that is used everyday because it makes open data accessible.

Data Sources

The NY Times API is documented and can be found here: https://developer.nytimes.com/docs/books-product/1/overview

ConstructApiCall <- function(url, ..., key = paste0("api-key=", apiKey)) {
  paste0(url, "?", paste(..., key, sep = "&"))
}

ApiCall <- function(url) {
  results = httr::GET(url)
  results$url = gsub("api-key=.*", "", results$url)
  results$request$url = gsub("api-key=.*", "", results$request$url)
  
  if (results$status_code >= 100 & results$status_code <= 299) {
    return(rawToChar(results$content))
  } else {
    warning(results$status_code)
    return(rawToChar(results$content))
  }
}

LessWords <- function(df) {
  apply(df, 2, substring, 0L, 20L)
}

Available Datasets

According to the NY Times Books API, they divide book into categories and the list is pulled through the API. Thus, we can use the GET /lists/names.json endpoint to get the bestsellers types of books. This will help us refine our search later to non-fiction.

timesBaseUrl = "https://api.nytimes.com/svc/books/v3/"

paste0(timesBaseUrl, "/lists/names.json") %>%
  ConstructApiCall(.) %>%
    ApiCall(.) %>%
      jsonlite::fromJSON(.) %>%
        .$results %>% 
          .[grepl("nonfiction", .[[3]], ignore.case = TRUE), ]
##                               list_name
## 2  Combined Print and E-Book Nonfiction
## 4                  Hardcover Nonfiction
## 7                  Paperback Nonfiction
## 9                     E-Book Nonfiction
## 17            Combined Print Nonfiction
## 32                     Audio Nonfiction
##                                 display_name
## 2         Combined Print & E-Book Nonfiction
## 4                       Hardcover Nonfiction
## 7                       Paperback Nonfiction
## 9                          E-Book Nonfiction
## 17 Combined Hardcover & Paperback Nonfiction
## 32                          Audio Nonfiction
##                       list_name_encoded oldest_published_date
## 2  combined-print-and-e-book-nonfiction            2011-02-13
## 4                  hardcover-nonfiction            2008-06-08
## 7                  paperback-nonfiction            2008-06-08
## 9                     e-book-nonfiction            2011-02-13
## 17            combined-print-nonfiction            2011-02-13
## 32                     audio-nonfiction            2018-03-11
##    newest_published_date updated
## 2             2022-12-11  WEEKLY
## 4             2022-12-11  WEEKLY
## 7             2022-12-11  WEEKLY
## 9             2017-01-29  WEEKLY
## 17            2013-05-12  WEEKLY
## 32            2022-11-13 MONTHLY

Non-Fiction Hardcover Books

Now that we know the necessary encoded name, we can construct a call on the API. With the GET /lists.json endpoint, we can get a vector and convert it to a data frame. The parameters are appended to the end of the URL to specify list and date. These are called query parameters.

paste0(timesBaseUrl, "lists.json") %>%
  ConstructApiCall(., "list=hardcover-nonfiction", "published-date=2014-04-20") %>%
    ApiCall(.) %>%
      jsonlite::fromJSON(.) %>%
        .$results %>%
          .$book_details %>%
            do.call(rbind, .) %>%
              LessWords(.) %>%
                head(.)
##      title                  description             contributor           
## [1,] "FLASH BOYS"           "The world of high-fr"  "by Michael Lewis"    
## [2,] "DON'T HURT PEOPLE AN" "A libertarian manife"  "by Matt Kibbe"       
## [3,] "THRIVE"               "Personal well-being "  "by Arianna Huffingto"
## [4,] "10% HAPPIER"          "A co-anchor of \"Nigh" "by Dan Harris"       
## [5,] "CALL TO ACTION"       "The former president"  "by Jimmy Carter"     
## [6,] "THE WOMEN OF DUCK CO" "Kay, Korie, Missy, J"  "by Kay Robertson and"
##      author                 contributor_note price  age_group
## [1,] "Michael Lewis"        ""               "0.00" ""       
## [2,] "Matt Kibbe"           ""               "0.00" ""       
## [3,] "Arianna Huffington"   ""               "0.00" ""       
## [4,] "Dan Harris"           ""               "0.00" ""       
## [5,] "Jimmy Carter"         ""               "0.00" ""       
## [6,] "Kay Robertson and ot" ""               "0.00" ""       
##      publisher              primary_isbn13  primary_isbn10
## [1,] "Norton"               "9780393244663" "0393244660"  
## [2,] "Morrow/HarperCollins" "9780062308252" "0062308254"  
## [3,] "Harmony"              "9780804140843" "0804140847"  
## [4,] "It Books"             "9780062265425" "0062265423"  
## [5,] "Simon & Schuster"     "9781476773957" "1476773955"  
## [6,] "Howard Books"         "9781476763309" "1476763305"

Ranking the Non-Fiction Hardcover Books

Alternatively, they provide another endpoint where the book details are not nested. This is preferable for our use case if we can start with a data frame instead of making it ourselves. The GET /lists/{date}/{list}.json endpoint will give us all non-fiction books from 2014. These correspond to path parameters because they are embedded in the URL. Path parameters are adjusted by inserting them directly into the link as shown. If we had more than 20 results, an optional query parameter is provided as the offset.

paste0(timesBaseUrl, "lists/2014-04-20/hardcover-nonfiction.json") %>%
  ConstructApiCall(., "offset=0") %>%
    ApiCall(.) %>%
      jsonlite::fromJSON(.) %>%
        .$results %>%
          .$books %>%
            LessWords(.) %>%
              head(.)
##      rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
## [1,] "1"  "0"            "1"           "0"      "0"    "0393244660"  
## [2,] "2"  "0"            "1"           "0"      "0"    "0062308254"  
## [3,] "3"  "1"            "2"           "0"      "0"    "0804140847"  
## [4,] "4"  "4"            "4"           "0"      "0"    "0062265423"  
## [5,] "5"  "2"            "2"           "0"      "0"    "1476773955"  
## [6,] "6"  "0"            "1"           "0"      "0"    "1476763305"  
##      primary_isbn13  publisher              description             price 
## [1,] "9780393244663" "Norton"               "The world of high-fr"  "0.00"
## [2,] "9780062308252" "Morrow/HarperCollins" "A libertarian manife"  "0.00"
## [3,] "9780804140843" "Harmony"              "Personal well-being "  "0.00"
## [4,] "9780062265425" "It Books"             "A co-anchor of \"Nigh" "0.00"
## [5,] "9781476773957" "Simon & Schuster"     "The former president"  "0.00"
## [6,] "9781476763309" "Howard Books"         "Kay, Korie, Missy, J"  "0.00"
##      title                  author                 contributor           
## [1,] "FLASH BOYS"           "Michael Lewis"        "by Michael Lewis"    
## [2,] "DON'T HURT PEOPLE AN" "Matt Kibbe"           "by Matt Kibbe"       
## [3,] "THRIVE"               "Arianna Huffington"   "by Arianna Huffingto"
## [4,] "10% HAPPIER"          "Dan Harris"           "by Dan Harris"       
## [5,] "CALL TO ACTION"       "Jimmy Carter"         "by Jimmy Carter"     
## [6,] "THE WOMEN OF DUCK CO" "Kay Robertson and ot" "by Kay Robertson and"
##      contributor_note book_image             book_image_width book_image_height
## [1,] ""               "https://storage.goog" "128"            "194"            
## [2,] ""               "https://storage.goog" "128"            "194"            
## [3,] ""               "https://storage.goog" "128"            "192"            
## [4,] ""               "https://storage.goog" "128"            "193"            
## [5,] ""               "https://storage.goog" "128"            "193"            
## [6,] ""               "https://storage.goog" "128"            "194"            
##      amazon_product_url     age_group book_review_link first_chapter_link
## [1,] "http://www.amazon.co" ""        ""               ""                
## [2,] "http://www.amazon.co" ""        ""               ""                
## [3,] "http://www.amazon.co" ""        ""               ""                
## [4,] "http://www.amazon.co" ""        ""               ""                
## [5,] "http://www.amazon.co" ""        ""               ""                
## [6,] "http://www.amazon.co" ""        ""               ""                
##      sunday_review_link     article_chapter_link isbns                  
## [1,] "https://www.nytimes." ""                   "list(isbn10 = c(\"039"
## [2,] ""                     ""                   "list(isbn10 = c(\"006"
## [3,] ""                     ""                   "list(isbn10 = c(\"080"
## [4,] ""                     ""                   "list(isbn10 = c(\"006"
## [5,] "https://www.nytimes." ""                   "list(isbn10 = c(\"147"
## [6,] ""                     ""                   "list(isbn10 = c(\"147"
##      buy_links               book_uri              
## [1,] "list(name = c(\"Amazo" "nyt://book/61841759-"
## [2,] "list(name = c(\"Amazo" "nyt://book/ec06256f-"
## [3,] "list(name = c(\"Amazo" "nyt://book/b7f48aa7-"
## [4,] "list(name = c(\"Amazo" "nyt://book/987d55e8-"
## [5,] "list(name = c(\"Amazo" "nyt://book/62436613-"
## [6,] "list(name = c(\"Amazo" "nyt://book/612d44cc-"