NY Times Books API


NY Times offers quite a few different APIs to be worked with. The one I will be using is the Books API that focuses on the specific books and their rankings within the best seller lists for each category. As I explore the different endpoints, I hope to organize a comprehensive dataframe that combines all of the categories, ie. hardcover fiction, so that we can have a singular dataframe to view most information. I may leave out certain things such as book image width and length or other uninteresting info. In effect, I am converting their JSON files back into a table that could easily be processed by some SQL software or similar.


To start, let’s see all the different lists that are available to us.The below code chunk reveals there are 59 different list titles that we can parse for their own list of books to be consolidated into one massive dataframe. Let’s get started with a loop pulling each and setting it as a variable. After that I will drop unnecessary columns and start combining them.

To create our loop we are going to need to pass a series of arguments to our preferred function fromJSON() letting it know which URLs to query. Luckily NY Times has a call to get a list of all the different options that can be called elsewhere. First, we collect our calls.

Finding All API Calls to Loop

l <- fromJSON("https://api.nytimes.com/svc/books/v3/lists/names.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy")

genres <- l$results$list_name
genres <- str_replace_all(genres," ","-")
genres[17] <- "Combined-Print-Nonfiction"
#genres <- data.frame(genres)
head(genres,10) #I chose print instead of paged_tables, because it requires genres to be a data frame and if I change it to that our genreCalls line will create incorrect output
##  [1] "Combined-Print-and-E-Book-Fiction"   
##  [2] "Combined-Print-and-E-Book-Nonfiction"
##  [3] "Hardcover-Fiction"                   
##  [4] "Hardcover-Nonfiction"                
##  [5] "Trade-Fiction-Paperback"             
##  [6] "Mass-Market-Paperback"               
##  [7] "Paperback-Nonfiction"                
##  [8] "E-Book-Fiction"                      
##  [9] "E-Book-Nonfiction"                   
## [10] "Hardcover-Advice"


There are 59 categories in total. Above I only showed 10 for readability. I saved the list names and changed the 17th one since I was originally getting a URL-not-found error during troubleshooting and took a wild stab that maybe nonfiction needed to spelled as one word. I was right!

From here, we create our genre calls with the url text and can view our finished values.

genreCalls <- paste0("https://api.nytimes.com/svc/books/v3/lists/current/",genres,".json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy") #Create the text for the fromJSON function

#t <- fromJSON("https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy")
#t$results
head(genreCalls, 5)
## [1] "https://api.nytimes.com/svc/books/v3/lists/current/Combined-Print-and-E-Book-Fiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"   
## [2] "https://api.nytimes.com/svc/books/v3/lists/current/Combined-Print-and-E-Book-Nonfiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"
## [3] "https://api.nytimes.com/svc/books/v3/lists/current/Hardcover-Fiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"                   
## [4] "https://api.nytimes.com/svc/books/v3/lists/current/Hardcover-Nonfiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"                
## [5] "https://api.nytimes.com/svc/books/v3/lists/current/Trade-Fiction-Paperback.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"

Looping API Calls

The loop below has a 10 second sleep timer in there. From troubleshooting, I found there tended to be failure if I did anything of certain volumes and figured NY Times might have some kind of limit. In the loop, we create a list of the results dataframes from each genre/category

# I would like to re-do this loop with lapply() since it should be faster ignoring the 10 second wait timer

list <- list() #Create empty list that we can dump all of our dataframes into

for (i in 1:length(genreCalls)) {    # iterate for the length of genreCalls

  test <- fromJSON(genreCalls[i])    #get the info
  
  books <- data.frame(test$results$books) #save the dataframe we are interested in to a variable that will be recognized as a df
  
  books <- books %>% select(rank, rank_last_week,weeks_on_list,primary_isbn10,primary_isbn13,publisher,title, author, contributor,book_image,amazon_product_url,age_group) %>% mutate(category = genres[i]) 
# select only columns of interest and add a column referencing what category the rows are from
  
  assign(paste0("results",genres[i]),books) 
#assign takes in a variable name, which you can create dynamically, and values to be assigned to that name, this allows us to create variables dynamically

  Sys.sleep(10) #wait timer so NY Times does not flag me
}



From here, we make out mega dataframe with all the results from each API call.

list <- mget(x = ls(pattern = '^results')) #retrieves every variable in the environment starting with 'results'
list <- list[-1] #dropping the first element of the list since it is NULL from when we created it before the for loop

megaDF <- bind_rows(list) #I love this function like no other

paged_table(megaDF)

The Full Dataframe

Now we can filter books by category instead of having to make the calls again. This could be useful if one wanted a clean historical report of all best sellers. If we wanted one for weekly/monthly updated lists we could repeat this process but call a different endpoint: /lists/current/category.json

It’s interesting to note that sometimes a book ends up as a best seller in multiple categories, ie.It Ends With Us by Colleen Hoover or The Body Keeps the Score by Bessel van der Kolk. Below is the entire table ordered by descending weeks on the top seller list to see the champions over the past years.

unique(megaDF$category)
##  [1] "Animals"                             
##  [2] "Audio-Fiction"                       
##  [3] "Audio-Nonfiction"                    
##  [4] "Business-Books"                      
##  [5] "Celebrities"                         
##  [6] "Chapter-Books"                       
##  [7] "Childrens-Middle-Grade"              
##  [8] "Childrens-Middle-Grade-E-Book"       
##  [9] "Childrens-Middle-Grade-Hardcover"    
## [10] "Childrens-Middle-Grade-Paperback"    
## [11] "Combined-Print-and-E-Book-Fiction"   
## [12] "Combined-Print-and-E-Book-Nonfiction"
## [13] "Combined-Print-Fiction"              
## [14] "Combined-Print-Nonfiction"           
## [15] "Crime-and-Punishment"                
## [16] "Culture"                             
## [17] "E-Book-Fiction"                      
## [18] "E-Book-Nonfiction"                   
## [19] "Education"                           
## [20] "Espionage"                           
## [21] "Expeditions-Disasters-and-Adventures"
## [22] "Family"                              
## [23] "Fashion-Manners-and-Customs"         
## [24] "Food-and-Fitness"                    
## [25] "Games-and-Activities"                
## [26] "Graphic-Books-and-Manga"             
## [27] "Hardcover-Advice"                    
## [28] "Hardcover-Business-Books"            
## [29] "Hardcover-Fiction"                   
## [30] "Hardcover-Graphic-Books"             
## [31] "Hardcover-Nonfiction"                
## [32] "Hardcover-Political-Books"           
## [33] "Health"                              
## [34] "Humor"                               
## [35] "Indigenous-Americans"                
## [36] "Manga"                               
## [37] "Mass-Market-Monthly"                 
## [38] "Mass-Market-Paperback"               
## [39] "Middle-Grade-Paperback-Monthly"      
## [40] "Paperback-Advice"                    
## [41] "Paperback-Books"                     
## [42] "Paperback-Business-Books"            
## [43] "Paperback-Graphic-Books"             
## [44] "Paperback-Nonfiction"                
## [45] "Picture-Books"                       
## [46] "Race-and-Civil-Rights"               
## [47] "Relationships"                       
## [48] "Religion-Spirituality-and-Faith"     
## [49] "Science"                             
## [50] "Series-Books"                        
## [51] "Sports"                              
## [52] "Trade-Fiction-Paperback"             
## [53] "Travel"                              
## [54] "Young-Adult"                         
## [55] "Young-Adult-E-Book"                  
## [56] "Young-Adult-Hardcover"               
## [57] "Young-Adult-Paperback"               
## [58] "Young-Adult-Paperback-Monthly"
paged_table(megaDF %>% arrange(desc(weeks_on_list)))