NY Times offers quite a few different APIs to be worked with.
The one I will be using is the Books API that focuses on the specific
books and their rankings within the best seller lists for each category.
As I explore the different endpoints, I hope to organize a comprehensive
dataframe that combines all of the categories, ie. hardcover fiction, so
that we can have a singular dataframe to view most information. I may
leave out certain things such as book image width and length or other
uninteresting info. In effect, I am converting their JSON files back
into a table that could easily be processed by some SQL software or
similar.
To start, let’s see all the different lists that are available
to us.The below code chunk reveals there are 59 different list titles
that we can parse for their own list of books to be consolidated into
one massive dataframe. Let’s get started with a loop pulling each and
setting it as a variable. After that I will drop unnecessary columns and
start combining them.
To create our loop we are going to need to pass a series of
arguments to our preferred function fromJSON() letting it know which
URLs to query. Luckily NY Times has a call to get a list of all the
different options that can be called elsewhere. First, we collect our
calls.
l <- fromJSON("https://api.nytimes.com/svc/books/v3/lists/names.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy")
genres <- l$results$list_name
genres <- str_replace_all(genres," ","-")
genres[17] <- "Combined-Print-Nonfiction"
#genres <- data.frame(genres)
head(genres,10) #I chose print instead of paged_tables, because it requires genres to be a data frame and if I change it to that our genreCalls line will create incorrect output
## [1] "Combined-Print-and-E-Book-Fiction"
## [2] "Combined-Print-and-E-Book-Nonfiction"
## [3] "Hardcover-Fiction"
## [4] "Hardcover-Nonfiction"
## [5] "Trade-Fiction-Paperback"
## [6] "Mass-Market-Paperback"
## [7] "Paperback-Nonfiction"
## [8] "E-Book-Fiction"
## [9] "E-Book-Nonfiction"
## [10] "Hardcover-Advice"
There are 59 categories in total. Above I only showed 10 for
readability. I saved the list names and changed the 17th one since I was
originally getting a URL-not-found error during troubleshooting and took
a wild stab that maybe nonfiction needed to spelled as one word. I was
right!
From here, we create our genre calls with the url text
and can view our finished values.
genreCalls <- paste0("https://api.nytimes.com/svc/books/v3/lists/current/",genres,".json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy") #Create the text for the fromJSON function
#t <- fromJSON("https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy")
#t$results
head(genreCalls, 5)
## [1] "https://api.nytimes.com/svc/books/v3/lists/current/Combined-Print-and-E-Book-Fiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"
## [2] "https://api.nytimes.com/svc/books/v3/lists/current/Combined-Print-and-E-Book-Nonfiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"
## [3] "https://api.nytimes.com/svc/books/v3/lists/current/Hardcover-Fiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"
## [4] "https://api.nytimes.com/svc/books/v3/lists/current/Hardcover-Nonfiction.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"
## [5] "https://api.nytimes.com/svc/books/v3/lists/current/Trade-Fiction-Paperback.json?api-key=US3wVrAhtL0siGxqkfY2ufhKr5gweMYy"
The loop below has a 10 second sleep timer in there. From
troubleshooting, I found there tended to be failure if I did anything of
certain volumes and figured NY Times might have some kind of limit. In
the loop, we create a list of the results dataframes from each
genre/category
# I would like to re-do this loop with lapply() since it should be faster ignoring the 10 second wait timer
list <- list() #Create empty list that we can dump all of our dataframes into
for (i in 1:length(genreCalls)) { # iterate for the length of genreCalls
test <- fromJSON(genreCalls[i]) #get the info
books <- data.frame(test$results$books) #save the dataframe we are interested in to a variable that will be recognized as a df
books <- books %>% select(rank, rank_last_week,weeks_on_list,primary_isbn10,primary_isbn13,publisher,title, author, contributor,book_image,amazon_product_url,age_group) %>% mutate(category = genres[i])
# select only columns of interest and add a column referencing what category the rows are from
assign(paste0("results",genres[i]),books)
#assign takes in a variable name, which you can create dynamically, and values to be assigned to that name, this allows us to create variables dynamically
Sys.sleep(10) #wait timer so NY Times does not flag me
}
From here, we make out mega dataframe with all the results
from each API call.
list <- mget(x = ls(pattern = '^results')) #retrieves every variable in the environment starting with 'results'
list <- list[-1] #dropping the first element of the list since it is NULL from when we created it before the for loop
megaDF <- bind_rows(list) #I love this function like no other
paged_table(megaDF)
Now we can filter books by category instead of having to make the
calls again. This could be useful if one wanted a clean historical
report of all best sellers. If we wanted one for weekly/monthly updated
lists we could repeat this process but call a different endpoint:
/lists/current/category.json
It’s interesting to note that
sometimes a book ends up as a best seller in multiple categories, ie.It
Ends With Us by Colleen Hoover or The Body Keeps the Score by Bessel van
der Kolk. Below is the entire table ordered by descending weeks on the
top seller list to see the champions over the past years.
unique(megaDF$category)
## [1] "Animals"
## [2] "Audio-Fiction"
## [3] "Audio-Nonfiction"
## [4] "Business-Books"
## [5] "Celebrities"
## [6] "Chapter-Books"
## [7] "Childrens-Middle-Grade"
## [8] "Childrens-Middle-Grade-E-Book"
## [9] "Childrens-Middle-Grade-Hardcover"
## [10] "Childrens-Middle-Grade-Paperback"
## [11] "Combined-Print-and-E-Book-Fiction"
## [12] "Combined-Print-and-E-Book-Nonfiction"
## [13] "Combined-Print-Fiction"
## [14] "Combined-Print-Nonfiction"
## [15] "Crime-and-Punishment"
## [16] "Culture"
## [17] "E-Book-Fiction"
## [18] "E-Book-Nonfiction"
## [19] "Education"
## [20] "Espionage"
## [21] "Expeditions-Disasters-and-Adventures"
## [22] "Family"
## [23] "Fashion-Manners-and-Customs"
## [24] "Food-and-Fitness"
## [25] "Games-and-Activities"
## [26] "Graphic-Books-and-Manga"
## [27] "Hardcover-Advice"
## [28] "Hardcover-Business-Books"
## [29] "Hardcover-Fiction"
## [30] "Hardcover-Graphic-Books"
## [31] "Hardcover-Nonfiction"
## [32] "Hardcover-Political-Books"
## [33] "Health"
## [34] "Humor"
## [35] "Indigenous-Americans"
## [36] "Manga"
## [37] "Mass-Market-Monthly"
## [38] "Mass-Market-Paperback"
## [39] "Middle-Grade-Paperback-Monthly"
## [40] "Paperback-Advice"
## [41] "Paperback-Books"
## [42] "Paperback-Business-Books"
## [43] "Paperback-Graphic-Books"
## [44] "Paperback-Nonfiction"
## [45] "Picture-Books"
## [46] "Race-and-Civil-Rights"
## [47] "Relationships"
## [48] "Religion-Spirituality-and-Faith"
## [49] "Science"
## [50] "Series-Books"
## [51] "Sports"
## [52] "Trade-Fiction-Paperback"
## [53] "Travel"
## [54] "Young-Adult"
## [55] "Young-Adult-E-Book"
## [56] "Young-Adult-Hardcover"
## [57] "Young-Adult-Paperback"
## [58] "Young-Adult-Paperback-Monthly"
paged_table(megaDF %>% arrange(desc(weeks_on_list)))