For this assignment, I am using the NYT Books API specifically the overview.json endpoint. Unlike a single category list, I chose the overview because it provides a snapshot of all the Best Sellers lists for a given week. My goal is to find out which book is currently holding the top spot across all categories the longest time. Using the overview service, I can compare # 1 books across all the genres on the list.
Approach
I am using the modern httr2 package to handle the API request. This allows for a piped workflow where I define the request, add my API key as a query parameter, and perform the request. Because the overview endpoint returns a nested list structure, I use resp_body_json() and functions from purrr to extract the top-ranked book from each bestseller list and combine the results into a tidy data frame.
Challenges
One challenge I anticipate is working with the nested JSON structure returned by the overview.json endpoint. The response includes multiple bestseller lists inside the results object, and each list contains its own set of books, so I will need to carefully flatten that structure into a tidy data frame. Another challenge is deciding the right unit of analysis for my question. Since I want to compare the current #1 books across all categories, I will need to isolate only the top-ranked book from each list and then compare those books based on how many weeks they have remained on the list. I also expect that some fields may be blank or inconsistent across categories, which may require light cleaning before analysis.
Load Packages
library(httr2)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(purrr)library(tibble)
nyt_key <-Sys.getenv("NYT_API_KEY")
if (nyt_key =="") {stop("NYT_API_KEY was not found.")}
The top_answer table shows the current #1 book, or books if there is a tie, that have remained on their bestseller list the longest across all NYT categories in the current overview response.
Data-Cleaning Notes
In order to create a tidy data frame, I extracted the nested lists object from the API response and then isolated the book with rank == 1 from each list. I kept one row per bestseller category and selected only the fields needed for the analysis, such as title, author, publisher, rank, and weeks on list. I also converted date fields to Date and numeric ranking fields to integers so they could be sorted and compared correctly.
Conclusions
This analysis compares the current #1 book in each NYT bestseller category and identifies which one has been on its list the longest. A useful next step would be to collect overview data across multiple weeks to study how category leaders change over time.
AI Transcript
I used ChatGPT as a learning tool while working on this assignment. Instead of using it to complete the assignment for me, I used it to help me understand the NYT developer site, the difference between the Books API endpoints, and how to safely store and use an API key in R.
It was especially helpful as a teaching resource when I was trying to understand the structure of the overview.json response. Because the response is nested, I needed to learn how the lists and books were organized before I could create a tidy data frame. ChatGPT helped me think through that structure step by step and helped me understand how to isolate the top-ranked book from each category.
I also used ChatGPT to better understand how to organize my Quarto document so that it met the assignment requirements. Overall, it was most useful as a learning and troubleshooting tool that supported my understanding of APIs, JSON parsing, and data cleaning in R.