Week 9 Approach

Author

Theresa Benny

Approach Deliverable

Overview

For this assignment, I plan to use the New York Times Books API to retrieve data about current best sellers. I chose the Books API because it provides structured book data that is relatively easy to work with in JSON format, while still giving enough detail to practice API authentication, parsing, and tidying in R. The Books API includes services for New York Times Best Sellers lists and book reviews. One available best-seller endpoint returns details for a specific list, such as hardcover-fiction, using the Books API v3 list service. (GitHub)

My guiding question will be:

What are the top 5 current hardcover fiction best sellers, and what do their ranks, weeks on the list, and publishers suggest about the books currently performing best?

This question is narrow enough to complete clearly for the assignment, but still allows for basic analysis beyond simply pulling the data.

Planned Approach

I will use the Best Sellers list details portion of the NYT Books API and request the current hardcover-fiction list. The Books API documentation describes a list-details endpoint for a specific list name and date, including requests such as current/hardcover-fiction.json. (GitHub)

In R, I plan to:

  1. Store my API key securely
    I will not hard code the API key in my Quarto document. Instead, I will store it in an environment variable and access it with Sys.getenv("NYT_API_KEY"). This keeps the key out of the shared file and follows the assignment requirement.

  2. Make the API request
    I will send a GET request to the Books API endpoint for the current hardcover fiction list. I expect to include my API key as a parameter in the request URL.

  3. Parse the JSON response
    The response will likely contain top-level metadata plus a nested results section. I will inspect the structure carefully to identify which part contains the actual list of books.

  4. Convert the data into a tidy data frame
    From the response, I will extract only the fields that are most useful for analysis. For example, I may keep:

    • title

    • author

    • rank

    • weeks_on_list

    • publisher

    • description

    I will then place these into a tibble so the data is clean and easy to analyze.

  5. Do a small amount of analysis
    Since my question focuses on the top 5 books, I will examine the first five ranked entries and describe any patterns I notice. For example, I may look at whether the highest-ranked books also have the longest time on the list, or whether certain publishers appear more than once.

Expected Data Cleaning / Tidying Decisions

I expect the JSON response to contain nested fields, so one of the main cleaning tasks will be identifying where the actual book entries live and flattening that structure into columns.

Some likely data-cleaning decisions include:

  • Selecting only relevant columns
    The API may return more information than I need, so I will keep only the variables that help answer my question.

  • Handling nested content
    Some fields may be stored inside a nested list structure rather than in a flat table. I may need to unnest or map over part of the response before converting it into a tibble.

  • Checking for missing values
    Some descriptive fields may be blank or missing for certain books. If that happens, I will note it and either keep the missing values as NA or exclude those fields from interpretation if they are not central to the question.

  • Making column names readable
    If needed, I will rename columns so they are easier to understand in the final data frame.

Anticipated Challenges

One challenge may be understanding the structure of the JSON response, especially if the returned object has multiple layers. I will likely need to inspect the response with tools like str() or names() before deciding exactly how to extract the books table.

Another challenge is that the API returns live data, so the exact books and rankings may change over time. Because of that, my results may not match someone else’s if they run the code on a different day.

A smaller challenge may be deciding how much of the returned data is useful. The API can include more detail than is needed for a simple assignment, so I will need to stay focused on the fields that help answer my question clearly.

Deliverable Plan

My final Quarto document will include:

  • the selected API and a short description of what it provides

  • the endpoint I used and the request parameters

  • code showing authentication through an environment variable

  • code to request and parse the JSON response

  • code that creates a tidy data frame

  • a short explanation of data-cleaning choices

  • a brief interpretation of the top 5 hardcover fiction best sellers

Codebase

#Load packages

library(httr2)
library(jsonlite)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tibble)

api_key <- Sys.getenv("NYT_API_KEY")


## Make API Request

# Base URL for NYT Books API (hardcover fiction list)
url <- "https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json"

# Add API key as query parameter
request <- request(url) %>%
  req_url_query(`api-key` = api_key)

# Perform the request
response <- request %>%
  req_perform()

# Check response
response
<httr2_response>
GET https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=KiG4CLzvGvdkpqU6lAAZlCws6VJbAEsBMqWJ14FgTXTAqo8I
Status: 200 OK
Content-Type: application/json
Body: In memory (23996 bytes)
# Parse JSON
# Convert response to text
json_data <- response %>%
  resp_body_string()

# Convert JSON text to R list
parsed_data <- jsonlite::fromJSON(json_data)

# Look at structure
str(parsed_data, max.level = 2)
List of 5
 $ status       : chr "OK"
 $ copyright    : chr "Copyright (c) 2026 The New York Times Company. All Rights Reserved."
 $ num_results  : int 15
 $ last_modified: chr "2026-03-25T22:37:48Z"
 $ results      :List of 12
  ..$ display_name           : chr "Hardcover Fiction"
  ..$ list_name              : chr "Hardcover Fiction"
  ..$ list_name_encoded      : chr "hardcover-fiction"
  ..$ previous_published_date: chr "2026-03-29"
  ..$ published_date         : chr "2026-04-05"
  ..$ bestsellers_date       : chr "2026-03-21"
  ..$ normal_list_ends_at    : int 15
  ..$ updated                : chr "WEEKLY"
  ..$ list_id                : int 1
  ..$ uri                    : chr "nyt://bestsellerslist/aa7d0d6a-4071-5737-ad6e-079ba050ef04"
  ..$ books                  :'data.frame': 15 obs. of  28 variables:
  ..$ corrections            : list()
# Extract books data

books_df <- parsed_data$results$books

# Preview the data
glimpse(books_df)
Rows: 15
Columns: 28
$ age_group            <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
$ amazon_product_url   <chr> "https://www.amazon.com/dp/0316579831?tag=thenewy…
$ article_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
$ asterisk             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ author               <chr> "Viola Davis and James Patterson", "Virginia Evan…
$ book_image           <chr> "https://static01.nyt.com/bestsellers/images/9780…
$ book_image_height    <int> 500, 0, 500, 0, 500, 0, 500, 500, 400, 400, 0, 0,…
$ book_image_width     <int> 322, 0, 331, 0, 313, 0, 327, 331, 312, 312, 0, 0,…
$ book_review_link     <chr> "", "", "", "", "", "", "", "https://www.nytimes.…
$ book_uri             <chr> "nyt://book/6f72e477-90a5-5cbc-9f2d-d96c1590adeb"…
$ contributor          <chr> "by Viola Davis and James Patterson", "by Virgini…
$ contributor_note     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
$ created_date         <chr> "2026-03-18T22:38:16.974Z", "2026-03-04T23:40:21.…
$ dagger               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
$ description          <chr> "Judge Mary Stone oversees an ethically complex c…
$ first_chapter_link   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
$ price                <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "…
$ primary_isbn10       <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
$ primary_isbn13       <chr> "9780316579834", "9780593798430", "9781538743027"…
$ publisher            <chr> "Little, Brown and JVL", "Crown", "Grand Central"…
$ rank                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
$ rank_last_week       <int> 1, 3, 0, 8, 0, 5, 0, 4, 7, 12, 10, 11, 6, 0, 0
$ sunday_review_link   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
$ title                <chr> "JUDGE STONE", "THE CORRESPONDENT", "BLOODLUST", …
$ updated_date         <chr> "2026-03-18T22:38:16.974Z", "2026-03-04T23:40:21.…
$ weeks_on_list        <int> 2, 21, 1, 8, 1, 10, 1, 4, 3, 5, 22, 26, 4, 2, 12
$ isbns                <list> [<data.frame[1 x 2]>], [<data.frame[1 x 2]>], [<d…
$ buy_links            <list> [<data.frame[5 x 2]>], [<data.frame[5 x 2]>], [<d…
#Create a tidy books table from the API

tidy_books <- books_df %>%
  select(
    rank,
    title,
    author,
    publisher,
    weeks_on_list,
    description
  ) %>%
  as_tibble()

tidy_books
# A tibble: 15 × 6
    rank title                        author publisher weeks_on_list description
   <int> <chr>                        <chr>  <chr>             <int> <chr>      
 1     1 JUDGE STONE                  Viola… Little, …             2 "Judge Mar…
 2     2 THE CORRESPONDENT            Virgi… Crown                21 "Letters f…
 3     3 BLOODLUST                    Sandr… Grand Ce…             1 "The mutua…
 4     4 MY HUSBAND'S WIFE            Alice… Pine & C…             8 "In an old…
 5     5 INNAMORATA                   Ava R… Del Rey               1 "Passion a…
 6     6 CARL'S DOOMSDAY SCENARIO     Matt … Ace                  10 "The secon…
 7     7 MOTHER OF DEATH AND DAWN     Caris… Bramble               1 "The third…
 8     8 KIN                          Tayar… Knopf                 4 "Vernice a…
 9     9 BETWEEN TWO FIRES            Chris… Tor Nigh…             3 "In 1348, …
10    10 THE DUNGEON ANARCHIST'S COO… Matt … Ace                   5 "The third…
11    11 THE WIDOW                    John … Doubleday            22 "When Simo…
12    12 ALCHEMISED                   SenLi… Del Rey              26 "After the…
13    13 THE CROSSROADS               C.J. … Putnam                4 "The 26th …
14    14 THE GATE OF THE FERAL GODS   Matt … Ace                   2 "The fourt…
15    15 DUNGEON CRAWLER CARL         Matt … Ace                  12 "A Coast G…
# Pull just top 5:

top_5_books <- tidy_books %>%
  arrange(rank) %>%
  slice_head(n = 5)

top_5_books
# A tibble: 5 × 6
   rank title             author             publisher weeks_on_list description
  <int> <chr>             <chr>              <chr>             <int> <chr>      
1     1 JUDGE STONE       Viola Davis and J… Little, …             2 Judge Mary…
2     2 THE CORRESPONDENT Virginia Evans     Crown                21 Letters fr…
3     3 BLOODLUST         Sandra Brown       Grand Ce…             1 The mutual…
4     4 MY HUSBAND'S WIFE Alice Feeney       Pine & C…             8 In an old …
5     5 INNAMORATA        Ava Reid           Del Rey               1 Passion an…
#And analysis

top_5_books %>%
  summarise(
    avg_weeks_on_list = mean(weeks_on_list),
    max_weeks_on_list = max(weeks_on_list),
    min_weeks_on_list = min(weeks_on_list)
  )
# A tibble: 1 × 3
  avg_weeks_on_list max_weeks_on_list min_weeks_on_list
              <dbl>             <int>             <int>
1               6.6                21                 1
#Looks like most books are on the list for an average of 6.6 weeks.