Assignment 9 Code Base Submission

Author

Long Lin

Overview

For this assignment, I used the public New York Times API endpoint for books to aid in data analysis. Using the API, I wanted to figure out the most recent best sellers for paperback nonfiction books via the Books API. I choose this because I don’t read often, but when I have free time, I enjoy an occasional nonfiction book.

To start off, I made a request to the https://api.nytimes.com/svc/books/v3/lists/2026-03-29/paperback-nonfiction.json endpoint. To access the endpoint, I appended my api key to the end of the call in order to authenticate the request. I used the httr2 library and the request function to do this. I also stored my api key as a environment variable because it is bad practice to hard code the api key in the code chunk. The endpoint returns a response that includes the best seller list for paperback nonfiction books on March 29, 2026.

library(httr2)
library(jsonlite)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
my_key <- Sys.getenv("NYT_API_KEY")

req <- request(paste0("https://api.nytimes.com/svc/books/v3/lists/2026-03-29/paperback-nonfiction.json?api-key=", my_key))

resp <- req_perform(req)

With the response from the api, I used the resp_body_json() function to unpack the data in the response into a standard R list. From there, I was able to go into the nested results variable within and then the books variable within the results to get a list of the books.

resp_list <- resp |> resp_body_json()

results <- resp_list$results

books <- results$books

Next, I created a books_df data frame from the list of books by using the tibble and unnest_wider functions. From there, I also cleaned up the names of the columns by removing the data_ prefix by using the gsub function.

books_df <- tibble(data = books) |>
  unnest_wider(data, names_sep = "_")

colnames(books_df) <- gsub("data_", "", colnames(books_df))

head(books_df)
# A tibble: 6 × 28
  age_group amazon_product_url   article_chapter_link asterisk author book_image
  <chr>     <chr>                <chr>                   <int> <chr>  <chr>     
1 ""        http://www.amazon.c… ""                          0 Besse… https://s…
2 ""        https://www.amazon.… ""                          0 Chloe… https://s…
3 ""        https://www.amazon.… ""                          0 Erik … https://s…
4 ""        https://www.amazon.… ""                          0 Jenne… https://s…
5 ""        https://www.amazon.… ""                          0 Eliza… https://s…
6 ""        https://www.amazon.… ""                          0 Wendy… https://s…
# ℹ 22 more variables: book_image_height <int>, book_image_width <int>,
#   book_review_link <chr>, book_uri <chr>, contributor <chr>,
#   contributor_note <chr>, created_date <chr>, dagger <int>,
#   description <chr>, first_chapter_link <chr>, price <chr>,
#   primary_isbn10 <chr>, primary_isbn13 <chr>, publisher <chr>, rank <int>,
#   rank_last_week <int>, sunday_review_link <chr>, title <chr>,
#   updated_date <chr>, weeks_on_list <int>, isbns <list>, buy_links <list>

Finally, I created a new data frame with only the columns that I was interested in by using the select function in the dplyr library.

library(dplyr)

paperback_nonfiction_bestsellers <- books_df |>
  select(rank, title, author, publisher)

head(paperback_nonfiction_bestsellers, 50)
# A tibble: 15 × 4
    rank title                                author              publisher     
   <int> <chr>                                <chr>               <chr>         
 1     1 THE BODY KEEPS THE SCORE             Bessel van der Kolk Penguin       
 2     2 RAISING HARE                         Chloe Dalton        Vintage       
 3     3 THE DEMON OF UNREST                  Erik Larson         Crown         
 4     4 I'M GLAD MY MOM DIED                 Jennette McCurdy    Simon & Schus…
 5     5 ONCE UPON A TIME                     Elizabeth Beller    Gallery       
 6     6 BORN SURVIVORS                       Wendy Holden        Harper Perenn…
 7     7 HOW TO HIDE AN EMPIRE                Daniel Immerwahr    Picador       
 8     8 THINKING, FAST AND SLOW              Daniel Kahneman     Farrar, Strau…
 9     9 SIGNS                                Laura Lynne Jackson Dial          
10    10 THE BEGINNING COMES AFTER THE END    Rebecca Solnit      Haymarket     
11    11 THE NAZI AND THE PSYCHIATRIST        Jack El-Hai         PublicAffairs 
12    12 WHAT REMAINS                         Carole Radziwill    Scribner      
13    13 BRAIDING SWEETGRASS                  Robin Wall Kimmerer Milkweed Edit…
14    14 THE GREATEST TEMPLAR TALE NEVER TOLD Scott F. Wolter     North Star    
15    15 BREATH                               James Nestor        Riverhead     

Conclusion

During this assignment, I had to figure out how to deal with the nested data that the API returned. In order to do this, I used the unnest_wider function to create a data frame from the books variable of the results variable in the response. This created a lot of columns with missing data but I was not interested in those specific columns. To make it tidy, I created another data frame that included only the columns that I wanted and was interested in. The final data frame shows the rank, title, author, and publisher of the book.