Week 9 Assignment

Overview

This assignment explores how to use an API to import JSON into R, and create a data frame.

The API that I chose is the NYT Books API.

Note: I didn’t realize when I did this assignment that it’s a deprecated API, so probably wouldn’t have chosen that one.

Load the Libraries

library(httr2)
library(purrr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(jsonlite)

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:purrr':
## 
##     flatten

library(stringr)
library(tidyjson)

## 
## Attaching package: 'tidyjson'

## The following object is masked from 'package:jsonlite':
## 
##     read_json

## The following object is masked from 'package:stats':
## 
##     filter

library(tidytext)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyjson::filter()    masks dplyr::filter(), stats::filter()
## ✖ jsonlite::flatten()   masks purrr::flatten()
## ✖ dplyr::lag()          masks stats::lag()
## ✖ tidyjson::read_json() masks jsonlite::read_json()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Read the Data

Read the current best sellers list for hardcover fiction:

# API Call
nyt_url <- "https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json"

# Get the API Key
api <- read_csv("https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-607/refs/heads/main/NYT_API.csv")

## Rows: 1 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): api_key
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

api_key <- api$api_key

# Must include API key in the URL
full_url <- paste(nyt_url, api_key, sep = "?api-key=")

# API Request
req <- fromJSON(full_url, flatten=TRUE)

# Actual df exists under requests -> books
df <- req$results$books
glimpse(df)

## Rows: 15
## Columns: 26
## $ rank                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
## $ rank_last_week       <int> 1, 0, 2, 0, 5, 3, 8, 11, 6, 10, 13, 0, 12, 7, 15
## $ weeks_on_list        <int> 9, 1, 72, 1, 38, 3, 59, 21, 89, 36, 87, 1, 4, 3, 3
## $ asterisk             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ dagger               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ primary_isbn10       <chr> "1649377150", "0316570001", "1649374178", "166807…
## $ primary_isbn13       <chr> "9781649377159", "9780316570008", "9781649374172"…
## $ publisher            <chr> "Red Tower", "Little, Brown", "Red Tower", "Saga"…
## $ description          <chr> "The third book in the Empyrean series. As enemie…
## $ price                <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "…
## $ title                <chr> "ONYX STORM", "THE WRITER", "IRON FLAME", "THE BU…
## $ author               <chr> "Rebecca Yarros", "James Patterson and J.D. Barke…
## $ contributor          <chr> "by Rebecca Yarros", "by James Patterson and J.D.…
## $ contributor_note     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ book_image           <chr> "https://storage.googleapis.com/du-prd/books/imag…
## $ book_image_width     <int> 310, 331, 309, 329, 333, 329, 333, 329, 309, 331,…
## $ book_image_height    <int> 500, 500, 500, 500, 500, 500, 500, 500, 500, 500,…
## $ amazon_product_url   <chr> "https://www.amazon.com/dp/1649374186?tag=thenewy…
## $ age_group            <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ book_review_link     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ first_chapter_link   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ sunday_review_link   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ article_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ isbns                <list> [<data.frame[3 x 2]>], [<data.frame[2 x 2]>], [<d…
## $ buy_links            <list> [<data.frame[5 x 2]>], [<data.frame[5 x 2]>], [<d…
## $ book_uri             <chr> "nyt://book/8aa689f7-187e-5292-bfd4-e7fc7c2c535f"…

head(df)

##   rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
## 1    1              1             9        0      0     1649377150
## 2    2              0             1        0      0     0316570001
## 3    3              2            72        0      0     1649374178
## 4    4              0             1        0      0     1668075083
## 5    5              5            38        0      0     0385550367
## 6    6              3             3        0      0     166807818X
##   primary_isbn13        publisher
## 1  9781649377159        Red Tower
## 2  9780316570008    Little, Brown
## 3  9781649374172        Red Tower
## 4  9781668075081             Saga
## 5  9780385550369        Doubleday
## 6  9781668078181 Simon & Schuster
##                                                                                                                                                       description
## 1                           The third book in the Empyrean series. As enemies gain traction, Violet Sorrengail goes beyond the Aretian wards in search of allies.
## 2                                    An N.Y.P.D. detective is called to a crime scene in an apartment where the shelves are full of books by a true-crime writer.
## 3 The second book in the Empyrean series. Violet Sorrengail’s next round of training under the new vice commandant might require her to betray the man she loves.
## 4                                A Lutheran pastor’s diary from 1912, which details a massacre in which 217 Blackfeet died in the snow, is found a century later.
## 5               A reimagining of “Adventures of Huckleberry Finn” shines a different light on Mark Twain's classic, revealing new facets of the character of Jim.
## 6                                                      Beth must confront her past when the man she once loved as a teenager returns to the village with his son.
##   price                     title                          author
## 1  0.00                ONYX STORM                  Rebecca Yarros
## 2  0.00                THE WRITER James Patterson and J.D. Barker
## 3  0.00                IRON FLAME                  Rebecca Yarros
## 4  0.00 THE BUFFALO HUNTER HUNTER            Stephen Graham Jones
## 5  0.00                     JAMES                Percival Everett
## 6  0.00            BROKEN COUNTRY               Clare Leslie Hall
##                          contributor contributor_note
## 1                  by Rebecca Yarros                 
## 2 by James Patterson and J.D. Barker                 
## 3                  by Rebecca Yarros                 
## 4            by Stephen Graham Jones                 
## 5                by Percival Everett                 
## 6               by Clare Leslie Hall                 
##                                                             book_image
## 1 https://storage.googleapis.com/du-prd/books/images/9781649374189.jpg
## 2 https://storage.googleapis.com/du-prd/books/images/9780316570008.jpg
## 3 https://storage.googleapis.com/du-prd/books/images/9781649374172.jpg
## 4 https://storage.googleapis.com/du-prd/books/images/9781668075081.jpg
## 5 https://storage.googleapis.com/du-prd/books/images/9780385550369.jpg
## 6 https://storage.googleapis.com/du-prd/books/images/9781668078181.jpg
##   book_image_width book_image_height
## 1              310               500
## 2              331               500
## 3              309               500
## 4              329               500
## 5              333               500
## 6              329               500
##                                          amazon_product_url age_group
## 1 https://www.amazon.com/dp/1649374186?tag=thenewyorktim-20          
## 2 https://www.amazon.com/dp/0316570001?tag=thenewyorktim-20          
## 3 https://www.amazon.com/dp/1649374178?tag=thenewyorktim-20          
## 4 https://www.amazon.com/dp/1668075083?tag=thenewyorktim-20          
## 5 https://www.amazon.com/dp/0385550367?tag=thenewyorktim-20          
## 6 https://www.amazon.com/dp/166807818X?tag=thenewyorktim-20          
##   book_review_link first_chapter_link sunday_review_link article_chapter_link
## 1                                                                            
## 2                                                                            
## 3                                                                            
## 4                                                                            
## 5                                                                            
## 6                                                                            
##                                                                             isbns
## 1 1649374186, 1649377150, 1705085113, 9781649374189, 9781649377159, 9781705085110
## 2                            0316570001, 0316569992, 9780316570008, 9780316569996
## 3 1649374178, 1705085083, 034943705X, 9781649374172, 9781705085080, 9780349437057
## 4                            1668075083, 1668075105, 9781668075081, 9781668075104
## 5 0385550367, 0385550375, 0593821254, 9780385550369, 9780385550376, 9780593821251
## 6                            166807818X, 1668078201, 9781668078181, 9781668078204
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           buy_links
## 1                            Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/1649374186?tag=thenewyorktim-20, https://goto.applebooks.apple/9781649377159?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781649377159, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FONYX%2BSTORM%2FRebecca%2BYarros%2F9781649377159, https://bookshop.org/a/3546/9781649377159
## 2     Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/0316570001?tag=thenewyorktim-20, https://goto.applebooks.apple/9780316570008?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780316570008, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BWRITER%2FJames%2BPatterson%2Band%2BJ.D.%2BBarker%2F9780316570008, https://bookshop.org/a/3546/9780316570008
## 3                            Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/1649374178?tag=thenewyorktim-20, https://goto.applebooks.apple/9781649374172?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781649374172, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FIRON%2BFLAME%2FRebecca%2BYarros%2F9781649374172, https://bookshop.org/a/3546/9781649374172
## 4 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/1668075083?tag=thenewyorktim-20, https://goto.applebooks.apple/9781668075081?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781668075081, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BBUFFALO%2BHUNTER%2BHUNTER%2FStephen%2BGraham%2BJones%2F9781668075081, https://bookshop.org/a/3546/9781668075081
## 5                                 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/0385550367?tag=thenewyorktim-20, https://goto.applebooks.apple/9780385550369?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780385550369, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FJAMES%2FPercival%2BEverett%2F9780385550369, https://bookshop.org/a/3546/9780385550369
## 6                   Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/166807818X?tag=thenewyorktim-20, https://goto.applebooks.apple/9781668078181?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781668078181, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FBROKEN%2BCOUNTRY%2FClare%2BLeslie%2BHall%2F9781668078181, https://bookshop.org/a/3546/9781668078181
##                                          book_uri
## 1 nyt://book/8aa689f7-187e-5292-bfd4-e7fc7c2c535f
## 2 nyt://book/7fbb96c4-2d88-57a5-af3f-95665f9db3a1
## 3 nyt://book/d3c570c9-3c3a-5c8b-a740-85ea5e92bfc9
## 4 nyt://book/b161bd81-1ad5-5683-b928-837a3a90644e
## 5 nyt://book/5788b098-426a-5f2c-a318-475692df69ee
## 6 nyt://book/bd78d61b-82fb-58ec-8990-46ca43146f2b

As we can see above, isbns and buy_links are both lists, so we can’t see the data after using fromJSON. So we need to unnest these variables to show the values:

# Unnest the variables that are lists or objects
final_df <- df |> 
  unnest_wider(buy_links) |>
  unnest_wider(isbns)
head(final_df)

## # A tibble: 6 × 28
##    rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
##   <int>          <int>         <int>    <int>  <int> <chr>         
## 1     1              1             9        0      0 1649377150    
## 2     2              0             1        0      0 0316570001    
## 3     3              2            72        0      0 1649374178    
## 4     4              0             1        0      0 1668075083    
## 5     5              5            38        0      0 0385550367    
## 6     6              3             3        0      0 166807818X    
## # ℹ 22 more variables: primary_isbn13 <chr>, publisher <chr>,
## #   description <chr>, price <chr>, title <chr>, author <chr>,
## #   contributor <chr>, contributor_note <chr>, book_image <chr>,
## #   book_image_width <int>, book_image_height <int>, amazon_product_url <chr>,
## #   age_group <chr>, book_review_link <chr>, first_chapter_link <chr>,
## #   sunday_review_link <chr>, article_chapter_link <chr>, isbn10 <list<chr>>,
## #   isbn13 <list<chr>>, name <list<chr>>, url <list<chr>>, book_uri <chr>

This data frame now looks exactly like the original JSON (we can see all the values).

Yet, having a list for a data frame value is not a good structure. For example, if we wanted to add another buy link for a book, we need to update the lists in name and url which is not feasible.

The other list variables are isbn10 and isbn13. An ISBN is the International Standard Book Number that identifies a book. According to online research, it looks like isbn10 (10 digit ID) and isbn13 (13 digit ID) have no relation to each other, they are separate ISBNs. So for this particular data frame, it makes more sense to have one isbn column that includes ISBN10 and ISBN13 values. We could also create separate data frames to separate this data, but I will hold off on doing that for this assignment.

Tidy the Data Frame

Let’s first separate the buy links. The name and url columns are connected to each other, so let’s unnest them at the same time to turn each value into a separate row/observation:

# Unnest connected columns at the same time to create accurate observations
final_df_wider <- final_df |>
  unnest(c("name", "url"))
head(final_df_wider)

## # A tibble: 6 × 28
##    rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
##   <int>          <int>         <int>    <int>  <int> <chr>         
## 1     1              1             9        0      0 1649377150    
## 2     1              1             9        0      0 1649377150    
## 3     1              1             9        0      0 1649377150    
## 4     1              1             9        0      0 1649377150    
## 5     1              1             9        0      0 1649377150    
## 6     2              0             1        0      0 0316570001    
## # ℹ 22 more variables: primary_isbn13 <chr>, publisher <chr>,
## #   description <chr>, price <chr>, title <chr>, author <chr>,
## #   contributor <chr>, contributor_note <chr>, book_image <chr>,
## #   book_image_width <int>, book_image_height <int>, amazon_product_url <chr>,
## #   age_group <chr>, book_review_link <chr>, first_chapter_link <chr>,
## #   sunday_review_link <chr>, article_chapter_link <chr>, isbn10 <list<chr>>,
## #   isbn13 <list<chr>>, name <chr>, url <chr>, book_uri <chr>

We now have each row represent a unique book and store combination.

Now let’s combine the ISBN values into one column, then separate into rows:

# Create one ISBN variable so we can have ISBN related observations
final_df_wider2 <- final_df_wider |> 
  mutate(isbn = mapply(c, isbn10, isbn13)) |>
  unnest(isbn) # unnest the list

# Remove the old variables
drop <- c("isbn10","isbn13")
final_df_wider2 <- final_df_wider2[ , !(names(final_df_wider2) %in% drop)]

# Final tidy data frame
head(final_df_wider2)

## # A tibble: 6 × 27
##    rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
##   <int>          <int>         <int>    <int>  <int> <chr>         
## 1     1              1             9        0      0 1649377150    
## 2     1              1             9        0      0 1649377150    
## 3     1              1             9        0      0 1649377150    
## 4     1              1             9        0      0 1649377150    
## 5     1              1             9        0      0 1649377150    
## 6     1              1             9        0      0 1649377150    
## # ℹ 21 more variables: primary_isbn13 <chr>, publisher <chr>,
## #   description <chr>, price <chr>, title <chr>, author <chr>,
## #   contributor <chr>, contributor_note <chr>, book_image <chr>,
## #   book_image_width <int>, book_image_height <int>, amazon_product_url <chr>,
## #   age_group <chr>, book_review_link <chr>, first_chapter_link <chr>,
## #   sunday_review_link <chr>, article_chapter_link <chr>, name <chr>,
## #   url <chr>, book_uri <chr>, isbn <chr>

We now have a final tidy date frame where each row represents an ISBN (book identifier) and a store where the book is sold. So if we have a new ISBN best seller, we can just add an additional row and not make an addition to a variable.

Conclusions

Takeaways from this assignment:

fromJSON does a good job importing JSON
unnest_wider and unnest makes it very easy to keep the data frame tidy when separating columns into rows