This assignment explores how to use an API to import JSON into R, and create a data frame.
The API that I chose is the NYT Books API.
Note: I didn’t realize when I did this assignment that it’s a deprecated API, so probably wouldn’t have chosen that one.
library(httr2)
library(purrr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
library(stringr)
library(tidyjson)
##
## Attaching package: 'tidyjson'
## The following object is masked from 'package:jsonlite':
##
## read_json
## The following object is masked from 'package:stats':
##
## filter
library(tidytext)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.5
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyjson::filter() masks dplyr::filter(), stats::filter()
## ✖ jsonlite::flatten() masks purrr::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tidyjson::read_json() masks jsonlite::read_json()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Read the current best sellers list for hardcover fiction:
# API Call
nyt_url <- "https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json"
# Get the API Key
api <- read_csv("https://raw.githubusercontent.com/gillianmcgovern0/cuny-data-607/refs/heads/main/NYT_API.csv")
## Rows: 1 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): api_key
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
api_key <- api$api_key
# Must include API key in the URL
full_url <- paste(nyt_url, api_key, sep = "?api-key=")
# API Request
req <- fromJSON(full_url, flatten=TRUE)
# Actual df exists under requests -> books
df <- req$results$books
glimpse(df)
## Rows: 15
## Columns: 26
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
## $ rank_last_week <int> 1, 0, 2, 0, 5, 3, 8, 11, 6, 10, 13, 0, 12, 7, 15
## $ weeks_on_list <int> 9, 1, 72, 1, 38, 3, 59, 21, 89, 36, 87, 1, 4, 3, 3
## $ asterisk <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ dagger <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ primary_isbn10 <chr> "1649377150", "0316570001", "1649374178", "166807…
## $ primary_isbn13 <chr> "9781649377159", "9780316570008", "9781649374172"…
## $ publisher <chr> "Red Tower", "Little, Brown", "Red Tower", "Saga"…
## $ description <chr> "The third book in the Empyrean series. As enemie…
## $ price <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "…
## $ title <chr> "ONYX STORM", "THE WRITER", "IRON FLAME", "THE BU…
## $ author <chr> "Rebecca Yarros", "James Patterson and J.D. Barke…
## $ contributor <chr> "by Rebecca Yarros", "by James Patterson and J.D.…
## $ contributor_note <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ book_image <chr> "https://storage.googleapis.com/du-prd/books/imag…
## $ book_image_width <int> 310, 331, 309, 329, 333, 329, 333, 329, 309, 331,…
## $ book_image_height <int> 500, 500, 500, 500, 500, 500, 500, 500, 500, 500,…
## $ amazon_product_url <chr> "https://www.amazon.com/dp/1649374186?tag=thenewy…
## $ age_group <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ book_review_link <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ first_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ sunday_review_link <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ article_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", "", "…
## $ isbns <list> [<data.frame[3 x 2]>], [<data.frame[2 x 2]>], [<d…
## $ buy_links <list> [<data.frame[5 x 2]>], [<data.frame[5 x 2]>], [<d…
## $ book_uri <chr> "nyt://book/8aa689f7-187e-5292-bfd4-e7fc7c2c535f"…
head(df)
## rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
## 1 1 1 9 0 0 1649377150
## 2 2 0 1 0 0 0316570001
## 3 3 2 72 0 0 1649374178
## 4 4 0 1 0 0 1668075083
## 5 5 5 38 0 0 0385550367
## 6 6 3 3 0 0 166807818X
## primary_isbn13 publisher
## 1 9781649377159 Red Tower
## 2 9780316570008 Little, Brown
## 3 9781649374172 Red Tower
## 4 9781668075081 Saga
## 5 9780385550369 Doubleday
## 6 9781668078181 Simon & Schuster
## description
## 1 The third book in the Empyrean series. As enemies gain traction, Violet Sorrengail goes beyond the Aretian wards in search of allies.
## 2 An N.Y.P.D. detective is called to a crime scene in an apartment where the shelves are full of books by a true-crime writer.
## 3 The second book in the Empyrean series. Violet Sorrengail’s next round of training under the new vice commandant might require her to betray the man she loves.
## 4 A Lutheran pastor’s diary from 1912, which details a massacre in which 217 Blackfeet died in the snow, is found a century later.
## 5 A reimagining of “Adventures of Huckleberry Finn” shines a different light on Mark Twain's classic, revealing new facets of the character of Jim.
## 6 Beth must confront her past when the man she once loved as a teenager returns to the village with his son.
## price title author
## 1 0.00 ONYX STORM Rebecca Yarros
## 2 0.00 THE WRITER James Patterson and J.D. Barker
## 3 0.00 IRON FLAME Rebecca Yarros
## 4 0.00 THE BUFFALO HUNTER HUNTER Stephen Graham Jones
## 5 0.00 JAMES Percival Everett
## 6 0.00 BROKEN COUNTRY Clare Leslie Hall
## contributor contributor_note
## 1 by Rebecca Yarros
## 2 by James Patterson and J.D. Barker
## 3 by Rebecca Yarros
## 4 by Stephen Graham Jones
## 5 by Percival Everett
## 6 by Clare Leslie Hall
## book_image
## 1 https://storage.googleapis.com/du-prd/books/images/9781649374189.jpg
## 2 https://storage.googleapis.com/du-prd/books/images/9780316570008.jpg
## 3 https://storage.googleapis.com/du-prd/books/images/9781649374172.jpg
## 4 https://storage.googleapis.com/du-prd/books/images/9781668075081.jpg
## 5 https://storage.googleapis.com/du-prd/books/images/9780385550369.jpg
## 6 https://storage.googleapis.com/du-prd/books/images/9781668078181.jpg
## book_image_width book_image_height
## 1 310 500
## 2 331 500
## 3 309 500
## 4 329 500
## 5 333 500
## 6 329 500
## amazon_product_url age_group
## 1 https://www.amazon.com/dp/1649374186?tag=thenewyorktim-20
## 2 https://www.amazon.com/dp/0316570001?tag=thenewyorktim-20
## 3 https://www.amazon.com/dp/1649374178?tag=thenewyorktim-20
## 4 https://www.amazon.com/dp/1668075083?tag=thenewyorktim-20
## 5 https://www.amazon.com/dp/0385550367?tag=thenewyorktim-20
## 6 https://www.amazon.com/dp/166807818X?tag=thenewyorktim-20
## book_review_link first_chapter_link sunday_review_link article_chapter_link
## 1
## 2
## 3
## 4
## 5
## 6
## isbns
## 1 1649374186, 1649377150, 1705085113, 9781649374189, 9781649377159, 9781705085110
## 2 0316570001, 0316569992, 9780316570008, 9780316569996
## 3 1649374178, 1705085083, 034943705X, 9781649374172, 9781705085080, 9780349437057
## 4 1668075083, 1668075105, 9781668075081, 9781668075104
## 5 0385550367, 0385550375, 0593821254, 9780385550369, 9780385550376, 9780593821251
## 6 166807818X, 1668078201, 9781668078181, 9781668078204
## buy_links
## 1 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/1649374186?tag=thenewyorktim-20, https://goto.applebooks.apple/9781649377159?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781649377159, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FONYX%2BSTORM%2FRebecca%2BYarros%2F9781649377159, https://bookshop.org/a/3546/9781649377159
## 2 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/0316570001?tag=thenewyorktim-20, https://goto.applebooks.apple/9780316570008?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780316570008, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BWRITER%2FJames%2BPatterson%2Band%2BJ.D.%2BBarker%2F9780316570008, https://bookshop.org/a/3546/9780316570008
## 3 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/1649374178?tag=thenewyorktim-20, https://goto.applebooks.apple/9781649374172?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781649374172, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FIRON%2BFLAME%2FRebecca%2BYarros%2F9781649374172, https://bookshop.org/a/3546/9781649374172
## 4 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/1668075083?tag=thenewyorktim-20, https://goto.applebooks.apple/9781668075081?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781668075081, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BBUFFALO%2BHUNTER%2BHUNTER%2FStephen%2BGraham%2BJones%2F9781668075081, https://bookshop.org/a/3546/9781668075081
## 5 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/0385550367?tag=thenewyorktim-20, https://goto.applebooks.apple/9780385550369?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780385550369, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FJAMES%2FPercival%2BEverett%2F9780385550369, https://bookshop.org/a/3546/9780385550369
## 6 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Bookshop.org, https://www.amazon.com/dp/166807818X?tag=thenewyorktim-20, https://goto.applebooks.apple/9781668078181?at=10lIEQ, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781668078181, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FBROKEN%2BCOUNTRY%2FClare%2BLeslie%2BHall%2F9781668078181, https://bookshop.org/a/3546/9781668078181
## book_uri
## 1 nyt://book/8aa689f7-187e-5292-bfd4-e7fc7c2c535f
## 2 nyt://book/7fbb96c4-2d88-57a5-af3f-95665f9db3a1
## 3 nyt://book/d3c570c9-3c3a-5c8b-a740-85ea5e92bfc9
## 4 nyt://book/b161bd81-1ad5-5683-b928-837a3a90644e
## 5 nyt://book/5788b098-426a-5f2c-a318-475692df69ee
## 6 nyt://book/bd78d61b-82fb-58ec-8990-46ca43146f2b
As we can see above, isbns
and buy_links
are both lists, so we can’t see the data after using
fromJSON
. So we need to unnest these variables to show the
values:
# Unnest the variables that are lists or objects
final_df <- df |>
unnest_wider(buy_links) |>
unnest_wider(isbns)
head(final_df)
## # A tibble: 6 × 28
## rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
## <int> <int> <int> <int> <int> <chr>
## 1 1 1 9 0 0 1649377150
## 2 2 0 1 0 0 0316570001
## 3 3 2 72 0 0 1649374178
## 4 4 0 1 0 0 1668075083
## 5 5 5 38 0 0 0385550367
## 6 6 3 3 0 0 166807818X
## # ℹ 22 more variables: primary_isbn13 <chr>, publisher <chr>,
## # description <chr>, price <chr>, title <chr>, author <chr>,
## # contributor <chr>, contributor_note <chr>, book_image <chr>,
## # book_image_width <int>, book_image_height <int>, amazon_product_url <chr>,
## # age_group <chr>, book_review_link <chr>, first_chapter_link <chr>,
## # sunday_review_link <chr>, article_chapter_link <chr>, isbn10 <list<chr>>,
## # isbn13 <list<chr>>, name <list<chr>>, url <list<chr>>, book_uri <chr>
This data frame now looks exactly like the original JSON (we can see all the values).
Yet, having a list for a data frame value is not a good structure.
For example, if we wanted to add another buy link for a book, we need to
update the lists in name
and url
which is not
feasible.
The other list variables are isbn10
and
isbn13
. An ISBN is the International Standard Book Number
that identifies a book. According to online research, it looks like
isbn10
(10 digit ID) and isbn13
(13 digit ID)
have no relation to each other, they are separate ISBNs. So for this
particular data frame, it makes more sense to have one isbn
column that includes ISBN10 and ISBN13 values. We could also create
separate data frames to separate this data, but I will hold off on doing
that for this assignment.
Let’s first separate the buy links. The name
and
url
columns are connected to each other, so let’s
unnest
them at the same time to turn each value into a
separate row/observation:
# Unnest connected columns at the same time to create accurate observations
final_df_wider <- final_df |>
unnest(c("name", "url"))
head(final_df_wider)
## # A tibble: 6 × 28
## rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
## <int> <int> <int> <int> <int> <chr>
## 1 1 1 9 0 0 1649377150
## 2 1 1 9 0 0 1649377150
## 3 1 1 9 0 0 1649377150
## 4 1 1 9 0 0 1649377150
## 5 1 1 9 0 0 1649377150
## 6 2 0 1 0 0 0316570001
## # ℹ 22 more variables: primary_isbn13 <chr>, publisher <chr>,
## # description <chr>, price <chr>, title <chr>, author <chr>,
## # contributor <chr>, contributor_note <chr>, book_image <chr>,
## # book_image_width <int>, book_image_height <int>, amazon_product_url <chr>,
## # age_group <chr>, book_review_link <chr>, first_chapter_link <chr>,
## # sunday_review_link <chr>, article_chapter_link <chr>, isbn10 <list<chr>>,
## # isbn13 <list<chr>>, name <chr>, url <chr>, book_uri <chr>
We now have each row represent a unique book and store combination.
Now let’s combine the ISBN values into one column, then separate into rows:
# Create one ISBN variable so we can have ISBN related observations
final_df_wider2 <- final_df_wider |>
mutate(isbn = mapply(c, isbn10, isbn13)) |>
unnest(isbn) # unnest the list
# Remove the old variables
drop <- c("isbn10","isbn13")
final_df_wider2 <- final_df_wider2[ , !(names(final_df_wider2) %in% drop)]
# Final tidy data frame
head(final_df_wider2)
## # A tibble: 6 × 27
## rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
## <int> <int> <int> <int> <int> <chr>
## 1 1 1 9 0 0 1649377150
## 2 1 1 9 0 0 1649377150
## 3 1 1 9 0 0 1649377150
## 4 1 1 9 0 0 1649377150
## 5 1 1 9 0 0 1649377150
## 6 1 1 9 0 0 1649377150
## # ℹ 21 more variables: primary_isbn13 <chr>, publisher <chr>,
## # description <chr>, price <chr>, title <chr>, author <chr>,
## # contributor <chr>, contributor_note <chr>, book_image <chr>,
## # book_image_width <int>, book_image_height <int>, amazon_product_url <chr>,
## # age_group <chr>, book_review_link <chr>, first_chapter_link <chr>,
## # sunday_review_link <chr>, article_chapter_link <chr>, name <chr>,
## # url <chr>, book_uri <chr>, isbn <chr>
We now have a final tidy date frame where each row represents an ISBN (book identifier) and a store where the book is sold. So if we have a new ISBN best seller, we can just add an additional row and not make an addition to a variable.
Takeaways from this assignment:
fromJSON
does a good job importing JSONunnest_wider
and unnest
makes it very easy
to keep the data frame tidy when separating columns into rows