Problem
The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis I need to start by signing up for an API key. My task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.
Approach
The New York Times API that interested me the most was “Books API”. After requesting the API key, I was able to access it in R. I extracted the data of “hardcover fiction books” from the books API, and this was in JSON format. I then used “fromjson” to convert the JSON data to R objects.
Since “fromjson” converts the JSON data and includes the list of the object related to it, I retrieved only the data frame containing all the books info that I need.
## Warning: package 'httr' was built under R version 4.0.3
## Warning: package 'jsonlite' was built under R version 4.0.3
# Transform JSON data into an R DataFrame
# Retrieve the data frame of books data
df <- fromJSON(books)[[5]][[11]]
dim(df)## [1] 15 26
## Rows: 15
## Columns: 26
## $ rank <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
## $ rank_last_week <int> 0, 1, 5, 7, 4, 8, 3, 2, 10, 0, 12, 0, 11, 9, 13
## $ weeks_on_list <int> 1, 3, 2, 5, 2, 4, 2, 2, 20, 1, 111, 1, 4, 6, 19
## $ asterisk <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ dagger <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ primary_isbn10 <chr> "0385545967", "1538728575", "073522465X", "052...
## $ primary_isbn13 <chr> "9780385545969", "9781538728574", "97807352246...
## $ publisher <chr> "Doubleday", "Grand Central", "Viking", "Vikin...
## $ description <chr> "The third book in the Jake Brigance series. A...
## $ price <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ title <chr> "A TIME FOR MERCY", "THE RETURN", "THE SEARCHE...
## $ author <chr> "John Grisham", "Nicholas Sparks", "Tana Frenc...
## $ contributor <chr> "by John Grisham", "by Nicholas Sparks", "by T...
## $ contributor_note <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ book_image <chr> "https://s1.nyt.com/du/books/images/9780385545...
## $ book_image_width <int> 329, 329, 331, 329, 331, 329, 329, 322, 331, 3...
## $ book_image_height <int> 500, 500, 500, 500, 500, 500, 500, 500, 500, 5...
## $ amazon_product_url <chr> "https://www.amazon.com/dp/0385545967?tag=NYTB...
## $ age_group <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ book_review_link <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ first_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ sunday_review_link <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ article_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ isbns <list> [<data.frame[2 x 2]>, <data.frame[5 x 2]>, <d...
## $ buy_links <list> [<data.frame[6 x 2]>, <data.frame[6 x 2]>, <d...
## $ book_uri <chr> "nyt://book/33a48cf6-d7f3-5113-aa1e-6adcbb3853...
# New DataFrame with some necessary columns from the original df
# (for analysis purpose)
df1 <- df[c("rank", "publisher", "title", "author", "primary_isbn13")]
df1 %>%
kbl(caption = "Hardcover fiction books") %>%
kable_material(c("striped", "hover")) %>%
row_spec(0, color = "indigo")| rank | publisher | title | author | primary_isbn13 |
|---|---|---|---|---|
| 1 | Doubleday | A TIME FOR MERCY | John Grisham | 9780385545969 |
| 2 | Grand Central | THE RETURN | Nicholas Sparks | 9781538728574 |
| 3 | Viking | THE SEARCHER | Tana French | 9780735224650 |
| 4 | Viking | THE EVENING AND THE MORNING | Ken Follett | 9780525954989 |
| 5 | Tor/Forge | THE INVISIBLE LIFE OF ADDIE LARUE | VE Schwab | 9780765387561 |
| 6 | Ballantine | THE BOOK OF TWO WAYS | Jodi Picoult | 9781984818355 |
| 7 | Ecco | LEAVE THE WORLD BEHIND | Rumaan Alam | 9780062667632 |
| 8 | Little, Brown | TROUBLES IN PARADISE | Elin Hilderbrand | 9780316435581 |
| 9 | Riverhead | THE VANISHING HALF | Brit Bennett | 9780525536291 |
| 10 | Ballantine | JINGLE ALL THE WAY | Debbie Macomber | 9781984818751 |
| 11 | Putnam | WHERE THE CRAWDADS SING | Delia Owens | 9780735219090 |
| 12 | Atria | INVISIBLE GIRL | Lisa Jewell | 9781982137335 |
| 13 | Little, Brown | THE COAST-TO-COAST MURDERS | James Patterson and JD Barker | 9780316457422 |
| 14 | Atria | ANXIOUS PEOPLE | Fredrik Backman | 9781501160837 |
| 15 | Morrow | THE GUEST LIST | Lucy Foley | 9780062868930 |
# Publisher df. Order books by publisher
df_pub <- df1 %>%
group_by(publisher) %>%
summarise(books_published = n())## `summarise()` ungrouping output (override with `.groups` argument)
df_pub <- df_pub[order(-df_pub$books_published), ]
# Plot the ranked books by publisher
# (Visualize which publisher has more books in the ranking)
df_pub %>%
ggplot(aes(reorder(publisher, books_published), books_published)) +
geom_col(aes(fill = books_published)) +
scale_fill_gradient2(low = "yellow",
high = "purple",
midpoint = median(df_pub$books_published)) +
coord_polar() +
labs(title = "Ranked hardcover fiction books by publisher", x = NULL, y = NULL) If you have access to it, API makes it easier on getting directly the data you’d want…