Overview

Problem

The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis I need to start by signing up for an API key. My task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

Approach

The New York Times API that interested me the most was “Books API”. After requesting the API key, I was able to access it in R. I extracted the data of “hardcover fiction books” from the books API, and this was in JSON format. I then used “fromjson” to convert the JSON data to R objects.

Since “fromjson” converts the JSON data and includes the list of the object related to it, I retrieved only the data frame containing all the books info that I need.

Packages

library(httr)
## Warning: package 'httr' was built under R version 4.0.3
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 4.0.3
library(tidyverse)
library(kableExtra)

Read the API

# Read JSON data from the API with the key

books <-"https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=s6WdQYAxrWRmhzTGoytPJplwgWujefbc"

Convert R objects from JSON

# Transform JSON data into an R DataFrame
# Retrieve the data frame of books data

df <- fromJSON(books)[[5]][[11]]
dim(df)
## [1] 15 26
# Get the insights

glimpse(df)
## Rows: 15
## Columns: 26
## $ rank                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
## $ rank_last_week       <int> 0, 1, 5, 7, 4, 8, 3, 2, 10, 0, 12, 0, 11, 9, 13
## $ weeks_on_list        <int> 1, 3, 2, 5, 2, 4, 2, 2, 20, 1, 111, 1, 4, 6, 19
## $ asterisk             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ dagger               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ primary_isbn10       <chr> "0385545967", "1538728575", "073522465X", "052...
## $ primary_isbn13       <chr> "9780385545969", "9781538728574", "97807352246...
## $ publisher            <chr> "Doubleday", "Grand Central", "Viking", "Vikin...
## $ description          <chr> "The third book in the Jake Brigance series. A...
## $ price                <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ title                <chr> "A TIME FOR MERCY", "THE RETURN", "THE SEARCHE...
## $ author               <chr> "John Grisham", "Nicholas Sparks", "Tana Frenc...
## $ contributor          <chr> "by John Grisham", "by Nicholas Sparks", "by T...
## $ contributor_note     <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ book_image           <chr> "https://s1.nyt.com/du/books/images/9780385545...
## $ book_image_width     <int> 329, 329, 331, 329, 331, 329, 329, 322, 331, 3...
## $ book_image_height    <int> 500, 500, 500, 500, 500, 500, 500, 500, 500, 5...
## $ amazon_product_url   <chr> "https://www.amazon.com/dp/0385545967?tag=NYTB...
## $ age_group            <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ book_review_link     <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ first_chapter_link   <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ sunday_review_link   <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ article_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", ""...
## $ isbns                <list> [<data.frame[2 x 2]>, <data.frame[5 x 2]>, <d...
## $ buy_links            <list> [<data.frame[6 x 2]>, <data.frame[6 x 2]>, <d...
## $ book_uri             <chr> "nyt://book/33a48cf6-d7f3-5113-aa1e-6adcbb3853...
# New DataFrame  with some necessary columns from the original df
# (for analysis purpose)

df1 <- df[c("rank", "publisher", "title", "author", "primary_isbn13")]

df1 %>%
  kbl(caption = "Hardcover fiction books") %>%
  kable_material(c("striped", "hover")) %>%
  row_spec(0, color = "indigo")
Hardcover fiction books
rank publisher title author primary_isbn13
1 Doubleday A TIME FOR MERCY John Grisham 9780385545969
2 Grand Central THE RETURN Nicholas Sparks 9781538728574
3 Viking THE SEARCHER Tana French 9780735224650
4 Viking THE EVENING AND THE MORNING Ken Follett 9780525954989
5 Tor/Forge THE INVISIBLE LIFE OF ADDIE LARUE VE Schwab 9780765387561
6 Ballantine THE BOOK OF TWO WAYS Jodi Picoult 9781984818355
7 Ecco LEAVE THE WORLD BEHIND Rumaan Alam 9780062667632
8 Little, Brown TROUBLES IN PARADISE Elin Hilderbrand 9780316435581
9 Riverhead THE VANISHING HALF Brit Bennett 9780525536291
10 Ballantine JINGLE ALL THE WAY Debbie Macomber 9781984818751
11 Putnam WHERE THE CRAWDADS SING Delia Owens 9780735219090
12 Atria INVISIBLE GIRL Lisa Jewell 9781982137335
13 Little, Brown THE COAST-TO-COAST MURDERS James Patterson and JD Barker 9780316457422
14 Atria ANXIOUS PEOPLE Fredrik Backman 9781501160837
15 Morrow THE GUEST LIST Lucy Foley 9780062868930

Extra

# Publisher df. Order books by publisher

df_pub <- df1 %>%
  group_by(publisher) %>%
  summarise(books_published = n())
## `summarise()` ungrouping output (override with `.groups` argument)
df_pub <- df_pub[order(-df_pub$books_published), ]


# Plot the ranked books by publisher
# (Visualize which publisher has more books in the ranking)

df_pub %>%
  
  ggplot(aes(reorder(publisher, books_published), books_published)) +
  
  geom_col(aes(fill = books_published)) +
  
  scale_fill_gradient2(low = "yellow",
                       high = "purple",
                       midpoint = median(df_pub$books_published)) +

  
coord_polar() +

  
  labs(title = "Ranked hardcover fiction books by publisher", x = NULL, y = NULL)  

Findings

If you have access to it, API makes it easier on getting directly the data you’d want…

LS0tDQp0aXRsZTogIldlYiBBUElzIg0KYXV0aG9yOiAiSmVyZWQgQXRha3kiDQpkYXRlOiAiMjAyMC0xMC0yMyINCm91dHB1dDogDQogIG9wZW5pbnRybzo6bGFiX3JlcG9ydDogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIG51bWJlcl9zZWN0aW9uczogeWVzDQotLS0NCg0KYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0V9DQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpDQpgYGANCg0KIyMgT3ZlcnZpZXcNCg0KPHN0eWxlPg0KZGl2LmFxdWFtYXJpbmUgeyBiYWNrZ3JvdW5kLWNvbG9yOiM3ZmZmZDQ7IGJvcmRlci1yYWRpdXM6IDEwcHg7IHBhZGRpbmc6IDVweDt9DQo8L3N0eWxlPg0KPGRpdiBjbGFzcyA9ICJhcXVhbWFyaW5lIj4NCg0KKipQcm9ibGVtKioNCg0KVGhlIE5ldyBZb3JrIFRpbWVzIHdlYiBzaXRlIHByb3ZpZGVzIGEgcmljaCBzZXQgb2YgQVBJcywgYXMgZGVzY3JpYmVkIGhlcmU6IGh0dHBzOi8vZGV2ZWxvcGVyLm55dGltZXMuY29tL2FwaXMNCkkgbmVlZCB0byBzdGFydCBieSBzaWduaW5nIHVwIGZvciBhbiBBUEkga2V5Lg0KTXkgdGFzayBpcyB0byBjaG9vc2Ugb25lIG9mIHRoZSBOZXcgWW9yayBUaW1lcyBBUElzLCBjb25zdHJ1Y3QgYW4gaW50ZXJmYWNlIGluIFIgdG8gcmVhZCBpbiB0aGUgSlNPTiBkYXRhLCBhbmQNCnRyYW5zZm9ybSBpdCBpbnRvIGFuIFIgRGF0YUZyYW1lLg0KDQoqKkFwcHJvYWNoKioNCg0KVGhlIE5ldyBZb3JrIFRpbWVzIEFQSSB0aGF0IGludGVyZXN0ZWQgbWUgdGhlIG1vc3Qgd2FzICJCb29rcyBBUEkiLg0KQWZ0ZXIgcmVxdWVzdGluZyB0aGUgQVBJIGtleSwgSSB3YXMgYWJsZSB0byBhY2Nlc3MgaXQgaW4gUi4gDQpJIGV4dHJhY3RlZCB0aGUgZGF0YSBvZiAiaGFyZGNvdmVyIGZpY3Rpb24gYm9va3MiIGZyb20gdGhlIGJvb2tzIEFQSSwgYW5kIHRoaXMgd2FzIGluIEpTT04gZm9ybWF0LiANCkkgdGhlbiB1c2VkICJmcm9tanNvbiIgdG8gY29udmVydCB0aGUgSlNPTiBkYXRhIHRvIFIgb2JqZWN0cy4NCg0KU2luY2UgImZyb21qc29uIiBjb252ZXJ0cyB0aGUgSlNPTiBkYXRhIGFuZCBpbmNsdWRlcyB0aGUgbGlzdCBvZiB0aGUgb2JqZWN0IHJlbGF0ZWQgdG8gaXQsDQpJIHJldHJpZXZlZCBvbmx5IHRoZSBkYXRhIGZyYW1lIGNvbnRhaW5pbmcgYWxsIHRoZSBib29rcyBpbmZvIHRoYXQgSSBuZWVkLg0KDQo8L2Rpdj4gXGhmaWxsXGJyZWFrDQoNCiMjIFBhY2thZ2VzDQoNCmBgYHtyIGxvYWQtcGFja2FnZXMsIG1lc3NhZ2U9RkFMU0V9DQoNCmxpYnJhcnkoaHR0cikNCmxpYnJhcnkoanNvbmxpdGUpDQpsaWJyYXJ5KHRpZHl2ZXJzZSkNCmxpYnJhcnkoa2FibGVFeHRyYSkNCg0KYGBgDQoNCg0KIyMgUmVhZCB0aGUgQVBJDQoNCmBgYHtyfQ0KIyBSZWFkIEpTT04gZGF0YSBmcm9tIHRoZSBBUEkgd2l0aCB0aGUga2V5DQoNCmJvb2tzIDwtImh0dHBzOi8vYXBpLm55dGltZXMuY29tL3N2Yy9ib29rcy92My9saXN0cy9jdXJyZW50L2hhcmRjb3Zlci1maWN0aW9uLmpzb24/YXBpLWtleT1zNldkUVlBeHJXUm1oelRHb3l0UEpwbHdnV3VqZWZiYyINCg0KYGBgDQoNCiMjIENvbnZlcnQgUiBvYmplY3RzIGZyb20gSlNPTg0KDQpgYGB7cn0NCg0KIyBUcmFuc2Zvcm0gSlNPTiBkYXRhIGludG8gYW4gUiBEYXRhRnJhbWUNCiMgUmV0cmlldmUgdGhlIGRhdGEgZnJhbWUgb2YgYm9va3MgZGF0YQ0KDQpkZiA8LSBmcm9tSlNPTihib29rcylbWzVdXVtbMTFdXQ0KZGltKGRmKQ0KYGBgDQoNCmBgYHtyfQ0KDQojIEdldCB0aGUgaW5zaWdodHMNCg0KZ2xpbXBzZShkZikNCmBgYA0KDQoNCg0KYGBge3J9DQoNCiMgTmV3IERhdGFGcmFtZSAgd2l0aCBzb21lIG5lY2Vzc2FyeSBjb2x1bW5zIGZyb20gdGhlIG9yaWdpbmFsIGRmDQojIChmb3IgYW5hbHlzaXMgcHVycG9zZSkNCg0KZGYxIDwtIGRmW2MoInJhbmsiLCAicHVibGlzaGVyIiwgInRpdGxlIiwgImF1dGhvciIsICJwcmltYXJ5X2lzYm4xMyIpXQ0KDQpkZjEgJT4lDQogIGtibChjYXB0aW9uID0gIkhhcmRjb3ZlciBmaWN0aW9uIGJvb2tzIikgJT4lDQogIGthYmxlX21hdGVyaWFsKGMoInN0cmlwZWQiLCAiaG92ZXIiKSkgJT4lDQogIHJvd19zcGVjKDAsIGNvbG9yID0gImluZGlnbyIpDQoNCmBgYA0KDQoNCg0KIyMgRXh0cmENCg0KYGBge3J9DQoNCiMgUHVibGlzaGVyIGRmLiBPcmRlciBib29rcyBieSBwdWJsaXNoZXINCg0KZGZfcHViIDwtIGRmMSAlPiUNCiAgZ3JvdXBfYnkocHVibGlzaGVyKSAlPiUNCiAgc3VtbWFyaXNlKGJvb2tzX3B1Ymxpc2hlZCA9IG4oKSkNCg0KZGZfcHViIDwtIGRmX3B1YltvcmRlcigtZGZfcHViJGJvb2tzX3B1Ymxpc2hlZCksIF0NCg0KDQojIFBsb3QgdGhlIHJhbmtlZCBib29rcyBieSBwdWJsaXNoZXINCiMgKFZpc3VhbGl6ZSB3aGljaCBwdWJsaXNoZXIgaGFzIG1vcmUgYm9va3MgaW4gdGhlIHJhbmtpbmcpDQoNCmRmX3B1YiAlPiUNCiAgDQogIGdncGxvdChhZXMocmVvcmRlcihwdWJsaXNoZXIsIGJvb2tzX3B1Ymxpc2hlZCksIGJvb2tzX3B1Ymxpc2hlZCkpICsNCiAgDQogIGdlb21fY29sKGFlcyhmaWxsID0gYm9va3NfcHVibGlzaGVkKSkgKw0KICANCiAgc2NhbGVfZmlsbF9ncmFkaWVudDIobG93ID0gInllbGxvdyIsDQogICAgICAgICAgICAgICAgICAgICAgIGhpZ2ggPSAicHVycGxlIiwNCiAgICAgICAgICAgICAgICAgICAgICAgbWlkcG9pbnQgPSBtZWRpYW4oZGZfcHViJGJvb2tzX3B1Ymxpc2hlZCkpICsNCg0KICANCmNvb3JkX3BvbGFyKCkgKw0KDQogIA0KICBsYWJzKHRpdGxlID0gIlJhbmtlZCBoYXJkY292ZXIgZmljdGlvbiBib29rcyBieSBwdWJsaXNoZXIiLCB4ID0gTlVMTCwgeSA9IE5VTEwpICANCiAgDQoNCmBgYA0KDQojIyBGaW5kaW5ncw0KDQoNCjxzdHlsZT4NCmRpdi5hcXVhbWFyaW5lIHsgYmFja2dyb3VuZC1jb2xvcjojN2ZmZmQ0OyBib3JkZXItcmFkaXVzOiAxMHB4OyBwYWRkaW5nOiA1cHg7fQ0KPC9zdHlsZT4NCjxkaXYgY2xhc3MgPSAiYXF1YW1hcmluZSI+DQoNCg0KDQpJZiB5b3UgaGF2ZSBhY2Nlc3MgdG8gaXQsIEFQSSBtYWtlcyBpdCBlYXNpZXIgb24gZ2V0dGluZyBkaXJlY3RseSB0aGUgZGF0YSB5b3UnZCB3YW50Li4uDQoNCjwvZGl2PiBcaGZpbGxcYnJlYWsNCg0KDQoNCg0K