Introduction

In this report, the Books API from The New York Times is used in order to import the best sellers and relevant information pertaining to each of the best sellers. The response is then encoded into a UTF-8 text format. The raw json data is then converted into a dataframe for future analysis.

Importing Raw Data Through API

The path is defined, which is also known as the HTTP request. This is the base url that will be used to pull data from the API.

path <- "https://api.nytimes.com/svc/books/v3/lists.json"

The GET request below lists several parameters that were used inside of the query.

res <- GET(path,
          query = list(list = 'hardcover-fiction',
                       'bestsellers-date' = "2016-03-05",
                       'published-date' = "2016-03-20",
                       offset = 0,
                       'api-key' = api_key))

Transforming Raw Imported Data into Dataframe

The content function is used in order to transform the contents of the request stored in res as a character vector with UTF-8 encoding. The resulting character vector is then stored in the response variable. The fromJSON function is them used on the response to convert the JSON data to an R object. Then the data.frame function is used to convert this R object into a dataframe.

response <- content(res, as = "text", encoding = "UTF-8")

book_df <- fromJSON(response, flatten = TRUE) %>%
  data.frame()

datatable(
  book_df[1:5,], extensions = 'FixedColumns',
  options = list(
  dom = 't',
  scrollX = TRUE,
  scrollCollapse = TRUE
))
NA

The output above shows that the results.isbns, results.book_details, and results.reviews columns contain nested dataframes. In order to extract the nested dataframes, the unnest function is used below.


unnested_book_df <- unnest(book_df, cols = c(results.isbns, results.book_details, results.reviews))

datatable(
  unnested_book_df[1:5,], extensions = 'FixedColumns',
  options = list(
  dom = 't',
  scrollX = TRUE,
  scrollCollapse = TRUE
))
NA

The unnested_book_df dataframe shows multiple entries for the same novel, when looking at the title column. Therefore, we want to get rid of these duplicate entries. This is done using the distinct function below.


removed_duplicates_df <- unnested_book_df %>%
  distinct(title, .keep_all = TRUE)

datatable(
  removed_duplicates_df, extensions = 'FixedColumns',
  options = list(
  dom = 't',
  scrollX = TRUE,
  scrollCollapse = TRUE
))
NA

Conclusion

This report shows how to import data from a New York Times API. The raw data is highly unstructured and must be transformed in order to analyze and draw conclusions from. In the future, this data could be analyzed to determine the average number of weeks a book stays on the best seller list by genre.

LS0tCnRpdGxlOiAiREFUQSA2MDcgLSBDb2RpbmcgQXNzaWdubWVudCA5IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgojIyMgSW50cm9kdWN0aW9uCgpJbiB0aGlzIHJlcG9ydCwgdGhlIEJvb2tzIEFQSSBmcm9tIFRoZSBOZXcgWW9yayBUaW1lcyBpcyB1c2VkIGluIG9yZGVyIHRvIGltcG9ydCB0aGUgYmVzdCBzZWxsZXJzIGFuZCByZWxldmFudCBpbmZvcm1hdGlvbiBwZXJ0YWluaW5nIHRvIGVhY2ggb2YgdGhlIGJlc3Qgc2VsbGVycy4gVGhlIHJlc3BvbnNlIGlzIHRoZW4gZW5jb2RlZCBpbnRvIGEgVVRGLTggdGV4dCBmb3JtYXQuIFRoZSByYXcganNvbiBkYXRhIGlzIHRoZW4gY29udmVydGVkIGludG8gYSBkYXRhZnJhbWUgZm9yIGZ1dHVyZSBhbmFseXNpcy4KCiMjIyBJbXBvcnRpbmcgUmF3IERhdGEgVGhyb3VnaCBBUEkKYGBge3IgaW1wb3J0IGxpYnJhcmllcywgZWNobyA9IEZBTFNFfQpsaWJyYXJ5KGh0dHIpCmxpYnJhcnkoanNvbmxpdGUpCmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KERUKQpgYGAKVGhlIHBhdGggaXMgZGVmaW5lZCwgd2hpY2ggaXMgYWxzbyBrbm93biBhcyB0aGUgSFRUUCByZXF1ZXN0LiBUaGlzIGlzIHRoZSBiYXNlIHVybCB0aGF0IHdpbGwgYmUgdXNlZCB0byBwdWxsIGRhdGEgZnJvbSB0aGUgQVBJLgoKYGBge3IgcGF0aH0KcGF0aCA8LSAiaHR0cHM6Ly9hcGkubnl0aW1lcy5jb20vc3ZjL2Jvb2tzL3YzL2xpc3RzLmpzb24iCmBgYAoKYGBge3IgYXBpIGtleSwgZWNobyA9IEZBTFNFfQphcGlfa2V5IDwtICc4R2NralY2MTl5Y0h0Tnl0MFcwQ0g0M2t2YUFWaGVEdycKYGBgCgpUaGUgYEdFVGAgcmVxdWVzdCBiZWxvdyBsaXN0cyBzZXZlcmFsIHBhcmFtZXRlcnMgdGhhdCB3ZXJlIHVzZWQgaW5zaWRlIG9mIHRoZSBxdWVyeS4KCi0gYGxpc3RgIHdhcyBzZXQgdG8gYGhhcmRjb3Zlci1maWN0aW9uYC4gVGhpcyBncmFicyBhbGwgb2YgdGhlIGBoYXJkY292ZXItZmljdGlvbmAgYm9va3MgZm9yIHRoZSB3ZWVrLWVuZGluZyBkYXRlIHNwZWNpZmllZCBpbiBgYmVzdHNlbGxlcnMtZGF0ZWAuCi0gYHB1Ymxpc2hlZC1kYXRlYCBzcGVjaWZpZXMgdGhlIGRhdGUgdGhhdCB0aGUgYmVzdCBzZWxsZXJzIGxpc3Qgd2FzIHB1Ymxpc2hlZCBvbiBOWVRpbWVzLmNvbQotIGBvZmZzZXRgIHNldHMgdGhlIHN0YXJ0aW5nIHBvaW50IG9mIHRoZSByZXN1bHRzIHNldC4gQnkgZGVmYXVsdCwgaXQgaXMgc2V0IHRvIDAuCi0gYGFwaS1rZXlgIGlzIHRoZSBBUEkga2V5IHRoYXQgd2FzIGdlbmVyYXRlZCB0aHJvdWdoIHNldHRpbmcgdXAgYSBkZXZlbG9wZXIubnl0aW1lcy5jb20gYWNjb3VudC4KCmBgYHtyIGdldCByZXF1ZXN0fQpyZXMgPC0gR0VUKHBhdGgsCiAgICAgICAgICBxdWVyeSA9IGxpc3QobGlzdCA9ICdoYXJkY292ZXItZmljdGlvbicsCiAgICAgICAgICAgICAgICAgICAgICAgJ2Jlc3RzZWxsZXJzLWRhdGUnID0gIjIwMTYtMDMtMDUiLAogICAgICAgICAgICAgICAgICAgICAgICdwdWJsaXNoZWQtZGF0ZScgPSAiMjAxNi0wMy0yMCIsCiAgICAgICAgICAgICAgICAgICAgICAgb2Zmc2V0ID0gMCwKICAgICAgICAgICAgICAgICAgICAgICAnYXBpLWtleScgPSBhcGlfa2V5KSkKYGBgCgojIyMgVHJhbnNmb3JtaW5nIFJhdyBJbXBvcnRlZCBEYXRhIGludG8gRGF0YWZyYW1lCgpUaGUgYGNvbnRlbnRgIGZ1bmN0aW9uIGlzIHVzZWQgaW4gb3JkZXIgdG8gdHJhbnNmb3JtIHRoZSBjb250ZW50cyBvZiB0aGUgcmVxdWVzdCBzdG9yZWQgaW4gYHJlc2AgYXMgYSBjaGFyYWN0ZXIgdmVjdG9yIHdpdGggVVRGLTggZW5jb2RpbmcuIFRoZSByZXN1bHRpbmcgY2hhcmFjdGVyIHZlY3RvciBpcyB0aGVuIHN0b3JlZCBpbiB0aGUgYHJlc3BvbnNlYCB2YXJpYWJsZS4gVGhlIGBmcm9tSlNPTmAgZnVuY3Rpb24gaXMgdGhlbSB1c2VkIG9uIHRoZSBgcmVzcG9uc2VgIHRvIGNvbnZlcnQgdGhlIEpTT04gZGF0YSB0byBhbiBSIG9iamVjdC4gVGhlbiB0aGUgYGRhdGEuZnJhbWVgIGZ1bmN0aW9uIGlzIHVzZWQgdG8gY29udmVydCB0aGlzIFIgb2JqZWN0IGludG8gYSBkYXRhZnJhbWUuCgpgYGB7ciBjcmVhdGluZyBkYXRhZnJhbWV9CnJlc3BvbnNlIDwtIGNvbnRlbnQocmVzLCBhcyA9ICJ0ZXh0IiwgZW5jb2RpbmcgPSAiVVRGLTgiKQoKYm9va19kZiA8LSBmcm9tSlNPTihyZXNwb25zZSwgZmxhdHRlbiA9IFRSVUUpICU+JQogIGRhdGEuZnJhbWUoKQoKZGF0YXRhYmxlKAogIGJvb2tfZGZbMTo1LF0sIGV4dGVuc2lvbnMgPSAnRml4ZWRDb2x1bW5zJywKICBvcHRpb25zID0gbGlzdCgKICBkb20gPSAndCcsCiAgc2Nyb2xsWCA9IFRSVUUsCiAgc2Nyb2xsQ29sbGFwc2UgPSBUUlVFCikpCgpgYGAKClRoZSBvdXRwdXQgYWJvdmUgc2hvd3MgdGhhdCB0aGUgYHJlc3VsdHMuaXNibnNgLCBgcmVzdWx0cy5ib29rX2RldGFpbHNgLCBhbmQgYHJlc3VsdHMucmV2aWV3c2AgY29sdW1ucyBjb250YWluIG5lc3RlZCBkYXRhZnJhbWVzLiBJbiBvcmRlciB0byBleHRyYWN0IHRoZSBuZXN0ZWQgZGF0YWZyYW1lcywgdGhlIGB1bm5lc3RgIGZ1bmN0aW9uIGlzIHVzZWQgYmVsb3cuCgpgYGB7ciB1bm5lc3RpbmcgZGF0YWZyYW1lfQoKdW5uZXN0ZWRfYm9va19kZiA8LSB1bm5lc3QoYm9va19kZiwgY29scyA9IGMocmVzdWx0cy5pc2JucywgcmVzdWx0cy5ib29rX2RldGFpbHMsIHJlc3VsdHMucmV2aWV3cykpCgpkYXRhdGFibGUoCiAgdW5uZXN0ZWRfYm9va19kZlsxOjUsXSwgZXh0ZW5zaW9ucyA9ICdGaXhlZENvbHVtbnMnLAogIG9wdGlvbnMgPSBsaXN0KAogIGRvbSA9ICd0JywKICBzY3JvbGxYID0gVFJVRSwKICBzY3JvbGxDb2xsYXBzZSA9IFRSVUUKKSkKCmBgYAoKVGhlIGB1bm5lc3RlZF9ib29rX2RmYCBkYXRhZnJhbWUgc2hvd3MgbXVsdGlwbGUgZW50cmllcyBmb3IgdGhlIHNhbWUgbm92ZWwsIHdoZW4gbG9va2luZyBhdCB0aGUgYHRpdGxlYCBjb2x1bW4uIFRoZXJlZm9yZSwgd2Ugd2FudCB0byBnZXQgcmlkIG9mIHRoZXNlIGR1cGxpY2F0ZSBlbnRyaWVzLiBUaGlzIGlzIGRvbmUgdXNpbmcgdGhlIGBkaXN0aW5jdGAgZnVuY3Rpb24gYmVsb3cuCgpgYGB7ciB1c2luZyBkaXN0aW5jdCBmdW5jdGlvbn0KCnJlbW92ZWRfZHVwbGljYXRlc19kZiA8LSB1bm5lc3RlZF9ib29rX2RmICU+JQogIGRpc3RpbmN0KHRpdGxlLCAua2VlcF9hbGwgPSBUUlVFKQoKZGF0YXRhYmxlKAogIHJlbW92ZWRfZHVwbGljYXRlc19kZiwgZXh0ZW5zaW9ucyA9ICdGaXhlZENvbHVtbnMnLAogIG9wdGlvbnMgPSBsaXN0KAogIGRvbSA9ICd0JywKICBzY3JvbGxYID0gVFJVRSwKICBzY3JvbGxDb2xsYXBzZSA9IFRSVUUKKSkKCmBgYAoKIyMjIENvbmNsdXNpb24KVGhpcyByZXBvcnQgc2hvd3MgaG93IHRvIGltcG9ydCBkYXRhIGZyb20gYSBOZXcgWW9yayBUaW1lcyBBUEkuIFRoZSByYXcgZGF0YSBpcyBoaWdobHkgdW5zdHJ1Y3R1cmVkIGFuZCBtdXN0IGJlIHRyYW5zZm9ybWVkIGluIG9yZGVyIHRvIGFuYWx5emUgYW5kIGRyYXcgY29uY2x1c2lvbnMgZnJvbS4gSW4gdGhlIGZ1dHVyZSwgdGhpcyBkYXRhIGNvdWxkIGJlIGFuYWx5emVkIHRvIGRldGVybWluZSB0aGUgYXZlcmFnZSBudW1iZXIgb2Ygd2Vla3MgYSBib29rIHN0YXlzIG9uIHRoZSBiZXN0IHNlbGxlciBsaXN0IGJ5IGdlbnJlLg==