Source: GitHub RPubs

Load packages

library("tidyverse")
library("httr")
library("reactable")
library("jsonlite")
library("tibble")

Make a request to the API Server

In this assignment, The New York Times, JSON data was pulled using the Developer zone Book API on their Best Sellers list https://developer.nytimes.com/apis. A request for data with a GET function and returned a list containing all of the information from the API server.

nybk <- GET("https://api.nytimes.com/svc/books/v3/lists/names.json?api-key=mRnATzHBDOaywjgsvH9402TBMw8RhxEG")

A summary provides a “Status” on the success or failure of the API request, and it comes in the form of a number. A desired number 200 corresponds to a successful request. The content type indicates what form the data takes and this response has the data in a json format.

nybk      # summary look at the resulting response
## Response [https://api.nytimes.com/svc/books/v3/lists/names.json?api-key=mRnATzHBDOaywjgsvH9402TBMw8RhxEG]
##   Date: 2021-10-21 19:57
##   Status: 200
##   Content-Type: application/json; charset=UTF-8
##   Size: 12.5 kB

Parsing Data

The NYTimes Book API Best Sellers content list is converted from a raw vector to a character. The fromJSON function combined with rawToChar function will parse the JSON file, convert the raw vector to a character, and create a data set. We will use names command to provide a list of the data set list elements.

rawToChar(nybk$content)  # convert raw vector to character vector objects
nybook <- fromJSON(rawToChar(nybk$content))

names(nybook)   # returns list names
## [1] "status"      "copyright"   "num_results" "results"

We will analyze the results list for it contains a nested hierarchical data structure with 59 observations and 6 variables. The flatten() function from jsonlite package assigned each of the variables as its own column.

nybks_flat <- flatten(nybook$results)

str(nybks_flat)
## 'data.frame':    59 obs. of  6 variables:
##  $ list_name            : chr  "Combined Print and E-Book Fiction" "Combined Print and E-Book Nonfiction" "Hardcover Fiction" "Hardcover Nonfiction" ...
##  $ display_name         : chr  "Combined Print & E-Book Fiction" "Combined Print & E-Book Nonfiction" "Hardcover Fiction" "Hardcover Nonfiction" ...
##  $ list_name_encoded    : chr  "combined-print-and-e-book-fiction" "combined-print-and-e-book-nonfiction" "hardcover-fiction" "hardcover-nonfiction" ...
##  $ oldest_published_date: chr  "2011-02-13" "2011-02-13" "2008-06-08" "2008-06-08" ...
##  $ newest_published_date: chr  "2021-10-31" "2021-10-31" "2021-10-31" "2021-10-31" ...
##  $ updated              : chr  "WEEKLY" "WEEKLY" "WEEKLY" "WEEKLY" ...

Converting the data frame to a table and show only the first 10 variables with data type.

nybooks_tbl <- as.data.frame(nybks_flat) # convert data frame to a table data frame
names(nybooks_tbl) <- toupper(names(nybooks_tbl))  # convert the data frame names to uppercase

reactable(nybooks_tbl)

Analysis

Load packages for analysis and visualization

library("dplyr")
library("ggplot2")
library("stringr")

Extract sub strings in a character vector to analyze best selling books against the newest published date.

Year <- substr(nybooks_tbl$NEWEST_PUBLISHED_DATE, 1, 4 )
Month <- substr(nybooks_tbl$NEWEST_PUBLISHED_DATE, 6, 7)
Date <- substr(nybooks_tbl$NEWEST_PUBLISHED_DATE, nchar(nybooks_tbl$NEWEST_PUBLISHED_DATE) - 2 + 1, nchar(nybooks_tbl$NEWEST_PUBLISHED_DATE))

Combine the substrings into a data frame.

df <- cbind.data.frame(Year, Month, Date)

plot1 <- df %>% 
  group_by(Year) %>%
  summarise (Count = n()) %>%
  arrange(desc(Year))

plot1
## # A tibble: 7 x 2
##   Year  Count
##   <chr> <int>
## 1 2021     18
## 2 2019      2
## 3 2017     28
## 4 2016      1
## 5 2015      2
## 6 2013      6
## 7 2012      2
ggplot(df) +
  geom_bar(aes(Year, fill = Month)) +
 labs(x = "Published Year", y = "Count", 
     title = "NYTimes Best Selling Books by Newest Published Year and Month")

Conclusion

In this assignment, NYTimes Developers API’s, had a successful connection with GET() function to retrieve NYT Best Sellers Lists. A analysis was performed on What year the Best Selling Books were published?. The results showed a range between 2012 to 2021 and only three years (2014, 2018, and 2020) where a best selling book was not present. Although, these results are dynamic because the dataset is updated weekly and monthly.