Web API (NYTimes books)

Introduction

In this document we will call the NYTimes Books API using a generated key. We will retrieve a list of books and then run analyses on the data.

Call API

library(jsonlite)
library(dplyr)
library(tidyr)
library(ggplot2)

nyt_key <- "cznmVxP8aoZ2RYnd6UHhW1HYuf0IV5bo"

nyt_lists <- fromJSON(paste("http://api.nytimes.com/svc/books/v3/lists/names.json?api-key=",nyt_key)) %>% data.frame()
head(nyt_lists$results.display_name)

## [1] "Combined Print & E-Book Fiction"   
## [2] "Combined Print & E-Book Nonfiction"
## [3] "Hardcover Fiction"                 
## [4] "Hardcover Nonfiction"              
## [5] "Paperback Trade Fiction"           
## [6] "Paperback Mass-Market Fiction"

# An interesting and broad list would be the Combined Print & E-Book Fiction list. This is called using list_name_encoded

list_name <- nyt_lists$results.list_name_encoded[1]

# We will now perform a new API call. We concatenate the call with variable names so it is easier to change later.

nyt_current_fiction <- fromJSON(paste("http://api.nytimes.com/svc/books/v3/lists/currrent/",list_name,".json?api-key=",nyt_key,sep=""))

best_books <- as.data.frame(nyt_current_fiction$results$books)

best_books Exploratory Data Analysis

summary(best_books$weeks_on_list)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   7.467   5.000  58.000

summary(best_books$price)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0       0       0       0

It appears that the mean time these current books have been on the best seller list for this category is seven and a half weeks. An outlier, wth 58 weeks on the list, is increasing the mean. The NYTimes is not keeping accurate price data for this list as each book has a price of zero, giving us no useful data.

Publisher Plot

We will create a simple plot that shows how often each publisher is represented in this list.

book_plot <- as.data.frame(table(best_books$publisher))

ggplot(book_plot, aes(x=Var1,y = Freq)) +
   geom_bar(width = .75,stat = "identity", position="dodge") +
   ggtitle("Count of Books by Publisher on NYTimes Best Selling Fiction Books List") +
   labs(x="Publisher",y="Frequency") +
   theme(plot.title = element_text(hjust=0.5),axis.text.x = element_text(angle=90,hjust=1)) +
   scale_y_continuous(breaks = seq(0,4,by = 1))

We can see that Putnam, Scribner, and Little, Brown have multiple entries on the list, with Putnam at the top with three.

Web API (NYTimes books)

Harris Dupre

10/27/2019

Introduction

Call API

best_books Exploratory Data Analysis

Publisher Plot