In this document we will call the NYTimes Books API using a generated key. We will retrieve a list of books and then run analyses on the data.
library(jsonlite)
library(dplyr)
library(tidyr)
library(ggplot2)
nyt_key <- "cznmVxP8aoZ2RYnd6UHhW1HYuf0IV5bo"
nyt_lists <- fromJSON(paste("http://api.nytimes.com/svc/books/v3/lists/names.json?api-key=",nyt_key)) %>% data.frame()
head(nyt_lists$results.display_name)
## [1] "Combined Print & E-Book Fiction"
## [2] "Combined Print & E-Book Nonfiction"
## [3] "Hardcover Fiction"
## [4] "Hardcover Nonfiction"
## [5] "Paperback Trade Fiction"
## [6] "Paperback Mass-Market Fiction"
# An interesting and broad list would be the Combined Print & E-Book Fiction list. This is called using list_name_encoded
list_name <- nyt_lists$results.list_name_encoded[1]
# We will now perform a new API call. We concatenate the call with variable names so it is easier to change later.
nyt_current_fiction <- fromJSON(paste("http://api.nytimes.com/svc/books/v3/lists/currrent/",list_name,".json?api-key=",nyt_key,sep=""))
best_books <- as.data.frame(nyt_current_fiction$results$books)
summary(best_books$weeks_on_list)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 7.467 5.000 58.000
summary(best_books$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 0 0 0
It appears that the mean time these current books have been on the best seller list for this category is seven and a half weeks. An outlier, wth 58 weeks on the list, is increasing the mean. The NYTimes is not keeping accurate price data for this list as each book has a price of zero, giving us no useful data.
We will create a simple plot that shows how often each publisher is represented in this list.
book_plot <- as.data.frame(table(best_books$publisher))
ggplot(book_plot, aes(x=Var1,y = Freq)) +
geom_bar(width = .75,stat = "identity", position="dodge") +
ggtitle("Count of Books by Publisher on NYTimes Best Selling Fiction Books List") +
labs(x="Publisher",y="Frequency") +
theme(plot.title = element_text(hjust=0.5),axis.text.x = element_text(angle=90,hjust=1)) +
scale_y_continuous(breaks = seq(0,4,by = 1))
We can see that Putnam, Scribner, and Little, Brown have multiple entries on the list, with Putnam at the top with three.