In this assignment, we will load the New York Times Best Sellers list using an API key. Then, we will use the data to answer the following question: Which book category is ranked highest on the NYT Best Sellers list on average?
First, we load the necessary packages:
library(jsonlite)
library(tidyverse)
library(dplyr)
library(ggplot2)
Next, we will retrieve the JSON data from the API url, using my API key.
# specify url
api_url <- paste0("https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?api-key=", api_key)
# use the fromJSON function from the jsonlite package to parse the JSON data
bookReviews <- fromJSON(api_url)
Next, we will unnest the data to create a clean data frame.
bookReviews_df <- as.data.frame(bookReviews$results)
bookRankings <- bookReviews_df %>%
unnest_wider(isbns) %>%
unnest_wider(ranks_history) %>%
unnest_wider(reviews) %>%
unnest(display_name) %>%
unnest(rank) %>%
# select only title, author, display name, and book ranking
select(title, author, rank, display_name) %>%
# select only one distinct title, no duplicates
distinct(title, .keep_all = TRUE)%>%
as.data.frame()
bookRankings
## title author rank
## 1 "I GIVE YOU MY BODY ..." Diana Gabaldon 8
## 2 "MOST BLESSED OF THE PATRIARCHS" Annette Gordon-Reed and Peter S Onuf 16
## 3 "YOU JUST NEED TO LOSE WEIGHT" Aubrey Gordon 2
## 4 #ASKGARYVEE Gary Vaynerchuk 5
## 5 #GIRLBOSS Sophia Amoruso 8
## 6 #IMOMSOHARD Kristin Hensley and Jen Smedley 10
## 7 #NEVERAGAIN David Hogg and Lauren Hogg 9
## 8 'TIL DEATH DO US PART Amanda Quick 15
## 9 'TIS THE SEASON Ron Carr 18
## 10 (RE)BORN IN THE USA Roger Bennett 3
## 11 ------, THAT'S DELICIOUS Action Bronson with Rachel Wharton 9
## display_name
## 1 Advice, How-To & Miscellaneous
## 2 Hardcover Nonfiction
## 3 Paperback Nonfiction
## 4 Business
## 5 Business
## 6 Advice, How-To & Miscellaneous
## 7 Paperback Nonfiction
## 8 Combined Print & E-Book Fiction
## 9 Paperback Mass-Market Fiction
## 10 Combined Print & E-Book Nonfiction
## 11 Advice, How-To & Miscellaneous
Now, we will answer the question: Which book category is ranked highest on the NYT Best Sellers list on average?
We will first calculate the average ranking for each category:
# calculate average ranking for each category
avgRankings <- bookRankings %>%
group_by(display_name) %>%
summarise(AverageRanking = mean(rank)) %>%
as.data.frame()
colnames(avgRankings) <- c("Category", "AverageRanking")
avgRankings
## Category AverageRanking
## 1 Advice, How-To & Miscellaneous 9.0
## 2 Business 6.5
## 3 Combined Print & E-Book Fiction 15.0
## 4 Combined Print & E-Book Nonfiction 3.0
## 5 Hardcover Nonfiction 16.0
## 6 Paperback Mass-Market Fiction 18.0
## 7 Paperback Nonfiction 5.5
Now, we will visualize the results using a bar plot:
# plot average rankings for each category
ggplot(avgRankings) +
geom_bar(aes(x = Category, y = AverageRanking, fill = Category), stat = "identity") +
coord_flip() +
ggtitle("Average Rankings for Each Book Category",
"From New York Times Best Sellers List") +
ylab("Average Ranking") +
xlab("Book Category") +
theme(axis.text.y = element_text(angle = 20, hjust = 1),
plot.title = element_text(hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5),
legend.position = "none")
As we know, the lower number a book is ranked, the higher it is on the New York Times Best Sellers list. So, the category with the lowest average ranking is more often ranked higher on the Best Sellers list. According to our graph, this category is “Combined Print & E-Book Nonfiction”.
“New York Times: Best Sellers” The New York Times, https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json