DATA 607 Assignment 9

Introduction

In this assignment, we will load the New York Times Best Sellers list using an API key. Then, we will use the data to answer the following question: Which book category is ranked highest on the NYT Best Sellers list on average?

Load Packages

First, we load the necessary packages:

library(jsonlite)
library(tidyverse)
library(dplyr)
library(ggplot2)

Retrieve JSON from API

Next, we will retrieve the JSON data from the API url, using my API key.

# specify url 
api_url <- paste0("https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?api-key=", api_key)

# use the fromJSON function from the jsonlite package to parse the JSON data
bookReviews <- fromJSON(api_url)

Unnest Data

Next, we will unnest the data to create a clean data frame.

bookReviews_df <- as.data.frame(bookReviews$results)

bookRankings <- bookReviews_df %>%
  unnest_wider(isbns) %>%
  unnest_wider(ranks_history) %>%
  unnest_wider(reviews) %>%
  unnest(display_name) %>%
  unnest(rank) %>%
  # select only title, author, display name, and book ranking
  select(title, author, rank, display_name) %>%
  # select only one distinct title, no duplicates
  distinct(title, .keep_all = TRUE)%>%
  as.data.frame()

bookRankings

##                               title                               author rank
## 1          "I GIVE YOU MY BODY ..."                       Diana Gabaldon    8
## 2  "MOST BLESSED OF THE PATRIARCHS" Annette Gordon-Reed and Peter S Onuf   16
## 3    "YOU JUST NEED TO LOSE WEIGHT"                        Aubrey Gordon    2
## 4                       #ASKGARYVEE                      Gary Vaynerchuk    5
## 5                         #GIRLBOSS                       Sophia Amoruso    8
## 6                       #IMOMSOHARD      Kristin Hensley and Jen Smedley   10
## 7                       #NEVERAGAIN           David Hogg and Lauren Hogg    9
## 8             'TIL DEATH DO US PART                         Amanda Quick   15
## 9                   'TIS THE SEASON                             Ron Carr   18
## 10              (RE)BORN IN THE USA                        Roger Bennett    3
## 11         ------, THAT'S DELICIOUS   Action Bronson with Rachel Wharton    9
##                          display_name
## 1      Advice, How-To & Miscellaneous
## 2                Hardcover Nonfiction
## 3                Paperback Nonfiction
## 4                            Business
## 5                            Business
## 6      Advice, How-To & Miscellaneous
## 7                Paperback Nonfiction
## 8     Combined Print & E-Book Fiction
## 9       Paperback Mass-Market Fiction
## 10 Combined Print & E-Book Nonfiction
## 11     Advice, How-To & Miscellaneous

Analysis

Now, we will answer the question: Which book category is ranked highest on the NYT Best Sellers list on average?

We will first calculate the average ranking for each category:

# calculate average ranking for each category
avgRankings <- bookRankings %>%
  group_by(display_name) %>%
  summarise(AverageRanking = mean(rank)) %>%
  as.data.frame()

colnames(avgRankings) <- c("Category", "AverageRanking")

avgRankings

##                             Category AverageRanking
## 1     Advice, How-To & Miscellaneous            9.0
## 2                           Business            6.5
## 3    Combined Print & E-Book Fiction           15.0
## 4 Combined Print & E-Book Nonfiction            3.0
## 5               Hardcover Nonfiction           16.0
## 6      Paperback Mass-Market Fiction           18.0
## 7               Paperback Nonfiction            5.5

Now, we will visualize the results using a bar plot:

# plot average rankings for each category
ggplot(avgRankings) +
  geom_bar(aes(x = Category, y = AverageRanking, fill = Category), stat = "identity") +
  coord_flip() +
  ggtitle("Average Rankings for Each Book Category",
          "From New York Times Best Sellers List") +
  ylab("Average Ranking") +
  xlab("Book Category") +
  theme(axis.text.y = element_text(angle = 20, hjust = 1),
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        legend.position = "none")

Conclusion

As we know, the lower number a book is ranked, the higher it is on the New York Times Best Sellers list. So, the category with the lowest average ranking is more often ranked higher on the Best Sellers list. According to our graph, this category is “Combined Print & E-Book Nonfiction”.

Sources

“New York Times: Best Sellers” The New York Times, https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json