NY Times Best Seller List

The goal of this project is to retrieve data on the weekly NY Times Best-Seller lists for ebooks and hardcover books via the NY Times web API that could be useful for downstream analysis. For this assignment, we will just look at books on the best seller list from 10-01-2015.

Set Up API Request URL

We build the URL in several sections, keeping the base url seperate from the request and the key. This keeps the code more robust since the request portion of the string can be swapped and reformatted to handle multiple api requests. It also helps with debugging.

url_base = "http://api.nytimes.com/svc/books/v3/lists/"
request_url = "overview.json?published_date=2015-10-01"
key = "&api-key=364724d5a4e5edc752b95518f02ca848:5:73313805"

request = paste(url_base,request_url,key, sep="")

JSON request

Since we are requesting a .json file, we use jsonlite to make the get request to the NY Times API.

package = jsonlite::fromJSON(request)

str(package, max.level = 2)

## List of 4
##  $ status     : chr "OK"
##  $ copyright  : chr "Copyright (c) 2015 The New York Times Company.  All Rights Reserved."
##  $ num_results: int 215
##  $ results    :List of 3
##   ..$ bestsellers_date: chr "2015-09-19"
##   ..$ published_date  : chr "2015-10-04"
##   ..$ lists           :'data.frame': 43 obs. of  6 variables:

Build dataframes

The API returned a package that must be drilled into to get the result data. The tables are in long format so we use dplyr_tbl to keep the console output format nice.

# Build dataframe with all book lists -- dplyr table for convience
book_lists = dplyr::tbl_df(package$results$lists)

#subset only to include lists that are updated weekly
weekly_lists = book_lists[book_lists$updated == "WEEKLY",]

#Make several book list dataframes from the book list categories
hardcover_fiction = dplyr::tbl_df( 
    as.data.frame(subset(
        weekly_lists$books, weekly_lists$list_id==1)))
hardcover_non_fiction = dplyr::tbl_df( 
    as.data.frame(subset(
        weekly_lists$books, weekly_lists$list_id==2))) 
ebook_fiction = dplyr::tbl_df( 
    as.data.frame(subset(
        weekly_lists$books, weekly_lists$list_id==201)))
ebook_non_fiction = dplyr::tbl_df( 
    as.data.frame(subset(
        weekly_lists$books, weekly_lists$list_id==202)))

#final book list dfs
hardcover_fiction

## Source: local data frame [5 x 13]
## 
##   age_group            author          contributor contributor_note
## 1                   Lee Child         by Lee Child                 
## 2           David Lagercrantz by David Lagercrantz                 
## 3                    J D Robb        by J. D. Robb                 
## 4                  Harper Lee        by Harper Lee                 
## 5               Anthony Doerr     by Anthony Doerr                 
## Variables not shown: created_date (chr), description (chr), price (int),
##   primary_isbn13 (chr), primary_isbn10 (chr), publisher (chr), rank (int),
##   title (chr), updated_date (chr)

hardcover_non_fiction

## Source: local data frame [5 x 13]
## 
##   age_group                                   author
## 1                                       Mindy Kaling
## 2                                   Ta-Nehisi Coates
## 3                                          Mary Karr
## 4           Suzy Favor Hamilton with Sarah Tomlinson
## 5                                        David Brock
## Variables not shown: contributor (chr), contributor_note (chr),
##   created_date (chr), description (chr), price (int), primary_isbn13
##   (chr), primary_isbn10 (chr), publisher (chr), rank (int), title (chr),
##   updated_date (chr)

ebook_fiction

## Source: local data frame [5 x 13]
## 
##   age_group                           author
## 1                                   J D Robb
## 2                              Meredith Wild
## 3                                  Lee Child
## 4                                  Andy Weir
## 5           Catherine Coulter and JT Ellison
## Variables not shown: contributor (chr), contributor_note (chr),
##   created_date (chr), description (chr), price (int), primary_isbn13
##   (chr), primary_isbn10 (chr), publisher (chr), rank (int), title (chr),
##   updated_date (chr)

ebook_non_fiction

## Source: local data frame [5 x 13]
## 
##   age_group                                   author
## 1                                       Mindy Kaling
## 2                                 Caroline Moorehead
## 3           Suzy Favor Hamilton with Sarah Tomlinson
## 4               James MacGregor Burns and Susan Dunn
## 5                                    Harry Bernstein
## Variables not shown: contributor (chr), contributor_note (chr),
##   created_date (chr), description (chr), price (int), primary_isbn13
##   (chr), primary_isbn10 (chr), publisher (chr), rank (int), title (chr),
##   updated_date (chr)

NY Times Best Seller List

jdeblase

October 30, 2015

Set Up API Request URL

JSON request

Build dataframes