The goal of this project is to retrieve data on the weekly NY Times Best-Seller lists for ebooks and hardcover books via the NY Times web API that could be useful for downstream analysis. For this assignment, we will just look at books on the best seller list from 10-01-2015.
We build the URL in several sections, keeping the base url seperate from the request and the key. This keeps the code more robust since the request portion of the string can be swapped and reformatted to handle multiple api requests. It also helps with debugging.
url_base = "http://api.nytimes.com/svc/books/v3/lists/"
request_url = "overview.json?published_date=2015-10-01"
key = "&api-key=364724d5a4e5edc752b95518f02ca848:5:73313805"
request = paste(url_base,request_url,key, sep="")
Since we are requesting a .json file, we use jsonlite to make the get request to the NY Times API.
package = jsonlite::fromJSON(request)
str(package, max.level = 2)
## List of 4
## $ status : chr "OK"
## $ copyright : chr "Copyright (c) 2015 The New York Times Company. All Rights Reserved."
## $ num_results: int 215
## $ results :List of 3
## ..$ bestsellers_date: chr "2015-09-19"
## ..$ published_date : chr "2015-10-04"
## ..$ lists :'data.frame': 43 obs. of 6 variables:
The API returned a package that must be drilled into to get the result data. The tables are in long format so we use dplyr_tbl to keep the console output format nice.
# Build dataframe with all book lists -- dplyr table for convience
book_lists = dplyr::tbl_df(package$results$lists)
#subset only to include lists that are updated weekly
weekly_lists = book_lists[book_lists$updated == "WEEKLY",]
#Make several book list dataframes from the book list categories
hardcover_fiction = dplyr::tbl_df(
as.data.frame(subset(
weekly_lists$books, weekly_lists$list_id==1)))
hardcover_non_fiction = dplyr::tbl_df(
as.data.frame(subset(
weekly_lists$books, weekly_lists$list_id==2)))
ebook_fiction = dplyr::tbl_df(
as.data.frame(subset(
weekly_lists$books, weekly_lists$list_id==201)))
ebook_non_fiction = dplyr::tbl_df(
as.data.frame(subset(
weekly_lists$books, weekly_lists$list_id==202)))
#final book list dfs
hardcover_fiction
## Source: local data frame [5 x 13]
##
## age_group author contributor contributor_note
## 1 Lee Child by Lee Child
## 2 David Lagercrantz by David Lagercrantz
## 3 J D Robb by J. D. Robb
## 4 Harper Lee by Harper Lee
## 5 Anthony Doerr by Anthony Doerr
## Variables not shown: created_date (chr), description (chr), price (int),
## primary_isbn13 (chr), primary_isbn10 (chr), publisher (chr), rank (int),
## title (chr), updated_date (chr)
hardcover_non_fiction
## Source: local data frame [5 x 13]
##
## age_group author
## 1 Mindy Kaling
## 2 Ta-Nehisi Coates
## 3 Mary Karr
## 4 Suzy Favor Hamilton with Sarah Tomlinson
## 5 David Brock
## Variables not shown: contributor (chr), contributor_note (chr),
## created_date (chr), description (chr), price (int), primary_isbn13
## (chr), primary_isbn10 (chr), publisher (chr), rank (int), title (chr),
## updated_date (chr)
ebook_fiction
## Source: local data frame [5 x 13]
##
## age_group author
## 1 J D Robb
## 2 Meredith Wild
## 3 Lee Child
## 4 Andy Weir
## 5 Catherine Coulter and JT Ellison
## Variables not shown: contributor (chr), contributor_note (chr),
## created_date (chr), description (chr), price (int), primary_isbn13
## (chr), primary_isbn10 (chr), publisher (chr), rank (int), title (chr),
## updated_date (chr)
ebook_non_fiction
## Source: local data frame [5 x 13]
##
## age_group author
## 1 Mindy Kaling
## 2 Caroline Moorehead
## 3 Suzy Favor Hamilton with Sarah Tomlinson
## 4 James MacGregor Burns and Susan Dunn
## 5 Harry Bernstein
## Variables not shown: contributor (chr), contributor_note (chr),
## created_date (chr), description (chr), price (int), primary_isbn13
## (chr), primary_isbn10 (chr), publisher (chr), rank (int), title (chr),
## updated_date (chr)