Being able to interact with and extract data from API’s is a critical skill for a data scientist. For this project, I will work with The New York Times web site API. In looking at the documentation available to developers, there are several different APIs to choose from. As a father with two young daughters who love to read, the Books API caught my attention. The goal of this project will be to get the current list of bestselling children’s books.
In order access The New York Times API, you need to request an API key, which was a simple and painless process. With key in hand, we’re ready to get started. Let’s load the libraries we will need for this project.
library(httr)
library(tidyverse)
## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------ tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts --------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(jsonlite)
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
##
## flatten
library(xml2)
According to the NY Times API documentation for the Books API, all URI’s are relative the following path: https://api.nytimes.com/svc/books/v3. Any API calls we make will start with this path, and then we will add additional arguments as we navigate to different sections of data. Another item of note is, based on the documentation, it looks like responses will be in JSON format.
In order to get access to the NY Times Bestseller list for Children’s Books, we first need to specify the exact list we want to look at. To do this we can make a request to the List Names service within the API. This service returns a list of all the NYT Best Sellers lists. It also includes other helpful information such as how often the list is updated and when it was last updated.
To get started, I’ll make a call to the List Names service so we can see what lists are available for us to use in our next step. The call will look something like this: Base URL/lists/names?api-key. As the response will be in JSON, we will use the fromJSON function from the jsonlite library to make this call. The call returns a list, with the second item of the list being the data we requested. I’ll index the list for the second item, and then us the as.data.frame() function to convert it to a data frame.
lists <- jsonlite::fromJSON("https://api.nytimes.com/svc/books/lists/names.json?api-key=RrYetImEkeHEqaKXs7n4ZLZ1bhmr7JsO")
lists <- as.data.frame(lists[2])
lists
## body.results.list_name
## 1 Combined Print and E-Book Fiction
## 2 Combined Print and E-Book Nonfiction
## 3 Hardcover Fiction
## 4 Hardcover Nonfiction
## 5 Trade Fiction Paperback
## 6 Mass Market Paperback
## 7 Paperback Nonfiction
## 8 E-Book Fiction
## 9 E-Book Nonfiction
## 10 Hardcover Advice
## 11 Paperback Advice
## 12 Advice How-To and Miscellaneous
## 13 Hardcover Graphic Books
## 14 Paperback Graphic Books
## 15 Manga
## 16 Combined Print Fiction
## 17 Combined Print Nonfiction
## 18 Chapter Books
## 19 Childrens Middle Grade
## 20 Childrens Middle Grade E-Book
## 21 Childrens Middle Grade Hardcover
## 22 Childrens Middle Grade Paperback
## 23 Paperback Books
## 24 Picture Books
## 25 Series Books
## 26 Young Adult
## 27 Young Adult E-Book
## 28 Young Adult Hardcover
## 29 Young Adult Paperback
## 30 Animals
## 31 Audio Fiction
## 32 Audio Nonfiction
## 33 Business Books
## 34 Celebrities
## 35 Crime and Punishment
## 36 Culture
## 37 Education
## 38 Espionage
## 39 Expeditions Disasters and Adventures
## 40 Fashion Manners and Customs
## 41 Food and Fitness
## 42 Games and Activities
## 43 Graphic Books and Manga
## 44 Hardcover Business Books
## 45 Health
## 46 Humor
## 47 Indigenous Americans
## 48 Relationships
## 49 Mass Market Monthly
## 50 Middle Grade Paperback Monthly
## 51 Paperback Business Books
## 52 Family
## 53 Hardcover Political Books
## 54 Race and Civil Rights
## 55 Religion Spirituality and Faith
## 56 Science
## 57 Sports
## 58 Travel
## 59 Young Adult Paperback Monthly
## body.results.display_name
## 1 Combined Print & E-Book Fiction
## 2 Combined Print & E-Book Nonfiction
## 3 Hardcover Fiction
## 4 Hardcover Nonfiction
## 5 Paperback Trade Fiction
## 6 Paperback Mass-Market Fiction
## 7 Paperback Nonfiction
## 8 E-Book Fiction
## 9 E-Book Nonfiction
## 10 Hardcover Advice & Misc.
## 11 Paperback Advice & Misc.
## 12 Advice, How-To & Miscellaneous
## 13 Hardcover Graphic Books
## 14 Paperback Graphic Books
## 15 Manga
## 16 Combined Hardcover & Paperback Fiction
## 17 Combined Hardcover & Paperback Nonfiction
## 18 Children’s Chapter Books
## 19 Children’s Middle Grade
## 20 Children’s Middle Grade E-Book
## 21 Children’s Middle Grade Hardcover
## 22 Children’s Middle Grade Paperback
## 23 Children’s Paperback Books
## 24 Children’s Picture Books
## 25 Children’s Series
## 26 Young Adult
## 27 Young Adult E-Book
## 28 Young Adult Hardcover
## 29 Young Adult Paperback
## 30 Animals
## 31 Audio Fiction
## 32 Audio Nonfiction
## 33 Business
## 34 Celebrities
## 35 Crime and Punishment
## 36 Culture
## 37 Education
## 38 Espionage
## 39 Expeditions
## 40 Fashion, Manners and Customs
## 41 Food and Diet
## 42 Games and Activities
## 43 Graphic Books and Manga
## 44 Hardcover Business Books
## 45 Health
## 46 Humor
## 47 Indigenous Americans
## 48 Love and Relationships
## 49 Mass Market
## 50 Middle Grade Paperback
## 51 Paperback Business Books
## 52 Parenthood and Family
## 53 Politics and American History
## 54 Race and Civil Rights
## 55 Religion, Spirituality and Faith
## 56 Science
## 57 Sports and Fitness
## 58 Travel
## 59 Young Adult Paperback
## body.results.list_name_encoded body.results.oldest_published_date
## 1 combined-print-and-e-book-fiction 2011-02-13
## 2 combined-print-and-e-book-nonfiction 2011-02-13
## 3 hardcover-fiction 2008-06-08
## 4 hardcover-nonfiction 2008-06-08
## 5 trade-fiction-paperback 2008-06-08
## 6 mass-market-paperback 2008-06-08
## 7 paperback-nonfiction 2008-06-08
## 8 e-book-fiction 2011-02-13
## 9 e-book-nonfiction 2011-02-13
## 10 hardcover-advice 2008-06-08
## 11 paperback-advice 2008-06-08
## 12 advice-how-to-and-miscellaneous 2013-04-28
## 13 hardcover-graphic-books 2009-03-15
## 14 paperback-graphic-books 2009-03-15
## 15 manga 2009-03-15
## 16 combined-print-fiction 2011-02-13
## 17 combined-print-nonfiction 2011-02-13
## 18 chapter-books 2008-06-08
## 19 childrens-middle-grade 2012-12-16
## 20 childrens-middle-grade-e-book 2015-08-30
## 21 childrens-middle-grade-hardcover 2015-08-30
## 22 childrens-middle-grade-paperback 2015-08-30
## 23 paperback-books 2008-06-08
## 24 picture-books 2008-06-08
## 25 series-books 2008-06-08
## 26 young-adult 2012-12-16
## 27 young-adult-e-book 2015-08-30
## 28 young-adult-hardcover 2015-08-30
## 29 young-adult-paperback 2015-08-30
## 30 animals 2014-09-07
## 31 audio-fiction 2018-03-11
## 32 audio-nonfiction 2018-03-11
## 33 business-books 2013-11-03
## 34 celebrities 2014-09-07
## 35 crime-and-punishment 2014-10-12
## 36 culture 2014-10-12
## 37 education 2014-10-12
## 38 espionage 2014-12-14
## 39 expeditions-disasters-and-adventures 2014-12-14
## 40 fashion-manners-and-customs 2014-10-12
## 41 food-and-fitness 2013-09-01
## 42 games-and-activities 2014-10-12
## 43 graphic-books-and-manga 2019-10-13
## 44 hardcover-business-books 2011-07-03
## 45 health 2014-10-12
## 46 humor 2014-09-07
## 47 indigenous-americans 2014-12-14
## 48 relationships 2014-09-07
## 49 mass-market-monthly 2019-10-13
## 50 middle-grade-paperback-monthly 2019-10-13
## 51 paperback-business-books 2011-07-03
## 52 family 2014-09-07
## 53 hardcover-political-books 2011-07-03
## 54 race-and-civil-rights 2014-12-14
## 55 religion-spirituality-and-faith 2014-09-07
## 56 science 2013-04-14
## 57 sports 2014-03-02
## 58 travel 2014-09-07
## 59 young-adult-paperback-monthly 2019-10-13
## body.results.newest_published_date body.results.updated body.num_results
## 1 2020-04-05 WEEKLY 59
## 2 2020-04-05 WEEKLY 59
## 3 2020-04-05 WEEKLY 59
## 4 2020-04-05 WEEKLY 59
## 5 2020-04-05 WEEKLY 59
## 6 2017-01-29 WEEKLY 59
## 7 2020-04-05 WEEKLY 59
## 8 2017-01-29 WEEKLY 59
## 9 2017-01-29 WEEKLY 59
## 10 2013-04-21 WEEKLY 59
## 11 2013-04-21 WEEKLY 59
## 12 2020-04-05 WEEKLY 59
## 13 2017-01-29 WEEKLY 59
## 14 2017-01-29 WEEKLY 59
## 15 2017-01-29 WEEKLY 59
## 16 2013-05-12 WEEKLY 59
## 17 2013-05-12 WEEKLY 59
## 18 2012-12-09 WEEKLY 59
## 19 2015-08-23 WEEKLY 59
## 20 2017-01-29 WEEKLY 59
## 21 2020-04-05 WEEKLY 59
## 22 2017-01-29 WEEKLY 59
## 23 2012-12-09 WEEKLY 59
## 24 2020-04-05 WEEKLY 59
## 25 2020-04-05 WEEKLY 59
## 26 2015-08-23 WEEKLY 59
## 27 2017-01-29 WEEKLY 59
## 28 2020-04-05 WEEKLY 59
## 29 2017-01-29 WEEKLY 59
## 30 2017-01-15 MONTHLY 59
## 31 2020-03-15 MONTHLY 59
## 32 2020-03-15 MONTHLY 59
## 33 2020-03-15 MONTHLY 59
## 34 2017-01-15 MONTHLY 59
## 35 2017-01-15 MONTHLY 59
## 36 2017-01-15 MONTHLY 59
## 37 2017-01-15 MONTHLY 59
## 38 2017-01-15 MONTHLY 59
## 39 2017-01-15 MONTHLY 59
## 40 2017-01-15 MONTHLY 59
## 41 2017-01-15 MONTHLY 59
## 42 2017-01-15 MONTHLY 59
## 43 2020-02-16 MONTHLY 59
## 44 2013-10-13 MONTHLY 59
## 45 2017-01-15 MONTHLY 59
## 46 2017-01-15 MONTHLY 59
## 47 2016-01-10 MONTHLY 59
## 48 2017-01-15 MONTHLY 59
## 49 2020-03-15 MONTHLY 59
## 50 2020-03-15 MONTHLY 59
## 51 2013-10-13 MONTHLY 59
## 52 2017-01-15 MONTHLY 59
## 53 2017-01-15 MONTHLY 59
## 54 2017-01-15 MONTHLY 59
## 55 2017-01-15 MONTHLY 59
## 56 2019-09-15 MONTHLY 59
## 57 2019-09-15 MONTHLY 59
## 58 2017-01-15 MONTHLY 59
## 59 2020-03-15 MONTHLY 59
It looks like there are a total of 59 rows, so let’s look at the unique values from the body.results.list_name column to see what lists involve children’s books.
unique(lists$body.results.display_name)
## [1] "Combined Print & E-Book Fiction"
## [2] "Combined Print & E-Book Nonfiction"
## [3] "Hardcover Fiction"
## [4] "Hardcover Nonfiction"
## [5] "Paperback Trade Fiction"
## [6] "Paperback Mass-Market Fiction"
## [7] "Paperback Nonfiction"
## [8] "E-Book Fiction"
## [9] "E-Book Nonfiction"
## [10] "Hardcover Advice & Misc."
## [11] "Paperback Advice & Misc."
## [12] "Advice, How-To & Miscellaneous"
## [13] "Hardcover Graphic Books"
## [14] "Paperback Graphic Books"
## [15] "Manga"
## [16] "Combined Hardcover & Paperback Fiction"
## [17] "Combined Hardcover & Paperback Nonfiction"
## [18] "Children’s Chapter Books"
## [19] "Children’s Middle Grade"
## [20] "Children’s Middle Grade E-Book"
## [21] "Children’s Middle Grade Hardcover"
## [22] "Children’s Middle Grade Paperback"
## [23] "Children’s Paperback Books"
## [24] "Children’s Picture Books"
## [25] "Children’s Series"
## [26] "Young Adult"
## [27] "Young Adult E-Book"
## [28] "Young Adult Hardcover"
## [29] "Young Adult Paperback"
## [30] "Animals"
## [31] "Audio Fiction"
## [32] "Audio Nonfiction"
## [33] "Business"
## [34] "Celebrities"
## [35] "Crime and Punishment"
## [36] "Culture"
## [37] "Education"
## [38] "Espionage"
## [39] "Expeditions"
## [40] "Fashion, Manners and Customs"
## [41] "Food and Diet"
## [42] "Games and Activities"
## [43] "Graphic Books and Manga"
## [44] "Hardcover Business Books"
## [45] "Health"
## [46] "Humor"
## [47] "Indigenous Americans"
## [48] "Love and Relationships"
## [49] "Mass Market"
## [50] "Middle Grade Paperback"
## [51] "Paperback Business Books"
## [52] "Parenthood and Family"
## [53] "Politics and American History"
## [54] "Race and Civil Rights"
## [55] "Religion, Spirituality and Faith"
## [56] "Science"
## [57] "Sports and Fitness"
## [58] "Travel"
In looking at the above list, “Children’s Picture Books” looks like just what I am looking for. Let’s filter down the list data frame so we can see the specifics on how often it is updated and other information we will need to make the API call to get the bestseller list for this “Children’s Picture Books” list.
lists %>% filter(body.results.display_name == "Children’s Picture Books")
## body.results.list_name body.results.display_name
## 1 Picture Books Children’s Picture Books
## body.results.list_name_encoded body.results.oldest_published_date
## 1 picture-books 2008-06-08
## body.results.newest_published_date body.results.updated body.num_results
## 1 2020-04-05 WEEKLY 59
In the output above, in the column “body.results.updated”, we can see that this list is updated weekly which means that our results will be very fresh.
To see the books in this best seller list for “Children’s Picture Books”, we’ll have to make a call to the List Data Service using the value from the “body.results.list_name_encoded” column, which is “picture-books”. We’ll make a call, like the call we made above, however, this time we’ll specify the date range of the list we are after. Looking at the API documentation, it says we can use “current” if we want to get the latest list, which we do. Additionally, the List Data service requires that we pass in the name of the list we are interested in. The call will look something like this: Base URL/{date range}/{best seller list name}?api-key
cb <- fromJSON("https://api.nytimes.com/svc/books/v3/lists/current/picture-books.json?api-key=RrYetImEkeHEqaKXs7n4ZLZ1bhmr7JsO")
children_books <- as.data.frame(cb$results$books)
children_books
## rank rank_last_week weeks_on_list asterisk dagger primary_isbn10
## 1 1 4 19 0 0 006219867X
## 2 2 3 10 0 0 None
## 3 3 5 2 0 0 0062975676
## 4 4 0 65 0 0 142319957X
## 5 5 1 4 0 0 0062404504
## 6 6 2 18 0 0 1492632910
## 7 7 8 227 0 0 0385376715
## 8 8 7 299 0 0 0803736800
## 9 9 9 304 0 0 0399255370
## 10 10 10 170 0 0 0803741715
## primary_isbn13 publisher
## 1 9780062198679 HarperFestival
## 2 9781728221656 Sourcebooks Wonderland
## 3 9780062975676 HarperCollins
## 4 9781423199571 Hyperion
## 5 9780062404503 HarperCollins
## 6 9781492632917 Sourcebooks Jabberwocky
## 7 9780385376716 Random House
## 8 9780803736801 Dial
## 9 9780399255373 Philomel
## 10 9780803741713 Dial
## description price
## 1 A certain rabbit needs Pete's help. 0
## 2 Children attempt to capture the mythical creature. 0
## 3 Good Egg and his pals escape their carton! 0
## 4 Impatient Gerald has to wait for Piggie’s promised surprise. 0
## 5 Pete opens a leprechaun catching business. 0
## 6 This is the year you'll finally catch a leprechaun. 0
## 7 A celebration of future possibilities. 0
## 8 What to serve your dragon-guests. 0
## 9 Problems arise when Duncan’s crayons revolt. 0
## 10 Silly songs and sound effects. 0
## title author
## 1 PETE THE CAT: BIG EASTER ADVENTURE James Dean and Kimberly Dean
## 2 HOW TO CATCH A UNICORN Adam Wallace
## 3 THE GOOD EGG PRESENTS: THE GREAT EGGSCAPE! Jory John
## 4 WAITING IS NOT EASY! Mo Willems
## 5 PETE THE CAT: THE GREAT LEPRECHAUN CHASE James Dean
## 6 HOW TO CATCH A LEPRECHAUN Adam Wallace
## 7 THE WONDERFUL THINGS YOU WILL BE Emily Winfield Martin
## 8 DRAGONS LOVE TACOS Adam Rubin
## 9 THE DAY THE CRAYONS QUIT Drew Daywalt
## 10 THE BOOK WITH NO PICTURES B J Novak
## contributor
## 1 by James Dean and Kimberly Dean
## 2 by Adam Wallace. Illustrated by Andy Elkerton
## 3 by Jory John. Illustrated by Pete Oswald
## 4 by Mo Willems
## 5 by James Dean
## 6 by Adam Wallace. Illustrated by Andy Elkerton
## 7 by Emily Winfield Martin
## 8 by Adam Rubin. Illustrated by Daniel Salmieri
## 9 by Drew Daywalt. Illustrated by Oliver Jeffers
## 10 by B. J. Novak
## contributor_note
## 1
## 2 Illustrated by Andy Elkerton
## 3 Illustrated by Pete Oswald
## 4
## 5
## 6 Illustrated by Andy Elkerton
## 7
## 8 Illustrated by Daniel Salmieri
## 9 Illustrated by Oliver Jeffers
## 10
## book_image book_image_width
## 1 https://s1.nyt.com/du/books/images/9780062198679.jpg 330
## 2 https://s1.nyt.com/du/books/images/9781492669739.jpg 330
## 3 https://s1.nyt.com/du/books/images/9780062975676.jpg 500
## 4 https://s1.nyt.com/du/books/images/9781423199571.jpg 128
## 5 https://s1.nyt.com/du/books/images/9780062404503.jpg 330
## 6 https://s1.nyt.com/du/books/images/9781492632917.jpg 330
## 7 https://s1.nyt.com/du/books/images/9780385376716.jpg 330
## 8 https://s1.nyt.com/du/books/images/9780803736801.jpg 330
## 9 https://s1.nyt.com/du/books/images/9780399255373.jpg 330
## 10 https://s1.nyt.com/du/books/images/9780803741713.jpg 128
## book_image_height
## 1 345
## 2 330
## 3 500
## 4 175
## 5 330
## 6 348
## 7 347
## 8 330
## 9 332
## 10 159
## amazon_product_url
## 1 https://www.amazon.com/Pete-Cat-Big-Easter-Adventure/dp/006219867X?tag=NYTBS-20
## 2 https://www.amazon.com/How-Catch-Unicorn-Adam-Wallace/dp/1492669733?tag=NYTBS-20
## 3 https://www.amazon.com/dp/0062975676?tag=NYTBSREV-20&tag=NYTBS-20
## 4 http://www.amazon.com/Waiting-Easy-Elephant-Piggie-Book/dp/142319957X?tag=NYTBS-20
## 5 https://www.amazon.com/Pete-Cat-Leprechaun-Patricks-Fold-Out/dp/0062404504?tag=NYTBS-20
## 6 http://www.amazon.com/How-Catch-Leprechaun-Adam-Wallace/dp/1492632910?tag=NYTBS-20
## 7 http://www.amazon.com/The-Wonderful-Things-You-Will/dp/0385376715?tag=NYTBS-20
## 8 http://www.amazon.com/Dragons-Love-Tacos-Adam-Rubin/dp/0803736800?tag=NYTBS-20
## 9 http://www.amazon.com/The-Crayons-Quit-Drew-Daywalt/dp/0399255370?tag=NYTBS-20
## 10 http://www.amazon.com/The-Book-Pictures-B-J-Novak/dp/0803741715?tag=NYTBS-20
## age_group book_review_link first_chapter_link
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
## sunday_review_link
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9 https://www.nytimes.com/2013/08/25/books/review/henris-scissors-by-jeanette-winter-and-more.html
## 10 https://www.nytimes.com/2014/11/09/books/review/b-j-novaks-book-with-no-pictures-and-more.html
## article_chapter_link isbns
## 1 006219867X, 9780062198679
## 2 1492669733, 9781492669739
## 3 0062975676, 9780062975676
## 4 142319957X, 9781423199571
## 5 0062404504, 9780062404503
## 6 1492632910, 9781492632917
## 7 0385376715, 198484881X, 9780385376716, 9781984848819
## 8 0803736800, 9780803736801
## 9 0399255370, 9780399255373
## 10 0803741715, 9780803741713
## buy_links
## 1 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, https://www.amazon.com/Pete-Cat-Big-Easter-Adventure/dp/006219867X?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=PETE+THE+CAT%3A+BIG+EASTER+ADVENTURE&author=James+Dean+and+Kimberly+Dean, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780062198679, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FPETE%2BTHE%2BCAT%253A%2BBIG%2BEASTER%2BADVENTURE%2FJames%2BDean%2Band%2BKimberly%2BDean%2F9780062198679, https://www.indiebound.org/book/9780062198679?aff=NYT
## 2 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, https://www.amazon.com/How-Catch-Unicorn-Adam-Wallace/dp/1492669733?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=HOW+TO+CATCH+A+UNICORN&author=Adam+Wallace, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781728221656, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FHOW%2BTO%2BCATCH%2BA%2BUNICORN%2FAdam%2BWallace%2F9781728221656, https://www.indiebound.org/book/9781728221656?aff=NYT
## 3 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, https://www.amazon.com/dp/0062975676?tag=NYTBSREV-20&tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=THE+GOOD+EGG+PRESENTS%3A+THE+GREAT+EGGSCAPE%21&author=Jory+John, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780062975676, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BGOOD%2BEGG%2BPRESENTS%253A%2BTHE%2BGREAT%2BEGGSCAPE%2521%2FJory%2BJohn%2F9780062975676, https://www.indiebound.org/book/9780062975676?aff=NYT
## 4 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, http://www.amazon.com/Waiting-Easy-Elephant-Piggie-Book/dp/142319957X?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=WAITING+IS+NOT+EASY%21&author=Mo+Willems, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781423199571, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FWAITING%2BIS%2BNOT%2BEASY%2521%2FMo%2BWillems%2F9781423199571, https://www.indiebound.org/book/9781423199571?aff=NYT
## 5 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, https://www.amazon.com/Pete-Cat-Leprechaun-Patricks-Fold-Out/dp/0062404504?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=PETE+THE+CAT%3A+THE+GREAT+LEPRECHAUN+CHASE&author=James+Dean, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780062404503, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FPETE%2BTHE%2BCAT%253A%2BTHE%2BGREAT%2BLEPRECHAUN%2BCHASE%2FJames%2BDean%2F9780062404503, https://www.indiebound.org/book/9780062404503?aff=NYT
## 6 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, http://www.amazon.com/How-Catch-Leprechaun-Adam-Wallace/dp/1492632910?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=HOW+TO+CATCH+A+LEPRECHAUN&author=Adam+Wallace, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9781492632917, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FHOW%2BTO%2BCATCH%2BA%2BLEPRECHAUN%2FAdam%2BWallace%2F9781492632917, https://www.indiebound.org/book/9781492632917?aff=NYT
## 7 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, http://www.amazon.com/The-Wonderful-Things-You-Will/dp/0385376715?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=THE+WONDERFUL+THINGS+YOU+WILL+BE&author=Emily+Winfield+Martin, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780385376716, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BWONDERFUL%2BTHINGS%2BYOU%2BWILL%2BBE%2FEmily%2BWinfield%2BMartin%2F9780385376716, https://www.indiebound.org/book/9780385376716?aff=NYT
## 8 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, http://www.amazon.com/Dragons-Love-Tacos-Adam-Rubin/dp/0803736800?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=DRAGONS+LOVE+TACOS&author=Adam+Rubin, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780803736801, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FDRAGONS%2BLOVE%2BTACOS%2FAdam%2BRubin%2F9780803736801, https://www.indiebound.org/book/9780803736801?aff=NYT
## 9 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, http://www.amazon.com/The-Crayons-Quit-Drew-Daywalt/dp/0399255370?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=THE+DAY+THE+CRAYONS+QUIT&author=Drew+Daywalt, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780399255373, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BDAY%2BTHE%2BCRAYONS%2BQUIT%2FDrew%2BDaywalt%2F9780399255373, https://www.indiebound.org/book/9780399255373?aff=NYT
## 10 Amazon, Apple Books, Barnes and Noble, Books-A-Million, Local Booksellers, http://www.amazon.com/The-Book-Pictures-B-J-Novak/dp/0803741715?tag=NYTBS-20, https://du-gae-books-dot-nyt-du-prd.appspot.com/buy?title=THE+BOOK+WITH+NO+PICTURES&author=B+J+Novak, https://www.anrdoezrs.net/click-7990613-11819508?url=https%3A%2F%2Fwww.barnesandnoble.com%2Fw%2F%3Fean%3D9780803741713, https://www.anrdoezrs.net/click-7990613-35140?url=https%3A%2F%2Fwww.booksamillion.com%2Fp%2FTHE%2BBOOK%2BWITH%2BNO%2BPICTURES%2FB%2BJ%2BNovak%2F9780803741713, https://www.indiebound.org/book/9780803741713?aff=NYT
## book_uri
## 1 nyt://book/334ccdea-bbc9-5f6b-9671-64a30fb9279e
## 2 nyt://book/e83910a7-e5e4-595b-80ec-c36c5ba4c535
## 3 nyt://book/8ec65f4b-c50b-5abc-ba62-bcd3cbbff1bc
## 4 nyt://book/870f97fc-efee-5767-a329-6ee56f0492e6
## 5 nyt://book/f8b7eb06-17ef-52ae-9ba9-46918fc90ecd
## 6 nyt://book/3743a4b3-e142-5aa1-ae0c-8bd18e728488
## 7 nyt://book/36cac861-60d3-511f-ba6d-edc88c6e938e
## 8 nyt://book/25d4f970-1f30-515b-a88c-691b4854bc63
## 9 nyt://book/e42bd6ff-8143-53b3-b574-c80553973559
## 10 nyt://book/eeb7a16f-011b-536f-9911-f3b4170139f0
In looking at the response above, we see that this response is rich with data. We can see the top 10 sellers’ title, description, author, ISBN, as well as their current rank and previous week’s rank. You will also see that there is a “price” column, however, it has 0’s for every entry, which we know can’t be correct. Fortunately for us, one of the columns “amazon_product_url” contains the web address of the book on Amazon. We can use this URL to scrape page to extract the price for each book and add it to our data frame. I will use the xml2 package to perform the scraping. As we are only grabbing one item off of each page, this should be fairly simple. I will first create a function to scrape the price from each page, then I will apply that function to each row of the “amazon_product_url” with the purrr::map_chr() function.
prices <- c()
scraper_func <- function(x) {
book_page <- xml2::read_html(x)
book_price <- book_page %>% rvest::html_nodes("#buyNewSection .a-text-normal") %>%
rvest::html_text()
prices <- c(prices, book_price)
}
childrens_book_prices <- purrr::map_chr(children_books$amazon_product_url, scraper_func)
childrens_book_prices
## [1] "$5.99" "$5.49" "$8.29" "$5.99" "$5.99" "$7.69" "$10.29" "$9.50"
## [9] "$9.19" "$9.39"
Now that we have the data, let’s go ahead and add it to our data frame and remove the other price column. To show the final output, we’ll show only a handful of columns.
children_books <- children_books %>% mutate("amazon_price" = childrens_book_prices) %>% select(-price)
children_books %>% select( title, author, amazon_price)
## title author
## 1 PETE THE CAT: BIG EASTER ADVENTURE James Dean and Kimberly Dean
## 2 HOW TO CATCH A UNICORN Adam Wallace
## 3 THE GOOD EGG PRESENTS: THE GREAT EGGSCAPE! Jory John
## 4 WAITING IS NOT EASY! Mo Willems
## 5 PETE THE CAT: THE GREAT LEPRECHAUN CHASE James Dean
## 6 HOW TO CATCH A LEPRECHAUN Adam Wallace
## 7 THE WONDERFUL THINGS YOU WILL BE Emily Winfield Martin
## 8 DRAGONS LOVE TACOS Adam Rubin
## 9 THE DAY THE CRAYONS QUIT Drew Daywalt
## 10 THE BOOK WITH NO PICTURES B J Novak
## amazon_price
## 1 $5.99
## 2 $5.49
## 3 $8.29
## 4 $5.99
## 5 $5.99
## 6 $7.69
## 7 $10.29
## 8 $9.50
## 9 $9.19
## 10 $9.39
As mentioned above, working with API’s is a critical skill for data scientists. In addition, understanding the response data is paramount. It is important to have an understanding of data structure formats such as XML and JSON in order to appropriately work with the response data. In our case, our responses came in JSON format, but we could have just as easily worked with the data if it had come back as XML.