The New York Times provides a variety of APIs that offer access to their extensive data. To begin, sign up for an API key at the New York Times Developer Network (url: https://developer.nytimes.com/apis). The task is to select one of these APIs, create an interface in R to fetch the JSON data, and convert it into an R DataFrame.
I am utilizing the Books API to access the Best Sellers Lists.
library(jsonlite)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
url <- paste0('https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?api-key=', Sys.getenv("TIMES_API_KEY"))
best_seller_data <- fromJSON(url)$results %>%
as.data.frame()
print(names(best_seller_data))
## [1] "title" "description" "contributor" "author"
## [5] "contributor_note" "price" "age_group" "publisher"
## [9] "isbns" "ranks_history" "reviews"
head(best_seller_data)
## title
## 1 "I GIVE YOU MY BODY ..."
## 2 "MOST BLESSED OF THE PATRIARCHS"
## 3 "YOU JUST NEED TO LOSE WEIGHT"
## 4 #ASKGARYVEE
## 5 #GIRLBOSS
## 6 #IMOMSOHARD
## description
## 1 The author of the Outlander novels gives tips on writing sex scenes, drawing on examples from the books.
## 2 A character study that attempts to make sense of Jefferson’s contradictions.
## 3 The co-host of the podcast “Maintenance Phase” examines myths about gaining and losing weight to dismantle anti-fat bias.
## 4 The entrepreneur expands on subjects addressed on his Internet show, like marketing, management and social media.
## 5 An online fashion retailer traces her path to success.
## 6
## contributor author
## 1 by Diana Gabaldon Diana Gabaldon
## 2 by Annette Gordon-Reed and Peter S. Onuf Annette Gordon-Reed and Peter S Onuf
## 3 by Aubrey Gordon Aubrey Gordon
## 4 by Gary Vaynerchuk Gary Vaynerchuk
## 5 by Sophia Amoruso Sophia Amoruso
## 6 by Kristin Hensley and Jen Smedley Kristin Hensley and Jen Smedley
## contributor_note price age_group publisher
## 1 0.00 Dell
## 2 0.00 Liveright
## 3 0.00 Beacon
## 4 0.00 HarperCollins
## 5 0.00 Portfolio/Penguin/Putnam
## 6 0.00 HarperOne
## isbns
## 1 0399178570, 9780399178573
## 2 0871404427, 9780871404428
## 3 0807006475, 0807006483, 9780807006474, 9780807006481
## 4 0062273124, 0062273132, 9780062273123, 9780062273130
## 5 039916927X, 1591847931, 9780399169274, 9781591847939
## 6 006285769X, 9780062857699
## ranks_history
## 1 0399178570, 9780399178573, 8, Advice How-To and Miscellaneous, Advice, How-To & Miscellaneous, 2016-09-04, 2016-08-20, 1, 0, 0, 0
## 2 0871404427, 9780871404428, 16, Hardcover Nonfiction, Hardcover Nonfiction, 2016-05-01, 2016-04-16, 1, 0, 1, 0
## 3 0807006475, 0807006475, 9780807006474, 9780807006474, 2, 6, Paperback Nonfiction, Combined Print and E-Book Nonfiction, Paperback Nonfiction, Combined Print & E-Book Nonfiction, 2023-01-29, 2023-01-29, 2023-01-14, 2023-01-14, 1, 1, 0, 0, 0, 0, 0, 0
## 4 0062273124, 0062273124, 9780062273123, 9780062273123, 5, 6, Business Books, Advice How-To and Miscellaneous, Business, Advice, How-To & Miscellaneous, 2016-04-10, 2016-03-27, 2016-03-26, 2016-03-12, 0, 1, 0, 0, 0, 0, 1, 1
## 5 1591847931, 1591847931, 1591847931, 1591847931, 039916927X, 9781591847939, 9781591847939, 9781591847939, 9781591847939, 9780399169274, 8, 9, 9, 8, 10, Business Books, Business Books, Business Books, Business Books, Business Books, Business, Business, Business, Business, Business, 2016-03-13, 2016-01-17, 2015-12-13, 2015-11-15, 2014-11-09, 2016-02-27, 2016-01-02, 2015-11-28, 2015-10-31, 2014-10-25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## 6 006285769X, 9780062857699, 10, Advice How-To and Miscellaneous, Advice, How-To & Miscellaneous, 2019-04-21, 2019-04-06, 1, 0, 0, 1
## reviews
## 1 , , ,
## 2 , , ,
## 3 , , ,
## 4 , , ,
## 5 , , ,
## 6 , , ,
Now, let’s tidy up our data a bit.
In JSON data, some fields can contain nested lists. For example, the isbns field might contain multiple ISBN numbers for a single book, and ranks_history might contain a list of historical rankings. Using unnest_wider helps to flatten these nested lists into separate columns. This makes the data easier to work with because each piece of information is in its own column rather than being nested within a list.
For example:
Before flattening, the isbns column might look like this: [{isbn10: “1234567890”, isbn13: “123-4567890123”}]. After flattening, it would be split into two columns: isbns_isbn10 and isbns_isbn13.
library(tidyr)
tidy_best_seller_data <- best_seller_data %>%
unnest_wider(isbns, names_sep = "_") %>%
unnest_wider(ranks_history, names_sep = "_") %>%
unnest_wider(reviews, names_sep = "_") %>%
select(title, author, publisher, description, price, age_group, contributor_note)
head(tidy_best_seller_data)
## # A tibble: 6 × 7
## title author publisher description price age_group contributor_note
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 "\"I GIVE YOU M… Diana… Dell "The autho… 0.00 "" ""
## 2 "\"MOST BLESSED… Annet… Liveright "A charact… 0.00 "" ""
## 3 "\"YOU JUST NEE… Aubre… Beacon "The co-ho… 0.00 "" ""
## 4 "#ASKGARYVEE" Gary … HarperCo… "The entre… 0.00 "" ""
## 5 "#GIRLBOSS" Sophi… Portfoli… "An online… 0.00 "" ""
## 6 "#IMOMSOHARD" Krist… HarperOne "" 0.00 "" ""
Now, I am going to find the top 5 authors with the most best sellers. I will sort primarily by best_seller_count and secondarily by the number of books.
library(ggplot2)
author_counts <- tidy_best_seller_data %>%
group_by(author) %>%
summarise(best_seller_count = n(),
book_count = n_distinct(title)) %>%
arrange(desc(best_seller_count), desc(book_count)) %>%
slice_head(n = 5)
print(author_counts)
## # A tibble: 5 × 3
## author best_seller_count book_count
## <chr> <int> <int>
## 1 Action Bronson with Rachel Wharton 1 1
## 2 Amanda Quick 1 1
## 3 Annette Gordon-Reed and Peter S Onuf 1 1
## 4 Aubrey Gordon 1 1
## 5 Chris Guillebeau 1 1
The goal of this assignment was to in JSON data from the New York Times API and transform it into an R data frame. I used unnest_wider to tidy data of NYT Best Seller Books.