Title: CUNY SPS MDS DATA607_WK9Assignmt"

Author: Charles Ugiagbe

Date: “10/24/2021”

Assignment Question

The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis(https://developer.nytimes.com/apis)

You’ll need to start by signing up for an API key.

Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

Approach to Solution

The New York Times API that interest me the most was “Books API”. After requesting the API key, I was able to access it in R. I extracted the data of “hardcover fiction books” from the books API, and this was in JSON format. I then used “fromjson” to convert the JSON data to R objects.

Load the Required Packages

library(httr)
library(jsonlite)
library(tidyverse)
library(kableExtra)

Read the API

bookAPI <- "https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=yrHCGK1wjdTA8abjGAyni4Q1A4RiWACs"

Transform Json Data into R Dataframe and take a glimpse view of the data structure

fiction_books <- fromJSON(bookAPI)[[5]][[11]]
glimpse(fiction_books)
## Rows: 15
## Columns: 26
## $ rank                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
## $ rank_last_week       <int> 0, 2, 1, 3, 5, 0, 8, 0, 6, 7, 9, 4, 0, 13, 10
## $ weeks_on_list        <int> 1, 3, 2, 3, 5, 1, 24, 1, 5, 2, 11, 2, 45, 6, 4
## $ asterisk             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ dagger               <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ primary_isbn10       <chr> "198217367X", "1538728621", "0735222355", "198216~
## $ primary_isbn13       <chr> "9781982173678", "9781538728628", "9780735222359"~
## $ publisher            <chr> "Simon & Schuster, St. Martin's", "Grand Central"~
## $ description          <chr> "In the wake of the previous administration’s mis~
## $ price                <chr> "0.00", "0.00", "0.00", "0.00", "0.00", "0.00", "~
## $ title                <chr> "STATE OF TERROR", "THE WISH", "THE LINCOLN HIGHW~
## $ author               <chr> "Hillary Rodham Clinton and Louise Penny", "Nicho~
## $ contributor          <chr> "by Hillary Rodham Clinton and Louise Penny", "by~
## $ contributor_note     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "~
## $ book_image           <chr> "https://storage.googleapis.com/du-prd/books/imag~
## $ book_image_width     <int> 331, 330, 331, 331, 329, 331, 331, 336, 329, 328,~
## $ book_image_height    <int> 500, 500, 500, 500, 500, 500, 500, 500, 500, 500,~
## $ amazon_product_url   <chr> "https://www.amazon.com/dp/198217367X?tag=NYTBSRE~
## $ age_group            <chr> "", "", "", "", "", "", "", "", "", "", "", "", "~
## $ book_review_link     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "~
## $ first_chapter_link   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "~
## $ sunday_review_link   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "~
## $ article_chapter_link <chr> "", "", "", "", "", "", "", "", "", "", "", "", "~
## $ isbns                <list> [<data.frame[2 x 2]>], [<data.frame[1 x 2]>], [<d~
## $ buy_links            <list> [<data.frame[6 x 2]>], [<data.frame[6 x 2]>], [<d~
## $ book_uri             <chr> "nyt://book/ee38f9b9-6787-5e00-b068-7d2f712aa3fd"~

Create a new Dataframe from the Original one for Analysis.

fiction_books2 <- fiction_books[c("rank", "publisher", "title", "author", "contributor", "primary_isbn13")]

fiction_books2 %>%
  kbl(caption = "Hardcover fiction books") %>%
  kable_material(c("striped", "hover")) %>%
  row_spec(0, color = "Red")
Hardcover fiction books
rank publisher title author contributor primary_isbn13
1 Simon & Schuster, St. Martin’s STATE OF TERROR Hillary Rodham Clinton and Louise Penny by Hillary Rodham Clinton and Louise Penny 9781982173678
2 Grand Central THE WISH Nicholas Sparks by Nicholas Sparks 9781538728628
3 Viking THE LINCOLN HIGHWAY Amor Towles by Amor Towles 9780735222359
4 Scribner CLOUD CUCKOO LAND Anthony Doerr by Anthony Doerr 9781982168438
5 Holt APPLES NEVER FALL Liane Moriarty by Liane Moriarty 9781250220257
6 Viking SILVERVIEW John Le Carré by John Le Carré 9780593490594
7 Simon & Schuster THE LAST THING HE TOLD ME Laura Dave by Laura Dave 9781501171345
8 Simon & Schuster THE BOOK OF MAGIC Alice Hoffman by Alice Hoffman 9781982151485
9 Doubleday HARLEM SHUFFLE Colson Whitehead by Colson Whitehead 9780385545136
10 Delacorte THE BUTLER Danielle Steel by Danielle Steel 9781984821522
11 Scribner BILLY SUMMERS Stephen King by Stephen King 9781982173616
12 Farrar, Straus & Giroux CROSSROADS Jonathan Franzen by Jonathan Franzen 9780374181178
13 Viking THE MIDNIGHT LIBRARY Matt Haig by Matt Haig 9780525559474
14 Farrar, Straus & Giroux BEAUTIFUL WORLD, WHERE ARE YOU Sally Rooney by Sally Rooney 9780374602604
15 Little, Brown THE JAILHOUSE LAWYER James Patterson and Nancy Allen by James Patterson and Nancy Allen 9780316276627

Extra Analysis

# Publisher df. Order books by publisher

df_pub <- fiction_books2 %>%
  group_by(publisher) %>%
  summarise(books_published = n())

Plot the ranked books by publisher and visualize which publicher has more books in the ranking.

df_pub <- df_pub[order(-df_pub$books_published), ]

df_pub %>%
  
  ggplot(aes(reorder(publisher, books_published), books_published)) +
  
  geom_col(aes(fill = books_published)) + 
  
   scale_fill_gradient2(low = "Green",
                       high = "Blue",
                       ) +
  
   labs(title="Ranked hardcover fiction books by publisher") + ylab("books_published") + theme(legend.position = "none", axis.title.x = element_blank(), axis.text.x=element_text(angle=45)) + theme(plot.title = element_text(hjust=0.5)) + theme(axis.text.x = element_text(margin = margin(t = 25, r = 20, b = 0, l = 0)))