Assignment NYT

Author

Michael Mayne

NYT API Assignment

Code Approach

Background : For this assignment we are using our web scraping skills in order to organize and gather information from New York Times developer page. These assignments are based on the API of popular books for NYT times. For this assignment, I wanted to know “If a graphic novel has been selected for NYT best selling book?”

API Name: Books and Records

Api Key: [Hidden for Privacy]

Analysis: I plan on using hhtr2 library in order to select and prepare the information that I have listed. Of course, it will not be performed as a raw key instead being transfered to a variable raw_nyt_book. I will also use jsonlite in order to accept the information as a json format. Then I will convert that format to a functional table.

Data will be organized and then filtered by the recent 15 years in order to see the best selling books. Next an analysis will be performed in order to see the most recent graphic novels and the name and years that they have been listed as best selling books. I expect the output to be in a tibble and will present it as such.

Sharing & Publish: I tend to publish using github as an alternative; although it may be privately published due to the API key being available in my code at this time.

library(httr2)
library(jsonlite)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code Base- Retrieve Data for API

API Key Hidden

my_nyt_key <- Sys.getenv("nyt_key")

In order to receive data for API, I would have to use a specific end point that can connect directly into the books section of NYT. This requires a url with the default api.nytimes.com then the server to connect and finally the current group call

graphic_req <- request("https://api.nytimes.com//svc/books/v3/lists/current/graphic-books-and-manga.json") |> 
  req_url_query(`api-key` = my_nyt_key)

Due to the nature of how the data in the NYT develops app is formatted , I will instead create a list in order to figure out a new problem.

inquiry <- req_perform(graphic_req)

resp_status(inquiry)

[1] 200

data_raw <- inquiry %>% resp_body_json()
book_list <- data_raw$results$books

I was under the impression that nyt does have best seller for graphic novels. So I decided to pivot the nature of my problem. Overseas manga has been gaining popularity in the mainstream culture. For that reason, I plan on figuring out how many of the New York time bestsellers for Graphic Novels and Comics could be considered manga.

graphic_books <- book_list %>%
  bind_rows() %>%
  select(rank, title, author, description)

glimpse(graphic_books)

Rows: 75
Columns: 4
$ rank        <int> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4…
$ title       <chr> "BIG JIM BELIEVES", "BIG JIM BELIEVES", "BIG JIM BELIEVES"…
$ author      <chr> "Dav Pilkey", "Dav Pilkey", "Dav Pilkey", "Dav Pilkey", "D…
$ description <chr> "The 14th book in the Dog Man series. The Space Cuties Fro…

The data is currently repeating. So some change might be needed to get a better glimpse of the data set. Distinct() is used so separate any repeat book titles.

graphic_books_clean <- graphic_books %>% distinct(title, .keep_all = TRUE)

glimpse(graphic_books_clean)

Rows: 15
Columns: 4
$ rank        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
$ title       <chr> "BIG JIM BELIEVES", "JUJUTSU KAISEN, VOL. 29", "TALONS OF …
$ author      <chr> "Dav Pilkey", "Gege Akutami", "Tui T. Sutherland", "Scott …
$ description <chr> "The 14th book in the Dog Man series. The Space Cuties Fro…

NYT does not have separation between manga and non-manga. Thus I decided to add a column manually which determines between manga and non-manga based on my personal knowledge and what I can research about each series.

graphic_books_clean$Manga <- c("N","Y","N","N","N","N","N","N","N","Y","N","Y","Y","N","N")

glimpse(graphic_books_clean)

Rows: 15
Columns: 5
$ rank        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
$ title       <chr> "BIG JIM BELIEVES", "JUJUTSU KAISEN, VOL. 29", "TALONS OF …
$ author      <chr> "Dav Pilkey", "Gege Akutami", "Tui T. Sutherland", "Scott …
$ description <chr> "The 14th book in the Dog Man series. The Space Cuties Fro…
$ Manga       <chr> "N", "Y", "N", "N", "N", "N", "N", "N", "N", "Y", "N", "Y"…

The graph below shows that majority of the best-selling titles of this week is still manga.

ggplot(data= graphic_books_clean, aes (x= Manga, fill= Manga) ) +
  geom_bar(aes(color= Manga))+
  labs(title ="How many Bestsellers are Manga?",
       y = "# of Titles",
       x = "Is it a Manga?")

Conclusions

This was an interesting discovery into calling data for an API. To be honest calling the data was a struggle because I had a large misunderstanding of the method in which to do so. Mainly, I didnt know which adjustment that needs to make in order to find the data in the website. The sources and books I tried to use was a bit hard to understand.

As for the assessment, we can collude that the majority of NYT bestsellers are in fact non-manga. This can be because of the rating by NYT. Another way to look at this data is to considered the mediums that most manga is consumed overseas. Since most manga is not in print in United States, most of the reader would consume this information online compared to U.S. created comics.