DATA607_Assignment

Assignment - Web APIs

Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

We will need to use libraries httr, jsonlite, and dplyr to extract the data from API from JSON format to R Dataframe

library(httr)
library(jsonlite)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   3.5.1     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()  masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag()     masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Sign up for the NY Times developer apis and activate an API key for the API accessing. I choose the books API and from the API documentation get the base url and use it to build out API call.

# Define API key and endpoint
api_key <- "PR6ullqlzC8T72wCHIUWBbqkTQWAgRRe"
base_url <- "https://api.nytimes.com/svc/books/v3/lists/full-overview.json"

Get the response with the url constructed and check the response code

# Construct the full URL with query parameters
url <- paste0(base_url, "?", "&api-key=", api_key) %>% print()

## [1] "https://api.nytimes.com/svc/books/v3/lists/full-overview.json?&api-key=PR6ullqlzC8T72wCHIUWBbqkTQWAgRRe"

# Make the GET request
response <- GET(url)

# Check if the request was successful
if (status_code(response) != 200) {
  stop("API request failed!")
  }

Parse the response data into json format then convert it to dataframe.

# Parse the JSON response into a list
json_data <- fromJSON(content(response, "text", encoding = "UTF-8"))

# Extract JSON to dataframe
book_lists <- json_data$results$lists %>% as.data.frame()
books <- json_data$results$lists$books
# Empty dataframe for books
books_df <- data.frame(counter = integer(),
                       title = character(), 
                       author = character(), 
                       rank = integer(), 
                       last_rank = integer(),
                       publisher = character(),
                       stringsAsFactors = FALSE)
# create an identifier for join later
book_lists <- book_lists %>% mutate(counter = 1:18)

# parsing the list of books
item <- 1
for (book in books) {
  dataf <- data.frame(
    counter = item,
    title = book$title,
    author = book$author,
    rank = book$rank,
    last_rank = book$rank_last_week,
    publisher = book$publisher,
    stringsAsFactors = FALSE
  )
  books_df <- rbind(books_df, dataf)
  item <- item+1
  }

# Create data frame for best sellers
best_sellers <- left_join(books_df, book_lists %>% select(counter, list_id,list_name,updated)) %>% select(-counter)

## Joining with `by = join_by(counter)`

Analysis

# Which author was on the best seller list the most times
best_author <- best_sellers %>% 
  count(author) %>%
  arrange(desc(n)) %>%
  head(6)

best_author

##             author n
## 1  Freida McFadden 6
## 2   Kristin Hannah 6
## 3   Rebecca Yarros 5
## 4    Sarah J. Maas 5
## 5       Dav Pilkey 4
## 6 Malcolm Gladwell 4

best_sellers %>% 
  filter(author %in% best_author$author) %>% 
  ggplot(aes(x = author)) +
  geom_bar()

# Which publisher produced the most best sellers
best_publisher <- best_sellers %>% 
  count(publisher) %>% 
  arrange(desc(n)) %>% 
  head(3)

best_publisher

##            publisher  n
## 1         Scholastic 15
## 2              Crown  9
## 3 Random House Audio  9

best_sellers %>% 
  filter(publisher %in% best_publisher$publisher) %>% 
  ggplot(aes(x = publisher)) +
  geom_bar()

Conslusion

Freida McFadden and Kristin Hannah authored the most best sellers for different categories as of the list from December. Scholastic is the publisher that published the most best sellers.

DATA607_Assignment_9

Ying Fang Lee

2024-10-27

Assignment - Web APIs

Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

Analysis

Conslusion