R Markdown

The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

For this assignment the web api I used was Books API.

In this first steps we proceeded to install the necessary packages and load libraries:

# Install necessary packages if not already installed
if (!require("httr")) install.packages("httr")
## Loading required package: httr
if (!require("jsonlite")) install.packages("jsonlite")
## Loading required package: jsonlite
if (!require("dplyr")) install.packages("dplyr")
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Load packages
library(httr)
library(jsonlite)
library(dplyr)
library(ggplot2)

We fetch the current hardcover fiction bestsellers from the New York Times Books API. By defining a custom function, we construct the API request, handle the response, and extract relevant book information such as title, author, publisher, description, rank, and weeks on the list. The data is then transformed into a DataFrame for easy analysis and display.

# Define your API key (replace with your actual API key)
api_key <- "lodjVtf8J4YFDAAiNW3mGkmtHG75NBvQ"

# Function to fetch data from NYT Books API for current hardcover fiction bestsellers
fetch_bestsellers <- function() {
  url <- paste0("https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=", api_key)
  
  # Make the GET request
  response <- GET(url)
  
  # Check if the request was successful
  if (status_code(response) == 200) {
    # Parse JSON content
    content <- content(response, as = "text", encoding = "UTF-8")
    json_data <- fromJSON(content)
    
    # Extract relevant information into a DataFrame
    books_df <- json_data$results$books %>%
      select(title, author, publisher, description, rank, weeks_on_list)
    
    return(books_df)
  } else {
    stop("Failed to fetch data. Status code: ", status_code(response))
  }
}

# Call the function and display the DataFrame
bestsellers_df <- fetch_bestsellers()
print(bestsellers_df)
##                         title                     author
## 1                 IN TOO DEEP Lee Child and Andrew Child
## 2                  BLOODGUARD                Cecy Robson
## 3                 THE WAITING           Michael Connelly
## 4           COUNTING MIRACLES            Nicholas Sparks
## 5                   THE WOMEN             Kristin Hannah
## 6                  IRON FLAME             Rebecca Yarros
## 7                  ABSOLUTION            Jeff VanderMeer
## 8                  INTERMEZZO               Sally Rooney
## 9             HERE ONE MOMENT             Liane Moriarty
## 10                  MEMORIALS            Richard Chizmar
## 11       THE GOD OF THE WOODS                  Liz Moore
## 12        THE STARS ARE DYING         Chloe C. Peñaranda
## 13                      JAMES           Percival Everett
## 14                FOURTH WING             Rebecca Yarros
## 15 ALL THE COLORS OF THE DARK             Chris Whitaker
##                    publisher
## 1                  Delacorte
## 2                  Red Tower
## 3              Little, Brown
## 4               Random House
## 5               St. Martin's
## 6                  Red Tower
## 7                        MCD
## 8  Farrar, Straus and Giroux
## 9                      Crown
## 10                   Gallery
## 11                 Riverhead
## 12                   Bramble
## 13                 Doubleday
## 14                 Red Tower
## 15                     Crown
##                                                                                                                                          description
## 1                            The 29th book in the Jack Reacher series. Reacher wakes up in a precarious position with no memory of how he got there.
## 2                                  An elven royal named Maeve offers the battle-scarred Leith of Grey an opportunity to win the title of Bloodguard.
## 3                           The sixth book in the Ballard and Bosch series. Bosch’s daughter, Maddie, becomes a new volunteer on the cold case unit.
## 4                 A man in search of the father he never knew encounters a single mom and rumors circulate of the nearby appearance of a white deer.
## 5                                   In 1965, a nursing student follows her brother to serve during the Vietnam War and returns to a divided America.
## 6                  The second book in the Empyrean series. Violet Sorrengail’s next round of training might require her to betray the man she loves.
## 7               The fourth book in the Southern Reach series. The story of the first mission into the Forgotten Coast before it was known as Area X.
## 8                                     After the passing of their father, seemingly different brothers engage in relationships and seek ways to cope.
## 9                                                  Passengers on a short and seemingly unremarkable flight learn how and when they are going to die.
## 10                              In 1983, three college students embark on a road trip for a class assignment and come across disturbing occurrences.
## 11                            When a 13-year-old girl disappears from an Adirondack summer camp in 1975, secrets kept by the Van Laar family emerge.
## 12                              Astraea must decide whether to sneak in as a substitute in the trials that will determine the safety of her kingdom.
## 13 A reimagining of “Adventures of Huckleberry Finn” shines a different light on Mark Twain's classic, revealing new facets of the character of Jim.
## 14                  Violet Sorrengail is urged by the commanding general, who also is her mother, to become a candidate for the elite dragon riders.
## 15                     Questions arise when a boy saves the daughter of a wealthy family amid a string of disappearances in a Missouri town in 1975.
##    rank weeks_on_list
## 1     1             1
## 2     2             1
## 3     3             2
## 4     4             5
## 5     5            38
## 6     6            51
## 7     7             1
## 8     8             5
## 9     9             7
## 10   10             1
## 11   11            17
## 12   12             3
## 13   13            17
## 14   14            74
## 15   15            15
bestsellers_summary <- bestsellers_df %>%
  group_by(weeks_on_list) %>%
  summarise(count = n())

# Print the summary for inspection
print(bestsellers_summary)
## # A tibble: 10 × 2
##    weeks_on_list count
##            <int> <int>
##  1             1     4
##  2             2     1
##  3             3     1
##  4             5     2
##  5             7     1
##  6            15     1
##  7            17     2
##  8            38     1
##  9            51     1
## 10            74     1

Including Plots

You can also embed plots

# Plot a bar chart showing the number of books per duration on the list
ggplot(bestsellers_summary, aes(x = weeks_on_list, y = count)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  theme_minimal() +
  labs(title = "Number of Books by Weeks on Bestsellers List",
       x = "Weeks on List",
       y = "Number of Books")

# Count how many times each publisher appears
publisher_count <- bestsellers_df %>%
  group_by(publisher) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

# Plot the most common publishers
ggplot(publisher_count, aes(x = reorder(publisher, -count), y = count)) +
  geom_bar(stat = "identity", fill = "lightgreen") +
  coord_flip() +
  theme_minimal() +
  labs(title = "Top Publishers on NYT Bestsellers List",
       x = "Publisher",
       y = "Number of Books")

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.