Get Book data from NYT API for the last 4 months. Does a high rank translate to a longer time on the best selling list?

#  Read API key from GitHub file
api_key_url <- "https://raw.githubusercontent.com/prnakyazze94/Data_607/refs/heads/main/NYT_API"
api_key <- readLines(api_key_url, warn = FALSE)
api_key <- trimws(api_key[1])  # ensure clean string, remove any spaces/newlines

#  NYT Books API setup
base_url <- "https://api.nytimes.com/svc/books/v3/lists"
list_name <- "hardcover-fiction"  # You can change this to another NYT list name

#  Function to fetch books for a given date 
get_books_for_date <- function(date) {
  url <- paste0(base_url, "/", date, "/", list_name, ".json")
  res <- GET(url, query = list("api-key" = api_key))
  
  if (status_code(res) != 200) {
    warning(paste("Failed for date:", date))
    return(NULL)
  }
  
  data <- fromJSON(content(res, as = "text"), flatten = TRUE)
  books <- data$results$books
  if (is.null(books)) return(NULL)
  
  df <- data.frame(
    title = books$title,
    author = books$author,
    publisher = books$publisher,
    rank = books$rank,
    weeks_on_list = books$weeks_on_list,
    stringsAsFactors = FALSE
  )
  
  return(df)
}

# --- Get top books for multiple dates this year 
current_year <- year(Sys.Date())
dates <- seq.Date(from = Sys.Date(), to = Sys.Date() - 120, by = "-7 days")  # last 4 months
dates <- format(dates, "%Y-%m-%d")

all_books <- lapply(dates, get_books_for_date)
## Warning in FUN(X[[i]], ...): Failed for date: 2025-09-11
## Warning in FUN(X[[i]], ...): Failed for date: 2025-09-04
## Warning in FUN(X[[i]], ...): Failed for date: 2025-08-28
## Warning in FUN(X[[i]], ...): Failed for date: 2025-08-21
## Warning in FUN(X[[i]], ...): Failed for date: 2025-08-14
## Warning in FUN(X[[i]], ...): Failed for date: 2025-08-07
## Warning in FUN(X[[i]], ...): Failed for date: 2025-07-31
## Warning in FUN(X[[i]], ...): Failed for date: 2025-07-24
## Warning in FUN(X[[i]], ...): Failed for date: 2025-07-17
## Warning in FUN(X[[i]], ...): Failed for date: 2025-07-10
## Warning in FUN(X[[i]], ...): Failed for date: 2025-07-03
## Warning in FUN(X[[i]], ...): Failed for date: 2025-06-26
## Warning in FUN(X[[i]], ...): Failed for date: 2025-06-19
all_books <- bind_rows(all_books)

# --- Keep top 40 unique books
top_books <- all_books %>%
  distinct(title, .keep_all = TRUE) %>%
  arrange(rank) %>%
  head(40)

# --- Print results 
print(top_books)
##                                title                                 author
## 1                         ALCHEMISED                               SenLinYu
## 2              THE SECRET OF SECRETS                              Dan Brown
## 3                 THE HALLMARKED MAN                       Robert Galbraith
## 4                     TOURIST SEASON                          Brynne Weaver
## 5                    THE LAST LETTER                         Rebecca Yarros
## 6       THE PRIMAL OF BLOOD AND BONE                 Jennifer L. Armentrout
## 7          AMONG THE BURNING FLOWERS                       Samantha Shannon
## 8                         CLOWN TOWN                            Mick Herron
## 9                    ONE DARK WINDOW                          Rachel Gillig
## 10                   LOVER FORBIDDEN                              J.R. Ward
## 11                TWO TWISTED CROWNS                          Rachel Gillig
## 12          ASSISTANT TO THE VILLAIN                  Hannah Nicole Maehrer
## 13              THIS INEVITABLE RUIN                          Matt Dinniman
## 14     CLIVE CUSSLER: THE IRON STORM                           Jack Du Brul
## 15                    APOSTLE'S COVE                   William Kent Krueger
## 16                    CIRCLE OF DAYS                            Ken Follett
## 17                        ONYX STORM                         Rebecca Yarros
## 18                       THE ACADEMY Elin Hilderbrand and Shelby Cunningham
## 19                        ATMOSPHERE                    Taylor Jenkins Reid
## 20                 ON WINGS OF BLOOD                           Briar Boleyn
## 21                          ANATHEMA                              Keri Lake
## 22                        MY FRIENDS                        Fredrik Backman
## 23                    WILD REVERENCE                           Rebecca Ross
## 24                      NEVER FLINCH                           Stephen King
## 25                  WHAT WE CAN KNOW                             Ian McEwan
## 26                    BROKEN COUNTRY                      Clare Leslie Hall
## 27             BILLION-DOLLAR RANSOM James Patterson and Duane Swierczynski
## 28     TOM CLANCY: TERMINAL VELOCITY                          M.P. Woodward
## 29          GREAT BIG BEAUTIFUL LIFE                            Emily Henry
## 30 THE LONELINESS OF SONIA AND SUNNY                            Kiran Desai
## 31                 THE CORRESPONDENT                         Virginia Evans
## 32                         PLAY NICE                        Rachel Harrison
## 33                     FORGET ME NOT                       Stacy Willingham
## 34                         KATABASIS                             R.F. Kuang
## 35                THE COLOR OF DEATH    Trey Gowdy with Christopher Greyson
## 36            THE PUMPKIN SPICE CAFÉ                         Laurie Gilmore
## 37                THE WEDDING PEOPLE                          Alison Espach
## 38                           BUCKEYE                           Patrick Ryan
## 39                   FRAMED IN DEATH                              J.D. Robb
## 40              DUNGEON CRAWLER CARL                          Matt Dinniman
##           publisher rank weeks_on_list
## 1           Del Rey    1             1
## 2         Doubleday    2             3
## 3        Mulholland    2             1
## 4          Slowburn    3             1
## 5             Amara    3             1
## 6          Blue Box    4             1
## 7        Bloomsbury    4             1
## 8        Soho Crime    4             1
## 9             Orbit    5             1
## 10          Gallery    5             1
## 11            Orbit    6             1
## 12        Red Tower    6             1
## 13              Ace    7             1
## 14           Putnam    7             1
## 15            Atria    7             1
## 16    Grand Central    8             1
## 17        Red Tower    8            35
## 18    Little, Brown    9             2
## 19       Ballantine    9            16
## 20             MIRA    9             2
## 21            Bloom   10             1
## 22            Atria   10            20
## 23         Saturday   10             2
## 24         Scribner   10            13
## 25            Knopf   11             1
## 26 Simon & Schuster   11            26
## 27    Little, Brown   11             2
## 28           Putnam   11             1
## 29          Berkley   11            17
## 30          Hogarth   12             1
## 31            Crown   12             1
## 32          Berkley   12             1
## 33         Minotaur   12             1
## 34   Harper Voyager   13             5
## 35         Fox News   13             3
## 36    HarperCollins   13             4
## 37             Holt   13            39
## 38     Random House   14             4
## 39     St. Martin's   14             3
## 40              Ace   14             4
# Save to CSV 
write.csv(top_books, "nyt_top40_books.csv", row.names = FALSE)
df_selected <- top_books %>%
  select(title, author, publisher, rank, weeks_on_list)
# Display in a clean table
kable(df_selected, caption = "Top NYT Books - Selected Columns")
Top NYT Books - Selected Columns
title author publisher rank weeks_on_list
ALCHEMISED SenLinYu Del Rey 1 1
THE SECRET OF SECRETS Dan Brown Doubleday 2 3
THE HALLMARKED MAN Robert Galbraith Mulholland 2 1
TOURIST SEASON Brynne Weaver Slowburn 3 1
THE LAST LETTER Rebecca Yarros Amara 3 1
THE PRIMAL OF BLOOD AND BONE Jennifer L. Armentrout Blue Box 4 1
AMONG THE BURNING FLOWERS Samantha Shannon Bloomsbury 4 1
CLOWN TOWN Mick Herron Soho Crime 4 1
ONE DARK WINDOW Rachel Gillig Orbit 5 1
LOVER FORBIDDEN J.R. Ward Gallery 5 1
TWO TWISTED CROWNS Rachel Gillig Orbit 6 1
ASSISTANT TO THE VILLAIN Hannah Nicole Maehrer Red Tower 6 1
THIS INEVITABLE RUIN Matt Dinniman Ace 7 1
CLIVE CUSSLER: THE IRON STORM Jack Du Brul Putnam 7 1
APOSTLE’S COVE William Kent Krueger Atria 7 1
CIRCLE OF DAYS Ken Follett Grand Central 8 1
ONYX STORM Rebecca Yarros Red Tower 8 35
THE ACADEMY Elin Hilderbrand and Shelby Cunningham Little, Brown 9 2
ATMOSPHERE Taylor Jenkins Reid Ballantine 9 16
ON WINGS OF BLOOD Briar Boleyn MIRA 9 2
ANATHEMA Keri Lake Bloom 10 1
MY FRIENDS Fredrik Backman Atria 10 20
WILD REVERENCE Rebecca Ross Saturday 10 2
NEVER FLINCH Stephen King Scribner 10 13
WHAT WE CAN KNOW Ian McEwan Knopf 11 1
BROKEN COUNTRY Clare Leslie Hall Simon & Schuster 11 26
BILLION-DOLLAR RANSOM James Patterson and Duane Swierczynski Little, Brown 11 2
TOM CLANCY: TERMINAL VELOCITY M.P. Woodward Putnam 11 1
GREAT BIG BEAUTIFUL LIFE Emily Henry Berkley 11 17
THE LONELINESS OF SONIA AND SUNNY Kiran Desai Hogarth 12 1
THE CORRESPONDENT Virginia Evans Crown 12 1
PLAY NICE Rachel Harrison Berkley 12 1
FORGET ME NOT Stacy Willingham Minotaur 12 1
KATABASIS R.F. Kuang Harper Voyager 13 5
THE COLOR OF DEATH Trey Gowdy with Christopher Greyson Fox News 13 3
THE PUMPKIN SPICE CAFÉ Laurie Gilmore HarperCollins 13 4
THE WEDDING PEOPLE Alison Espach Holt 13 39
BUCKEYE Patrick Ryan Random House 14 4
FRAMED IN DEATH J.D. Robb St. Martin’s 14 3
DUNGEON CRAWLER CARL Matt Dinniman Ace 14 4

Plot NYT Top Books by Weeks on List

# Arrange books by descending weeks_on_list
plot_data <- df_selected %>%
  arrange(desc(weeks_on_list))

# Create horizontal bar plot
ggplot(plot_data, aes(x = reorder(title, weeks_on_list), y = weeks_on_list)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "NYT Top Books by Weeks on List",
    x = "Book Title",
    y = "Weeks on NYT List"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.y = element_text(size = 8)
  )

The Wedding people remained top sellers for over 35 weeks on the NYT best selling list. While almost 18 titles out of 40 were only on the list for less than 4 weeks.

# from top_books data frame get top 15 `top_books`
top_15_books <- top_books %>%
  arrange(rank) %>%              # sort by rank ascending
  select(rank, title) %>%        # keep only rank and title
  head(15)                       # get top 15

# View results
print(top_15_books)
##    rank                         title
## 1     1                    ALCHEMISED
## 2     2         THE SECRET OF SECRETS
## 3     2            THE HALLMARKED MAN
## 4     3                TOURIST SEASON
## 5     3               THE LAST LETTER
## 6     4  THE PRIMAL OF BLOOD AND BONE
## 7     4     AMONG THE BURNING FLOWERS
## 8     4                    CLOWN TOWN
## 9     5               ONE DARK WINDOW
## 10    5               LOVER FORBIDDEN
## 11    6            TWO TWISTED CROWNS
## 12    6      ASSISTANT TO THE VILLAIN
## 13    7          THIS INEVITABLE RUIN
## 14    7 CLIVE CUSSLER: THE IRON STORM
## 15    7                APOSTLE'S COVE

Top 15 books by rank over the last 4 months.

# Prepare top 15 books
top_15_books <- top_books %>%
  arrange(rank) %>%
  select(rank, title) %>%
  head(15)

# Horizontal bar plot
ggplot(top_15_books, aes(x = reorder(title, -rank), y = rank)) +
  geom_col(fill = "steelblue") +
  coord_flip() +  # horizontal bars
  scale_y_reverse(breaks = 1:15) + # reverse y-axis so rank 1 is at top
  labs(
    title = "Top 15 NYT Books by Rank",
    x = "Book Title",
    y = "Rank"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.y = element_text(size = 10)
  )

Relationship Between Rank and Weeks on List

Do books that debut at 1 stay longer on the list?

ggplot(top_books, aes(x = rank, y = weeks_on_list)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Rank vs Weeks on List")
## `geom_smooth()` using formula = 'y ~ x'

A single data point is showing a specific rank and the corresponding number of weeks it spent on a list.

The data is quite scattered, meaning there isn’t a perfect or very strong relationship.

Noticeable points exist far from the main cluster, especially those with high “Weeks on List” (For example one point near Rank 12 is close to 40 Weeks on List).

Trend Line (Blue Line) is a linear regression line (or a similar smoothing line) that attempts to summarize the general trend in the data.

The line has a positive slope, indicating a positive correlation between Rank and Weeks on List.

In the context of typical rankings (where Rank 1 is the best), this positive slope suggests that as the Rank gets worse (increases from 1 to 12), the Weeks on List tends to increase.

This is an unusual finding for a typical ranking where higher rank numbers should usually mean less time on a list. It suggests that items with lower (worse) ranks tend to stay on the list longer, or perhaps the ‘Rank’ variable is measuring something where a higher number is actually better, or the relationship is non-linear and not well-captured by the straight line.

In summary just going by the data as it is, the image graphically explores the relationship between a list Rank and the number of Weeks on List, suggesting a weak positive trend, but with significant variability in the data.

Count how many titles spent how long on best selling list.

# Count how many titles per weeks_on_list
cluster_counts <- top_books %>%
  group_by(weeks_on_list) %>%
  summarise(num_titles = n())

# View results
print(cluster_counts)
## # A tibble: 12 × 2
##    weeks_on_list num_titles
##            <int>      <int>
##  1             1         22
##  2             2          4
##  3             3          3
##  4             4          3
##  5             5          1
##  6            13          1
##  7            16          1
##  8            17          1
##  9            20          1
## 10            26          1
## 11            35          1
## 12            39          1

Most of the titles where on the best selling list for 1 to 3 weeks.

Average time on the Top selling weeks list.

# Ensure weeks_on_list is numeric
top_books <- top_books %>%
  mutate(weeks_on_list = as.numeric(as.character(weeks_on_list)))

# Calculate average weeks on list
average_weeks <- mean(top_books$weeks_on_list, na.rm = TRUE)

# Print result
print(average_weeks)
## [1] 5.55

Summary of Weeks on List

The analysis of the weeks_on_list data shows that most books are relatively new to the bestseller list. Out of all titles, 22 books have appeared on the list for only one week, indicating frequent turnover and strong competition among recent releases. A smaller number of books 4 titles for 2 weeks, and 3 titles each for 3 and 4 weeks demonstrate moderate staying power. Only a handful of books have shown long-term popularity, remaining on the list for 13, 16, 17, 20, and 26 weeks respectively. Overall, this suggests that while the bestseller list changes often, a few standout titles maintain consistent reader interest over time. The Average time a book stays on the best seller List is 5.55 weeks.