DATA 607 Assignment Week 9

Intro

The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis In this assignment, I will use an API key, choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

The “Most Popular” API interests me the most: https://developer.nytimes.com/docs/most-popular-product/1/overview.

api_key <- Sys.getenv("NYT_API_KEY")

# api endpoint https://api.nytimes.com/svc/mostpopular/v2/viewed/{period}.json?api-key=yourkey

url <- paste0("https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=", api_key)

Getting Started

response <- GET(url)

# Check for successful response
# 200 = ok; 404 = not found
if (response$status_code == 200) {
  data_raw <- content(response, as = "text", encoding = "UTF-8")
  data_json <- fromJSON(data_raw, flatten = TRUE)
  
  # Extract article results into a data frame
  nyt_df <- as.data.frame(data_json$results)
  
} else {
  stop("API request failed with status code: ", response$status_code)
}

I’ve read the data that we got from the api call and found in the response. It is now stored in an R dataframe.

dim(nyt_df)

## [1] 20 22

colnames(nyt_df)

##  [1] "uri"            "url"            "id"             "asset_id"      
##  [5] "source"         "published_date" "updated"        "section"       
##  [9] "subsection"     "nytdsection"    "adx_keywords"   "column"        
## [13] "byline"         "type"           "title"          "abstract"      
## [17] "des_facet"      "org_facet"      "per_facet"      "geo_facet"     
## [21] "media"          "eta_id"

There are 20 articles from the New York Times that are “most popular” in the last 30 days.

nyt_df %>%
  select(title, byline, section, published_date, url) %>%
  head(10) %>%
  kbl(caption = "Sample of Top 10 Most Popular NYT Articles (Last 30 Days)",
  col.names = c("Title", "Byline", "Section", "Date Published", "URL")) %>%
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

Sample of Top 10 Most Popular NYT Articles (Last 30 Days)
Title	Byline	Section	Date Published	URL
Coast Guard Buys Two Private Jets for Noem, Costing $172 Million	By Catie Edmondson	U.S.	2025-10-18	https://www.nytimes.com/2025/10/18/us/politics/kristi-noem-dhs-gulfstream.html
Daniel Naroditsky, Chess Grandmaster, Dies at 29	By Alexandra E. Petri	U.S.	2025-10-20	https://www.nytimes.com/2025/10/20/us/daniel-naroditsky-chess-grandmaster-dead.html
After Declining to Give Trump a Sword for King Charles, a Museum Leader Is Out	By Jennifer Schuessler and Minho Kim	Arts	2025-10-02	https://www.nytimes.com/2025/10/02/arts/design/trump-eisenhower-king-charles-sword.html
In Just 7 Brazen Minutes, Thieves Grab ‘Priceless’ Jewels From Louvre	By Catherine Porter and Aurelien Breeden	World	2025-10-19	https://www.nytimes.com/2025/10/19/world/europe/louvre-paris-robbery.html
He’s Young, Talented and Openly Religious. Is He the Savior Democrats Have Been Waiting For?	By Michelle Goldberg	Opinion	2025-10-01	https://www.nytimes.com/2025/10/01/opinion/james-talarico-religious-left.html
The Superintendent’s Bio Seemed Too Good to Be True. It Was.	By Mitch Smith, Ernesto Londoño and Dana Goldstein	U.S.	2025-10-05	https://www.nytimes.com/2025/10/05/us/des-moines-iowa-superintendent-ian-roberts-immigration-ice.html
Santos Is Released After Trump Commutes His Sentence	By Michael Gold and Grace Ashford	U.S.	2025-10-17	https://www.nytimes.com/2025/10/17/us/politics/trump-george-santos-sentence-commute.html
Diane Keaton, a Star of ‘Annie Hall’ and ‘First Wives Club,’ Dies at 79	By Anita Gates	Movies	2025-10-11	https://www.nytimes.com/2025/10/11/movies/diane-keaton-dead.html
Head of the U.S. Military’s Southern Command Is Stepping Down, Officials Say	By Eric Schmitt and Tyler Pager	U.S.	2025-10-16	https://www.nytimes.com/2025/10/16/us/politics/southern-command-head-stepping-down.html
After Days of Silence, Joe Rogan Weighs In on Kimmel’s Suspension	By Julia Jacobs	Arts	2025-09-23	https://www.nytimes.com/2025/09/23/arts/joe-rogan-jimmy-kimmel.html

Visual Analysis for Fun

article_count_df <- nyt_df %>%
  filter(section != "") %>%
  count(section) %>%
  arrange(desc(n)) %>%
  slice_head(n = 10)

ggplot(data=article_count_df,
  aes(x = reorder(section, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "NYT Sections by Popular Article Count", x = "Section", y = "Number of Articles") +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid.major.y = element_blank(),  # remove horizontal gridlines
    panel.grid.minor = element_blank()     # remove minor gridlines
  ) + 
  scale_y_continuous(breaks = seq(0, max(article_count_df$n), 1))

The most popular section appears to be “U.S.”. This is followed by a tie in “World”, “Opinion” and “Arts”.

nyt_df %>%
  filter(byline != "") %>%
  count(byline) %>%
  arrange(desc(n)) %>%
  slice_head(n = 10) %>%
  ggplot(aes(x = reorder(byline, n), y = n)) +
  geom_col(fill = "darkorange") +
  coord_flip() +
  labs(title = "Authors Appearing for the Most Popular Articles", x = "Author", y = "Number of Articles") +
  theme_minimal(base_size = 14) + 
  scale_y_continuous(breaks = seq(0, max(article_count_df$n), 1)) +
  theme(
  plot.title = element_text(hjust = 1, face = "bold"),
  plot.margin = margin(20, 10, 10, 10)
)

# Make sure stop_words dataset is available
data("stop_words")

# Words from title and abstract
cloud_words <- nyt_df %>%
  mutate(
    text = paste(title, coalesce(abstract, ""), sep = " ") %>% tolower()
  ) %>%
  select(text) %>%
  unnest_tokens(word, text) %>%
  filter(!word %in% stop_words$word) %>%
  filter(!word %in% c("new", "york", "times")) %>%
  filter(nchar(word) > 2) %>%
  count(word, sort = TRUE)

# Generate word cloud
wordcloud(
  words = cloud_words$word,
  freq = cloud_words$n,
  max.words = 100,
  random.order = FALSE,
  colors = brewer.pal(8, "Dark2")
)

title(main = "Most Common Words in Popular NYT Titles and Abstracts", cex.main = 1.5)

Conclusion

Getting data via api call and using it in R is easier than initially expected. The NYT has more “U.S.” published stories compared to any other category than I realized. President Trump is referenced very frequently in the NYT.

DATA 607 Assignment Week 9 - Web APIs

Catherine Dube

Intro

Getting Started

Visual Analysis for Fun

Conclusion