The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis In this assignment, I will use an API key, choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.
The “Most Popular” API interests me the most: https://developer.nytimes.com/docs/most-popular-product/1/overview.
api_key <- Sys.getenv("NYT_API_KEY")
# api endpoint https://api.nytimes.com/svc/mostpopular/v2/viewed/{period}.json?api-key=yourkey
url <- paste0("https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=", api_key)
response <- GET(url)
# Check for successful response
# 200 = ok; 404 = not found
if (response$status_code == 200) {
data_raw <- content(response, as = "text", encoding = "UTF-8")
data_json <- fromJSON(data_raw, flatten = TRUE)
# Extract article results into a data frame
nyt_df <- as.data.frame(data_json$results)
} else {
stop("API request failed with status code: ", response$status_code)
}
I’ve read the data that we got from the api call and found in the response. It is now stored in an R dataframe.
dim(nyt_df)
## [1] 20 22
colnames(nyt_df)
## [1] "uri" "url" "id" "asset_id"
## [5] "source" "published_date" "updated" "section"
## [9] "subsection" "nytdsection" "adx_keywords" "column"
## [13] "byline" "type" "title" "abstract"
## [17] "des_facet" "org_facet" "per_facet" "geo_facet"
## [21] "media" "eta_id"
There are 20 articles from the New York Times that are “most popular” in the last 30 days.
nyt_df %>%
select(title, byline, section, published_date, url) %>%
head(10) %>%
kbl(caption = "Sample of Top 10 Most Popular NYT Articles (Last 30 Days)",
col.names = c("Title", "Byline", "Section", "Date Published", "URL")) %>%
kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
| Title | Byline | Section | Date Published | URL |
|---|---|---|---|---|
| Coast Guard Buys Two Private Jets for Noem, Costing $172 Million | By Catie Edmondson | U.S. | 2025-10-18 | https://www.nytimes.com/2025/10/18/us/politics/kristi-noem-dhs-gulfstream.html |
| Daniel Naroditsky, Chess Grandmaster, Dies at 29 | By Alexandra E. Petri | U.S. | 2025-10-20 | https://www.nytimes.com/2025/10/20/us/daniel-naroditsky-chess-grandmaster-dead.html |
| After Declining to Give Trump a Sword for King Charles, a Museum Leader Is Out | By Jennifer Schuessler and Minho Kim | Arts | 2025-10-02 | https://www.nytimes.com/2025/10/02/arts/design/trump-eisenhower-king-charles-sword.html |
| In Just 7 Brazen Minutes, Thieves Grab ‘Priceless’ Jewels From Louvre | By Catherine Porter and Aurelien Breeden | World | 2025-10-19 | https://www.nytimes.com/2025/10/19/world/europe/louvre-paris-robbery.html |
| He’s Young, Talented and Openly Religious. Is He the Savior Democrats Have Been Waiting For? | By Michelle Goldberg | Opinion | 2025-10-01 | https://www.nytimes.com/2025/10/01/opinion/james-talarico-religious-left.html |
| The Superintendent’s Bio Seemed Too Good to Be True. It Was. | By Mitch Smith, Ernesto Londoño and Dana Goldstein | U.S. | 2025-10-05 | https://www.nytimes.com/2025/10/05/us/des-moines-iowa-superintendent-ian-roberts-immigration-ice.html |
| Santos Is Released After Trump Commutes His Sentence | By Michael Gold and Grace Ashford | U.S. | 2025-10-17 | https://www.nytimes.com/2025/10/17/us/politics/trump-george-santos-sentence-commute.html |
| Diane Keaton, a Star of ‘Annie Hall’ and ‘First Wives Club,’ Dies at 79 | By Anita Gates | Movies | 2025-10-11 | https://www.nytimes.com/2025/10/11/movies/diane-keaton-dead.html |
| Head of the U.S. Military’s Southern Command Is Stepping Down, Officials Say | By Eric Schmitt and Tyler Pager | U.S. | 2025-10-16 | https://www.nytimes.com/2025/10/16/us/politics/southern-command-head-stepping-down.html |
| After Days of Silence, Joe Rogan Weighs In on Kimmel’s Suspension | By Julia Jacobs | Arts | 2025-09-23 | https://www.nytimes.com/2025/09/23/arts/joe-rogan-jimmy-kimmel.html |
article_count_df <- nyt_df %>%
filter(section != "") %>%
count(section) %>%
arrange(desc(n)) %>%
slice_head(n = 10)
ggplot(data=article_count_df,
aes(x = reorder(section, n), y = n)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(title = "NYT Sections by Popular Article Count", x = "Section", y = "Number of Articles") +
theme_minimal(base_size = 14) +
theme(
panel.grid.major.y = element_blank(), # remove horizontal gridlines
panel.grid.minor = element_blank() # remove minor gridlines
) +
scale_y_continuous(breaks = seq(0, max(article_count_df$n), 1))
The most popular section appears to be “U.S.”. This is followed by a tie
in “World”, “Opinion” and “Arts”.
nyt_df %>%
filter(byline != "") %>%
count(byline) %>%
arrange(desc(n)) %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(byline, n), y = n)) +
geom_col(fill = "darkorange") +
coord_flip() +
labs(title = "Authors Appearing for the Most Popular Articles", x = "Author", y = "Number of Articles") +
theme_minimal(base_size = 14) +
scale_y_continuous(breaks = seq(0, max(article_count_df$n), 1)) +
theme(
plot.title = element_text(hjust = 1, face = "bold"),
plot.margin = margin(20, 10, 10, 10)
)
# Make sure stop_words dataset is available
data("stop_words")
# Words from title and abstract
cloud_words <- nyt_df %>%
mutate(
text = paste(title, coalesce(abstract, ""), sep = " ") %>% tolower()
) %>%
select(text) %>%
unnest_tokens(word, text) %>%
filter(!word %in% stop_words$word) %>%
filter(!word %in% c("new", "york", "times")) %>%
filter(nchar(word) > 2) %>%
count(word, sort = TRUE)
# Generate word cloud
wordcloud(
words = cloud_words$word,
freq = cloud_words$n,
max.words = 100,
random.order = FALSE,
colors = brewer.pal(8, "Dark2")
)
title(main = "Most Common Words in Popular NYT Titles and Abstracts", cex.main = 1.5)
Getting data via api call and using it in R is easier than initially expected. The NYT has more “U.S.” published stories compared to any other category than I realized. President Trump is referenced very frequently in the NYT.