Introduction
We will be using the The New York Time Most Popular API to get the
most popular articles on NYTimes.com based on views and shares over the
last 7 days.
We will use the following libraries
- The httr library
- The jsonlite library
- The tidyverse library
- The knitr library
To access New York Times APIs I needed an API key, for security
reasons, I will not write out my API key explicitly.
#saving my API Key
nyt_key <- Sys.getenv("NYT_KEY")
The NYT Most Popular API
We first need to call NYT’s Most Popular API.
The end of our call viewed/7.json clarifies that we are
interested in the most viewed articles and that the period we’re
interested in is the last 7 days.
b_url <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json"
url <- paste0(b_url, "?api-key=", nyt_key)
#sending the request via httr's GET() function
response <- GET(url)
#extracting our response and saving the raw JSON text
raw_data <- content(response, as = "text")
#using jsonlite to parse our raw JSON data
json_data <- fromJSON(raw_data, flatten = TRUE)
Let’s save our results as a data frame and take a look at our data
frame
most_popular <- json_data$results
glimpse(most_popular)
## Rows: 20
## Columns: 22
## $ uri <chr> "nyt://article/156d7a30-6af3-5314-9aab-9975079794e8", "…
## $ url <chr> "https://www.nytimes.com/2025/10/26/world/europe/louvre…
## $ id <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ asset_id <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ source <chr> "New York Times", "New York Times", "New York Times", "…
## $ published_date <chr> "2025-10-26", "2025-10-24", "2025-10-20", "2025-10-23",…
## $ updated <chr> "2025-10-26 15:08:29", "2025-10-25 23:01:00", "2025-10-…
## $ section <chr> "World", "Arts", "U.S.", "Style", "World", "Magazine", …
## $ subsection <chr> "Europe", "", "Politics", "", "Europe", "", "Europe", "…
## $ nytdsection <chr> "world", "arts", "u.s.", "style", "world", "magazine", …
## $ adx_keywords <chr> "Art;Museums;Robberies and Thefts;Jewels and Jewelry;in…
## $ column <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ byline <chr> "By Aurelien Breeden", "By Zachary Small", "By Alan Bli…
## $ type <chr> "Article", "Article", "Article", "Article", "Article", …
## $ title <chr> "Police Make Arrests in Louvre Robbery, Authorities Say…
## $ abstract <chr> "Thieves stole over $100 million in jewelry from the Pa…
## $ des_facet <list> <"Art", "Museums", "Robberies and Thefts", "Jewels and…
## $ org_facet <list> "Louvre Museum", <"Microsoft Corp", "Sony Corporation"…
## $ per_facet <list> <>, <>, "Trump, Donald J", <>, <>, <>, <>, <"Davidson,…
## $ geo_facet <list> "France", <>, <>, "Paris (France)", <"France", "Paris …
## $ media <list> [<data.frame[1 x 6]>], [<data.frame[1 x 6]>], [<data.f…
## $ eta_id <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
While we now have our data in a data frame it’s a bit messy and
includes information we may not need or find useful for the purpose of
the analysis we wish to perform.
In this case I want to look more closely at the title, published and
updated dates, the section, subsection, byline and abstract of each
article
most_pop_nyt <- most_popular %>%
select(title, byline, published_date, updated, section, subsection, abstract)
glimpse(most_pop_nyt)
## Rows: 20
## Columns: 7
## $ title <chr> "Police Make Arrests in Louvre Robbery, Authorities Say…
## $ byline <chr> "By Aurelien Breeden", "By Zachary Small", "By Alan Bli…
## $ published_date <chr> "2025-10-26", "2025-10-24", "2025-10-20", "2025-10-23",…
## $ updated <chr> "2025-10-26 15:08:29", "2025-10-25 23:01:00", "2025-10-…
## $ section <chr> "World", "Arts", "U.S.", "Style", "World", "Magazine", …
## $ subsection <chr> "Europe", "", "Politics", "", "Europe", "", "Europe", "…
## $ abstract <chr> "Thieves stole over $100 million in jewelry from the Pa…
Now that our data frame contains only the information we’re
interested in we can make a visualization of the most popular articles
for the last 7 days by their section
ggplot(most_pop_nyt, aes(x = section, fill = section)) +
geom_histogram(stat="count") +
labs(
title = "Most Viewed New York Times Articles",
subtitle = "By Section (Top 20 Articles from the last 7 Days)",
x = "NYT Article Section",
y = "Count"
)+
theme_minimal()
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

The most viewed NYTimes.com articles over the last week came
in largest number from the U.S. section, followed by the World
section.
NYT Articles Most Shared on Facebook
It was interesting looking through the most viewed articles of the
last seven days, so let’s also take a look at the most shared articles
on facebook over the last seven days.
We will follow the same steps as before to get the data, however the
url ends differently as we need to specify that we want the most
shared articles
b_url2 <- "https://api.nytimes.com/svc/mostpopular/v2/shared/7/facebook.json"
url2 <- paste0(b_url2, "?api-key=", nyt_key)
response2 <- GET(url2)
raw_data_fb <- content(response, as = "text")
fb_json_data <- fromJSON(raw_data, flatten = TRUE)
colnames(fb_json_data$results)
## [1] "uri" "url" "id" "asset_id"
## [5] "source" "published_date" "updated" "section"
## [9] "subsection" "nytdsection" "adx_keywords" "column"
## [13] "byline" "type" "title" "abstract"
## [17] "des_facet" "org_facet" "per_facet" "geo_facet"
## [21] "media" "eta_id"
fb_shared <- fb_json_data$results
glimpse(fb_shared)
## Rows: 20
## Columns: 22
## $ uri <chr> "nyt://article/156d7a30-6af3-5314-9aab-9975079794e8", "…
## $ url <chr> "https://www.nytimes.com/2025/10/26/world/europe/louvre…
## $ id <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ asset_id <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ source <chr> "New York Times", "New York Times", "New York Times", "…
## $ published_date <chr> "2025-10-26", "2025-10-24", "2025-10-20", "2025-10-23",…
## $ updated <chr> "2025-10-26 15:08:29", "2025-10-25 23:01:00", "2025-10-…
## $ section <chr> "World", "Arts", "U.S.", "Style", "World", "Magazine", …
## $ subsection <chr> "Europe", "", "Politics", "", "Europe", "", "Europe", "…
## $ nytdsection <chr> "world", "arts", "u.s.", "style", "world", "magazine", …
## $ adx_keywords <chr> "Art;Museums;Robberies and Thefts;Jewels and Jewelry;in…
## $ column <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ byline <chr> "By Aurelien Breeden", "By Zachary Small", "By Alan Bli…
## $ type <chr> "Article", "Article", "Article", "Article", "Article", …
## $ title <chr> "Police Make Arrests in Louvre Robbery, Authorities Say…
## $ abstract <chr> "Thieves stole over $100 million in jewelry from the Pa…
## $ des_facet <list> <"Art", "Museums", "Robberies and Thefts", "Jewels and…
## $ org_facet <list> "Louvre Museum", <"Microsoft Corp", "Sony Corporation"…
## $ per_facet <list> <>, <>, "Trump, Donald J", <>, <>, <>, <>, <"Davidson,…
## $ geo_facet <list> "France", <>, <>, "Paris (France)", <"France", "Paris …
## $ media <list> [<data.frame[1 x 6]>], [<data.frame[1 x 6]>], [<data.f…
## $ eta_id <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
Most Shared FB Article By Sections
Again, let’s visualization by section but this time of the most
shared articles over Facebook for the last 7 days.
ggplot(fb_shared, aes(x = section, fill = section)) +
geom_histogram(stat="count") +
labs(
title = "Most Shared New York Times Articles on Facebook",
subtitle = "By Section (Top 20 Articles from the last 7 Days)",
x = "NYT Article Section",
y = "Count"
)+
theme_minimal()
## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

The most shared articles over Facebook over the last week
came in the largest number from the U.S. section, followed by the World
section. This similar to the results we saw for the most viewed articles
over the last 7 days.
The Top Keywords Used in the Most Shared FB Article
Apart from the number of most popular articles that come from each
NYT section, I am also interested in the top keywords used in the most
shared articles over Facebook. Let’s take a look at the five most used
keywords in each section.
To do this I need a new data frame wherein each keyword has it’s own
row per each article it appears in, and I want to make sure the keywords
are grouped by section. Our original data holds a list of keywords for
each article as a single string so we have to split up this string.
most_fb_shared <- fb_shared %>%
select(title, byline, published_date, updated, section, subsection,
adx_keywords, abstract) %>%
rename(keywords = adx_keywords)
fb_keywords2 <- most_fb_shared %>%
filter(!is.na(keywords)) %>%
group_by(section)%>%
mutate(keywords = str_split(keywords,";" )) %>%
unnest(keywords)
Now that our keywords are organized by the section they appear in,
let’s count how often they appear per section and print the 5 most used
keywords per section in the fb shared articles of the last 7 days.
fb_keyword_count3 <- fb_keywords2 %>%
count(keywords, sort = TRUE) %>%
slice_head(n = 5) %>%
arrange(section, desc(n)) %>%
kable(
caption = "Top 5 Keywords from NYT Articles by Section (Over The Last 7 Days) ",
col.names = c("Section", "Keyword", "Count"),
align = c("l", "l", "r")
)
fb_keyword_count3
Top 5 Keywords from NYT Articles by Section (Over The Last 7
Days)
| Arts |
Computer and Video Games |
1 |
| Arts |
Microsoft Corp |
1 |
| Arts |
PlayStation (Video Game System) |
1 |
| Arts |
Sony Corporation |
1 |
| Arts |
Xbox (Video Game System) |
1 |
| Magazine |
Doctors |
1 |
| Magazine |
Drugs (Pharmaceuticals) |
1 |
| Magazine |
Health Insurance and Managed Care |
1 |
| Magazine |
Hormones |
1 |
| Magazine |
Menopause |
1 |
| New York |
432 Park Avenue (Manhattan, NY, Apartments) |
1 |
| New York |
Accidents and Safety |
1 |
| New York |
Billionaires’ Row (Manhattan, NY) |
1 |
| New York |
Buildings (Structures) |
1 |
| New York |
Buildings Department (NYC) |
1 |
| Opinion |
Delaware County (Pa) |
1 |
| Opinion |
Demonstrations, Protests and Riots |
1 |
| Opinion |
Feces |
1 |
| Opinion |
Federal-State Relations (US) |
1 |
| Opinion |
Harjo, Sterlin |
1 |
| Style |
Artificial Intelligence |
1 |
| Style |
Associated Press |
1 |
| Style |
Davidson, Pete (1993- ) |
1 |
| Style |
Fashion and Apparel |
1 |
| Style |
Ferries |
1 |
| U.S. |
Trump, Donald J |
7 |
| U.S. |
United States Politics and Government |
7 |
| U.S. |
internal-open-access-from-nl |
4 |
| U.S. |
Politics and Government |
2 |
| U.S. |
Republican Party |
2 |
| World |
internal-open-access-from-nl |
4 |
| World |
Jewels and Jewelry |
3 |
| World |
Louvre Museum |
3 |
| World |
Museums |
3 |
| World |
Robberies and Thefts |
3 |
As we can see the Arts, Magazine, New York, Opinion and Style
section articles did not have any unique keywords used more than once in
the last 7 days. While the U.S. and World sections did.
New York Times Most Popular API Limitations
A limitation of The New York Times Most Popular API I found
frustrating while working with it is that it only returns the
top 20 most viewed, most emailed or most shared on
facebook. In the future I would like to explore the Top Stories API and
the Article Search API as those return more results and I would like to
compare those results to the results from the Most Popular API.