Data 607 Assignment 9

Introduction

We will be using the The New York Time Most Popular API to get the most popular articles on NYTimes.com based on views and shares over the last 7 days.

We will use the following libraries

The httr library
The jsonlite library
The tidyverse library
The knitr library

To access New York Times APIs I needed an API key, for security reasons, I will not write out my API key explicitly.

#saving my API Key

nyt_key <- Sys.getenv("NYT_KEY")

The NYT Most Popular API

We first need to call NYT’s Most Popular API.

The end of our call viewed/7.json clarifies that we are interested in the most viewed articles and that the period we’re interested in is the last 7 days.

b_url <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json"
url <- paste0(b_url, "?api-key=", nyt_key)

  
#sending the request via httr's GET() function
response <- GET(url)

#extracting our response and saving the raw JSON text
raw_data <- content(response, as = "text")

#using jsonlite to parse our raw JSON data
json_data <- fromJSON(raw_data, flatten = TRUE)

Let’s save our results as a data frame and take a look at our data frame

most_popular <- json_data$results
  
glimpse(most_popular)

## Rows: 20
## Columns: 22
## $ uri            <chr> "nyt://article/156d7a30-6af3-5314-9aab-9975079794e8", "…
## $ url            <chr> "https://www.nytimes.com/2025/10/26/world/europe/louvre…
## $ id             <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ asset_id       <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ source         <chr> "New York Times", "New York Times", "New York Times", "…
## $ published_date <chr> "2025-10-26", "2025-10-24", "2025-10-20", "2025-10-23",…
## $ updated        <chr> "2025-10-26 15:08:29", "2025-10-25 23:01:00", "2025-10-…
## $ section        <chr> "World", "Arts", "U.S.", "Style", "World", "Magazine", …
## $ subsection     <chr> "Europe", "", "Politics", "", "Europe", "", "Europe", "…
## $ nytdsection    <chr> "world", "arts", "u.s.", "style", "world", "magazine", …
## $ adx_keywords   <chr> "Art;Museums;Robberies and Thefts;Jewels and Jewelry;in…
## $ column         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ byline         <chr> "By Aurelien Breeden", "By Zachary Small", "By Alan Bli…
## $ type           <chr> "Article", "Article", "Article", "Article", "Article", …
## $ title          <chr> "Police Make Arrests in Louvre Robbery, Authorities Say…
## $ abstract       <chr> "Thieves stole over $100 million in jewelry from the Pa…
## $ des_facet      <list> <"Art", "Museums", "Robberies and Thefts", "Jewels and…
## $ org_facet      <list> "Louvre Museum", <"Microsoft Corp", "Sony Corporation"…
## $ per_facet      <list> <>, <>, "Trump, Donald J", <>, <>, <>, <>, <"Davidson,…
## $ geo_facet      <list> "France", <>, <>, "Paris (France)", <"France", "Paris …
## $ media          <list> [<data.frame[1 x 6]>], [<data.frame[1 x 6]>], [<data.f…
## $ eta_id         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

While we now have our data in a data frame it’s a bit messy and includes information we may not need or find useful for the purpose of the analysis we wish to perform.

In this case I want to look more closely at the title, published and updated dates, the section, subsection, byline and abstract of each article

most_pop_nyt <- most_popular %>%
  select(title, byline, published_date, updated, section, subsection, abstract)

glimpse(most_pop_nyt)

## Rows: 20
## Columns: 7
## $ title          <chr> "Police Make Arrests in Louvre Robbery, Authorities Say…
## $ byline         <chr> "By Aurelien Breeden", "By Zachary Small", "By Alan Bli…
## $ published_date <chr> "2025-10-26", "2025-10-24", "2025-10-20", "2025-10-23",…
## $ updated        <chr> "2025-10-26 15:08:29", "2025-10-25 23:01:00", "2025-10-…
## $ section        <chr> "World", "Arts", "U.S.", "Style", "World", "Magazine", …
## $ subsection     <chr> "Europe", "", "Politics", "", "Europe", "", "Europe", "…
## $ abstract       <chr> "Thieves stole over $100 million in jewelry from the Pa…

Now that our data frame contains only the information we’re interested in we can make a visualization of the most popular articles for the last 7 days by their section

ggplot(most_pop_nyt, aes(x = section, fill = section)) +
  geom_histogram(stat="count") +
  labs(
    title = "Most Viewed New York Times Articles",
    subtitle = "By Section (Top 20 Articles from the last 7 Days)",
    x = "NYT Article Section",
    y = "Count"
  )+
  theme_minimal()

## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

The most viewed NYTimes.com articles over the last week came in largest number from the U.S. section, followed by the World section.

NYT Articles Most Shared on Facebook

It was interesting looking through the most viewed articles of the last seven days, so let’s also take a look at the most shared articles on facebook over the last seven days.

We will follow the same steps as before to get the data, however the url ends differently as we need to specify that we want the most shared articles

b_url2 <- "https://api.nytimes.com/svc/mostpopular/v2/shared/7/facebook.json"
url2 <- paste0(b_url2, "?api-key=", nyt_key)


response2 <- GET(url2)

raw_data_fb <- content(response, as = "text")

fb_json_data <- fromJSON(raw_data, flatten = TRUE)

colnames(fb_json_data$results)

##  [1] "uri"            "url"            "id"             "asset_id"      
##  [5] "source"         "published_date" "updated"        "section"       
##  [9] "subsection"     "nytdsection"    "adx_keywords"   "column"        
## [13] "byline"         "type"           "title"          "abstract"      
## [17] "des_facet"      "org_facet"      "per_facet"      "geo_facet"     
## [21] "media"          "eta_id"

fb_shared <- fb_json_data$results

glimpse(fb_shared)

## Rows: 20
## Columns: 22
## $ uri            <chr> "nyt://article/156d7a30-6af3-5314-9aab-9975079794e8", "…
## $ url            <chr> "https://www.nytimes.com/2025/10/26/world/europe/louvre…
## $ id             <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ asset_id       <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
## $ source         <chr> "New York Times", "New York Times", "New York Times", "…
## $ published_date <chr> "2025-10-26", "2025-10-24", "2025-10-20", "2025-10-23",…
## $ updated        <chr> "2025-10-26 15:08:29", "2025-10-25 23:01:00", "2025-10-…
## $ section        <chr> "World", "Arts", "U.S.", "Style", "World", "Magazine", …
## $ subsection     <chr> "Europe", "", "Politics", "", "Europe", "", "Europe", "…
## $ nytdsection    <chr> "world", "arts", "u.s.", "style", "world", "magazine", …
## $ adx_keywords   <chr> "Art;Museums;Robberies and Thefts;Jewels and Jewelry;in…
## $ column         <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ byline         <chr> "By Aurelien Breeden", "By Zachary Small", "By Alan Bli…
## $ type           <chr> "Article", "Article", "Article", "Article", "Article", …
## $ title          <chr> "Police Make Arrests in Louvre Robbery, Authorities Say…
## $ abstract       <chr> "Thieves stole over $100 million in jewelry from the Pa…
## $ des_facet      <list> <"Art", "Museums", "Robberies and Thefts", "Jewels and…
## $ org_facet      <list> "Louvre Museum", <"Microsoft Corp", "Sony Corporation"…
## $ per_facet      <list> <>, <>, "Trump, Donald J", <>, <>, <>, <>, <"Davidson,…
## $ geo_facet      <list> "France", <>, <>, "Paris (France)", <"France", "Paris …
## $ media          <list> [<data.frame[1 x 6]>], [<data.frame[1 x 6]>], [<data.f…
## $ eta_id         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

Most Shared FB Article By Sections

Again, let’s visualization by section but this time of the most shared articles over Facebook for the last 7 days.

ggplot(fb_shared, aes(x = section, fill = section)) +
  geom_histogram(stat="count") +
  labs(
    title = "Most Shared New York Times Articles on Facebook",
    subtitle = "By Section (Top 20 Articles from the last 7 Days)",
    x = "NYT Article Section",
    y = "Count"
  )+
  theme_minimal()

## Warning in geom_histogram(stat = "count"): Ignoring unknown parameters:
## `binwidth`, `bins`, and `pad`

The most shared articles over Facebook over the last week came in the largest number from the U.S. section, followed by the World section. This similar to the results we saw for the most viewed articles over the last 7 days.

The Top Keywords Used in the Most Shared FB Article

Apart from the number of most popular articles that come from each NYT section, I am also interested in the top keywords used in the most shared articles over Facebook. Let’s take a look at the five most used keywords in each section.

To do this I need a new data frame wherein each keyword has it’s own row per each article it appears in, and I want to make sure the keywords are grouped by section. Our original data holds a list of keywords for each article as a single string so we have to split up this string.

most_fb_shared <- fb_shared %>%
  select(title, byline, published_date, updated, section, subsection, 
         adx_keywords, abstract) %>%
  rename(keywords = adx_keywords)


fb_keywords2 <- most_fb_shared %>%
  filter(!is.na(keywords)) %>%
  group_by(section)%>%
  mutate(keywords = str_split(keywords,";" )) %>%
  unnest(keywords)

Now that our keywords are organized by the section they appear in, let’s count how often they appear per section and print the 5 most used keywords per section in the fb shared articles of the last 7 days.

fb_keyword_count3 <- fb_keywords2 %>%
  count(keywords, sort = TRUE) %>%
  slice_head(n = 5) %>%
  arrange(section, desc(n)) %>%
  kable(
    caption = "Top 5 Keywords from NYT Articles by Section (Over The Last 7 Days) ",
    col.names = c("Section", "Keyword", "Count"),
    align = c("l", "l", "r")
  )


fb_keyword_count3

Top 5 Keywords from NYT Articles by Section (Over The Last 7 Days)
Section	Keyword	Count
Arts	Computer and Video Games	1
Arts	Microsoft Corp	1
Arts	PlayStation (Video Game System)	1
Arts	Sony Corporation	1
Arts	Xbox (Video Game System)	1
Magazine	Doctors	1
Magazine	Drugs (Pharmaceuticals)	1
Magazine	Health Insurance and Managed Care	1
Magazine	Hormones	1
Magazine	Menopause	1
New York	432 Park Avenue (Manhattan, NY, Apartments)	1
New York	Accidents and Safety	1
New York	Billionaires’ Row (Manhattan, NY)	1
New York	Buildings (Structures)	1
New York	Buildings Department (NYC)	1
Opinion	Delaware County (Pa)	1
Opinion	Demonstrations, Protests and Riots	1
Opinion	Feces	1
Opinion	Federal-State Relations (US)	1
Opinion	Harjo, Sterlin	1
Style	Artificial Intelligence	1
Style	Associated Press	1
Style	Davidson, Pete (1993- )	1
Style	Fashion and Apparel	1
Style	Ferries	1
U.S.	Trump, Donald J	7
U.S.	United States Politics and Government	7
U.S.	internal-open-access-from-nl	4
U.S.	Politics and Government	2
U.S.	Republican Party	2
World	internal-open-access-from-nl	4
World	Jewels and Jewelry	3
World	Louvre Museum	3
World	Museums	3
World	Robberies and Thefts	3

As we can see the Arts, Magazine, New York, Opinion and Style section articles did not have any unique keywords used more than once in the last 7 days. While the U.S. and World sections did.

New York Times Most Popular API Limitations

A limitation of The New York Times Most Popular API I found frustrating while working with it is that it only returns the top 20 most viewed, most emailed or most shared on facebook. In the future I would like to explore the Top Stories API and the Article Search API as those return more results and I would like to compare those results to the results from the Most Popular API.