Week 9 Assignment: NYT API

Author

Emily El Mouaquite

Approach

Do multimedia elements (e.g. images & videos) increase the probability of an article being highly viewed?

In order to answer this question, I will use the New York Times’ Most Popular API to collect data on the top most viewed articles of the last month. This involves organizing the information from the API into a data frame, which can then be analyzed through a summary of how many of the most viewed articles include media, and a comparison of the view counts of articles with media to those without media.

Code Base

#load libraries
library(httr)
library(jsonlite)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

First, the NYT Most Popular API needs to be authenticated and accessed.

#store key in environment variable (replace with your own key locally)
key <- Sys.getenv("NYT_API_KEY")
#request API URL for the most viewed articles from the last 30 days by combining endpoint & key
url <- paste0("https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=", key)

Then, a request for a JSON with the most viewed articles from the last 30 days can be initiated.

#request/ parse JSON request
request <- GET(url)
articles <- content(request, as = "text", encoding = "UTF-8")
articles_parsed <- fromJSON(articles)

Creation of a data frame from the JSON:

#tibble creation
articles_df <- as_tibble(articles_parsed$results) %>%
  select(title, byline, published_date, section, url, media)
head(articles_df)
# A tibble: 6 × 6
  title                                byline published_date section url   media
  <chr>                                <chr>  <chr>          <chr>   <chr> <lis>
1 Cesar Chavez, a Civil Rights Icon, … "By M… 2026-03-18     U.S.    http… <df> 
2 How Trump and His Advisers Miscalcu… "By M… 2026-03-10     U.S.    http… <df> 
3 Georgia Teacher Is Killed After Tee… "By A… 2026-03-08     U.S.    http… <df> 
4 Kristi Noem Survived Many Crises. T… "By Z… 2026-03-06     U.S.    http… <df> 
5 Kash Patel’s Girlfriend Seeks Fame … "By E… 2026-02-28     U.S.    http… <df> 
6 Crossplay Game Review                ""     2025-10-07     The Up… http… <df> 

Media Analysis

#add a column representative of whether or not an article has any form of multimedia
articles_df <- articles_df %>%
  mutate(has_media = sapply(media, function(x) length(x) > 0))
#summarize the count of articles with media
has_media <- articles_df %>%
  group_by(has_media) %>%
  summarise(count = n())
print("Count of articles with vs without media:")
[1] "Count of articles with vs without media:"
print(has_media)
# A tibble: 2 × 2
  has_media count
  <lgl>     <int>
1 FALSE         1
2 TRUE         19

Conclusion

Of the top 20 articles from the last 30 days, 19 of them have some sort of multimedia element, while only one of them does not. This suggests that there is a relationship between media and the views that an article gets, and indicates that including videos or images may be related to attracting more readers. In order to extend/ verify this work, one could create an accompanying visualization that supports this relationship. One might also build upon it by looking at a wider variety of articles. Linear regression could also be done to formally test the relationship between multimedia elements and article views.