Do multimedia elements (e.g. images & videos) increase the probability of an article being highly viewed?
In order to answer this question, I will use the New York Times’ Most Popular API to collect data on the top most viewed articles of the last month. This involves organizing the information from the API into a data frame, which can then be analyzed through a summary of how many of the most viewed articles include media, and a comparison of the view counts of articles with media to those without media.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
First, the NYT Most Popular API needs to be authenticated and accessed.
#store key in environment variable (replace with your own key locally)key <-Sys.getenv("NYT_API_KEY")#request API URL for the most viewed articles from the last 30 days by combining endpoint & keyurl <-paste0("https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=", key)
Then, a request for a JSON with the most viewed articles from the last 30 days can be initiated.
# A tibble: 6 × 6
title byline published_date section url media
<chr> <chr> <chr> <chr> <chr> <lis>
1 Cesar Chavez, a Civil Rights Icon, … "By M… 2026-03-18 U.S. http… <df>
2 How Trump and His Advisers Miscalcu… "By M… 2026-03-10 U.S. http… <df>
3 Georgia Teacher Is Killed After Tee… "By A… 2026-03-08 U.S. http… <df>
4 Kristi Noem Survived Many Crises. T… "By Z… 2026-03-06 U.S. http… <df>
5 Kash Patel’s Girlfriend Seeks Fame … "By E… 2026-02-28 U.S. http… <df>
6 Crossplay Game Review "" 2025-10-07 The Up… http… <df>
Media Analysis
#add a column representative of whether or not an article has any form of multimediaarticles_df <- articles_df %>%mutate(has_media =sapply(media, function(x) length(x) >0))#summarize the count of articles with mediahas_media <- articles_df %>%group_by(has_media) %>%summarise(count =n())print("Count of articles with vs without media:")
Of the top 20 articles from the last 30 days, 19 of them have some sort of multimedia element, while only one of them does not. This suggests that there is a relationship between media and the views that an article gets, and indicates that including videos or images may be related to attracting more readers. In order to extend/ verify this work, one could create an accompanying visualization that supports this relationship. One might also build upon it by looking at a wider variety of articles. Linear regression could also be done to formally test the relationship between multimedia elements and article views.