The objective of this assignment is to use a New York Times API to retrieve JSON data, import it into R, and transform it into a tidy data frame for analysis. The goal is to explore patterns in article engagement by examining whether articles with images appear more frequently in the New York Times Most Popular list compared to those without images.
Approach
For this assignment, I used the New York Times Most Popular API, specifically the endpoint that returns the most viewed articles over the past 7 days.
The research question for this analysis is:
Do articles with images receive more engagement than those without in the New York Times Most Popular list?
The dataset should be obtained dynamically through an API request and include information about the most viewed articles published by The New York Times. To answer the research question, the media field will be used to determine whether an article contains images. A new variable will be created:
Has Image – A binary variable indicating whether the article includes at least one image (Yes/No)
The data retrieved from the API is in JSON format, which represents hierarchical and nested data commonly used in web services. This requires parsing and transformation before it can be analyzed in R.
Data Analysis Steps
The analysis will follow these steps:
API Authentication: The API key will be securely accessed using an environment variable (Sys.getenv) to avoid hardcoding sensitive information.
Data Retrieval: A request will be made to the NYT Most Popular API to retrieve the most viewed articles over a 7-day period.
JSON Parsing: The JSON response will be parsed into an R object using the jsonlite package.
Data Transformation: The relevant fields (title, section, published date, byline, and media) will be extracted and converted into a tidy data frame.
Feature Engineering: A new variable (Has Image) will be created by checking whether the nested media field contains data.
Data Cleaning: Nested structures will be simplified, missing media values will be treated as “No Image”, and dates will be converted into proper date format
Exploratory Data Analysis: Count and compare the number of articles with and without images, analyze distribution across sections, and visualize differences using bar charts
Interpretation: The results will be used to evaluate whether the presence of images is associated with higher representation in the Most Popular list, serving as a proxy for engagement.
Anticipated Challenges
One of the main challenges in this assignment is working with nested JSON data returned by the API. The media field, which contains image information, is stored as a list of nested objects and requires additional processing to determine whether an article includes images.
Another challenge is that the API does not provide direct engagement metrics such as view counts. Instead, inclusion in the “Most Popular” list must be used as a proxy for engagement, which limits the ability to draw causal conclusions.
Additionally, handling missing or empty fields and converting them into meaningful variables (such as the binary image indicator) requires careful data cleaning and transformation.
Implementation of Data Import
The following code demonstrates how the JSON data is retrieved from the New York Times API and prepared for analysis in R. First, all the required libraries are imported.
library(httr)library(jsonlite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)library(gt)
The API request retrieves the most viewed articles over the past 7 days. The API key is stored securely as an environment variable.
API Selection (Endpoint and brief description)
API Used: New York Times Most Popular API
Endpoint:/mostpopular/v2/viewed/7.json
Purpose: Retrieves articles that were most viewed over the past 7 days.
Description: This endpoint returns a list of articles with metadata such as title, section, published_date, byline, and media. It provides a snapshot of the articles currently most popular among readers, which can be used to analyze trends, engagement, and the presence of images in widely read content.
API Request (Parameters Used)
Request Method: GET
Parameters: - 7.json – Specifies retrieval of articles viewed over the last 7 days - api-key – Authorizes the request using a personal NYT API key stored securely in an environment variable
Description: The GET request to the Most Popular API retrieves a JSON response containing the top-viewed articles for the past week. The API key ensures secure access without exposing sensitive information in the code. The response includes nested fields for each article, with metadata such as title, section, published_date, byline, and media, which are later extracted and transformed into a tidy R data frame for analysis.
# Retrieve API keyapi_key <-Sys.getenv("NYT_API_KEY")# API requesturl <-paste0("https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=", api_key)response <-GET(url)# Convert response to textdata_json <-content(response, as ="text", encoding ="UTF-8")# Parse JSONdata <-fromJSON(data_json, flatten =TRUE)# Extract resultsarticles <- data$results# Convert to data framedf <- articles %>%select(title, section, published_date, byline, media)df |>head(n =3) |>gt()
title
section
published_date
byline
media
Trump Administration Begins Inquiries Into 3 Medical Schools in Show of Power
U.S.
2026-03-26
By Michael C. Bender and Alan Blinder
image, photo, Enrollment in medical programs is generally a fraction of that of undergraduate programs. At Stanford University, pictured, the incoming class this academic year had 119 students., David Madison, via Getty Images, 1, list(list(url = c("https://static01.nyt.com/images/2026/03/26/us/politics/dc-admissions-1/dc-admissions-1-thumbStandard.jpg", "https://static01.nyt.com/images/2026/03/26/us/politics/dc-admissions-1/dc-admissions-1-mediumThreeByTwo210.jpg", "https://static01.nyt.com/images/2026/03/26/us/politics/dc-admissions-1/dc-admissions-1-mediumThreeByTwo440.jpg"), format = c("Standard Thumbnail", "mediumThreeByTwo210", "mediumThreeByTwo440"), height = c(75, 140, 293), width = c(75, 210, 440)))
T.S.A. Tipped Off ICE Agents Before Arrests at San Francisco Airport
U.S.
2026-03-24
By Hamed Aleaziz and Heather Knight
image, photo, Angelina Lopez-Jimenez and her 9-year-old daughter were detained at the San Francisco International Airport and later deported to Guatemala., Jeff Chiu/Associated Press, 1, list(list(url = c("https://static01.nyt.com/images/2026/03/24/multimedia/24nat-sf-ice-placeholder-jkbp/24nat-sf-ice-placeholder-jkbp-thumbStandard.jpg", "https://static01.nyt.com/images/2026/03/24/multimedia/24nat-sf-ice-placeholder-jkbp/24nat-sf-ice-placeholder-jkbp-mediumThreeByTwo210.jpg", "https://static01.nyt.com/images/2026/03/24/multimedia/24nat-sf-ice-placeholder-jkbp/24nat-sf-ice-placeholder-jkbp-mediumThreeByTwo440.jpg"), format = c("Standard Thumbnail", "mediumThreeByTwo210", "mediumThreeByTwo440" ), height = c(75, 140, 293), width = c(75, 210, 440)))
Savannah Guthrie Says 2 Ransom Notes About Her Mother Were Likely Genuine
U.S.
2026-03-26
By Nicholas Bogel-Burroughs
A new dataframe is created that has an extra column Has_Image. A check looks for the length of the values within the ‘media’ column. If there is no image, the length should be 0, therefore the Has_Image value will be ‘No’. Otherwise, a non-zero length means there is an image.
Trump Administration Begins Inquiries Into 3 Medical Schools in Show of Power
U.S.
2026-03-26
By Michael C. Bender and Alan Blinder
image, photo, Enrollment in medical programs is generally a fraction of that of undergraduate programs. At Stanford University, pictured, the incoming class this academic year had 119 students., David Madison, via Getty Images, 1, list(list(url = c("https://static01.nyt.com/images/2026/03/26/us/politics/dc-admissions-1/dc-admissions-1-thumbStandard.jpg", "https://static01.nyt.com/images/2026/03/26/us/politics/dc-admissions-1/dc-admissions-1-mediumThreeByTwo210.jpg", "https://static01.nyt.com/images/2026/03/26/us/politics/dc-admissions-1/dc-admissions-1-mediumThreeByTwo440.jpg"), format = c("Standard Thumbnail", "mediumThreeByTwo210", "mediumThreeByTwo440"), height = c(75, 140, 293), width = c(75, 210, 440)))
Yes
T.S.A. Tipped Off ICE Agents Before Arrests at San Francisco Airport
U.S.
2026-03-24
By Hamed Aleaziz and Heather Knight
image, photo, Angelina Lopez-Jimenez and her 9-year-old daughter were detained at the San Francisco International Airport and later deported to Guatemala., Jeff Chiu/Associated Press, 1, list(list(url = c("https://static01.nyt.com/images/2026/03/24/multimedia/24nat-sf-ice-placeholder-jkbp/24nat-sf-ice-placeholder-jkbp-thumbStandard.jpg", "https://static01.nyt.com/images/2026/03/24/multimedia/24nat-sf-ice-placeholder-jkbp/24nat-sf-ice-placeholder-jkbp-mediumThreeByTwo210.jpg", "https://static01.nyt.com/images/2026/03/24/multimedia/24nat-sf-ice-placeholder-jkbp/24nat-sf-ice-placeholder-jkbp-mediumThreeByTwo440.jpg"), format = c("Standard Thumbnail", "mediumThreeByTwo210", "mediumThreeByTwo440" ), height = c(75, 140, 293), width = c(75, 210, 440)))
Yes
Savannah Guthrie Says 2 Ransom Notes About Her Mother Were Likely Genuine
U.S.
2026-03-26
By Nicholas Bogel-Burroughs
No
Finally, a bar chart shows that 18 out of 20 most viewed articles has image. Only 2 article out of 20 has no image, yet being most viewed.
# Create summary data for annotationsummary_counts <- df_clean %>%count(Has_Image)ggplot(df_clean, aes(x = Has_Image, fill = Has_Image)) +geom_bar(width =0.6, show.legend =FALSE) +# Add count labels on top of barsgeom_text(data = summary_counts,aes(x = Has_Image, y = n, label = n),vjust =-.1,size =3.5,fontface ="bold" ) +# Custom colorsscale_fill_manual(values =c("Yes"="#2C7FB8", "No"="#F03B20")) +labs(title ="Do Articles with Images Dominate the Most Popular List?",subtitle ="Comparison of NYT Most Viewed Articles (7-Day Period)",x ="Article Contains Image",y ="Number of Articles",caption ="Source: New York Times Most Popular API" )
Conclusion
The analysis shows that 18 out of 20 articles (90%) in the New York Times Most Popular list include images, while only 2 articles (10%) do not. This strong imbalance suggests that articles with images are far more prevalent among the most viewed content.
While this does not prove causation, it indicates that the presence of images is likely associated with higher engagement or visibility. Images may enhance reader interest, improve click-through rates, and make articles more appealing in digital formats.
However, it is important to note that this dataset only includes already “most popular” articles, meaning it reflects representation rather than direct impact. Other factors—such as topic, headline, or timing—also play significant roles in driving engagement.
Overall, the findings suggest that including images is a common characteristic of highly popular articles, and may be an important factor in capturing audience attention.
References
OpenAI. (2026, March 28). ChatGPT conversation with K. M. Qaiduzzaman on New York Times API data analysis in R. Retrieved March 28, 2026, from https://chat.openai.com/