The objective of this assignment is to use one of the public New York Times APIs in R, retrieve data through an authenticated API request, parse the JSON response, and transform the results into a clean tidy data frame for analysis.
For this assignment, I selected the New York Times Most Popular API. This API provides metadata for articles that are most viewed, most shared, or most emailed over a selected period. I chose this API because it returns article-level information in a relatively clean and structured format, making it appropriate for transformation into a tidy data frame in R.
The main question I plan to explore is:
Which New York Times sections produced the most-viewed articles in the last 7 days?
Selected API
The API selected for this assignment is the New York Times Most Popular API.
This endpoint returns the most viewed New York Times articles over the last 7 days.
The API documentation is available through the New York Times Developer portal. To access the data, an API key is required. In this assignment, I will authenticate securely by storing the key in an environment variable and retrieving it in R with Sys.getenv("NYT_API_KEY") rather than hard-coding it directly in the script.
Planned Workflow
The workflow for this assignment will be:
Load required libraries such as httr, jsonlite, and tidyverse
Retrieve the API key using Sys.getenv()
Make a GET request to the API endpoint
Parse the JSON response
Extract the results section containing article data
Convert the data into a tibble
Select relevant columns such as title, section, subsection, published_date, and source
Perform a simple analysis to count articles by section
Anticipated Data Cleaning Decisions
Although the Most Popular API is cleaner than some other API options, there are still a few data-cleaning decisions that may be necessary.
First, some fields may contain missing values, especially subsection or other optional metadata. These will either be kept as NA values or replaced only if a specific transformation is needed for analysis.
Second, the JSON response may include nested structures, particularly for multimedia information. Since the primary question focuses on article sections rather than media content, I may exclude deeply nested multimedia fields from the main tidy data frame unless they are needed.
Third, date fields such as published_date and updated may need to be converted into appropriate date or date-time formats.
Finally, I will retain only the columns that are relevant for identifying, grouping, and summarizing the articles so that the final data frame remains tidy and easy to interpret.
Expected Outcome
The expected outcome is a reproducible R workflow that retrieves New York Times article metadata from the Most Popular API and transforms it into a clean tidy data frame.
Using that cleaned data, I expect to identify which news sections are most represented among the most viewed New York Times articles over the past 7 days. This will demonstrate both technical API handling in R and a basic exploratory analysis of the returned data.
Load Libraries
library(httr)library(jsonlite)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.1 ✔ tibble 3.3.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Access API Key
api_key <-Sys.getenv("NYT_API_KEY")if (api_key =="") {stop("NYT_API_KEY not found. Please store your API key in an environment variable.")}
# A tibble: 7 × 2
section n
<chr> <int>
1 U.S. 12
2 Opinion 3
3 Business 1
4 Magazine 1
5 New York 1
6 Technology 1
7 Well 1
Tidy Data Frame
nyt_df
# A tibble: 20 × 13
url section subsection byline type title abstract published_date source
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <date> <chr>
1 https:/… U.S. <NA> By Ha… Arti… T.S.… Transpo… 2026-03-24 New Y…
2 https:/… U.S. Politics By Da… Arti… Flor… Emily G… 2026-03-24 New Y…
3 https:/… Opinion <NA> By Ph… Arti… Trum… America… 2026-03-22 New Y…
4 https:/… U.S. Politics By Er… Arti… ICE … Tom Hom… 2026-03-22 New Y…
5 https:/… U.S. <NA> By El… Arti… ‘I’m… He mast… 2026-03-24 New Y…
6 https:/… U.S. Politics By Mi… Arti… Trum… The Jus… 2026-03-26 New Y…
7 https:/… U.S. Politics By Ju… Arti… Saud… Crown P… 2026-03-24 New Y…
8 https:/… Busine… <NA> By Ma… Arti… How … The Wal… 2026-03-23 New Y…
9 https:/… Magazi… <NA> By Je… Inte… ‘A M… Forty-t… 2026-03-23 New Y…
10 https:/… U.S. <NA> By Em… Inte… Trac… Travele… 2026-03-23 New Y…
11 https:/… U.S. Politics By Ti… Arti… Andy… Mr. Bes… 2026-03-21 New Y…
12 https:/… New Yo… <NA> By Th… Arti… Dead… Two pil… 2026-03-23 New Y…
13 https:/… U.S. Politics By Er… Arti… Wild… Preside… 2026-03-28 New Y…
14 https:/… U.S. <NA> By Gr… Arti… Hegs… Defense… 2026-03-27 New Y…
15 https:/… Well Move By Hi… Arti… This… Buildin… 2026-03-25 New Y…
16 https:/… U.S. <NA> By Ka… Arti… Greg… He was … 2026-03-24 New Y…
17 https:/… Opinion <NA> By Br… Arti… The … How the… 2026-03-24 New Y…
18 https:/… Opinion <NA> By Ly… Arti… It’s… The cou… 2026-03-26 New Y…
19 https:/… U.S. <NA> By Ni… Arti… Sava… Her int… 2026-03-26 New Y…
20 https:/… Techno… <NA> By Ce… Arti… Meta… A jury … 2026-03-25 New Y…
# ℹ 4 more variables: asset_id <dbl>, updated <dttm>, nytdsection <chr>,
# adx_keywords <chr>
# A tibble: 5 × 2
section n
<chr> <int>
1 U.S. 12
2 Opinion 3
3 Business 1
4 Magazine 1
5 New York 1
Visualization
section_counts %>%ggplot(aes(x =reorder(section, n), y = n)) +geom_col() +coord_flip() +labs(title ="Most Viewed NYT Articles by Section",x ="Section",y ="Number of Articles" )
Brief Interpretation
The results show that the U.S. section had the highest number of articles in the Most Popular API response, with 12 out of 20 articles. The Opinion section followed with 3 articles, while all other sections appeared only once. This suggests that U.S. news had the strongest representation among the most viewed New York Times articles during the last 7 days.
Conclusion
This assignment used the New York Times Most Popular API in R to retrieve article metadata for the most viewed articles over the last 7 days. The JSON response was parsed and converted into a tidy data frame, and the results were analyzed by section.
The final tidy data frame provides a clean article-level dataset that can be used for simple exploratory analysis of article popularity across New York Times sections.