Week 9 Assignment – Codebase

Author

Muhammad Suffyan Khan

Objective

The objective of this assignment is to use one of the public New York Times APIs in R, retrieve data through an authenticated API request, parse the JSON response, and transform the results into a clean tidy data frame for analysis.

For this assignment, I selected the New York Times Most Popular API. This API provides metadata for articles that are most viewed, most shared, or most emailed over a selected period. I chose this API because it returns article-level information in a relatively clean and structured format, making it appropriate for transformation into a tidy data frame in R.

The main question I plan to explore is:

Which New York Times sections produced the most-viewed articles in the last 7 days?

Selected API

The API selected for this assignment is the New York Times Most Popular API.

Endpoint chosen:
/svc/mostpopular/v2/viewed/7.json

This endpoint returns the most viewed New York Times articles over the last 7 days.

The API documentation is available through the New York Times Developer portal. To access the data, an API key is required. In this assignment, I will authenticate securely by storing the key in an environment variable and retrieving it in R with Sys.getenv("NYT_API_KEY") rather than hard-coding it directly in the script.

Planned Workflow

The workflow for this assignment will be:

Load required libraries such as httr, jsonlite, and tidyverse
Retrieve the API key using Sys.getenv()
Make a GET request to the API endpoint
Parse the JSON response
Extract the results section containing article data
Convert the data into a tibble
Select relevant columns such as title, section, subsection, published_date, and source
Perform a simple analysis to count articles by section

Anticipated Data Cleaning Decisions

Although the Most Popular API is cleaner than some other API options, there are still a few data-cleaning decisions that may be necessary.

First, some fields may contain missing values, especially subsection or other optional metadata. These will either be kept as NA values or replaced only if a specific transformation is needed for analysis.

Second, the JSON response may include nested structures, particularly for multimedia information. Since the primary question focuses on article sections rather than media content, I may exclude deeply nested multimedia fields from the main tidy data frame unless they are needed.

Third, date fields such as published_date and updated may need to be converted into appropriate date or date-time formats.

Finally, I will retain only the columns that are relevant for identifying, grouping, and summarizing the articles so that the final data frame remains tidy and easy to interpret.

Expected Outcome

The expected outcome is a reproducible R workflow that retrieves New York Times article metadata from the Most Popular API and transforms it into a clean tidy data frame.

Using that cleaned data, I expect to identify which news sections are most represented among the most viewed New York Times articles over the past 7 days. This will demonstrate both technical API handling in R and a basic exploratory analysis of the returned data.

Load Libraries

library(httr)
library(jsonlite)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ purrr::flatten() masks jsonlite::flatten()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Access API Key

api_key <- Sys.getenv("NYT_API_KEY")

if (api_key == "") {
  stop("NYT_API_KEY not found. Please store your API key in an environment variable.")
}

Make API Request

base_url <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json"

response <- GET(
  url = base_url,
  query = list(`api-key` = api_key)
)

stop_for_status(response)

Parse JSON Response

nyt_text <- content(response, as = "text", encoding = "UTF-8")
nyt_json <- fromJSON(nyt_text, flatten = TRUE)

Inspect Response Structure

names(nyt_json)

[1] "status"      "copyright"   "num_results" "results"

nyt_json$status

[1] "OK"

nyt_json$num_results

[1] 20

Convert Results to a Tidy Data Frame

nyt_df <- as_tibble(nyt_json$results) %>%
  select(
    url,
    section,
    subsection,
    byline,
    type,
    title,
    abstract,
    published_date,
    source,
    asset_id,
    updated,
    nytdsection,
    adx_keywords
  ) %>%
  mutate(
    published_date = as.Date(published_date),
    updated = as.POSIXct(updated, format = "%Y-%m-%d %H:%M:%S"),
    subsection = na_if(subsection, "")
  )

Inspecting Data

glimpse(nyt_df)

Rows: 20
Columns: 13
$ url            <chr> "https://www.nytimes.com/2026/03/24/us/tsa-data-ice-dep…
$ section        <chr> "U.S.", "U.S.", "Opinion", "U.S.", "U.S.", "U.S.", "U.S…
$ subsection     <chr> NA, "Politics", NA, "Politics", NA, "Politics", "Politi…
$ byline         <chr> "By Hamed Aleaziz and Heather Knight", "By David W. Che…
$ type           <chr> "Article", "Article", "Article", "Article", "Article", …
$ title          <chr> "T.S.A. Tipped Off ICE Agents Before Arrests at San Fra…
$ abstract       <chr> "Transportation Security Administration officials told …
$ published_date <date> 2026-03-24, 2026-03-24, 2026-03-22, 2026-03-22, 2026-0…
$ source         <chr> "New York Times", "New York Times", "New York Times", "…
$ asset_id       <dbl> 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14, 1e+14,…
$ updated        <dttm> 2026-03-25 23:08:53, 2026-03-25 23:16:17, 2026-03-23 1…
$ nytdsection    <chr> "u.s.", "u.s.", "opinion", "u.s.", "u.s.", "u.s.", "u.s…
$ adx_keywords   <chr> "Immigration and Emigration;Illegal Immigration;interna…

section_counts <- nyt_df %>%
  count(section, sort = TRUE)

section_counts

# A tibble: 7 × 2
  section        n
  <chr>      <int>
1 U.S.          12
2 Opinion        3
3 Business       1
4 Magazine       1
5 New York       1
6 Technology     1
7 Well           1

Tidy Data Frame

nyt_df

# A tibble: 20 × 13
   url      section subsection byline type  title abstract published_date source
   <chr>    <chr>   <chr>      <chr>  <chr> <chr> <chr>    <date>         <chr> 
 1 https:/… U.S.    <NA>       By Ha… Arti… T.S.… Transpo… 2026-03-24     New Y…
 2 https:/… U.S.    Politics   By Da… Arti… Flor… Emily G… 2026-03-24     New Y…
 3 https:/… Opinion <NA>       By Ph… Arti… Trum… America… 2026-03-22     New Y…
 4 https:/… U.S.    Politics   By Er… Arti… ICE … Tom Hom… 2026-03-22     New Y…
 5 https:/… U.S.    <NA>       By El… Arti… ‘I’m… He mast… 2026-03-24     New Y…
 6 https:/… U.S.    Politics   By Mi… Arti… Trum… The Jus… 2026-03-26     New Y…
 7 https:/… U.S.    Politics   By Ju… Arti… Saud… Crown P… 2026-03-24     New Y…
 8 https:/… Busine… <NA>       By Ma… Arti… How … The Wal… 2026-03-23     New Y…
 9 https:/… Magazi… <NA>       By Je… Inte… ‘A M… Forty-t… 2026-03-23     New Y…
10 https:/… U.S.    <NA>       By Em… Inte… Trac… Travele… 2026-03-23     New Y…
11 https:/… U.S.    Politics   By Ti… Arti… Andy… Mr. Bes… 2026-03-21     New Y…
12 https:/… New Yo… <NA>       By Th… Arti… Dead… Two pil… 2026-03-23     New Y…
13 https:/… U.S.    Politics   By Er… Arti… Wild… Preside… 2026-03-28     New Y…
14 https:/… U.S.    <NA>       By Gr… Arti… Hegs… Defense… 2026-03-27     New Y…
15 https:/… Well    Move       By Hi… Arti… This… Buildin… 2026-03-25     New Y…
16 https:/… U.S.    <NA>       By Ka… Arti… Greg… He was … 2026-03-24     New Y…
17 https:/… Opinion <NA>       By Br… Arti… The … How the… 2026-03-24     New Y…
18 https:/… Opinion <NA>       By Ly… Arti… It’s… The cou… 2026-03-26     New Y…
19 https:/… U.S.    <NA>       By Ni… Arti… Sava… Her int… 2026-03-26     New Y…
20 https:/… Techno… <NA>       By Ce… Arti… Meta… A jury … 2026-03-25     New Y…
# ℹ 4 more variables: asset_id <dbl>, updated <dttm>, nytdsection <chr>,
#   adx_keywords <chr>

Check Missing Values

nyt_df %>%
  summarise(
    missing_subsection = sum(is.na(subsection)),
    missing_byline = sum(is.na(byline)),
    missing_abstract = sum(is.na(abstract))
  )

# A tibble: 1 × 3
  missing_subsection missing_byline missing_abstract
               <int>          <int>            <int>
1                 13              0                0

Data Cleaning Notes

The main data-cleaning decisions were:

The API key was stored in an environment variable and retrieved with Sys.getenv("NYT_API_KEY") instead of being hard coded.
The JSON response was parsed and the results section was extracted because it contains the article-level records.
Only relevant columns were selected for the tidy data frame.
The published_date column was converted to date format.
Blank values in subsection were converted to NA.
Deeply nested fields such as multimedia metadata were not included because they were not needed for this analysis.

Analysis: Articles by Section

section_counts <- nyt_df %>%
  count(section, sort = TRUE)

section_counts

# A tibble: 7 × 2
  section        n
  <chr>      <int>
1 U.S.          12
2 Opinion        3
3 Business       1
4 Magazine       1
5 New York       1
6 Technology     1
7 Well           1

Top 5 Sections

top_5_sections <- section_counts %>%
  slice_head(n = 5)

top_5_sections

# A tibble: 5 × 2
  section      n
  <chr>    <int>
1 U.S.        12
2 Opinion      3
3 Business     1
4 Magazine     1
5 New York     1

Visualization

section_counts %>%
  ggplot(aes(x = reorder(section, n), y = n)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Most Viewed NYT Articles by Section",
    x = "Section",
    y = "Number of Articles"
  )

Brief Interpretation

The results show that the U.S. section had the highest number of articles in the Most Popular API response, with 12 out of 20 articles. The Opinion section followed with 3 articles, while all other sections appeared only once. This suggests that U.S. news had the strongest representation among the most viewed New York Times articles during the last 7 days.

Conclusion

This assignment used the New York Times Most Popular API in R to retrieve article metadata for the most viewed articles over the last 7 days. The JSON response was parsed and converted into a tidy data frame, and the results were analyzed by section.

The final tidy data frame provides a clean article-level dataset that can be used for simple exploratory analysis of article popularity across New York Times sections.