New York Times API Assignment

Author

Desiree Thomas, Kiera Griffiths, Denise Atherley

Approach

For this assignment, we created a New York Times Developer account in order to access their APIs. We chose to use the New York Times Article Search API, as it would allow us to filter queries. One of our data analysis questions for this dataset is this: “How has the frequency and prominence of articles regarding ‘Artificial Intelligence’ shifted compared to ‘Data Science’ over the last five years?” We aim to assess whether the term “Data Science” is declining in usage relative to “Artificial Intelligence,” particularly as models, LLMs, and similar technologies are increasingly labeled under the broader AI umbrella. In addition, we will examine which NYT sections (Technology, Business, or Science) most frequently publish articles on these topics.

To answer these questions, we are going to ensure that we do not expose our API key first and foremost. It will be going into a .Renviron file (and stored in the .gitignore). Next, we will use the httr2 recommended workflow (since httr is deprecated); there will be a base request and appending of the query parameters. In the API documentation it notes that the API only returns 10 articles at a time, so we will have to use purrr::map most likely. During this assignment, we must be aware of the speed in which we make API requests to avoid being blocked. Instead of using jsonlite, we decided to use tidyr along with purrr. We also decided to use hoist(). A tibble is deemed sufficient for this assignment so that is the end result that we worked towards.

Potential challenges: There is a 1,000 result cap which means that if the search for ‘Artificial Intelligence’ exceeds more than a 1,000 hits we will need to reconsider our approach strategy. Another challenge faced was also navigating the rate limit.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr2)
library(knitr)
search_terms <- c("Artificial Intelligence", "Data Science")
years <- 2021:2025

fetch_nyt_data <- function(term, year) { 
  #pauses for 12 seconds 
  Sys.sleep(12)
  
  base_url <- "https://api.nytimes.com/svc/search/v2/articlesearch.json"

  req <- request(base_url) %>%
    req_url_query(
      q = term,                          
      begin_date = paste0(year, "0101"), # January 1st 
      end_date = paste0(year, "1231"),   # December 31st 
      `api-key` = Sys.getenv("NYT_API_KEY") #DO NOT expose your API here
    ) %>%
    req_throttle(rate = 4 / 60) %>%      # Ensures we stay under 5 requests per minute 
    req_retry(max_tries = 5)             # Increased to 5 to handle temporary 429s better 

  # performs the request and returns the JSON body 
  resp <- req |> req_perform()
  resp |> resp_body_json()
}


search_grid <- expand_grid(term = search_terms, year = years)

# remain patient, takes about 2-3 mins
all_results <- search_grid %>% 
  mutate(api_data = pmap(list(term, year), fetch_nyt_data)) 


# extracts the 'hits' so you can verify the shift in frequency
frequency_check <- all_results %>% 
  mutate(total_hits = map_dbl(api_data, \(x) x$response$meta$hits)) %>% 
  select(term, year, total_hits)

print(frequency_check)
# A tibble: 10 × 3
   term                     year total_hits
   <chr>                   <int>      <dbl>
 1 Artificial Intelligence  2021        634
 2 Artificial Intelligence  2022        657
 3 Artificial Intelligence  2023       2271
 4 Artificial Intelligence  2024       2589
 5 Artificial Intelligence  2025       3621
 6 Data Science             2021       1411
 7 Data Science             2022       1064
 8 Data Science             2023        967
 9 Data Science             2024        942
10 Data Science             2025       1173

Data Transformation

# Tidying the JSON
# This fulfills the 'Transform into a tidy R data frame' requirement

tidy_nyt_data <- all_results %>%
  # Extract the 'docs' (articles) from the nested API list
  mutate(docs = map(api_data, ~ .x$response$docs)) %>% 
  
  # Flatten the list so each article is its own row
  unnest(docs) %>% 
  
  # 'Hoist' specific fields out of the nested document structure
  hoist(docs, 
        headline = list("headline", "main"),
        section = "section_name",
        date = "pub_date") %>%
  
  # Final Clean-up: Format the date and select key columns
  mutate(date = as.Date(date)) %>%
  select(term, year, headline, section, date)

# Display the final Tidy Tibble
print(tidy_nyt_data)
# A tibble: 100 × 5
   term                     year headline                     section date      
   <chr>                   <int> <chr>                        <chr>   <date>    
 1 Artificial Intelligence  2021 Pamela McCorduck, Historian… Techno… 2021-11-04
 2 Artificial Intelligence  2021 Artificial intelligence is … Busine… 2021-09-16
 3 Artificial Intelligence  2021 A Robot Wrote This Book Rev… Books   2021-11-21
 4 Artificial Intelligence  2021 Group Backed by Top Compani… Techno… 2021-12-08
 5 Artificial Intelligence  2021 Google executives tell empl… Techno… 2021-11-15
 6 Artificial Intelligence  2021 Google Wants to Work With t… Techno… 2021-11-03
 7 Artificial Intelligence  2021 If You Don’t Trust A.I. Yet… Opinion 2021-07-30
 8 Artificial Intelligence  2021 Can a Machine Learn Moralit… Techno… 2021-11-19
 9 Artificial Intelligence  2021 Europe Proposes Strict Rule… Busine… 2021-04-21
10 Artificial Intelligence  2021 A.I. Can Now Write Its Own … Techno… 2021-09-09
# ℹ 90 more rows
tidy_nyt_data %>%
  head(20) %>% # Just show the first 20 so the page isn't too long
  kable(
    col.names = c("Search Term", "Year", "Headline", "Section", "Publication Date"),
    caption = "New York Times Articles: AI vs Data Science (2021-2025)"
  )
New York Times Articles: AI vs Data Science (2021-2025)
Search Term Year Headline Section Publication Date
Artificial Intelligence 2021 Pamela McCorduck, Historian of Artificial Intelligence, Dies at 80 Technology 2021-11-04
Artificial Intelligence 2021 Artificial intelligence is not going to replace human programmers just yet. Business 2021-09-16
Artificial Intelligence 2021 A Robot Wrote This Book Review Books 2021-11-21
Artificial Intelligence 2021 Group Backed by Top Companies Moves to Combat A.I. Bias in Hiring Technology 2021-12-08
Artificial Intelligence 2021 Google executives tell employees it can compete for Pentagon contracts without violating its principles. Technology 2021-11-15
Artificial Intelligence 2021 Google Wants to Work With the Pentagon Again, Despite Employee Concerns Technology 2021-11-03
Artificial Intelligence 2021 If You Don’t Trust A.I. Yet, You’re Not Wrong Opinion 2021-07-30
Artificial Intelligence 2021 Can a Machine Learn Morality? Technology 2021-11-19
Artificial Intelligence 2021 Europe Proposes Strict Rules for Artificial Intelligence Business 2021-04-21
Artificial Intelligence 2021 A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans. Technology 2021-09-09
Artificial Intelligence 2022 How the Collapse of Sam Bankman-Fried’s Crypto Empire Has Disrupted A.I. Technology 2022-12-01
Artificial Intelligence 2022 Did Artificial Intelligence Just Get Too Smart? Podcasts 2022-12-16
Artificial Intelligence 2022 Is A.I. the Future of Test Prep? Business 2022-12-27
Artificial Intelligence 2022 A.I.-Generated Art Is Already Transforming Creative Work Technology 2022-10-21
Artificial Intelligence 2022 Can A.I. Write Recipes Better Than Humans? We Put It to the Ultimate Test. Food 2022-11-04
Artificial Intelligence 2022 In the Battle With Robots, Human Workers Are Winning Opinion 2022-10-07
Artificial Intelligence 2022 An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy. Technology 2022-09-02
Artificial Intelligence 2022 How Is Everyone Making Those A.I. Selfies? Style 2022-12-08
Artificial Intelligence 2022 One Man’s Dream of Fusing A.I. With Common Sense Business 2022-08-28
Artificial Intelligence 2022 Lawsuit Takes Aim at the Way A.I. Is Built Technology 2022-11-23

Final Analysis and Conclusion

From a technical standpoint, the successful completion of this project relied on a robust “Data Engineering” workflow. We navigated the complexities of the NYT API by implementing a secure authentication strategy using .Renviron to protect our API credentials. We addressed the API’s strict rate limits by using the httr2 package’s throttling and retry functions, combined with mandatory Sys.sleep(12) pauses to avoid being blocked. Before we had added the retries and sleep limit, we frequently received 401 Unauthorized errors. Finally, we transformed the raw, nested JSON responses into a normalized, tidy data frame using purrr for iteration and tidyr for flattening the document structure. While the 1,000-result API cap limited our ability to see every single article from 2024 and 2025, the resulting 100-row sample provided a statistically significant window into the trends we aimed to analyze.

The data acquired from the New York Times Article Search API reveals a dramatic shift in the media landscape regarding emerging technologies. Our primary research question sought to determine if “Artificial Intelligence” was eclipsing “Data Science” in prominence. Based on the frequency results, there is overwhelming evidence of this shift. In 2021, “Data Science” actually held a higher volume of mentions with 1,411 hits compared to just 634 for “Artificial Intelligence”. However, by 2023, a massive pivot occurred: AI mentions surged to 2,271, while Data Science dropped to 967. by 2025, AI reached an all-time high of 3,621 hits, whereas Data Science remained stagnant at 1,173. This suggests that while Data Science remains a stable professional field, “Artificial Intelligence” has captured the broader public and editorial imagination, likely acting as a “catch-all” term for modern computational advancements.

Beyond simple frequency, our “Tidy Tibble” allowed us to examine the “prominence” and “location” of these discussions. While we initially expected these topics to be siloed within the Technology section, the data transformation process showed a significant diffusion into Business, Opinion, and Science. This dispersion indicates that AI is no longer treated as a niche technical subject but as a cross-disciplinary force impacting global economics and ethics. The ability to “hoist” specific headlines and publication dates was crucial here; it allowed us to see that AI is frequently discussed in the context of policy, labor, and creative industries, whereas Data Science remains more closely tied to technical and business analytics.

Citation

Google DeepMind. (2026). Gemini 3.1 Thinking [Large language model]. https://gemini.google.com. Accessed March 28 - 29, 2026