For this assignment, we created a New York Times Developer account in order to access their APIs. We chose to use the New York Times Article Search API, as it would allow us to filter queries. One of our data analysis questions for this dataset is this: “How has the frequency and prominence of articles regarding ‘Artificial Intelligence’ shifted compared to ‘Data Science’ over the last five years?” We aim to assess whether the term “Data Science” is declining in usage relative to “Artificial Intelligence,” particularly as models, LLMs, and similar technologies are increasingly labeled under the broader AI umbrella. In addition, we will examine which NYT sections (Technology, Business, or Science) most frequently publish articles on these topics.
To answer these questions, we are going to ensure that we do not expose our API key first and foremost. It will be going into a .Renviron file (and stored in the .gitignore). Next, we will use the httr2 recommended workflow (since httr is deprecated); there will be a base request and appending of the query parameters. In the API documentation it notes that the API only returns 10 articles at a time, so we will have to use purrr::map most likely. During this assignment, we must be aware of the speed in which we make API requests to avoid being blocked. Instead of using jsonlite, we decided to use tidyr along with purrr. We also decided to use hoist(). A tibble is deemed sufficient for this assignment so that is the end result that we worked towards.
Potential challenges: There is a 1,000 result cap which means that if the search for ‘Artificial Intelligence’ exceeds more than a 1,000 hits we will need to reconsider our approach strategy. Another challenge faced was also navigating the rate limit.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.2.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr2)library(knitr)
search_terms <-c("Artificial Intelligence", "Data Science")years <-2021:2025fetch_nyt_data <-function(term, year) { #pauses for 12 seconds Sys.sleep(12) base_url <-"https://api.nytimes.com/svc/search/v2/articlesearch.json" req <-request(base_url) %>%req_url_query(q = term, begin_date =paste0(year, "0101"), # January 1st end_date =paste0(year, "1231"), # December 31st `api-key`=Sys.getenv("NYT_API_KEY") #DO NOT expose your API here ) %>%req_throttle(rate =4/60) %>%# Ensures we stay under 5 requests per minute req_retry(max_tries =5) # Increased to 5 to handle temporary 429s better # performs the request and returns the JSON body resp <- req |>req_perform() resp |>resp_body_json()}search_grid <-expand_grid(term = search_terms, year = years)# remain patient, takes about 2-3 minsall_results <- search_grid %>%mutate(api_data =pmap(list(term, year), fetch_nyt_data)) # extracts the 'hits' so you can verify the shift in frequencyfrequency_check <- all_results %>%mutate(total_hits =map_dbl(api_data, \(x) x$response$meta$hits)) %>%select(term, year, total_hits)print(frequency_check)
# A tibble: 10 × 3
term year total_hits
<chr> <int> <dbl>
1 Artificial Intelligence 2021 634
2 Artificial Intelligence 2022 657
3 Artificial Intelligence 2023 2271
4 Artificial Intelligence 2024 2589
5 Artificial Intelligence 2025 3621
6 Data Science 2021 1411
7 Data Science 2022 1064
8 Data Science 2023 967
9 Data Science 2024 942
10 Data Science 2025 1173
Data Transformation
# Tidying the JSON# This fulfills the 'Transform into a tidy R data frame' requirementtidy_nyt_data <- all_results %>%# Extract the 'docs' (articles) from the nested API listmutate(docs =map(api_data, ~ .x$response$docs)) %>%# Flatten the list so each article is its own rowunnest(docs) %>%# 'Hoist' specific fields out of the nested document structurehoist(docs, headline =list("headline", "main"),section ="section_name",date ="pub_date") %>%# Final Clean-up: Format the date and select key columnsmutate(date =as.Date(date)) %>%select(term, year, headline, section, date)# Display the final Tidy Tibbleprint(tidy_nyt_data)
# A tibble: 100 × 5
term year headline section date
<chr> <int> <chr> <chr> <date>
1 Artificial Intelligence 2021 Pamela McCorduck, Historian… Techno… 2021-11-04
2 Artificial Intelligence 2021 Artificial intelligence is … Busine… 2021-09-16
3 Artificial Intelligence 2021 A Robot Wrote This Book Rev… Books 2021-11-21
4 Artificial Intelligence 2021 Group Backed by Top Compani… Techno… 2021-12-08
5 Artificial Intelligence 2021 Google executives tell empl… Techno… 2021-11-15
6 Artificial Intelligence 2021 Google Wants to Work With t… Techno… 2021-11-03
7 Artificial Intelligence 2021 If You Don’t Trust A.I. Yet… Opinion 2021-07-30
8 Artificial Intelligence 2021 Can a Machine Learn Moralit… Techno… 2021-11-19
9 Artificial Intelligence 2021 Europe Proposes Strict Rule… Busine… 2021-04-21
10 Artificial Intelligence 2021 A.I. Can Now Write Its Own … Techno… 2021-09-09
# ℹ 90 more rows
tidy_nyt_data %>%head(20) %>%# Just show the first 20 so the page isn't too longkable(col.names =c("Search Term", "Year", "Headline", "Section", "Publication Date"),caption ="New York Times Articles: AI vs Data Science (2021-2025)" )
New York Times Articles: AI vs Data Science (2021-2025)
Search Term
Year
Headline
Section
Publication Date
Artificial Intelligence
2021
Pamela McCorduck, Historian of Artificial Intelligence, Dies at 80
Technology
2021-11-04
Artificial Intelligence
2021
Artificial intelligence is not going to replace human programmers just yet.
Business
2021-09-16
Artificial Intelligence
2021
A Robot Wrote This Book Review
Books
2021-11-21
Artificial Intelligence
2021
Group Backed by Top Companies Moves to Combat A.I. Bias in Hiring
Technology
2021-12-08
Artificial Intelligence
2021
Google executives tell employees it can compete for Pentagon contracts without violating its principles.
Technology
2021-11-15
Artificial Intelligence
2021
Google Wants to Work With the Pentagon Again, Despite Employee Concerns
Technology
2021-11-03
Artificial Intelligence
2021
If You Don’t Trust A.I. Yet, You’re Not Wrong
Opinion
2021-07-30
Artificial Intelligence
2021
Can a Machine Learn Morality?
Technology
2021-11-19
Artificial Intelligence
2021
Europe Proposes Strict Rules for Artificial Intelligence
Business
2021-04-21
Artificial Intelligence
2021
A.I. Can Now Write Its Own Computer Code. That’s Good News for Humans.
Technology
2021-09-09
Artificial Intelligence
2022
How the Collapse of Sam Bankman-Fried’s Crypto Empire Has Disrupted A.I.
Technology
2022-12-01
Artificial Intelligence
2022
Did Artificial Intelligence Just Get Too Smart?
Podcasts
2022-12-16
Artificial Intelligence
2022
Is A.I. the Future of Test Prep?
Business
2022-12-27
Artificial Intelligence
2022
A.I.-Generated Art Is Already Transforming Creative Work
Technology
2022-10-21
Artificial Intelligence
2022
Can A.I. Write Recipes Better Than Humans? We Put It to the Ultimate Test.
Food
2022-11-04
Artificial Intelligence
2022
In the Battle With Robots, Human Workers Are Winning
Opinion
2022-10-07
Artificial Intelligence
2022
An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy.
Technology
2022-09-02
Artificial Intelligence
2022
How Is Everyone Making Those A.I. Selfies?
Style
2022-12-08
Artificial Intelligence
2022
One Man’s Dream of Fusing A.I. With Common Sense
Business
2022-08-28
Artificial Intelligence
2022
Lawsuit Takes Aim at the Way A.I. Is Built
Technology
2022-11-23
Final Analysis and Conclusion
From a technical standpoint, the successful completion of this project relied on a robust “Data Engineering” workflow. We navigated the complexities of the NYT API by implementing a secure authentication strategy using .Renviron to protect our API credentials. We addressed the API’s strict rate limits by using the httr2 package’s throttling and retry functions, combined with mandatory Sys.sleep(12) pauses to avoid being blocked. Before we had added the retries and sleep limit, we frequently received 401 Unauthorized errors. Finally, we transformed the raw, nested JSON responses into a normalized, tidy data frame using purrr for iteration and tidyr for flattening the document structure. While the 1,000-result API cap limited our ability to see every single article from 2024 and 2025, the resulting 100-row sample provided a statistically significant window into the trends we aimed to analyze.
The data acquired from the New York Times Article Search API reveals a dramatic shift in the media landscape regarding emerging technologies. Our primary research question sought to determine if “Artificial Intelligence” was eclipsing “Data Science” in prominence. Based on the frequency results, there is overwhelming evidence of this shift. In 2021, “Data Science” actually held a higher volume of mentions with 1,411 hits compared to just 634 for “Artificial Intelligence”. However, by 2023, a massive pivot occurred: AI mentions surged to 2,271, while Data Science dropped to 967. by 2025, AI reached an all-time high of 3,621 hits, whereas Data Science remained stagnant at 1,173. This suggests that while Data Science remains a stable professional field, “Artificial Intelligence” has captured the broader public and editorial imagination, likely acting as a “catch-all” term for modern computational advancements.
Beyond simple frequency, our “Tidy Tibble” allowed us to examine the “prominence” and “location” of these discussions. While we initially expected these topics to be siloed within the Technology section, the data transformation process showed a significant diffusion into Business, Opinion, and Science. This dispersion indicates that AI is no longer treated as a niche technical subject but as a cross-disciplinary force impacting global economics and ethics. The ability to “hoist” specific headlines and publication dates was crucial here; it allowed us to see that AI is frequently discussed in the context of policy, labor, and creative industries, whereas Data Science remains more closely tied to technical and business analytics.
Citation
Google DeepMind. (2026). Gemini 3.1 Thinking [Large language model]. https://gemini.google.com. Accessed March 28 - 29, 2026