SPS_Data607_Week9_DC

Author

David Chen

Assignment: Using a New York Times API in R

The New York Times provides multiple public APIs via its developer portal (NYT Developer Network). To use any NYT API, you must first create an account and obtain an API key.

Your task:

  1. Choose one New York Times API endpoint (e.g., Article Search, Most Popular, Books, etc.).  You should use your data analysis to ask and attempt to answer an interesting question of your choosing.  Examples:  “What are the top five best selling hard cover books?”, “Which newspaper sections have produced the most popular articles, and how has this changed compared to X years ago?”, etc.

  2. In R, write code to:

-    Authenticate using your API key.
Never hard code an API key. 
There are a number of good ways to do this. 
For example: Store it in an environment variable 
(e.g., `Sys.getenv("NYT_API_KEY"`)).

-    Make a request to the endpoint

-    Parse the **JSON** response

-    Transform the result into a clean **R data frame** (tibble is fine)

Deliverable

-    The API you selected (endpoint + brief description)

-    The request you made (parameters used)

-    Code that returns a tidy data frame

-    A short note describing any data-cleaning decisions (e.g., nested fields, missing values)

Approach

  • Talk to the The New York Times API

  • Handle JSON data (API responses)

  • Clean and analyze data

  • Make charts

Setup

httr → sends a request to NYT (like opening a webpage automatically)

jsonlite → converts API response into table format

dplyr → lets you filter/select columns easily

ggplot2 → creates graphs

#install.packages(c("httr", "jsonlite", "dplyr", "ggplot2"))
knitr::opts_chunk$set(cache = TRUE)
library(httr)
library(jsonlite)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)

Set API key:

Sys.setenv(NYT_API_KEY = "DGkAWyGbtH9guBDRY0seBsnGrdb8Lkx9s2qYsEgyNXTYLT1T")

API Function

creating a reusable function that:

  1. asks the New York Times Article Search API for data

    • query = what you search (e.g., “China”)

    • begin_date / end_date = time range

    • page = which batch of results

  2. returns articles as a table

nyt_search <- function(query, begin_date, end_date, page = 0) {
  api_key <- Sys.getenv("NYT_API_KEY")
  
  url <- "https://api.nytimes.com/svc/search/v2/articlesearch.json"
  
  res <- GET(url, query = list(
    q = query,
    begin_date = begin_date,
    end_date = end_date,
    page = page,
    `api-key` = api_key
  ))
  
  nyt_content <- content(res, as = "text", encoding = "UTF-8")
  #print(nyt_content)
  json <- fromJSON(nyt_content, flatten = TRUE)
  
  return(json$response$docs)
}

Data Collection

The API does NOT give all results at once

Each request = 10 articles Need a loop to get more

Create a data container with a loop to collect and store all results from API calls. Implement a delay (timer) between requests to prevent the NYT API from rate-limiting or blocking the API key.

all_articles <- list()

for (p in 0:50) {
  #cat("Fetching page:", p, "\n")
  tmp <- nyt_search("China", 19290101, 19291231, p)
  
  if (length(tmp) == 0) break
  all_articles[[p + 1]] <- tmp
  
  Sys.sleep(15) # the rate limit is 5 req per 60s
}
Error in `curl::curl_fetch_memory()`:
! Could not connect to server [api.nytimes.com]:
Recv failure: Connection was reset
df <- bind_rows(all_articles)

Data Cleaning

Raw API data is untidy

  • Keep only useful columns

  • Rename them

  • Fix date format

df_clean <- df %>%
  select(
    headline = headline.main,
    pub_date,
    snippet,
    section_name,
    web_url
  ) %>%
  mutate(pub_date = as.Date(pub_date))

head(df_clean)
                                                                          headline
1                                                               DOG MEAT IN CHINA.
2                                                           NEW TROUBLES IN CHINA.
3             CHARM OF OLD CHINA TRADE SHOWN IN BUSINESS LETTERS; CHINESE MERCHANT
4                                                 Extends Letter Service in China.
5 EDUCATING CHINESE WOMEN; Support Is Urged for Lingnan University in South China.
6                         China Says She Will Proclaim Full Freedom by End of 1929
    pub_date
1 1929-12-01
2 1929-10-16
3 1929-12-15
4 1929-09-15
5 1929-11-26
6 1929-09-17
                                                                                                snippet
1            Chow dog flesh still obtainable in Canton and Peiping (Peking) although sale is prohibited
2                                                                                    Ed on new troubles
3                                       Gets old lrs of Russell & Co recalling charm of old China trade
4                                                Extension of lr service to interior of China announced
5                                           Appeals for funds for Lingnan Univ for women of South China
6 China plans to proclaim full freedom by end of 1929 regardless of outcome of negotiations now pending
  section_name
1     Archives
2     Archives
3     Archives
4     Archives
5     Archives
6     Archives
                                                                                                               web_url
1                                                   https://www.nytimes.com/1929/12/01/archives/dog-meat-in-china.html
2                                               https://www.nytimes.com/1929/10/16/archives/new-troubles-in-china.html
3 https://www.nytimes.com/1929/12/15/archives/charm-of-old-china-trade-shown-in-business-letters-chinese-merchant.html
4                                     https://www.nytimes.com/1929/09/15/archives/extends-letter-service-in-china.html
5  https://www.nytimes.com/1929/11/26/archives/educating-chinese-women-support-is-urged-for-lingnan-university-in.html
6            https://www.nytimes.com/1929/09/17/archives/china-says-she-will-proclaim-full-freedom-by-end-of-1929.html

Analysis

Articles Over Time

df_clean %>%
  mutate(month = format(pub_date, "%Y-%m")) %>%
  count(month) %>%
  ggplot(aes(x = month, y = n, group = 1)) +
  geom_col() +
  #geom_point() +
  theme_minimal() +
  labs(title = "NYT Articles About China (1929)",
       x = "Month",
       y = "Number of Articles")

Keyword Exploration

war_articles <- df_clean %>%
  filter(grepl("Peking", snippet, ignore.case = TRUE))

head(war_articles)
                                                                                                                   headline
1                                                                                                        DOG MEAT IN CHINA.
2 CHINA IS DEVELOPING MOTOR AND AIR TRAVEL; Commercial Attache Tells Hoover American Trade Is Gaining There Satisfactorily.
3                                     Kidnapped Furrier Sends Plea for Ransom; Stimson Cables to China to Aid Aaron Brenner
4                                         SEIZE CHINESE TEMPLE.; Workers Turn Monks Out and Smash Idols in Peking Outbreak.
5     CHINESE ART GIVEN PHILADELPHIA MUSEUM; 300 Paintings and Ducal Hall From Peking Are Pesented by Edward B. Robinettes.
    pub_date
1 1929-12-01
2 1929-10-30
3 1929-11-10
4 1929-11-24
5 1929-10-20
                                                                                                                              snippet
1                                          Chow dog flesh still obtainable in Canton and Peiping (Peking) although sale is prohibited
2                                                           Conf of J Arnold, Commercial Attache at Peiping (Peking) with Pres Hoover
3                                                  Brenner asks that random be paid; Amer Minister at Peking instructed to render aid
4 Union of street car workers at Peking seizes temple Tiehshanssu on Kuomintang party's ruling that "superstition" must be eradicated
5                                         Mr and Mrs E B Robinette present 300 Chinese paintings and hall from ducal palace in Peking
  section_name
1     Archives
2     Archives
3     Archives
4     Archives
5     Archives
                                                                                                              web_url
1                                                  https://www.nytimes.com/1929/12/01/archives/dog-meat-in-china.html
2  https://www.nytimes.com/1929/10/30/archives/china-is-developing-motor-and-air-travel-commercial-attache-tells.html
3 https://www.nytimes.com/1929/11/10/archives/kidnapped-furrier-sends-plea-for-ransom-stimson-cables-to-china-to.html
4     https://www.nytimes.com/1929/11/24/archives/seize-chinese-temple-workers-turn-monks-out-and-smash-idols-in.html
5 https://www.nytimes.com/1929/10/20/archives/chinese-art-given-philadelphia-museum-300-paintings-and-ducal-hall.html

Conclusion

In conclusion, the New York Times API provides a powerful way to retrieve and analyze historical news data using R. Through this project, we were able to collect article counts by month and filter results based on specific keywords. This enabled meaningful comparisons across different years and laid the foundation for identifying trends, such as which keywords appear most frequently in news coverage. Furthermore, these results can be extended with statistical or predictive models to better understand patterns in media attention over time. However, due to API rate limits, it may not be possible to fully complete large-scale data collection within the given timeframe, which remains a key limitation of this task.

LLMS used:

• OpenAI. (2025). ChatGPT (Version 5.2) [Large language model]. https://chat.openai.com. Accessed Mar 28, 2026.