The New York Times provides multiple public APIs via its developer portal (NYT Developer Network). To use any NYT API, you must first create an account and obtain an API key.
Your task:
Choose one New York Times API endpoint (e.g., Article Search, Most Popular, Books, etc.). You should use your data analysis to ask and attempt to answer an interesting question of your choosing. Examples: “What are the top five best selling hard cover books?”, “Which newspaper sections have produced the most popular articles, and how has this changed compared to X years ago?”, etc.
In R, write code to:
- Authenticate using your API key.
Never hard code an API key.
There are a number of good ways to do this.
For example: Store it in an environment variable
(e.g., `Sys.getenv("NYT_API_KEY"`)).
- Make a request to the endpoint
- Parse the **JSON** response
- Transform the result into a clean **R data frame** (tibble is fine)
Deliverable
A single Quarto (.qmd) document that includes:
- The API you selected (endpoint + brief description)
- The request you made (parameters used)
- Code that returns a tidy data frame
- A short note describing any data-cleaning decisions (e.g., nested fields, missing values)
Approach
Talk to the The New York Times API
Handle JSON data (API responses)
Clean and analyze data
Make charts
Setup
httr → sends a request to NYT (like opening a webpage automatically)
jsonlite → converts API response into table format
Each request = 10 articles Need a loop to get more
Create a data container with a loop to collect and store all results from API calls. Implement a delay (timer) between requests to prevent the NYT API from rate-limiting or blocking the API key.
all_articles <-list()for (p in0:20) {cat("Fetching page:", p, "\n") tmp <-nyt_search("China OR Chinese OR Beijing OR Shanghai", 19200101, 19291231, p)if (length(tmp) ==0) break all_articles[[p +1]] <- tmpSys.sleep(6)}
Error in `select()`:
! Can't select columns that don't exist.
✖ Column `headline.main` doesn't exist.
head(df_clean)
Error:
! object 'df_clean' not found
Analysis
Articles Over Time
df_clean %>%mutate(year =format(pub_date, "%Y")) %>%count(year) %>%ggplot(aes(x = year, y = n, group =1)) +geom_line() +geom_point() +theme_minimal() +labs(title ="NYT Articles About China (1920s)",x ="Year",y ="Number of Articles")