This write-up tackles the core tasks in the Web APIs assignment for DATA 607:
Choose a New York Times API
I went with the Article Search API, which allows for
searching articles by keyword across time.
Sign up for an API key
I created my key through NYT’s developer portal and
stored it safely using .Renviron
.
Construct an interface in R to read in the JSON
data
Using httr
, I pulled articles related to
“homelessness” between 2020 and 2023, requesting
multiple pages of data to get a broader slice of NYT coverage.
Transform JSON into a usable R DataFrame
The response was flattened with jsonlite
and wrangled using
dplyr
into a clean, structured dataset with the relevant
fields for analysis.
This wasn’t just about making the API work. I wanted the topic to matter—so I chose homelessness because it ties into my public health background and gives this assignment some real-world weight.
# Define the search criteria
query <- "homelessness"
start_date <- "20200101" # Format: YYYYMMDD
end_date <- "20231231"
# NYT Article Search endpoint
url <- "https://api.nytimes.com/svc/search/v2/articlesearch.json"
# Prepare to store paginated article results
all_articles <- list()
# NYT returns 10 results per page. We'll request the first 3 pages for demo purposes.
for (i in 0:2) {
response <- GET(
url = url,
query = list(
q = query,
begin_date = start_date,
end_date = end_date,
page = i,
`api-key` = nyt_key
)
)
# Parse and store if response is successful
if (status_code(response) == 200) {
content_json <- fromJSON(content(response, as = "text"), flatten = TRUE)
articles <- content_json$response$docs
all_articles[[i + 1]] <- articles
} else {
warning(paste("Failed request on page", i))
}
}
## No encoding supplied: defaulting to UTF-8.
## No encoding supplied: defaulting to UTF-8.
## Warning: Failed request on page 2
# Combine all results into a single dataframe and select key fields
combined_df <- bind_rows(all_articles) %>%
select(
headline = headline.main,
pub_date,
section = section_name,
author = byline.original,
url = web_url,
snippet
)
# Deduplicate the dataset in case of overlap across pages
combined_df <- distinct(combined_df)
# Save to CSV for portability and future analysis
write.csv(combined_df, "nyt_homelessness_articles_2020_2023.csv", row.names = FALSE)
# Quick look at structure and preview
glimpse(combined_df)
## Rows: 20
## Columns: 6
## $ headline <chr> "Jurors Find San Francisco Homeless Man Not Guilty in Pipe Be…
## $ pub_date <chr> "2023-12-23T05:48:00+0000", "2023-12-19T08:01:52+0000", "2023…
## $ section <chr> "U.S.", "New York", "U.S.", "New York", "U.S.", "New York", "…
## $ author <chr> "By Jesse Barron", "By Jan Ransom and Amy Julia Harris", "By …
## $ url <chr> "https://www.nytimes.com/2023/12/23/us/san-francisco-homeless…
## $ snippet <chr> "The case was initially seen as an illustration of the city’s…
head(combined_df, 5)
## headline
## 1 Jurors Find San Francisco Homeless Man Not Guilty in Pipe Beating
## 2 A New Push to Improve Mental Health Care for Homeless New Yorkers
## 3 Homelessness Rose to Record Level This Year, Government Says
## 4 As Winter Approaches, Fears Grow for Homeless Migrants
## 5 A Once Despairing Sandwich Shop Owner Sees ‘a Miracle’
## pub_date section author
## 1 2023-12-23T05:48:00+0000 U.S. By Jesse Barron
## 2 2023-12-19T08:01:52+0000 New York By Jan Ransom and Amy Julia Harris
## 3 2023-12-15T18:05:07+0000 U.S. By Jason DeParle
## 4 2023-12-04T08:00:21+0000 New York By Luis Ferré-Sadurní
## 5 2023-12-26T10:02:42+0000 U.S. By Eli Saslow
## url
## 1 https://www.nytimes.com/2023/12/23/us/san-francisco-homeless-man-not-guilty.html
## 2 https://www.nytimes.com/2023/12/19/nyregion/nyc-homeless-mental-ill-violence.html
## 3 https://www.nytimes.com/2023/12/15/us/politics/homelessness-record-level.html
## 4 https://www.nytimes.com/2023/12/04/nyregion/nyc-migrant-crisis-cold.html
## 5 https://www.nytimes.com/2023/12/26/us/phoenix-homeless-encampment-zone.html
## snippet
## 1 The case was initially seen as an illustration of the city’s crime and homelessness woes. Further evidence challenged that narrative.
## 2 Mark Levine, the Manhattan borough president, is calling for more treatment teams and psychiatric beds to address a mental health crisis.
## 3 The rise of 12 percent was the biggest one-year jump since the federal government began an annual count in 2007.
## 4 Migrants slept on New York City sidewalks last week. Some advocates worry about what will happen when families need to reapply for shelter after Christmas.
## 5 Six months ago, Joe Faillace wasn’t sure his business could survive. He believes Phoenix’s clearing of a nearby homeless encampment saved his shop.
This end-to-end workflow shows how to: - Connect to a real-world API using R, - Handle authentication securely, - Parse and transform nested JSON data, - Export a clean, usable dataset for future analysis.
Instead of just going through the motions, I built this project around a subject that matters. Homelessness is a persistent issue that intersects with public health, economics, and policy—and this kind of data work is how we start telling better stories with real information.
This assignment reminded me why I love working with real-world data. It’s not just about cleaning or querying—it’s about connecting the technical with the human. Pulling New York Times articles on homelessness gave me the chance to see how media reflects and shapes our understanding of social issues.
As a data analyst, I care deeply about impact, and this kind of work bridges the gap between numbers and narratives. This project wasn’t just a coding task—it was a reminder that data is a lens we can use to bring empathy, awareness, and solutions to the surface.