The NY Times provides an API that enables web-developers and others to access data (1851 to present) associated with the publication. There are currently 10 API’s available to the public - for non-commercial use only. They include:
Archive - past NYT articles for a given month
Article Search - articles by keyword
Books - book reviews and The New York Times Best Sellers lists
Community - comments from registered users on New York Times articles
Most Popular - comments from registered users on New York Times articles
Movie Reviews - movie reviews by keyword and opening date
RSS Feeds - articles ranked on the section fronts
Semantic - list of people, places, organizations and other locations, entities and descriptors that make up the controlled vocabulary used as metadata by
Times Newswire - links and metadata for Times’ articles
Top Stories - array of articles currently on the specified sections
In this assignment, I applied the “Article Search” API to access news articles from 2020 highlighting the Adirondack region of New York State. I also took advantage of filtering to limit my search to news articles.
The API returns query results in JSON form. Further steps are required to retrieve and store this data as a dataframe. For example, JSON’s data structure (which is hierarchical) must be “flattened” prior to conversion. I used the jasonlite library to accomplish this task. I relied on dplyr to accomplish other aspects of data cleaning and transformation. In this respect, I limited the final data set to entries where Adirondack(s) appeared either in the article headline or lead paragraph.
The following source material was valuable in providing example code and steps to complete this assignment:
The NY Times Developers Network: https://developer.nytimes.com
Storybench: https://www.storybench.org/working-with-the-new-york-times-api-in-r/
Daisung jang: https://daisungjang.com/tutorial/Nytimes_tutorial.html
#load libraries
library(tidyverse)
library(jsonlite)
library(magrittr)
library(kableExtra)
library(janitor)
#set query parameters to generalize requests
key = "jY4q18VWUjMQPNMBjhrdquQ7AVPxYGZe"
search_terms <- c("Adirondacks")
material<-c("News")
# Query API and identify data volumn (page number)
for(i in 1:length(search_terms)){
term <- search_terms[i]}
begin_date <- "20000101"
end_date <- "20001231"
url <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",
term,
"&fq=type_of_material:",
material,
"&begin_date=",
begin_date,
"&end_date=",
end_date,
"&facet_filter=true&api-key=",
key, sep="")
#Query hit limited to 10 pages
query <- fromJSON(url)
tot_pages <- round((query$response$meta$hits[1] / 10)-1)
#Create a df for each page and pastes on page number
pages <- list()
for(i in 0:tot_pages){
nytSearch <- fromJSON(paste0(url, "&page=", i), flatten = TRUE) %>% data.frame()
message("Retrieving page ", i)
pages[[i+1]] <- nytSearch
Sys.sleep(6)
}
#combine dataframes
temp <- rbind_pages(pages)
# Save as CSV after subsetting and renaming columns for clarity
temp%<>%select(!c(status, copyright,response.docs.abstract, response.docs.snippet,response.docs._id, response.docs.multimedia, response.docs.keywords, response.docs.byline.person, response.docs.uri, response.docs.print_section, response.docs.print_page, response.docs.subsection_name, response.docs.headline.kicker, response.docs.headline.content_kicker, response.docs.headline.print_headline, response.docs.headline.name, response.docs.headline.seo, response.docs.headline.sub,response.docs.byline.organization, response.meta.hits, response.meta.offset, response.meta.time))
temp%<>%rename(url=response.docs.web_url, lead = response.docs.lead_paragraph, source =response.docs.source, pub_date = response.docs.pub_date, type = response.docs.document_type, news_desk = response.docs.news_desk, section = response.docs.section_name, material = response.docs.type_of_material, word_count=response.docs.word_count, headline = response.docs.headline.main, author = response.docs.byline.original)
write_csv(temp, "Adks2000.csv")
#Filter dataset to remove irrelevant articles
Adks<-read_csv("Adks2000.csv")%>%as.data.frame()
View(Adks)
Adks%<>%filter(material != "Paid Death Notice")
Adks%<>%filter(material != "Obituary; Biography")
Adks%<>%filter(news_desk != "classified")
adk1<-Adks%>%filter(str_detect(headline,'Adirondack|Adirondacks'))
adk2<-Adks%>%filter(str_detect(lead,'Adirondack|Adirondacks'))
Adks2000_final<-rbind(adk1, adk2)%>%separate(pub_date, c("date","temp"), sep =" ")%>%select(!temp)%>%clean_names()
#print final df and save to csv
Adks2000_final%>%kbl%>%kable_material(c("striped"))
url | lead | source | date | type | news_desk | section | material | word_count | headline | author |
---|---|---|---|---|---|---|---|---|---|---|
https://www.nytimes.com/2000/08/20/us/2000-campaign-president-clinton-maintains-low-profile-his-adirondacks-vacation.html | CNN has a correspondent and crew in place, just in case news breaks out in the woods. The other networks are following along as well, betting on the proposition that President Clinton, even while on vacation, will not be able to resist a camera and boom mike in his face. | The New York Times | 2000-08-20 | article | National Desk | U.S. | News | 501 | Clinton Maintains a Low Profile On His Adirondacks’ Vacation | By Marc Lacey |
https://www.nytimes.com/2000/03/27/nyregion/acid-rain-law-found-to-fail-in-adirondacks.html | A landmark air pollution law enacted a decade ago to reduce acid rain has failed to slow the acidification of lakes and streams in the Adirondacks, many of which are rapidly losing the ability to sustain life, according to a new federal report. | The New York Times | 2000-03-27 | article | Metropolitan Desk | New York | News | 1218 | Acid Rain Law Found to Fail in Adirondacks | By James Dao |
https://www.nytimes.com/2000/02/21/nyregion/rare-avalanche-kills-one-on-an-adirondack-slope.html | Six back-country skiers who ventured into a dangerous snowslide area were caught in a roaring avalanche and swept hundreds of yards down a mountainside in the Adirondack High Peaks of northern New York State on Saturday. Rescuers found one dead under the snow and his companions all injured, one critically. | The New York Times | 2000-02-21 | article | Metropolitan Desk | New York | News | 906 | Rare Avalanche Kills One On an Adirondack Slope | By Robert D. McFadden |
https://www.nytimes.com/2000/11/03/arts/weekend-warrior-alone-in-the-wilds-where-nature-makes-waves.html | There are many iconic American landscapes, but perhaps none is more evocative than the mountains and lakes of the Adirondacks. The northern forests, a mix of conifers and hardwoods, are everything we think a forest should be. The peaks are ancient, bare granite; the streams are magical, tumbling past banks blanketed in green moss and ferns. James Fenimore Cooper’s Hawkeye could still be lurking in the morning mists, or at least the movie image of Daniel Day Lewis in ‘’The Last of the Mohicans.’’ | The New York Times | 2000-11-03 | article | Leisure/Weekend Desk | Arts | News | 1985 | Alone in the Wilds, Where Nature Makes Waves | By Jerry Beilinson |
https://www.nytimes.com/2000/03/27/nyregion/acid-rain-law-found-to-fail-in-adirondacks.html | A landmark air pollution law enacted a decade ago to reduce acid rain has failed to slow the acidification of lakes and streams in the Adirondacks, many of which are rapidly losing the ability to sustain life, according to a new federal report. | The New York Times | 2000-03-27 | article | Metropolitan Desk | New York | News | 1218 | Acid Rain Law Found to Fail in Adirondacks | By James Dao |
https://www.nytimes.com/2000/02/21/nyregion/rare-avalanche-kills-one-on-an-adirondack-slope.html | Six back-country skiers who ventured into a dangerous snowslide area were caught in a roaring avalanche and swept hundreds of yards down a mountainside in the Adirondack High Peaks of northern New York State on Saturday. Rescuers found one dead under the snow and his companions all injured, one critically. | The New York Times | 2000-02-21 | article | Metropolitan Desk | New York | News | 906 | Rare Avalanche Kills One On an Adirondack Slope | By Robert D. McFadden |
https://www.nytimes.com/2000/04/02/weekinreview/march-26-april-1-failure-is-reported-for-clean-air-act.html | When Congress enacted amendments to the Clean Air Act in 1990, environmentalists hailed them as powerful weapons in the fight to reduce acid rain. But a new federal report showed that many lakes, ponds and streams in the Northeast, particularly in the Adirondack Mountains, are continuing to turn acidic and are in growing danger of becoming lifeless. | The New York Times | 2000-04-02 | article | Week in Review Desk | Week in Review | News | 78 | March 26-April 1; Failure Is Reported For Clean Air Act | By James Dao |
https://www.nytimes.com/2000/08/19/nyregion/mrs-clinton-takes-a-busman-s-holiday.html | Some Adirondacks getaway! Hillary Rodham Clinton barely had enough time to unpack her bags at the lakeside home where the first family is spending the weekend before she was off to a political rally to talk up her Senate bid. | The New York Times | 2000-08-19 | article | Metropolitan Desk | New York | News | 432 | Mrs. Clinton Takes a Busman’s Holiday | By Marc Lacey |
https://www.nytimes.com/2000/08/20/travel/a-voice-from-the-met-s-first-season.html | HIDDEN in a grove of pines on a finger of land overlooking Lake George in the foothills of the Adirondacks is a shrine to a remarkable woman. Her lovely musical name was Marcella Sembrich, and though virtually forgotten today, this Polish-born soprano was once one of the glories of the operatic stage. | The New York Times | 2000-08-20 | article | Travel Desk | Travel | News | 2264 | A Voice From the Met’s First Season | By Constance Rosenblum |
https://www.nytimes.com/2000/04/10/nyregion/invasive-mussels-turn-up-in-lake-thought-to-be-immune.html | A year and a half ago, it looked as if Lake George, a blue jewel in the green Adirondacks, had dodged a biological bullet – the zebra mussel, an invasive European mollusk that is clogging pipes, crowding local aquatic life and turning beaches into toe-slicing shell heaps from Michigan to the Hudson River. | The New York Times | 2000-04-10 | article | Metropolitan Desk | New York | News | 1049 | Invasive Mussels Turn Up in Lake Thought to Be Immune | By Andrew C. Revkin |
https://www.nytimes.com/2000/06/09/arts/weekend-warrior-stalking-the-bear-bears-optional.html | Camped on the banks of the Oswegatchie River deep in the Adirondacks several years ago, Paul Rezendes, a master tracker and wilderness photographer, was awakened by a black bear bashing around his camp. Naked and armed with only a canoe paddle, he chased the bear into the night. As he drifted back to sleep, the snorting bear returned, bent on sampling the contents of the well-stocked cooler that Mr. Rezendes had stashed at the far end of his camp. | The New York Times | 2000-06-09 | article | Leisure/Weekend Desk | Arts | News | 1742 | Stalking the Bear (Bears Optional) | By Joe Glickman |
https://www.nytimes.com/2000/07/16/arts/television-radio-saving-the-world-one-sexy-teen-at-a-time.html | IT seems to me that having created the Internet, the wireless electronic pocket organizer and Oreos that turn your milk bright blue when you dunk ‘em, humankind has reached its technological apogee. In so doing, we’ve solved our most fundamental problems:’‘How can I deliver, to someone far away, an annoying online sales pitch for deeply discounted off-season Adirondack time shares?’’ and ‘’I’m a wealthy day trader and an idiot – what day is it?’’ and ‘’Why is my cookie so tasty yet not at all artistically ambitious or challenging to the eye?’’ | The New York Times | 2000-07-16 | article | Arts and Leisure Desk | Arts | News | 2614 | Saving the World, One Sexy Teen at a Time | By Jeff Macgregor |
write_csv(Adks2000_final, "Adks2000_final")
The NY Times API provides a reasonably user-friendly interface for accessing publication related data by developers. Users should be aware that query requests will be limited to 10 pages unless query hits are spaced at least 6-seconds apart (per API instructions) and that filter options are limited to those provided by the API.
The query that I constructed produced 12 entries (after data cleaning) for the year 2020. The majority of these entries can be grouped thematically into two categories: 1. environmental; and 2) recreational.