1. Overview

The NY Times provides an API that enables web-developers and others to access data (1851 to present) associated with the publication. There are currently 10 API’s available to the public - for non-commercial use only. They include:

In this assignment, I applied the “Article Search” API to access news articles from 2020 highlighting the Adirondack region of New York State. I also took advantage of filtering to limit my search to news articles.

The API returns query results in JSON form. Further steps are required to retrieve and store this data as a dataframe. For example, JSON’s data structure (which is hierarchical) must be “flattened” prior to conversion. I used the jasonlite library to accomplish this task. I relied on dplyr to accomplish other aspects of data cleaning and transformation. In this respect, I limited the final data set to entries where Adirondack(s) appeared either in the article headline or lead paragraph.

The following source material was valuable in providing example code and steps to complete this assignment:

#load libraries

library(tidyverse)
library(jsonlite)
library(magrittr)
library(kableExtra)
library(janitor)

2. NY Time API Query

#set query parameters to generalize requests

key = "jY4q18VWUjMQPNMBjhrdquQ7AVPxYGZe"

search_terms <- c("Adirondacks")

material<-c("News")

# Query API and identify data volumn (page number)

for(i in 1:length(search_terms)){
    term <- search_terms[i]}

begin_date <- "20000101"
end_date <- "20001231"

url <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",
              term,
              "&fq=type_of_material:",
              material,
              "&begin_date=",
              begin_date,
              "&end_date=",
              end_date,
              "&facet_filter=true&api-key=",
              key, sep="")

#Query hit limited to 10 pages

query <- fromJSON(url)

tot_pages <- round((query$response$meta$hits[1] / 10)-1)

#Create a df for each page and pastes on page number

pages <- list()

for(i in 0:tot_pages){
  nytSearch <- fromJSON(paste0(url, "&page=", i), flatten = TRUE) %>% data.frame() 
  message("Retrieving page ", i)
  pages[[i+1]] <- nytSearch 
  Sys.sleep(6) 
}

#combine dataframes

temp <- rbind_pages(pages)  

# Save as CSV after subsetting and renaming columns for clarity

temp%<>%select(!c(status, copyright,response.docs.abstract, response.docs.snippet,response.docs._id,  response.docs.multimedia, response.docs.keywords, response.docs.byline.person, response.docs.uri, response.docs.print_section, response.docs.print_page, response.docs.subsection_name, response.docs.headline.kicker, response.docs.headline.content_kicker, response.docs.headline.print_headline, response.docs.headline.name, response.docs.headline.seo, response.docs.headline.sub,response.docs.byline.organization, response.meta.hits, response.meta.offset, response.meta.time))

temp%<>%rename(url=response.docs.web_url, lead = response.docs.lead_paragraph, source =response.docs.source, pub_date = response.docs.pub_date, type = response.docs.document_type, news_desk = response.docs.news_desk,  section = response.docs.section_name, material = response.docs.type_of_material, word_count=response.docs.word_count, headline = response.docs.headline.main, author = response.docs.byline.original)

write_csv(temp, "Adks2000.csv")

3. Clean and Tidy data

#Filter dataset to remove irrelevant articles

Adks<-read_csv("Adks2000.csv")%>%as.data.frame()
View(Adks)

Adks%<>%filter(material != "Paid Death Notice")
Adks%<>%filter(material != "Obituary; Biography")
Adks%<>%filter(news_desk != "classified")

adk1<-Adks%>%filter(str_detect(headline,'Adirondack|Adirondacks'))
adk2<-Adks%>%filter(str_detect(lead,'Adirondack|Adirondacks'))

Adks2000_final<-rbind(adk1, adk2)%>%separate(pub_date, c("date","temp"), sep =" ")%>%select(!temp)%>%clean_names()

#print final df and save to csv

Adks2000_final%>%kbl%>%kable_material(c("striped"))
url lead source date type news_desk section material word_count headline author
https://www.nytimes.com/2000/08/20/us/2000-campaign-president-clinton-maintains-low-profile-his-adirondacks-vacation.html CNN has a correspondent and crew in place, just in case news breaks out in the woods. The other networks are following along as well, betting on the proposition that President Clinton, even while on vacation, will not be able to resist a camera and boom mike in his face. The New York Times 2000-08-20 article National Desk U.S. News 501 Clinton Maintains a Low Profile On His Adirondacks’ Vacation By Marc Lacey
https://www.nytimes.com/2000/03/27/nyregion/acid-rain-law-found-to-fail-in-adirondacks.html A landmark air pollution law enacted a decade ago to reduce acid rain has failed to slow the acidification of lakes and streams in the Adirondacks, many of which are rapidly losing the ability to sustain life, according to a new federal report. The New York Times 2000-03-27 article Metropolitan Desk New York News 1218 Acid Rain Law Found to Fail in Adirondacks By James Dao
https://www.nytimes.com/2000/02/21/nyregion/rare-avalanche-kills-one-on-an-adirondack-slope.html Six back-country skiers who ventured into a dangerous snowslide area were caught in a roaring avalanche and swept hundreds of yards down a mountainside in the Adirondack High Peaks of northern New York State on Saturday. Rescuers found one dead under the snow and his companions all injured, one critically. The New York Times 2000-02-21 article Metropolitan Desk New York News 906 Rare Avalanche Kills One On an Adirondack Slope By Robert D. McFadden
https://www.nytimes.com/2000/11/03/arts/weekend-warrior-alone-in-the-wilds-where-nature-makes-waves.html There are many iconic American landscapes, but perhaps none is more evocative than the mountains and lakes of the Adirondacks. The northern forests, a mix of conifers and hardwoods, are everything we think a forest should be. The peaks are ancient, bare granite; the streams are magical, tumbling past banks blanketed in green moss and ferns. James Fenimore Cooper’s Hawkeye could still be lurking in the morning mists, or at least the movie image of Daniel Day Lewis in ‘’The Last of the Mohicans.’’ The New York Times 2000-11-03 article Leisure/Weekend Desk Arts News 1985 Alone in the Wilds, Where Nature Makes Waves By Jerry Beilinson
https://www.nytimes.com/2000/03/27/nyregion/acid-rain-law-found-to-fail-in-adirondacks.html A landmark air pollution law enacted a decade ago to reduce acid rain has failed to slow the acidification of lakes and streams in the Adirondacks, many of which are rapidly losing the ability to sustain life, according to a new federal report. The New York Times 2000-03-27 article Metropolitan Desk New York News 1218 Acid Rain Law Found to Fail in Adirondacks By James Dao
https://www.nytimes.com/2000/02/21/nyregion/rare-avalanche-kills-one-on-an-adirondack-slope.html Six back-country skiers who ventured into a dangerous snowslide area were caught in a roaring avalanche and swept hundreds of yards down a mountainside in the Adirondack High Peaks of northern New York State on Saturday. Rescuers found one dead under the snow and his companions all injured, one critically. The New York Times 2000-02-21 article Metropolitan Desk New York News 906 Rare Avalanche Kills One On an Adirondack Slope By Robert D. McFadden
https://www.nytimes.com/2000/04/02/weekinreview/march-26-april-1-failure-is-reported-for-clean-air-act.html When Congress enacted amendments to the Clean Air Act in 1990, environmentalists hailed them as powerful weapons in the fight to reduce acid rain. But a new federal report showed that many lakes, ponds and streams in the Northeast, particularly in the Adirondack Mountains, are continuing to turn acidic and are in growing danger of becoming lifeless. The New York Times 2000-04-02 article Week in Review Desk Week in Review News 78 March 26-April 1; Failure Is Reported For Clean Air Act By James Dao
https://www.nytimes.com/2000/08/19/nyregion/mrs-clinton-takes-a-busman-s-holiday.html Some Adirondacks getaway! Hillary Rodham Clinton barely had enough time to unpack her bags at the lakeside home where the first family is spending the weekend before she was off to a political rally to talk up her Senate bid. The New York Times 2000-08-19 article Metropolitan Desk New York News 432 Mrs. Clinton Takes a Busman’s Holiday By Marc Lacey
https://www.nytimes.com/2000/08/20/travel/a-voice-from-the-met-s-first-season.html HIDDEN in a grove of pines on a finger of land overlooking Lake George in the foothills of the Adirondacks is a shrine to a remarkable woman. Her lovely musical name was Marcella Sembrich, and though virtually forgotten today, this Polish-born soprano was once one of the glories of the operatic stage. The New York Times 2000-08-20 article Travel Desk Travel News 2264 A Voice From the Met’s First Season By Constance Rosenblum
https://www.nytimes.com/2000/04/10/nyregion/invasive-mussels-turn-up-in-lake-thought-to-be-immune.html A year and a half ago, it looked as if Lake George, a blue jewel in the green Adirondacks, had dodged a biological bullet – the zebra mussel, an invasive European mollusk that is clogging pipes, crowding local aquatic life and turning beaches into toe-slicing shell heaps from Michigan to the Hudson River. The New York Times 2000-04-10 article Metropolitan Desk New York News 1049 Invasive Mussels Turn Up in Lake Thought to Be Immune By Andrew C. Revkin
https://www.nytimes.com/2000/06/09/arts/weekend-warrior-stalking-the-bear-bears-optional.html Camped on the banks of the Oswegatchie River deep in the Adirondacks several years ago, Paul Rezendes, a master tracker and wilderness photographer, was awakened by a black bear bashing around his camp. Naked and armed with only a canoe paddle, he chased the bear into the night. As he drifted back to sleep, the snorting bear returned, bent on sampling the contents of the well-stocked cooler that Mr. Rezendes had stashed at the far end of his camp. The New York Times 2000-06-09 article Leisure/Weekend Desk Arts News 1742 Stalking the Bear (Bears Optional) By Joe Glickman
https://www.nytimes.com/2000/07/16/arts/television-radio-saving-the-world-one-sexy-teen-at-a-time.html IT seems to me that having created the Internet, the wireless electronic pocket organizer and Oreos that turn your milk bright blue when you dunk ‘em, humankind has reached its technological apogee. In so doing, we’ve solved our most fundamental problems:’‘How can I deliver, to someone far away, an annoying online sales pitch for deeply discounted off-season Adirondack time shares?’’ and ‘’I’m a wealthy day trader and an idiot – what day is it?’’ and ‘’Why is my cookie so tasty yet not at all artistically ambitious or challenging to the eye?’’ The New York Times 2000-07-16 article Arts and Leisure Desk Arts News 2614 Saving the World, One Sexy Teen at a Time By Jeff Macgregor
write_csv(Adks2000_final, "Adks2000_final")

4. Conclusion

The NY Times API provides a reasonably user-friendly interface for accessing publication related data by developers. Users should be aware that query requests will be limited to 10 pages unless query hits are spaced at least 6-seconds apart (per API instructions) and that filter options are limited to those provided by the API.

The query that I constructed produced 12 entries (after data cleaning) for the year 2020. The majority of these entries can be grouped thematically into two categories: 1. environmental; and 2) recreational.