The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs
You’ll need to start by signing up for an API key.
Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.
library(jsonlite)
library(tidyverse)
library(DT)
I set my NY Times API key as an environment variable in order to hide it from my final report. If you want to replicate my data by running the code yourself, you will have to set your key by running the following in the console and replacing the text with your API key…
Sys.setenv(nytkey="Your_key_goes_here")
I thought it would be fun to look at what was happening 100 years before I was born so I chose to select only articles published in April 1872. I will filter that down to only the date of my birth later since the NY Times Archive API only allows you to narrow down to one month at a time.
Again, if you want to replicate my data, you will have to remove the eval=FALSE argument from the code chunk below. I didn’t want to keep requesting the data repeatedly every time I knit the document, so after it had successfully run I added the eval=FALSE argument to keep it from running again and saved the data to a .RData file so it can be loaded from there going forward. Since I am looking at archived data from April 1872, I don’t expect it to change.
url <- "https://api.nytimes.com/svc/archive/v1/1872/4.json"
NYTKey <- Sys.getenv("nytkey")
query <- paste0(url,"?api-key=",NYTKey)
content <- fromJSON(query)
save(content, file = "NYTArchive-1872-04.RData")
First I tried using just the str function in order to see what’s there but the raw data comes in nested lists with lists of data frames and there’s so many levels within the structure that it was hard to make sense of it. I could see that the majority of interesting data is in the content$response$docs variable so I used the str function on the head function on content$response$docs to get a list of variables to start selecting from and narrowing down.
load("NYTArchive-1872-04.RData")
str(head(content$response$docs))
## 'data.frame': 6 obs. of 20 variables:
## $ web_url : chr "https://query.nytimes.com/gst/abstract.html?res=9D02E3D91739EF34BC4953DFB2668389669FDE" "https://query.nytimes.com/gst/abstract.html?res=9903E3D91739EF34BC4953DFB2668389669FDE" "https://query.nytimes.com/gst/abstract.html?res=9E00E4D91739EF34BC4953DFB2668389669FDE" "https://query.nytimes.com/gst/abstract.html?res=9C03E3D91739EF34BC4953DFB2668389669FDE" ...
## $ snippet : chr "The Presbytery of Long Island will meet at Southampton Tuesday, April 9...." NA "Escape of..." NA ...
## $ lead_paragraph : chr "The Presbytery of Long Island will meet at Southampton Tuesday, April 9." NA NA NA ...
## $ abstract : chr NA NA "Escape of" NA ...
## $ print_page : chr "8" "8" "1" "8" ...
## $ blog :List of 6
## ..$ : list()
## ..$ : list()
## ..$ : list()
## ..$ : list()
## ..$ : list()
## ..$ : list()
## $ source : chr "The New York Times" "The New York Times" "The New York Times" "The New York Times" ...
## $ multimedia :List of 6
## ..$ : list()
## ..$ : list()
## ..$ : list()
## ..$ : list()
## ..$ : list()
## ..$ : list()
## $ headline :'data.frame': 6 obs. of 2 variables:
## ..$ main : chr "LONG ISLAND." "German Observances No Out-Door Celebrations-Concerts and Bails." "CUBA.; Watching a Blockade-Runner Rumored Escape of the Vessel from the Spanish Man-of-War." "Church of the Ascension Confirmation by Bishop Potter." ...
## ..$ kicker: chr "1" "1" NA "1" ...
## $ keywords :List of 6
## ..$ :'data.frame': 0 obs. of 0 variables
## ..$ :'data.frame': 0 obs. of 0 variables
## ..$ :'data.frame': 2 obs. of 2 variables:
## .. ..$ name : chr "organizations" "subject"
## .. ..$ value: chr "VIRGINIUS, BLOCKADE-RUNNER" "MARINE"
## ..$ :'data.frame': 0 obs. of 0 variables
## ..$ :'data.frame': 2 obs. of 2 variables:
## .. ..$ name : chr "subject" "subject"
## .. ..$ value: chr "ACCIDENTS" "WESTFIELD FERRY-BOAT DISASTER"
## ..$ :'data.frame': 1 obs. of 2 variables:
## .. ..$ name : chr "persons"
## .. ..$ value: chr "SHERMAN, GEN."
## $ pub_date : chr "1872-04-01T00:03:58Z" "1872-04-01T00:03:58Z" "1872-04-01T00:03:58Z" "1872-04-01T00:03:58Z" ...
## $ document_type : chr "article" "article" "article" "article" ...
## $ news_desk : logi NA NA NA NA NA NA
## $ section_name : logi NA NA NA NA NA NA
## $ subsection_name : logi NA NA NA NA NA NA
## $ byline :'data.frame': 6 obs. of 2 variables:
## ..$ person :List of 6
## .. ..$ : NULL
## .. ..$ : NULL
## .. ..$ : NULL
## .. ..$ : NULL
## .. ..$ : NULL
## .. ..$ : NULL
## ..$ original: chr NA NA NA NA ...
## $ type_of_material : chr "Article" "Article" "Front Page" "Article" ...
## $ _id : chr "4fbfe79545c1498b0d06266c" "4fbfe7e445c1498b0d063ff7" "4fbfe7e945c1498b0d064850" "4fbfe7e445c1498b0d063ff6" ...
## $ word_count : int 200 185 73 110 152 73
## $ slideshow_credits: logi NA NA NA NA NA NA
Many of the variables within content$response$docs did not really have any data or had very little data so I selected only the following columns to work with: snippet, lead_paragraph, abstract, print_page, source, headline, keywords, pub_date, type_of_material, word_count.
df <- content$response$docs %>%
select(headline, snippet, lead_paragraph, abstract, print_page,
keywords, pub_date, type_of_material, word_count)
# 'headline' was a list of two variables 'main' and 'kicker'
# so I just converted it to contain the main headline
df$headline <- df$headline$main
df$pub_date <- substr(df$pub_date, 1, 10)
I narrowed the data down even more to only include the articles published on my birthday.
birthday <- df %>%
filter(grepl("1872-04-19",pub_date))
datatable(birthday)
table(birthday$type_of_material)
##
## Article Editorial Front Page
## 38 6 25
## Letter Marriage Announcement Obituary
## 1 1 1
frontpage <- filter(birthday, grepl('Front Page', type_of_material))
frontpage$headline
## [1] "The Union League of America."
## [2] "Republican Legislative Caucus Measures to be Pressed Vanderbilt and His Disbursing Agents."
## [3] "GREAT BRITAIN.; The Queen to Visit Napoleon at Chiselhurst Edwin James Pays His Compliments to America."
## [4] "BY MAIL AND TELEGRAPH."
## [5] "AUSTRALIA.; Destruction of a Melbourne Theatre A Cyclone."
## [6] "THE STATE CAPITAL.; Vanderbilt's Scheme Passes the Senate The Impeachment Testimony Submitting the Testimony as Printed. THE CENTRAL UNDER-GROUND. THE ELECTION LAW FOR NEW-YORK CITY. THE IMPEACHMENT TESTIMONY."
## [7] "NEWS BY TELEGRAPH.; Prospects of the Civil Service Bill in Washington. Utter Annihilation of Catacazy by the Russian Government. The London Telegraph's France-German Story Denied Flatly. Japanese Exclusiveness Yielding Before Advanced Ideas. Burning of a Theatre in Melbourne, Australia. WASHINGTON. The Congressional Dispatches to the New-York Grant Demonstration--Mr.Schurz and His Accounts--Prospects of the Civil Service Bill. MR. SCHURZ AND HIS ACCOUNTS. A SCENE IN THE HOUSE. THE CIVIL SERVICE BILL. FOR CINCINNATI. CATACAZY'S CASE."
## [8] "Escape of Prisoners on Their Way to Blackwell's Island."
## [9] "JAPAN.; Relaxing Exclusiveness Gratification with the Reception of the Embassy Earthquake Fears for the Safety of a Steamer."
## [10] "THE BAR ASSOCIATION.; Continuation of the Investigation-Revelations of Judge Curtis' Partnership Business."
## [11] "FRANCE AND GERMANY.; Alarming State of Affairs as Reported by the London Telegraph."
## [12] "THE INDIRECT CLAIMS.; Synopsis of the British Counter Case at Geneva. Allegations of Insincere Neutrality Not Discussed. Weight to be Attached to Statements of American Consuls. Cuban and Fenian Raids Cited Against us as Precedents. Comments of the London Journals on the Counter Case. Discussion in the British Parliament on the Counter-Case."
## [13] "THE NAVAL INVESTIGATION.; Testimony of Chief-Engineer Isherwood Proceedings Yesterday Before the Special Committee."
## [14] "NEW-JERSEY."
## [15] "The Watson Murder The Fourth Day of the Trial."
## [16] "LONG ISLAND."
## [17] "MEXICO.; Saltillo Occupied by the Government Forces Movements of the Revolutionists."
## [18] "BROOKLYN."
## [19] "Serious Stabbing Affray."
## [20] "The Sherman Poisoning Trial."
## [21] "A Flat Denial by the Constitutionnel of Paris."
## [22] "The Official Vote for Governor in Connecticut."
## [23] "NEW-YORK AND SUBURBAN NEWS.; NEW-YORK."
## [24] "NORTH CAROLINA.; The Republican State Convention Grant Indorsed."
## [25] "NOVA SCOTIA.; Sale of the Cargo of the Steamer Dacian Proroguing of the Legislature."
articles <- filter(birthday, grepl('Article', type_of_material))
articles$headline
## [1] "The Effect of the Charter's Passage on Office-Holders."
## [2] "POLITICAL FEELING IN ILLINOIS.; Weakness of the \"Liberal\" Sentiment--Gov. Palmer Indulging in Great Expectations from Cincinnati--His Biography and Photograph Already Sent East."
## [3] "The Tariff A Decision Demanded."
## [4] "SHOT IN A BRAWL.; Death of David Barry He Perished in a Drunken Row."
## [5] "The Tenth Assembly District Republicans Resolve to Support President Grant and the New Charter."
## [6] "COMMON PLEAS TRIAL TERM PART II. APRIL 18.; Before Judge Joseph F. Daly and a Jury, SUIT ON A \"RAISED\" DRAFT."
## [7] "SUPERIOR COURT SPECIAL TERM APRIL 18.; Before Chief-Justice Barbour, BREACH OF PROMISE."
## [8] "The New Oil Exchange Completing the Organization."
## [9] "Article 2 -- No Title"
## [10] "The Massachusetts Liberal Republicans."
## [11] "THE FIRE UNDERWRITERS.; Reorganization of the Old Board The Losses by the Chicago Fire."
## [12] "Amusements This Evening."
## [13] "Article 3 -- No Title"
## [14] "UNITED STATES SUPREME COURT."
## [15] "SUPREME COURT CHAMBERS APRIL 18.; Before Judge Brady THE GOULD-GORDON SUIT."
## [16] "COMMERCIAL AFFAIRS."
## [17] "Exchange Sales Thursday, April 18,"
## [18] "MARINE INTELLIGENCE.; Cleared. Arrived. Sailed. By Telegraph. Spoken. Foreign Ports. Marine Disaster. European Marine News."
## [19] "Hahnemann Charity Operatic and Dramatic Soiree."
## [20] "Desperate Encounter With Burglars--An Officer Shot."
## [21] "Article 7 -- No Title; Sailed. By Telegraph. Spoken. Foreign Ports. Marine Disaster. European Marine News."
## [22] "SUPERIOR COURT TRIAL TERM PART II. APRIL 18.; Before Judge Curtis and a Jury. LIABILITY OF COMMON CARRIERS."
## [23] "Article 5 -- No Title"
## [24] "The Bondholders of the St. Louis and St. Joseph Railroad."
## [25] "Article 1 -- No Title"
## [26] "Article 6 -- No Title"
## [27] "COURT CALENDARS THIS DAY."
## [28] "Article 4 -- No Title"
## [29] "THE JACK GLASS HOMICIDE.; Trial of Costello Yesterday Testimony for the Defense What Judge Dowling Thought of Glass."
## [30] "Doek Commission Financial Report Yesterday's Proceedings."
## [31] "The People's Municipal Association on the Passage of the Seventy's Charter."
## [32] "SALES OF ARMS.; Examination of the Chief Clerk of the Treasury Department Singular Obtuseness of Senators."
## [33] "The Great Grant Meeting at Cooper Institute--Letter from Mr. Dwight--The Supper at the Union League Club."
## [34] "Semi-Weekly Meeting of the Board of Aldermen."
## [35] "VIRGINIA REPUBLICANS.; The Platform Adopted by the State Convention--President Grant Indorsed and His Renomination Urged."
## [36] "DECISIONS.; SUPREME COURT SPECIAL TERM APRIL 18."
## [37] "THE CHARTER.; Passage of the Measure in the Senate and Assembly. Who Voted for and Against it, and Who were Absent. Probable Action of Gov. Hoffman on the Bill. Bets and Speculations as to the Legal Consequences. Full Text of the Charter as Passed by the Legislature. THE NEW CHARTER. THE CHARTER."
## [38] "CONGRESSIONAL PROCEEDINGS.; CONSIDERATION OF BILLS. BILLS INTRODUCED. BILLS PASSED. POSTPONED. HOUSE OF REPRESENTATIVES. BILLS PASSED. THE CIVIL SERVICE."
I was curious to see if there was any more details about this article since Cooper Union is my alma mater but unfortunately there is no snippet or first paragraph text.
filter(articles, grepl("Cooper", headline))
## headline
## 1 The Great Grant Meeting at Cooper Institute--Letter from Mr. Dwight--The Supper at the Union League Club.
## snippet lead_paragraph abstract print_page keywords pub_date
## 1 <NA> <NA> <NA> 2 NULL 1872-04-19
## type_of_material word_count
## 1 Article 290