The task for week 9 was to get a NYT API key, construct an interface to R, grab some data and finally return a dataframe. I chose to work with the “Archive_api” because I actually have some future tasks that I’d like to tackle that involve old newspapers.
This file is available on rpubs here and in my github here
library(jsonlite)
library(httr)
library(knitr)I created a function to query the NYT Archive API and return a dataframe containing the data retrieved. Also, in the development process, I noticed that my request failed frequently so I’ve set my function up to retry using the RETRY() function.
#params
api.key <- "36a5b43cb0e04a1dad5e23a9810f2cc1"
yyyy <- "1929"
mm <- "09"
#return JSON from NYT API
get.NytArchives <- function(api.key,yyyy,mm){
base.url <- paste("https://api.nytimes.com/svc/archive/v1/",yyyy,"/",mm,".json",sep="")
print(paste("Collecting NYT archvies data for: ",toString(yyyy),"-",toString(mm)))
#get seems to fail sometimes, so keep on tryin'
query <- RETRY("GET","https://api.nytimes.com/svc/archive/v1/1929/9.json",
query = list(api_key=api.key),
times = 100,
pause_base = 2)
query <- content(query,as="text",encoding="UTF-8")
df <- as.data.frame(fromJSON(query))
#clean up the column names
colnames(df) <- gsub("^.*\\.","", colnames(df))
return(df)
}We’ll do a call to grab a single month and see what we get back
result <- get.NytArchives(api.key,"1929","9")## [1] "Collecting NYT archvies data for: 1929 - 9"
And now we’ll take a look and see what we got. First i’ll print the column names:
kable(colnames(result),col.names = "Column Names")| Column Names |
|---|
| copyright |
| hits |
| web_url |
| snippet |
| lead_paragraph |
| abstract |
| print_page |
| blog |
| source |
| multimedia |
| headline |
| keywords |
| pub_date |
| document_type |
| news_desk |
| section_name |
| subsection_name |
| byline |
| type_of_material |
| _id |
| word_count |
| slideshow_credits |
For output purposes, I’ll select only a few of the columns listed above in order to keep things legible:
kable(head(result[c("web_url","snippet")],5))| web_url | snippet |
|---|---|
| https://query.nytimes.com/gst/abstract.html?res=9D04E0D6163FE731A25752C0A96F9C946895D6CF | With the ending of the vacation season business and industry swing into the Autumn season under exceptionally favorable circumstances. The Summer has been characterized generally by exceptional vigor in most lines, with high ratios of ac tivity, … |
| https://query.nytimes.com/gst/abstract.html?res=9503E3D6163FE731A25752C0A96F9C946895D6CF | WASHINGTON, Aug. 31.–The next Congress will not consent to change the postal rates unless the postoffice bookkeeping system is changed to show accurately the receipts from different classifications, Senator George H. Moses of New Hampshire, forme… |
| https://query.nytimes.com/gst/abstract.html?res=9B02E1D6163FE731A25752C0A96F9C946895D6CF | Mrs. Lillian Bloch Schwarz of Long Beach, L.I., prominent Zionist, died suddenly in Berlin, according to word received here by Hadassah, the Women’s Zionist Organization of America. Her age was 39. The body will arrive here on the Leviathan tomorr… |
| https://query.nytimes.com/gst/abstract.html?res=9906E3D6163FE731A25752C0A96F9C946895D6CF | MINNEAPOLIS, Aug. 31. (AP).–Dedication of Minneapolis’s tallest building, the Foshay Tower, as a Washington memorial, today drew a notable gathering of Federal and State Government officials from most States of the Union…. |
| https://query.nytimes.com/gst/abstract.html?res=9403E4D6163FE731A25752C0A96F9C946895D6CF | LONDON, Aug. 31.–London is anxiously awaiting the result of the impending clash between the large force of Arabs which, it is officially announced, crossed the Syrian frontier into Palestine early yesterday morning, and the strong detachment of a… |
The data looks good!
Now we’re going to try and grab a bunch of data all at once:
df <- data.frame(matrix(ncol = 2, nrow=0))
colnames(df) <- c("web_url","snippet")
for (i in 1:5){
data <- get.NytArchives(api.key,1929,i)
df<- rbind(df,data[c("web_url","snippet")])
}## [1] "Collecting NYT archvies data for: 1929 - 1"
## [1] "Collecting NYT archvies data for: 1929 - 2"
## [1] "Collecting NYT archvies data for: 1929 - 3"
## [1] "Collecting NYT archvies data for: 1929 - 4"
## [1] "Collecting NYT archvies data for: 1929 - 5"
We’ve just collected 70620 articles from the NYT archives