The New York Times web site provides a rich set of APIs, including the TimesTags API. For this assignment, I have chose to work with this API which allows you to mine the New York Times tag set. From your query, the response provided is a ranked list of terms.
I will read in the JSON data from this API for a couple different queries and store the data in R dataframes.
structure: ?query={search-string}&[optional-param1=value1]&[.]&api-key={your-api-key}
The tag dictionaries that are searchable [&filter={dictionary}] include: (Des) - descriptive terms (Geo) - geographical unit (Org) - organizations (Per) - personal names
Using the httr library, I was able to test one of the examples from the Times Tags documentation. This example is a quick query across all dictionaries for the letters ‘pal’.
I have requested an api-key for use of this API which is used throughout the code.
#load required package
library(httr)
library(knitr)
## Warning: package 'knitr' was built under R version 3.3.3
library(kableExtra)
exampleurl='http://api.nytimes.com/svc/suggest/v1/timestags?query=pal&api-key=7178bcfcb8b24ba3bdf1d837327dfd79'
pal <- GET(exampleurl)
#check the status to be sure the call worked
pal$status_code
## [1] 200
#status is 200 which means it worked
#view the content
kable(content(pal, "parse"))%>%kable_styling("striped", full_width = F)
| x |
|---|
| [“pal”,[“Palestinians (Des)”,“Palestine Liberation Organization (Org)”,“Paleontology (Des)”,“Palestinian Authority (Org)”,“Palin, Sarah (Per)”,“Cerebral Palsy (Des)”,“Palm Beach (Fla) (Geo)”,“Palaces and Castles (Des)”,“Palmer, Arnold (Per)”,“Paltrow, Gwyneth (Per)”]] |
The data from the ‘pal’ example does not look too easy to work with, so I used the jsonlite package on some of the new queries.
#load required package
library(jsonlite)
#fromJSON turns the JSON code into an R list. The search criteria is in the first element of the list and the results are in the second element.
kable(fromJSON(data_all))%>%kable_styling("striped", full_width = F)
|
|
#create a dataframe from the second element in the list
data <- data.frame(fromJSON(data_all)[[2]])
names(data) <- "Tags_incl_'data'"
kable(head(data))%>%kable_styling("striped", full_width = F)
| Tags_incl_‘data’ |
|---|
| Data-Mining and Database Marketing (Des) |
| Falsification of Data (Des) |
| Data Storage (Des) |
| Data Centers (Des) |
| Dataclysm: Who We Are When We Think No One’s Looking (Book) (Ttl) |
| Big Data (Movie) (Ttl) |
Here I want to nest functions for cleaner code. I will create a dataframe of the top 200 organizations with ‘llc’ in the name, based on how frequently they are used in the New York Times.
top200 <- data.frame(fromJSON(sprintf(baseurl, "?query=llc&filter=(Org)&max=200"))[[2]])
names(top200) <- "Top200 LLC's"
kable(head(top200))%>%kable_styling("striped", full_width = F)
| Top200 LLC’s |
|---|
| United States Steel Corporation (Org) |
| KPMG (Org) |
| GMAC LLC (Org) |
| Fortress Investment Group L.L.C (Org) |
| Breitbart News Network LLC (Org) |
| One Grand LLC (Org) |