The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs.
The goal of the Week 10 assignment is to use one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.
The code for this assignment requires the following R packages:
This assignment will use the New York Times Most Popular API. This API is described as providing the following:
“Get links and metadata for the blog posts and articles that are most frequently e-mailed, shared and viewed by NYTimes.com readers.”
Anyone wishing to call a NYTimes API must register and requset an API key. http://developer.nytimes.com/page
This assignment will be issuing a GET for the Most Viewed
sections.
Parameters included in the Most Popular API call:
Parameter | Value | Description |
---|---|---|
format | json | Response format |
section | all-sections | Limits the results by one or more sections |
time-period | 1 | Corresponds to a day, a week or a month of content |
offset | 20 | Number or results returned. To page through the results, set offset to the appropriate value. |
api-key | Registered API Key | API Key provided by Times Developer Network |
offset <- 20
# construct the URI
url_base <- paste0("http://api.nytimes.com/svc/mostpopular/v2/mostviewed/all-sections/1.json?offset=", offset)
url <- paste0(url_base,"&api-key=", nyt_most_popular_api)
# Get the first batch of 20 using the the API
raw_contents <- GET(url = url)
The response from the API Call looks like this:
## Response [http://api.nytimes.com/svc/mostpopular/v2/mostviewed/all-sections/1.json?offset=20&api-key=8f268ba14f2142789d55690bc74054f6:18:74856659]
## Date: 2016-04-03 21:38
## Status: 200
## Content-Type: application/json; charset=UTF-8
## Size: 54.3 kB
A 200 value for the status indicates a successful response according to the Times Developer Network Standard Errror Codes.
raw_contents$status_code
## [1] 200
HTTP Response Code | Description |
---|---|
200 OK | Requests successfully understood and processed |
400 Bad Request | A required parameter was not specified or your request was otherwise improperly formed. See the body of the error response for more details. (For additional information on required parameters, see the documentation for each API.) |
404 Not Found | The resource you requested does not exist. |
500 Server Error | The request was successfully understood, but it could not be processed due to a server error. Please try your request again later, and contact us if the problem continues. |
# store the json
json_raw <- httr::content(raw_contents, type = "text", encoding = "UTF-8")
## get status
status <-
json_raw %>%
enter_object("status") %>%
append_values_string("status") %>%
select(status)
## get the number of results
results <-
json_raw %>%
enter_object("num_results") %>%
append_values_string("num_results") %>%
select(num_results)
Using the Response Body, we can determine the status and the number of results returned from the API call.
Below is an excerpt of the Response Body to illustrate the json format.
A description of the tidyjson
package from CRAN:
“The tidyjson package takes an alternate approach to structuring JSON data into tidy data.frames. Similar to tidyr, tidyjson builds a grammar for manipulating JSON into a tidy table structure. Tidyjson is based on the following principles:
Using tidyjson
we can extract key information from the JSON structure using the pipeline operator %>%. In this case, the JSON structure is complex, we’ll use the enter_object() function to move into a specific object key in the JSON attribute. In this particular case, the “results” object.
nyt_most_popular_json <- json_raw %>% as.tbl_json
results <-
nyt_most_popular_json %>%
enter_object("results") %>%
gather_array %>%
spread_values(
id = jnumber("id"),
type = jstring("type"),
section = jstring("section"),
title = jstring("title"),
by = jstring("byline"),
url = jstring("url"),
keywords = jstring("adx_keywords"),
abstract = jstring("abstract"),
published_date = jstring("published_date"),
source = jstring("source"),
views = jnumber("views")
)
(Note - some of the extracted values such as abstract and keywords are not displayed below due to the length of the text.)
id | type | section | title | by | url | published_date | source | views |
---|---|---|---|---|---|---|---|---|
100000004293613 | Article | Magazine | What Happened When Venture Capitalists Took Over the Golden State Warriors | By BRUCE SCHOENFELD | http://www.nytimes.com/2016/04/03/magazine/what-happened-when-venture-capitalists-took-over-the-golden-state-warriors.html | 2016-04-03 | The New York Times | 1 |
100000004301080 | Article | Science | View From Space Hints at a New Viking Site in North America | By RALPH BLUMENTHAL | http://www.nytimes.com/2016/04/01/science/vikings-archaeology-north-america-newfoundland.html | 2016-04-01 | The New York Times | 2 |
100000004306746 | Article | U.S. | Amtrak Collision With Backhoe Leaves 2 Dead, Officials Say | By NATE SCHWEBER and MIKE McPHATE | http://www.nytimes.com/2016/04/04/us/amtrak-train-derails-outside-of-philadelphia.html | 2016-04-04 | The New York Times | 3 |
100000004301105 | Article | Opinion | When Whites Just Don’t Get It, Part 6 | By NICHOLAS KRISTOF | http://www.nytimes.com/2016/04/03/opinion/sunday/when-whites-just-dont-get-it-part-6.html | 2016-04-03 | The New York Times | 4 |
100000004304318 | Article | Arts | And the Awards for Best Audio Fiction Go to … | By JOSHUA BARONE | http://www.nytimes.com/2016/04/02/books/sarah-lawrence-international-audio-fiction-awards-2016.html | 2016-04-02 | The New York Times | 5 |
100000004272340 | Article | U.S. | Obama Gets Scant Credit in Indiana Region Where Recovery Was Robust | By JACKIE CALMES | http://www.nytimes.com/2016/04/03/us/politics/obama-donald-trump-economy-indiana.html | 2016-04-03 | The New York Times | 6 |
100000004304286 | Article | Opinion | Abortion and Punishment | By KATHA POLLITT | http://www.nytimes.com/2016/04/02/opinion/campaign-stops/abortion-and-punishment.html | 2016-04-02 | The New York Times | 7 |
100000004256392 | Article | World | E.U. Suspects Russian Agenda in Migrants’ Shifting Arctic Route | By ANDREW HIGGINS | http://www.nytimes.com/2016/04/03/world/europe/for-migrants-into-europe-a-road-less-traveled.html | 2016-04-03 | The New York Times | 8 |
100000004306346 | Article | Business Day | Alaska Airlines Said to Be Near $2 Billion Deal for Virgin America | By MICHAEL J. de la MERCED and LESLIE PICKER | http://www.nytimes.com/2016/04/03/business/dealbook/alaska-airlines-said-to-be-near-2-billion-deal-for-virgin-america.html | 2016-04-03 | The New York Times | 9 |
100000004303113 | Interactive | U.S. | How Votes For Trump Could Become Delegates for Someone Else | By LARRY BUCHANAN and ALICIA PARLAPIANO | http://www.nytimes.com/interactive/2016/04/01/us/politics/how-votes-for-trump-could-become-delegates-for-someone-else.html | 2016-04-01 | The New York Times | 10 |
100000004306002 | Article | World | Third Man Is Charged in Belgium Over Foiled Plot in France | By AURELIEN BREEDEN and SEWELL CHAN | http://www.nytimes.com/2016/04/03/world/europe/belgium-terrorist-plot-france.html | 2016-04-03 | The New York Times | 11 |
100000004298310 | Article | Travel | Is Europe Safe for Travelers? Yes, Experts Say, but Here Are Some Tips | By KAREN WORKMAN | http://www.nytimes.com/2016/03/31/travel/is-europe-safe-for-travelers-yes-but-here-are-some-tips.html | 2016-03-31 | The New York Times | 12 |
100000004302516 | Article | Opinion | Time for South Africa’s Jacob Zuma to Step Down | By THE EDITORIAL BOARD | http://www.nytimes.com/2016/04/02/opinion/time-for-south-africasjacob-zuma-to-step-down.html | 2016-04-02 | The New York Times | 13 |
100000004293262 | Article | Fashion & Style | A Love Story That Had to Wait | By JANE GORDON JULIEN | http://www.nytimes.com/2016/04/03/fashion/weddings/a-love-story-that-had-to-wait.html | 2016-04-03 | The New York Times | 14 |
100000004304092 | Article | Your Money | To Buy or Rent a Home? Weighing Which Is Better | By TARA SIEGEL BERNARD | http://www.nytimes.com/2016/04/02/your-money/to-buy-or-rent-a-home-weighing-which-is-better.html | 2016-04-02 | The New York Times | 15 |
100000004299975 | Interactive | The Upshot | How the Rest of the Delegate Race Could Unfold | By GREGOR AISCH, JOSH KATZ and K.K. REBECCA LAI | http://www.nytimes.com/interactive/2016/03/30/upshot/trump-clinton-delegate-calculator.html | 2016-03-30 | The New York Times | 16 |
343 | Blog | U.S. | Donald Trump Steps Awkwardly Into Abortion Debate for Second Time This Week | By MAGGIE HABERMAN | http://www.nytimes.com/politics/first-draft/2016/04/01/donald-trump-steps-awkwardly-into-abortion-debate-for-second-time-this-week/ | 2016-04-01 | The New York Times | 17 |
100000004306154 | Article | Business Day | After WikiLeaks Revelation, Greece Asks I.M.F. to Clarify Bailout Plan | By LIZ ALDERMAN | http://www.nytimes.com/2016/04/03/business/after-wikileaks-revelation-greece-asks-imf-to-clarify-bailout-plan.html | 2016-04-03 | The New York Times | 18 |
100000004302159 | Article | Arts | What to Watch Saturday | By KATHRYN SHATTUCK | http://www.nytimes.com/2016/04/02/arts/television/what-to-watch-saturday.html | 2016-04-02 | The New York Times | 19 |
24 | Blog | Health | Ask Well: Does Taking Fewer Than 5,000 Steps a Day Make You Sedentary? | By GRETCHEN REYNOLDS | http://well.blogs.nytimes.com/2016/04/01/ask-well-does-less-than-5000-steps-a-day-make-you-sedentary/ | 2016-04-01 | The New York Times | 20 |
For this particular example, use the NY Times Most Popular API to retrieve the 100 most viewed articles for a single day, a week, and a month. This will require using the offset parameter in the API, ranging from 20, 40, … 100 as well as the time period API parameter.
# ================================================================
# Function: get_most_viewed
# ================================================================
# Parameters:
# 1. section default value "all-sections"
# 2. tim_period: day value of either 1, 7, or 30
# 3. iterations: provided value * 20 will determine the offset for paging through results
# Reurn: tbl_json
# ================================================================
get_most_viewed <- function(section = "all-sections", time_period = 1, iterations = 1, debug = FALSE) {
for (i in 1:iterations) {
offset <- i * 20
# construct the URI
uri_base <- paste0("http://api.nytimes.com/svc/mostpopular/v2/mostviewed/all-sections/", time_period)
uri_base <- paste0(uri_base, ".json?offset=", offset)
uri <- paste0(uri_base,"&api-key=", nyt_most_popular_api)
if (debug) {print(uri)}
# Get the first batch of 20 using the the API
raw_contents <- GET(url = uri)
# store the json
json_raw <- httr::content(raw_contents, type = "text", encoding = "UTF-8")
## get status
json_raw %>% enter_object("status") %>%
append_values_string("status") %>% select(status)
## get the number of resultes
results <-
json_raw %>%
enter_object("num_results") %>%
append_values_string("num_results") %>%
select(num_results)
if (debug) {print(status)}
if (debug) {print(results)}
nyt_most_popular_json <- json_raw %>% as.tbl_json
results <-
nyt_most_popular_json %>%
enter_object("results") %>%
gather_array %>%
spread_values(
id = jnumber("id"),
type = jstring("type"),
section = jstring("section"),
title = jstring("title"),
by = jstring("byline"),
url = jstring("url"),
keywords = jstring("adx_keywords"),
abstract = jstring("abstract"),
published_date = jstring("published_date"),
source = jstring("source"),
views = jnumber("views")
)
# rowbind the results to create one tbl_json object containing the 100 Most Viewed articles
# rbindlist requires the data.table package
if (i == 1) {
results_json <- results
}
else {
results_json <- rbindlist(list(results_json, results))
}
}
return (results_json)
}
top_100_day_json <- get_most_viewed("all-sections", 1, 5)
top_100_day_json %>%
group_by(section) %>%
tally %>%
ggplot(aes(section, n, fill = section)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() + theme(legend.position = "none") +
ggtitle("100 Most Viewed NY Times Articles by Section (Single Day)") +
xlab("Section") + ylab("Number of Views") +
geom_text(aes(label=n), vjust=0.5, hjust=1.1,color="black")
top_100_wk_json <- get_most_viewed("all-sections", 7, 5)
top_100_wk_json %>%
group_by(section) %>%
tally %>%
ggplot(aes(section, n, fill = section)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() + theme(legend.position = "none") +
ggtitle("100 Most Viewed NY Times Articles by Section (Week)") +
xlab("Section") + ylab("Number of Views") +
geom_text(aes(label=n), vjust=0.5, hjust=1.1,color="black")
top_100_mth_json <- get_most_viewed("all-sections", 30, 5)
top_100_mth_json %>%
group_by(section) %>%
tally %>%
ggplot(aes(section, n, fill = section)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() + theme(legend.position = "none") +
ggtitle("100 Most Viewed NY Times Articles by Section (Month)") +
xlab("Section") + ylab("Number of Views") +
geom_text(aes(label=n), vjust=0.5, hjust=1.1,color="black")