Using Web APIs - New York Times

The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs.

The goal of the Week 10 assignment is to use one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it to an R dataframe.

The code for this assignment requires the following R packages:

Using the tidyjson package to create tidy data.frames in R

A description of the tidyjson package from CRAN:

“The tidyjson package takes an alternate approach to structuring JSON data into tidy data.frames. Similar to tidyr, tidyjson builds a grammar for manipulating JSON into a tidy table structure. Tidyjson is based on the following principles:

Using tidyjson we can extract key information from the JSON structure using the pipeline operator %>%. In this case, the JSON structure is complex, we’ll use the enter_object() function to move into a specific object key in the JSON attribute. In this particular case, the “results” object.

nyt_most_popular_json <- json_raw %>% as.tbl_json

results <-
       nyt_most_popular_json %>%
       enter_object("results") %>%
       gather_array %>%
       spread_values(
            id = jnumber("id"),
            type = jstring("type"),
            section = jstring("section"),
            title = jstring("title"),
            by = jstring("byline"),
            url = jstring("url"),
            keywords = jstring("adx_keywords"),
            abstract = jstring("abstract"),
            published_date = jstring("published_date"),
            source = jstring("source"),
            views = jnumber("views")
  )

The Results

(Note - some of the extracted values such as abstract and keywords are not displayed below due to the length of the text.)

id type section title by url published_date source views
100000004293613 Article Magazine What Happened When Venture Capitalists Took Over the Golden State Warriors By BRUCE SCHOENFELD http://www.nytimes.com/2016/04/03/magazine/what-happened-when-venture-capitalists-took-over-the-golden-state-warriors.html 2016-04-03 The New York Times 1
100000004301080 Article Science View From Space Hints at a New Viking Site in North America By RALPH BLUMENTHAL http://www.nytimes.com/2016/04/01/science/vikings-archaeology-north-america-newfoundland.html 2016-04-01 The New York Times 2
100000004306746 Article U.S. Amtrak Collision With Backhoe Leaves 2 Dead, Officials Say By NATE SCHWEBER and MIKE McPHATE http://www.nytimes.com/2016/04/04/us/amtrak-train-derails-outside-of-philadelphia.html 2016-04-04 The New York Times 3
100000004301105 Article Opinion When Whites Just Don’t Get It, Part 6 By NICHOLAS KRISTOF http://www.nytimes.com/2016/04/03/opinion/sunday/when-whites-just-dont-get-it-part-6.html 2016-04-03 The New York Times 4
100000004304318 Article Arts And the Awards for Best Audio Fiction Go to … By JOSHUA BARONE http://www.nytimes.com/2016/04/02/books/sarah-lawrence-international-audio-fiction-awards-2016.html 2016-04-02 The New York Times 5
100000004272340 Article U.S. Obama Gets Scant Credit in Indiana Region Where Recovery Was Robust By JACKIE CALMES http://www.nytimes.com/2016/04/03/us/politics/obama-donald-trump-economy-indiana.html 2016-04-03 The New York Times 6
100000004304286 Article Opinion Abortion and Punishment By KATHA POLLITT http://www.nytimes.com/2016/04/02/opinion/campaign-stops/abortion-and-punishment.html 2016-04-02 The New York Times 7
100000004256392 Article World E.U. Suspects Russian Agenda in Migrants’ Shifting Arctic Route By ANDREW HIGGINS http://www.nytimes.com/2016/04/03/world/europe/for-migrants-into-europe-a-road-less-traveled.html 2016-04-03 The New York Times 8
100000004306346 Article Business Day Alaska Airlines Said to Be Near $2 Billion Deal for Virgin America By MICHAEL J. de la MERCED and LESLIE PICKER http://www.nytimes.com/2016/04/03/business/dealbook/alaska-airlines-said-to-be-near-2-billion-deal-for-virgin-america.html 2016-04-03 The New York Times 9
100000004303113 Interactive U.S. How Votes For Trump Could Become Delegates for Someone Else By LARRY BUCHANAN and ALICIA PARLAPIANO http://www.nytimes.com/interactive/2016/04/01/us/politics/how-votes-for-trump-could-become-delegates-for-someone-else.html 2016-04-01 The New York Times 10
100000004306002 Article World Third Man Is Charged in Belgium Over Foiled Plot in France By AURELIEN BREEDEN and SEWELL CHAN http://www.nytimes.com/2016/04/03/world/europe/belgium-terrorist-plot-france.html 2016-04-03 The New York Times 11
100000004298310 Article Travel Is Europe Safe for Travelers? Yes, Experts Say, but Here Are Some Tips By KAREN WORKMAN http://www.nytimes.com/2016/03/31/travel/is-europe-safe-for-travelers-yes-but-here-are-some-tips.html 2016-03-31 The New York Times 12
100000004302516 Article Opinion Time for South Africa’s Jacob Zuma to Step Down By THE EDITORIAL BOARD http://www.nytimes.com/2016/04/02/opinion/time-for-south-africasjacob-zuma-to-step-down.html 2016-04-02 The New York Times 13
100000004293262 Article Fashion & Style A Love Story That Had to Wait By JANE GORDON JULIEN http://www.nytimes.com/2016/04/03/fashion/weddings/a-love-story-that-had-to-wait.html 2016-04-03 The New York Times 14
100000004304092 Article Your Money To Buy or Rent a Home? Weighing Which Is Better By TARA SIEGEL BERNARD http://www.nytimes.com/2016/04/02/your-money/to-buy-or-rent-a-home-weighing-which-is-better.html 2016-04-02 The New York Times 15
100000004299975 Interactive The Upshot How the Rest of the Delegate Race Could Unfold By GREGOR AISCH, JOSH KATZ and K.K. REBECCA LAI http://www.nytimes.com/interactive/2016/03/30/upshot/trump-clinton-delegate-calculator.html 2016-03-30 The New York Times 16
343 Blog U.S. Donald Trump Steps Awkwardly Into Abortion Debate for Second Time This Week By MAGGIE HABERMAN http://www.nytimes.com/politics/first-draft/2016/04/01/donald-trump-steps-awkwardly-into-abortion-debate-for-second-time-this-week/ 2016-04-01 The New York Times 17
100000004306154 Article Business Day After WikiLeaks Revelation, Greece Asks I.M.F. to Clarify Bailout Plan By LIZ ALDERMAN http://www.nytimes.com/2016/04/03/business/after-wikileaks-revelation-greece-asks-imf-to-clarify-bailout-plan.html 2016-04-03 The New York Times 18
100000004302159 Article Arts What to Watch Saturday By KATHRYN SHATTUCK http://www.nytimes.com/2016/04/02/arts/television/what-to-watch-saturday.html 2016-04-02 The New York Times 19
24 Blog Health Ask Well: Does Taking Fewer Than 5,000 Steps a Day Make You Sedentary? By GRETCHEN REYNOLDS http://well.blogs.nytimes.com/2016/04/01/ask-well-does-less-than-5000-steps-a-day-make-you-sedentary/ 2016-04-01 The New York Times 20

Use Case - Graph the 100 Most Viewed New York Times Articles by Section by Time Period (1, 7, 30 Days)

For this particular example, use the NY Times Most Popular API to retrieve the 100 most viewed articles for a single day, a week, and a month. This will require using the offset parameter in the API, ranging from 20, 40, … 100 as well as the time period API parameter.

# ================================================================
# Function: get_most_viewed
# ================================================================
# Parameters: 
#            1. section default value "all-sections"
#            2. tim_period: day value of either 1, 7, or 30
#            3. iterations: provided value * 20 will determine the offset for paging through results
# Reurn: tbl_json
# ================================================================
get_most_viewed <- function(section = "all-sections", time_period = 1, iterations = 1, debug = FALSE) {

    for (i in 1:iterations) {
    
        offset <- i * 20
        
        # construct the URI
        uri_base <- paste0("http://api.nytimes.com/svc/mostpopular/v2/mostviewed/all-sections/", time_period)
        uri_base <- paste0(uri_base, ".json?offset=", offset)
        uri      <- paste0(uri_base,"&api-key=", nyt_most_popular_api)
        
        if (debug) {print(uri)}
        
        # Get the first batch of 20 using the the API
        raw_contents <- GET(url = uri)
        
        # store the json
        json_raw <- httr::content(raw_contents, type = "text", encoding = "UTF-8")
        
        
        ## get status
        json_raw %>% enter_object("status") %>%
          append_values_string("status") %>% select(status)
            
        ## get the number of resultes
        results <- 
            json_raw %>% 
            enter_object("num_results") %>%
            append_values_string("num_results") %>% 
            select(num_results)

        if (debug) {print(status)}
        if (debug) {print(results)}
    
        nyt_most_popular_json <- json_raw %>% as.tbl_json
    
        results <-
                nyt_most_popular_json %>%
                enter_object("results") %>%
                gather_array %>%
                spread_values(
                    id = jnumber("id"),
                    type = jstring("type"),
                    section = jstring("section"),
                    title = jstring("title"),
                    by = jstring("byline"),
                    url = jstring("url"),
                    keywords = jstring("adx_keywords"),
                    abstract = jstring("abstract"),
                    published_date = jstring("published_date"),
                    source = jstring("source"),
                    views = jnumber("views")
                  )
    
         # rowbind the results to create one tbl_json object containing the 100 Most Viewed articles
         # rbindlist requires the data.table package
        
         if (i == 1) { 
              results_json <- results
         } 
         else {
              results_json <- rbindlist(list(results_json, results))
         }
    
    
    }

    return (results_json)
}

Graph the Results

Most Viewed Articles by Section - One Day

top_100_day_json <- get_most_viewed("all-sections", 1, 5)

top_100_day_json %>%
   group_by(section) %>% 
   tally %>% 
   ggplot(aes(section, n, fill = section)) +
          geom_bar(stat = "identity", position = "stack") +
          coord_flip()   + theme(legend.position = "none") + 
    ggtitle("100 Most Viewed NY Times Articles by Section (Single Day)") +
    xlab("Section") + ylab("Number of Views") +
    geom_text(aes(label=n), vjust=0.5, hjust=1.1,color="black")

Most Viewed Articles by Section - One Week

top_100_wk_json <- get_most_viewed("all-sections", 7, 5)

top_100_wk_json %>%
   group_by(section) %>% 
   tally %>% 
   ggplot(aes(section, n, fill = section)) +
          geom_bar(stat = "identity", position = "stack") +
          coord_flip()   + theme(legend.position = "none") + 
    ggtitle("100 Most Viewed NY Times Articles by Section (Week)") +
    xlab("Section") + ylab("Number of Views") +
    geom_text(aes(label=n), vjust=0.5, hjust=1.1,color="black")

Most Viewed Articles by Section - One Month

top_100_mth_json <- get_most_viewed("all-sections", 30, 5)

top_100_mth_json %>%
   group_by(section) %>% 
   tally %>% 
   ggplot(aes(section, n, fill = section)) +
          geom_bar(stat = "identity", position = "stack") +
          coord_flip()   + theme(legend.position = "none") + 
    ggtitle("100 Most Viewed NY Times Articles by Section (Month)") +
    xlab("Section") + ylab("Number of Views") +
    geom_text(aes(label=n), vjust=0.5, hjust=1.1,color="black")