CUNY Data 607 - Web APIs
Assignment
The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.
Setup
library(tidyverse)
library(rvest)
library(tidyjson)
library(configr)First we’ll use the configr package to load our secret App Key from a configuration file ‘config.yml’, and constuct a basic API call to the Times Newswire API for the twenty most recent articles:
cfg <- read.config(file = 'config.yml')
api_endpoint <- 'https://api.nytimes.com/svc/news/v3/content/all/all.json'
api_url <- str_c(api_endpoint,'?api-key=',cfg$app_key)Call API
Next. we’ll call this API endpoint and use the rvest package to parse the text from the HTML body tag in the response:
html <- read_html(api_url)
json <- html %>% html_elements("body") %>% html_text()Parse JSON
With the tidyjson package, we use the gather_object function to parse the json into a table, and use the json_types function to examine the resulting structure:
json %>% gather_object %>% json_types# A tbl_json: 4 x 4 tibble with a "JSON" attribute
..JSON document.id name type
<chr> <int> <chr> <fct>
1 "\"OK\"" 1 status string
2 "\"Copyright (c) ..." 1 copyright string
3 "500" 1 num_results number
4 "[{\"slug_name\":\"..." 1 results array
(Note that the table type is a special tbl_json object which looks like a dataframe or tibble, but always includes an additional attribute with the original json string. We’ll wait until the final step to drop this additional data.)
The results object contains the article information we want, so we select it using enter_object. Since it is an array type, we need to use gather_array (instead of gather_object) to parse the components into a table.
results <- json %>%
enter_object('results') %>%
gather_arrayEach article (each row in our results table) looks something like this:
For each row/article, we want to extract certain fields from the ..JSON field into their own columns.
Here we are using the spread_values function to define new columns individually instead of using spread_all to unpack them all at once, since there is some inconsistency between fields. Some have nulls for list objects (such as des_facet) resulting in errors.
results_dist <- results %>%
spread_values(
section = jstring(section),
title = jstring(title),
byline = jstring(byline),
abstract = jstring(abstract),
created_date = jstring(created_date)
) %>%
select(!c(document.id, array.index))For convenience. we also de-select some of the index data we no lnoger need, such as document.id and array.index.
Convert to DataFrame
Notice we still have all the original json in the ..JSON column. We’ll drop that by converting our tbl_json object into a normal dataframe:
results_df <- results_dist %>% as_data_frame.tbl_json()The resulting dataframe is tidy and ready for further handling:
| section | title | byline | abstract | created_date |
|---|---|---|---|---|
| World | Regressive behavior, acting out: U.S. teachers weigh in on the struggles they see children facing. | BY JESSICA GROSE | 2021-10-23T16:13:34-04:00 | |
| World | Here are the lessons experts have taken from the five waves of the coronavirus in the U.S. | BY LAUREN LEATHERBY | 2021-10-23T16:06:58-04:00 | |
| Technology | In India, Facebook Grapples With an Amplified Version of Its Problems | BY SHEERA FRENKEL AND DAVEY ALBA | Internal documents show a struggle with misinformation, hate speech and celebrations of violence in the country, the company’s biggest market. | 2021-10-23T15:46:49-04:00 |
| U.S. | Nevada Man Is Charged With Voting Using His Dead Wife’s Ballot | BY EDUARDO MEDINA | Donald Kirk Hartle, a Republican, had claimed that someone voted in the 2020 election by using the mail-in ballot of his wife, who died in 2017. He now faces two counts of voter fraud. | 2021-10-23T15:31:28-04:00 |
| Opinion | How I Became a Sick Person | BY ROSS DOUTHAT | A sudden descent into a chronic illness. | 2021-10-23T15:00:07-04:00 |
| U.S. | Inadvertent Gun Discharges Occurred on Alec Baldwin Film Before Fatal Shooting, Crew Members Say | BY SIMON ROMERO AND JULIA JACOBS | They expressed concerns over gun mishaps and working conditions just days before the shooting that killed the cinematographer Halyna Hutchins. | 2021-10-23T14:59:03-04:00 |
| Business | Chuck Bundrant, Pacific Fisheries’ ‘Henry Ford,’ Dies at 79 | BY CLAY RISEN | In 1961 he arrived in Seattle with no job, no skills and $80. Over the next 60 years, he built a seafood empire and transformed the industry. | 2021-10-23T14:27:54-04:00 |
| World | Taliban Honor Suicide Bombers’ ‘Sacrifices’ in Bid to Rewrite History | BY THOMAS GIBBONS-NEFF, SHARIF HASSAN AND RUHULLAH KHAPALWAK | The new government brought together the bombers’ families at a publicized event, praising their actions but alienating those who have suffered at their hands. | 2021-10-23T14:11:46-04:00 |
| World | Erdogan Threatens to Expel 10 Western Ambassadors | BY CARLOTTA GALL | The move follows a statement from the envoys demanding the release of a prominent philanthropist jailed since 2017. | 2021-10-23T13:57:20-04:00 |
| U.S. | Biden’s popularity rating falls, but the pandemic is a bright spot. | BY NATE COHN | 2021-10-23T13:35:51-04:00 | |
| New York | Letitia James Isn’t Saying Whether She’s Running for Governor. But She Is Hiring. | BY KATIE GLUECK | Ms. James, the New York attorney general, has recently recruited several advisers and fund-raisers ahead of a possible run for the state’s top office. | 2021-10-23T13:18:43-04:00 |
| World | Singapore will require vaccination or daily tests for workplace access next year. | BY JOHN YOON | 2021-10-23T13:01:12-04:00 | |
| World | ‘Completely Lost’: For Some Afghans, Returning Home Is as Difficult as Fleeing | BY MUJIB MASHAL | Thousands of Afghans who were in India for medical treatment when the country collapsed are now desperate to return, but have no money and no clear route home. | 2021-10-23T12:37:21-04:00 |
| World | France recommends flu and Covid booster shots in the same visit. | BY JOHN YOON | 2021-10-23T12:11:13-04:00 | |
| Sports | His N.B.A. Dream Was Right There. Then He Couldn’t Move His Legs. | BY DAVID GARDNER | A mysterious illness on the eve of the 2019 N.B.A. draft derailed Kris Wilkes’s hopes of going pro. As he heals, he’s not giving up hope. | 2021-10-23T12:01:10-04:00 |
| Opinion | ‘Dune’ Owes Its Climate Change Prophecies to Indigenous Tribes | BY DANIEL IMMERWAHR | Native Americans’ warnings of environmental catastrophe inspired the landscape of “Dune.” Now their tribal lands are flooding. | 2021-10-23T11:41:15-04:00 |
| U.S. | U.S. Struggles With Afghan Evacuees Weeded Out, and Now in Limbo | BY CHARLIE SAVAGE | No final decisions have been made, but dozens red-flagged for apparent criminal pasts or links to militants have been sent to a base in Kosovo, where their fate is uncertain. | 2021-10-23T11:30:39-04:00 |
| Opinion | Is Truth the Best Medicine for Dying Patients? | Readers react to a doctor’s essay about regretting her honesty with a dying man. | 2021-10-23T11:30:05-04:00 | |
| Opinion | The N.F.L.’s Problems Are Bigger Than Gruden | BY JANE COASTON | Gruden’s scandal revealed the gross underbelly of the N.F.L. | 2021-10-23T11:15:04-04:00 |
| Sports | Friends Row On After Teammate’s Death, Leaving One Seat Empty | BY MARIA CRAMER | Charlie Hamlin, a former Olympic rower who dominated races well into his 70s, died in May. On Sunday, his longtime teammates plan to row the Head of the Charles Regatta with an empty seat in his memory. | 2021-10-23T11:07:32-04:00 |