This assignment is an exploration in the use of The New York Times api directly from inside R
library(jsonlite)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.5 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.4 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x purrr::flatten() masks jsonlite::flatten()
## x dplyr::lag() masks stats::lag()
library(keyring)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
first_time <- FALSE
#If this is the very first time you are running this script you need to use save out your api key using keyring
if(first_time){
key_set_with_value(service = "NYT api",password = "YOUR_API_KEY_GOES_HERE")
}
api_key <- key_get("NYT api")
I used the paste0 function to concatenate my api key into the my query to the NYT api
The fromJSON and data.frame function do the heavy lifting of converting my JSON into a data frame
Queried articles related to ‘molecular fossils’ - organic compounds in the fossil record that are derived from once living organisms since the beginning of 2021.
I used the NYT Most Popular API
#Lets connect and look for articles in the Most Popular API related to molecular fossil
results <- fromJSON(paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=molecular fossil&begin_date=20210101&api-key=",api_key), flatten = TRUE) %>% data.frame()
glimpse(results)
## Rows: 6
## Columns: 32
## $ status <chr> "OK", "OK", "OK", "OK", "OK", "O…
## $ copyright <chr> "Copyright (c) 2021 The New York…
## $ response.docs.abstract <chr> "A laborer discovered the fossil…
## $ response.docs.web_url <chr> "https://www.nytimes.com/2021/06…
## $ response.docs.snippet <chr> "A laborer discovered the fossil…
## $ response.docs.lead_paragraph <chr> "Scientists on Friday announced …
## $ response.docs.print_section <chr> "A", "A", NA, "MM", "D", NA
## $ response.docs.print_page <chr> "1", "27", NA, "49", "2", NA
## $ response.docs.source <chr> "The New York Times", "The New Y…
## $ response.docs.multimedia <list> [<data.frame[73 x 19]>], [<data.…
## $ response.docs.keywords <list> [<data.frame[10 x 4]>], [<data.f…
## $ response.docs.pub_date <chr> "2021-06-25T15:00:12+0000", "20…
## $ response.docs.document_type <chr> "article", "article", "article"…
## $ response.docs.news_desk <chr> "Science", "OpEd", "NYTNow", "Ma…
## $ response.docs.section_name <chr> "Science", "Opinion", "Briefing"…
## $ response.docs.type_of_material <chr> "News", "Op-Ed", "briefing", "In…
## $ response.docs._id <chr> "nyt://article/56530668-e4d8-5ca…
## $ response.docs.word_count <int> 1520, 1081, 1062, 0, 6520, 13540
## $ response.docs.uri <chr> "nyt://article/56530668-e4d8-5ca…
## $ response.docs.headline.main <chr> "Discovery of ‘Dragon Man’ Skull…
## $ response.docs.headline.kicker <chr> "Matter", NA, NA, "The Health Is…
## $ response.docs.headline.content_kicker <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.headline.print_headline <chr> "Skull May Point to New Kind of …
## $ response.docs.headline.name <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.headline.seo <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.headline.sub <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.byline.original <chr> "By Carl Zimmer", "By Sarah Stew…
## $ response.docs.byline.person <list> [<data.frame[1 x 8]>], [<data.fr…
## $ response.docs.byline.organization <lgl> NA, NA, NA, NA, NA, NA
## $ response.meta.hits <int> 6, 6, 6, 6, 6, 6
## $ response.meta.offset <int> 0, 0, 0, 0, 0, 0
## $ response.meta.time <int> 24, 24, 24, 24, 24, 24
reduced_results <- results %>% select(headline = response.docs.headline.main, abstract = response.docs.abstract, section = response.docs.section_name)
reduced_results %>% kbl() %>% kable_styling()
headline | abstract | section |
---|---|---|
Discovery of ‘Dragon Man’ Skull in China May Add Species to Human Family Tree | A laborer discovered the fossil and hid it in a well for 85 years. Scientists say it could help sort out the human family tree and how our species emerged. | Science |
Why Frigid Mars Is the Perfect Place to Look for Ancient Life | Our early days on Earth have almost entirely disappeared, but on Mars, the past is entombed. | Opinion |
Infrastructure, Surfside, Giuliani: Your Thursday Evening Briefing | Here’s what you need to know at the end of the day. | Briefing |
Can We Live to 200? Here’s a Roadmap | 43 advances that could radically extend life spans over the next 100 years. | Magazine |
The Science of Climate Change Explained: Facts, Evidence and Proof | Definitive answers to the big questions. | Climate |
Transcript: Ezra Klein Interviews Adam Tooze | Every Tuesday and Friday, Ezra Klein invites you into a conversation about something that matters, like today’s episode with Adam Tooze. Listen wherever you get your podcasts. | Podcasts |
baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=molecular fossil&begin_date=20180101&api-key=",api_key)
initialQuery <- fromJSON(baseurl)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1)
pages <- list()
for(i in 0:maxPages){
nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame()
message("Retrieving page ", i)
pages[[i+1]] <- nytSearch
Sys.sleep(2)
}
## Retrieving page 0
## Retrieving page 1
## Retrieving page 2
all_results <- rbind_pages(pages)
all_reduced_results <- all_results %>% select(headline = response.docs.headline.main, abstract = response.docs.abstract, section = response.docs.section_name)
Grouped by section and counted the number of articles about ‘molecular fossils’ that came from each section since 2018.
all_reduced_results %>% group_by(section) %>% count() %>% kbl() %>% kable_styling()
section | n |
---|---|
Briefing | 2 |
Climate | 2 |
Crosswords & Games | 1 |
Magazine | 3 |
Opinion | 3 |
Podcasts | 1 |
Science | 11 |
Style | 1 |
T Brand | 3 |
The Learning Network | 1 |
It’s easy to get great out of the box performance with th jsonlite package when querying APIs. I did try a couple of additional things that I haven’t documented. I tried the R nytimes package and found it limiting and also explored querying subjects with more results and found that this for loop approach runs into trouble with results greater than 150 articles.