Data607 HW9 WilliamAiken

Method

1. Created an account with The New York Times and request an API key

2. Loaded libraries and used the keyring package to save my API key into my environment

library(jsonlite)
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()  masks stats::filter()
## x purrr::flatten() masks jsonlite::flatten()
## x dplyr::lag()     masks stats::lag()

library(keyring)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

first_time <- FALSE


#If this is the very first time you are running this script you need to use save out your api key using keyring
if(first_time){
key_set_with_value(service = "NYT api",password = "YOUR_API_KEY_GOES_HERE")
}
api_key <- key_get("NYT api")

3. I used an article by Jonathan D Fitzgerald on storybench.org to get started with the jsonlite package

link

I used the paste0 function to concatenate my api key into the my query to the NYT api
The fromJSON and data.frame function do the heavy lifting of converting my JSON into a data frame
Queried articles related to ‘molecular fossils’ - organic compounds in the fossil record that are derived from once living organisms since the beginning of 2021.
I used the NYT Most Popular API

#Lets connect and look for articles in the Most Popular API related to molecular fossil
results <- fromJSON(paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=molecular fossil&begin_date=20210101&api-key=",api_key), flatten = TRUE) %>% data.frame()
glimpse(results)

## Rows: 6
## Columns: 32
## $ status                                <chr> "OK", "OK", "OK", "OK", "OK", "O…
## $ copyright                             <chr> "Copyright (c) 2021 The New York…
## $ response.docs.abstract                <chr> "A laborer discovered the fossil…
## $ response.docs.web_url                 <chr> "https://www.nytimes.com/2021/06…
## $ response.docs.snippet                 <chr> "A laborer discovered the fossil…
## $ response.docs.lead_paragraph          <chr> "Scientists on Friday announced …
## $ response.docs.print_section           <chr> "A", "A", NA, "MM", "D", NA
## $ response.docs.print_page              <chr> "1", "27", NA, "49", "2", NA
## $ response.docs.source                  <chr> "The New York Times", "The New Y…
## $ response.docs.multimedia              <list> [<data.frame[73 x 19]>], [<data.…
## $ response.docs.keywords                <list> [<data.frame[10 x 4]>], [<data.f…
## $ response.docs.pub_date                <chr> "2021-06-25T15:00:12+0000", "20…
## $ response.docs.document_type           <chr> "article", "article", "article"…
## $ response.docs.news_desk               <chr> "Science", "OpEd", "NYTNow", "Ma…
## $ response.docs.section_name            <chr> "Science", "Opinion", "Briefing"…
## $ response.docs.type_of_material        <chr> "News", "Op-Ed", "briefing", "In…
## $ response.docs._id                     <chr> "nyt://article/56530668-e4d8-5ca…
## $ response.docs.word_count              <int> 1520, 1081, 1062, 0, 6520, 13540
## $ response.docs.uri                     <chr> "nyt://article/56530668-e4d8-5ca…
## $ response.docs.headline.main           <chr> "Discovery of ‘Dragon Man’ Skull…
## $ response.docs.headline.kicker         <chr> "Matter", NA, NA, "The Health Is…
## $ response.docs.headline.content_kicker <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.headline.print_headline <chr> "Skull May Point to New Kind of …
## $ response.docs.headline.name           <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.headline.seo            <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.headline.sub            <lgl> NA, NA, NA, NA, NA, NA
## $ response.docs.byline.original         <chr> "By Carl Zimmer", "By Sarah Stew…
## $ response.docs.byline.person           <list> [<data.frame[1 x 8]>], [<data.fr…
## $ response.docs.byline.organization     <lgl> NA, NA, NA, NA, NA, NA
## $ response.meta.hits                    <int> 6, 6, 6, 6, 6, 6
## $ response.meta.offset                  <int> 0, 0, 0, 0, 0, 0
## $ response.meta.time                    <int> 24, 24, 24, 24, 24, 24

4. I select the columns that I’m interested after inspecting the data frame head with ‘glimpse’ (I also really like ‘names()’)

I kept the headline, abstract and section columns in the data frame

reduced_results <- results %>% select(headline = response.docs.headline.main, abstract = response.docs.abstract, section = response.docs.section_name)

reduced_results %>% kbl() %>% kable_styling()

headline	abstract	section
Discovery of ‘Dragon Man’ Skull in China May Add Species to Human Family Tree	A laborer discovered the fossil and hid it in a well for 85 years. Scientists say it could help sort out the human family tree and how our species emerged.	Science
Why Frigid Mars Is the Perfect Place to Look for Ancient Life	Our early days on Earth have almost entirely disappeared, but on Mars, the past is entombed.	Opinion
Infrastructure, Surfside, Giuliani: Your Thursday Evening Briefing	Here’s what you need to know at the end of the day.	Briefing
Can We Live to 200? Here’s a Roadmap	43 advances that could radically extend life spans over the next 100 years.	Magazine
The Science of Climate Change Explained: Facts, Evidence and Proof	Definitive answers to the big questions.	Climate
Transcript: Ezra Klein Interviews Adam Tooze	Every Tuesday and Friday, Ezra Klein invites you into a conversation about something that matters, like today’s episode with Adam Tooze. Listen wherever you get your podcasts.	Podcasts

5. The NYT api only returns 10 responses at a time. With the jsonlite package it is possible to iteratively pull all responses 10 at a time

Saved out the query as a string
Calculate how many times you need to iterate by counting the number of hits and dividing by 10 (this is Jonathan D Fitzgerald’s method)

baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=molecular fossil&begin_date=20180101&api-key=",api_key)


initialQuery <- fromJSON(baseurl)
maxPages <- round((initialQuery$response$meta$hits[1] / 10)-1)

6. Now you save all the ‘pages’ as a list that you generate by iterating over a for loop (this is Jonathan D Fitzgerald’s method)

use the rbind_pages function to create a data frame
I reduce my data frame down to the colums of interest

pages <- list()
for(i in 0:maxPages){
  nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame() 
  message("Retrieving page ", i)
  pages[[i+1]] <- nytSearch 
  Sys.sleep(2) 
}

## Retrieving page 0

## Retrieving page 1

## Retrieving page 2

all_results <- rbind_pages(pages)

all_reduced_results <- all_results %>% select(headline = response.docs.headline.main, abstract = response.docs.abstract, section = response.docs.section_name)

Data607 HW9 WilliamAiken

William Aiken

10/24/2021

Introduction

Method

1. Created an account with The New York Times and request an API key

2. Loaded libraries and used the keyring package to save my API key into my environment

3. I used an article by Jonathan D Fitzgerald on storybench.org to get started with the jsonlite package

4. I select the columns that I’m interested after inspecting the data frame head with ‘glimpse’ (I also really like ‘names()’)

5. The NYT api only returns 10 responses at a time. With the jsonlite package it is possible to iteratively pull all responses 10 at a time

6. Now you save all the ‘pages’ as a list that you generate by iterating over a for loop (this is Jonathan D Fitzgerald’s method)

Results

Conclusion

section	n
Briefing	2
Climate	2
Crosswords & Games	1
Magazine	3
Opinion	3
Podcasts	1
Science	11
Style	1
T Brand	3
The Learning Network	1