This assignment is to choose one of the NY Time APIs, construct an interface in R to read in the JSON data, then transform data to an R dataframe.
The NY Times Newswire API provides an up-to-the-minute stream of published articles. Usage requires obtaining an authorized API access key after completing registration form.
The registered API key is stored in a .csv file and loaded at script run time.
An HTTP API query for articles was requested.
# query NYT for all articles
theURL <- "D607_A09_API.csv"
nyt_API_key <- read.csv(file=theURL, header=TRUE, sep=",")
nyt_URL_all <- c("https://api.nytimes.com/svc/news/v3/content/all/all.json?api-key=")
nyt_URL_wAPIkey <- str_c(nyt_URL_all,nyt_API_key)
nyt_r <- GET(nyt_URL_wAPIkey)
# check response code
http_status(nyt_r)
## $category
## [1] "Success"
##
## $reason
## [1] "OK"
##
## $message
## [1] "Success: (200) OK"
After a successful API query, “raw” JSON data is transformed into an R data frame with containing 20 observations comprised of 26 variables.
The data frame is subsetted containing section name, article title, byline, and creation date, arranged in section and title order, then reported.
## [1] 20 26
## [1] "status" "copyright"
## [3] "num_results" "results.slug_name"
## [5] "results.section" "results.subsection"
## [7] "results.title" "results.abstract"
## [9] "results.url" "results.byline"
## [11] "results.thumbnail_standard" "results.item_type"
## [13] "results.source" "results.updated_date"
## [15] "results.created_date" "results.published_date"
## [17] "results.first_published_date" "results.material_type_facet"
## [19] "results.kicker" "results.subheadline"
## [21] "results.des_facet" "results.org_facet"
## [23] "results.per_facet" "results.geo_facet"
## [25] "results.related_urls" "results.multimedia"
nyt_subset_df <- nyt_df %>%
select(results.section, results.title, results.byline,results.created_date) %>%
arrange(results.section, results.title)
tibble(nyt_subset_df)
## # A tibble: 20 x 4
## results.section results.title results.byline results.created_~
## <chr> <chr> <chr> <chr>
## 1 Arts Jerry Jeff Walker, Who ~ "BY BILL FRISKICS~ 2020-10-24T16:39~
## 2 Briefing pmvote "" 2020-10-24T16:44~
## 3 Opinion A Photographer’s Amer~ "BY AN-MY LÊ" 2020-10-24T17:04~
## 4 Sports At heavyweight, bigger ~ "BY MORGAN CAMPBE~ 2020-10-24T15:47~
## 5 Sports Early Virus Scares Kept~ "BY TYLER KEPNER" 2020-10-24T17:59~
## 6 Sports Nurmagomedov announced ~ "BY MORGAN CAMPBE~ 2020-10-24T17:18~
## 7 Sports Nurmagomedov beat Gaeth~ "BY MORGAN CAMPBE~ 2020-10-24T16:55~
## 8 Sports Round 1: Nurmagomedov p~ "BY MORGAN CAMPBE~ 2020-10-24T16:50~
## 9 Sports Volkov dropped Harris w~ "BY MORGAN CAMPBE~ 2020-10-24T15:54~
## 10 Sports Whittaker took a unanim~ "BY MORGAN CAMPBE~ 2020-10-24T16:31~
## 11 Theater Edith O’Hara, a Fixtu~ "BY NEIL GENZLING~ 2020-10-24T16:48~
## 12 U.S. ‘Florida Man wouldnâ€~ "BY GLENN THRUSH ~ 2020-10-24T17:52~
## 13 U.S. Enthusiastic voters are~ "BY MICHAEL LEVEN~ 2020-10-24T16:12~
## 14 U.S. Iowa: This presidential~ "BY LUKE BROADWAT~ 2020-10-24T17:00~
## 15 U.S. Murkowski, in a turnabo~ "BY NICHOLAS FAND~ 2020-10-24T16:00~
## 16 U.S. One of the Parkland sho~ "BY EVAN NICOLE B~ 2020-10-24T17:14~
## 17 U.S. Virus Surge Shadows Tru~ "BY SHANE GOLDMAC~ 2020-10-24T15:31~
## 18 World Poland’s president te~ "BY MONIKA PRONCZ~ 2020-10-24T16:04~
## 19 World The Czech Republic’s ~ "BY HANA DE GOEIJ" 2020-10-24T15:56~
## 20 World Venezuela Opposition Fi~ "BY ANATOLY KURMA~ 2020-10-24T17:12~
Query the list of NY Times Section names. This list can be used in future section specific queries.
# Query NYT section names then tidy
nyt_URL_section <- c("https://api.nytimes.com/svc/news/v3/content/section-list.json?api-key=")
nyt_URL_wAPIkey <- str_c(nyt_URL_section,nyt_API_key)
nyt_r <- GET(nyt_URL_wAPIkey)
# check response code
http_status(nyt_r)
## $category
## [1] "Success"
##
## $reason
## [1] "OK"
##
## $message
## [1] "Success: (200) OK"
# transform data
nyt_query <- httr::content(nyt_r, as="raw")
nyt_json <- jsonlite::fromJSON(rawToChar(nyt_query))
nyt_sec_df <- flatten(as.data.frame(nyt_json))
nyt_subset_df <- nyt_sec_df %>%
select(results.section,results.display_name) %>%
arrange(results.section)
tibble(nyt_subset_df)
## # A tibble: 50 x 2
## results.section results.display_name
## <chr> <chr>
## 1 admin Admin
## 2 arts Arts
## 3 automobiles Automobiles
## 4 books Books
## 5 briefing Briefing
## 6 business Business
## 7 climate Climate
## 8 corrections Corrections
## 9 crosswords & games Crosswords & Games
## 10 education Education
## # ... with 40 more rows
In this assignment, the following tasks were performed:
- completion of an online registration form necessary to obtain an authorized NY Times API key for query access,
- the API key was stored in a .csv file and read at script run time,
- an HTTP API query for the NY Times Newswire limited to 20 results (default),
- these results were received in JSON data format then transformed into a dataframe,
- several columns were selected and arranged by article section and title for reporting, and
- an additional query of all sections were obtained and arranged by section for reporting.
In conclusion, this working API script can be used to create specific queries. For example, searching all articles with titles and abstracts containing COVID and/or flu that can be extracted for further data analysis and text mining.