The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis
You’ll need to start by signing up for an API key.
Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame
library(httr)
library(jsonlite)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
I connected to the NYT Books API using the GET function.
res = GET("https://api.nytimes.com/svc/books/v3/lists/full-overview.json?api-key=wAWL9GvlAEQXqPqHPdP6g5NKswYDCt26")
I successfully connected based on the 200 status returned.
res
## Response [https://api.nytimes.com/svc/books/v3/lists/full-overview.json?api-key=wAWL9GvlAEQXqPqHPdP6g5NKswYDCt26]
## Date: 2022-10-30 23:54
## Status: 200
## Content-Type: application/json; charset=UTF-8
## Size: 516 kB
In this step, I turn the data into a Dataframe.
# the content here is turned from its raw form into a JSON style format and then turned to dataframe
data = fromJSON(rawToChar(res$content)) %>%
data.frame()
When looking at the dataframe I noticed that in the last column there appears to be a nested dataframe. Some of the more interesting information such as book title and author is in there.
data %>%
select(14:17) %>%
as_tibble()
## # A tibble: 18 x 4
## results.lists.list_image results.lists.lis~ results.lists.l~ results.lists.b~
## <lgl> <lgl> <lgl> <list>
## 1 NA NA NA <df [15 x 25]>
## 2 NA NA NA <df [15 x 25]>
## 3 NA NA NA <df [15 x 25]>
## 4 NA NA NA <df [15 x 25]>
## 5 NA NA NA <df [15 x 25]>
## 6 NA NA NA <df [15 x 25]>
## 7 NA NA NA <df [10 x 25]>
## 8 NA NA NA <df [10 x 25]>
## 9 NA NA NA <df [10 x 25]>
## 10 NA NA NA <df [10 x 25]>
## 11 NA NA NA <df [10 x 25]>
## 12 NA NA NA <df [15 x 25]>
## 13 NA NA NA <df [15 x 25]>
## 14 NA NA NA <df [10 x 25]>
## 15 NA NA NA <df [15 x 25]>
## 16 NA NA NA <df [15 x 25]>
## 17 NA NA NA <df [10 x 25]>
## 18 NA NA NA <df [10 x 25]>
To access the information under the results.lists.books column, I use the function unnest.
data <- unnest(data, results.lists.books)
head(data)
## # A tibble: 6 x 41
## status copyright num_results results.bestsel~ results.publish~
## <chr> <chr> <int> <chr> <chr>
## 1 OK Copyright (c) 2022 The N~ 230 2022-10-22 2022-11-06
## 2 OK Copyright (c) 2022 The N~ 230 2022-10-22 2022-11-06
## 3 OK Copyright (c) 2022 The N~ 230 2022-10-22 2022-11-06
## 4 OK Copyright (c) 2022 The N~ 230 2022-10-22 2022-11-06
## 5 OK Copyright (c) 2022 The N~ 230 2022-10-22 2022-11-06
## 6 OK Copyright (c) 2022 The N~ 230 2022-10-22 2022-11-06
## # ... with 36 more variables: results.published_date_description <chr>,
## # results.previous_published_date <chr>, results.next_published_date <chr>,
## # results.lists.list_id <int>, results.lists.list_name <chr>,
## # results.lists.list_name_encoded <chr>, results.lists.display_name <chr>,
## # results.lists.updated <chr>, results.lists.list_image <lgl>,
## # results.lists.list_image_width <lgl>,
## # results.lists.list_image_height <lgl>, age_group <chr>, ...
Lastly, I selected the columns of interest such as title and author. If I were to do an analysis, I would work with the columns selected in this new set
data <- data %>%
select(c(35,36,40,38,20,26,34,12,5,7,4,13))
as_tibble()
## Warning: The `x` argument of `as_tibble()` can't be missing as of tibble 3.0.0.
## # A tibble: 0 x 0
data
## # A tibble: 230 x 12
## rank rank_last_week weeks_on_list title author contributor publisher
## <int> <int> <int> <chr> <chr> <chr> <chr>
## 1 1 0 1 IT STARTS WI~ Colle~ by Colleen~ Atria
## 2 2 0 1 THE BOYS FRO~ John ~ by John Gr~ Doubleday
## 3 3 4 71 IT ENDS WITH~ Colle~ by Colleen~ Atria
## 4 4 3 46 VERITY Colle~ by Colleen~ Grand Ce~
## 5 5 0 1 DEMON COPPER~ Barba~ by Barbara~ Harper
## 6 6 1 2 LONG SHADOWS David~ by David B~ Grand Ce~
## 7 7 7 41 UGLY LOVE Colle~ by Colleen~ Atria
## 8 8 5 7 FAIRY TALE Steph~ by Stephen~ Scribner
## 9 9 6 183 WHERE THE CR~ Delia~ by Delia O~ Putnam
## 10 10 8 3 MAD HONEY Jodi ~ by Jodi Pi~ Ballanti~
## # ... with 220 more rows, and 5 more variables:
## # results.lists.display_name <chr>, results.published_date <chr>,
## # results.previous_published_date <chr>, results.bestsellers_date <chr>,
## # results.lists.updated <chr>