Assignment

The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis

You’ll need to start by signing up for an API key.

Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame

Load library

library(httr)
library(jsonlite)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Connect to NYT API

I connected to the NYT Books API using the GET function.

res = GET("https://api.nytimes.com/svc/books/v3/lists/full-overview.json?api-key=wAWL9GvlAEQXqPqHPdP6g5NKswYDCt26")

I successfully connected based on the 200 status returned.

res
## Response [https://api.nytimes.com/svc/books/v3/lists/full-overview.json?api-key=wAWL9GvlAEQXqPqHPdP6g5NKswYDCt26]
##   Date: 2022-10-30 23:54
##   Status: 200
##   Content-Type: application/json; charset=UTF-8
##   Size: 516 kB

Transform Data to Dataframe

In this step, I turn the data into a Dataframe.

#  the content here is turned from its raw form into a JSON style format and then turned to dataframe
data = fromJSON(rawToChar(res$content)) %>%
  data.frame()

Further Cleaning

When looking at the dataframe I noticed that in the last column there appears to be a nested dataframe. Some of the more interesting information such as book title and author is in there.

data %>%
  select(14:17) %>%
  as_tibble()
## # A tibble: 18 x 4
##    results.lists.list_image results.lists.lis~ results.lists.l~ results.lists.b~
##    <lgl>                    <lgl>              <lgl>            <list>          
##  1 NA                       NA                 NA               <df [15 x 25]>  
##  2 NA                       NA                 NA               <df [15 x 25]>  
##  3 NA                       NA                 NA               <df [15 x 25]>  
##  4 NA                       NA                 NA               <df [15 x 25]>  
##  5 NA                       NA                 NA               <df [15 x 25]>  
##  6 NA                       NA                 NA               <df [15 x 25]>  
##  7 NA                       NA                 NA               <df [10 x 25]>  
##  8 NA                       NA                 NA               <df [10 x 25]>  
##  9 NA                       NA                 NA               <df [10 x 25]>  
## 10 NA                       NA                 NA               <df [10 x 25]>  
## 11 NA                       NA                 NA               <df [10 x 25]>  
## 12 NA                       NA                 NA               <df [15 x 25]>  
## 13 NA                       NA                 NA               <df [15 x 25]>  
## 14 NA                       NA                 NA               <df [10 x 25]>  
## 15 NA                       NA                 NA               <df [15 x 25]>  
## 16 NA                       NA                 NA               <df [15 x 25]>  
## 17 NA                       NA                 NA               <df [10 x 25]>  
## 18 NA                       NA                 NA               <df [10 x 25]>

Unnest Data

To access the information under the results.lists.books column, I use the function unnest.

data <- unnest(data, results.lists.books)
head(data)
## # A tibble: 6 x 41
##   status copyright                 num_results results.bestsel~ results.publish~
##   <chr>  <chr>                           <int> <chr>            <chr>           
## 1 OK     Copyright (c) 2022 The N~         230 2022-10-22       2022-11-06      
## 2 OK     Copyright (c) 2022 The N~         230 2022-10-22       2022-11-06      
## 3 OK     Copyright (c) 2022 The N~         230 2022-10-22       2022-11-06      
## 4 OK     Copyright (c) 2022 The N~         230 2022-10-22       2022-11-06      
## 5 OK     Copyright (c) 2022 The N~         230 2022-10-22       2022-11-06      
## 6 OK     Copyright (c) 2022 The N~         230 2022-10-22       2022-11-06      
## # ... with 36 more variables: results.published_date_description <chr>,
## #   results.previous_published_date <chr>, results.next_published_date <chr>,
## #   results.lists.list_id <int>, results.lists.list_name <chr>,
## #   results.lists.list_name_encoded <chr>, results.lists.display_name <chr>,
## #   results.lists.updated <chr>, results.lists.list_image <lgl>,
## #   results.lists.list_image_width <lgl>,
## #   results.lists.list_image_height <lgl>, age_group <chr>, ...

Select data of interest

Lastly, I selected the columns of interest such as title and author. If I were to do an analysis, I would work with the columns selected in this new set

data <- data %>%
  select(c(35,36,40,38,20,26,34,12,5,7,4,13))
  as_tibble()
## Warning: The `x` argument of `as_tibble()` can't be missing as of tibble 3.0.0.
## # A tibble: 0 x 0
data
## # A tibble: 230 x 12
##     rank rank_last_week weeks_on_list title         author contributor publisher
##    <int>          <int>         <int> <chr>         <chr>  <chr>       <chr>    
##  1     1              0             1 IT STARTS WI~ Colle~ by Colleen~ Atria    
##  2     2              0             1 THE BOYS FRO~ John ~ by John Gr~ Doubleday
##  3     3              4            71 IT ENDS WITH~ Colle~ by Colleen~ Atria    
##  4     4              3            46 VERITY        Colle~ by Colleen~ Grand Ce~
##  5     5              0             1 DEMON COPPER~ Barba~ by Barbara~ Harper   
##  6     6              1             2 LONG SHADOWS  David~ by David B~ Grand Ce~
##  7     7              7            41 UGLY LOVE     Colle~ by Colleen~ Atria    
##  8     8              5             7 FAIRY TALE    Steph~ by Stephen~ Scribner 
##  9     9              6           183 WHERE THE CR~ Delia~ by Delia O~ Putnam   
## 10    10              8             3 MAD HONEY     Jodi ~ by Jodi Pi~ Ballanti~
## # ... with 220 more rows, and 5 more variables:
## #   results.lists.display_name <chr>, results.published_date <chr>,
## #   results.previous_published_date <chr>, results.bestsellers_date <chr>,
## #   results.lists.updated <chr>