In this assignment, we choose one of the New York Times APIs to construct an interface in R to read in the JSON data, and transform it to an R dataframe. The New York Times web site provides a set of APIs available on http://developer.nytimes.com/docs.
library(kableExtra)# manipulate table styles
library(dplyr)
library(httr)
library(rtimes)
library(ggplot2)
We use the function as_search to output a series of S3 objects, one for each item found. Each element, an object of class as_search, is a summary of a list of data.
The API key was put the key in your .Renviron file, which will be called on startup of R. So we don’t don’t have to enter the key for each run of a function. The following function adds the key to .Renviron file.
info <- read.table("NYT_API_Key.txt", header = TRUE, stringsAsFactors = FALSE)
names(info)
## [1] "Key"
dim(info)
## [1] 1 1
key <- info$Key
Sys.setenv(NYTIMES_AS_KEY = key)
In this example, the search article API is used with query = “Brazil”, during Oct 10 - 26, 2018.
resp <- as_search(q="Brazil", begin_date = "20181010", end_date = '20181026')
resp
## $copyright
## [1] "Copyright (c) 2015 The New York Times Company. All Rights Reserved."
##
## $meta
## # A tibble: 1 x 3
## hits offset time
## <int> <int> <int>
## 1 93 0 27
##
## $data
## # A tibble: 10 x 25
## web_url snippet print_page source multimedia keywords pub_date
## * <chr> <chr> <chr> <chr> <list> <list> <chr>
## 1 https:… Jair B… 22 The N… <data.fra… <data.f… 2018-10…
## 2 https:… With m… 1 The N… <data.fra… <data.f… 2018-10…
## 3 https:… "Some … <NA> The N… <data.fra… <data.f… 2018-10…
## 4 https:… “He so… 9 The N… <data.fra… <data.f… 2018-10…
## 5 https:… The co… 4 The N… <data.fra… <data.f… 2018-10…
## 6 https:… Neithe… <NA> The N… <data.fra… <data.f… 2018-10…
## 7 https:… Jair B… <NA> The N… <data.fra… <data.f… 2018-10…
## 8 https:… Facebo… 1 The N… <data.fra… <data.f… 2018-10…
## 9 https:… The mu… 8 The N… <data.fra… <data.f… 2018-10…
## 10 https:… Brazil… <NA> AP <data.fra… <data.f… 2018-10…
## # ... with 18 more variables: document_type <chr>, news_desk <chr>,
## # type_of_material <chr>, `_id` <chr>, word_count <int>, score <dbl>,
## # uri <chr>, section_name <chr>, headline.main <chr>,
## # headline.kicker <chr>, headline.content_kicker <chr>,
## # headline.print_headline <chr>, headline.name <lgl>,
## # headline.seo <lgl>, headline.sub <lgl>, byline.original <chr>,
## # byline.person <list>, byline.organization <chr>
##
## $facets
## NULL
class(resp$data)
## [1] "tbl_df" "tbl" "data.frame"
glimpse(resp$data)
## Observations: 10
## Variables: 25
## $ web_url <chr> "https://www.nytimes.com/2018/10/21/op...
## $ snippet <chr> "Jair Bolsonaro, the blustery hard-rig...
## $ print_page <chr> "22", "1", NA, "9", "4", NA, NA, "1", ...
## $ source <chr> "The New York Times", "The New York Ti...
## $ multimedia <list> [<c("0", "0", "0", "0", "0", "0", "0"...
## $ keywords <list> [<c("glocations", "persons", "subject...
## $ pub_date <chr> "2018-10-22T00:24:53+0000", "2018-10-1...
## $ document_type <chr> "article", "article", "article", "arti...
## $ news_desk <chr> "Editorial", "Culture", "OpEd", "Forei...
## $ type_of_material <chr> "Editorial", "Review", "Op-Ed", "News"...
## $ `_id` <chr> "5bcd18db00a1bc2872e8fc81", "5bc0f1420...
## $ word_count <int> 516, 1421, 1106, 555, 1079, 921, 332, ...
## $ score <dbl> 89.40305, 83.65459, 83.21866, 80.23711...
## $ uri <chr> "nyt://article/6df6e741-5fc0-57f5-a5e7...
## $ section_name <chr> NA, "Art & Design", NA, "Americas", "A...
## $ headline.main <chr> "Brazil’s Sad Choice", "Brazil Enthral...
## $ headline.kicker <chr> NA, "Critic’s Pick", NA, NA, NA, NA, N...
## $ headline.content_kicker <chr> NA, "Critic’s Pick", NA, NA, NA, NA, N...
## $ headline.print_headline <chr> "Brazil’s Sad Choice", "Enthralling, F...
## $ headline.name <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ headline.seo <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ headline.sub <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ byline.original <chr> "By THE EDITORIAL BOARD", "By HOLLAND ...
## $ byline.person <list> [<>, <Holland, NA, COTTER, NA, NA, re...
## $ byline.organization <chr> "THE EDITORIAL BOARD", NA, NA, NA, NA,...
We visualize the number of articles grouped by news_desk variable from the table resp$data. The code below was restrieve from http://www.storybench.org/working-with-the-new-york-times-api-in-r/.
data <- resp$data
data %>%
group_by(news_desk) %>%
summarize(count=n()) %>%
mutate(percent = (count / sum(count))*100) %>%
ggplot() +
geom_bar(aes(y=percent, x=news_desk, fill=news_desk), stat = "identity") + coord_flip()
In this example, we use the package httr to search NYT best sellers book via books_api.
url <- "http://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json"
resp <- GET(url, query=list("api-key"=key))
resp
## Response [http://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?api-key=d1975df9f67a4d3ab45507eb1387fb63]
## Date: 2018-11-03 18:09
## Status: 200
## Content-Type: application/json; charset=UTF-8
## Size: 14.9 kB
http_status(resp)
## $category
## [1] "Success"
##
## $reason
## [1] "OK"
##
## $message
## [1] "Success: (200) OK"
http_type(resp)
## [1] "application/json"
parsed <- jsonlite::fromJSON(content(resp, "text",flatten = TRUE), simplifyVector = FALSE)
names(parsed)
## [1] "status" "copyright" "num_results" "results"
class(parsed)
## [1] "list"
parsed$results[1:2]
## [[1]]
## [[1]]$title
## [1] "\"I GIVE YOU MY BODY ...\""
##
## [[1]]$description
## [1] "The author of the Outlander novels gives tips on writing sex scenes, drawing on examples from the books."
##
## [[1]]$contributor
## [1] "by Diana Gabaldon"
##
## [[1]]$author
## [1] "Diana Gabaldon"
##
## [[1]]$contributor_note
## [1] ""
##
## [[1]]$price
## [1] 0
##
## [[1]]$age_group
## [1] ""
##
## [[1]]$publisher
## [1] "Dell"
##
## [[1]]$isbns
## [[1]]$isbns[[1]]
## [[1]]$isbns[[1]]$isbn10
## [1] "0399178570"
##
## [[1]]$isbns[[1]]$isbn13
## [1] "9780399178573"
##
##
##
## [[1]]$ranks_history
## [[1]]$ranks_history[[1]]
## [[1]]$ranks_history[[1]]$primary_isbn10
## [1] "0399178570"
##
## [[1]]$ranks_history[[1]]$primary_isbn13
## [1] "9780399178573"
##
## [[1]]$ranks_history[[1]]$rank
## [1] 8
##
## [[1]]$ranks_history[[1]]$list_name
## [1] "Advice How-To and Miscellaneous"
##
## [[1]]$ranks_history[[1]]$display_name
## [1] "Advice, How-To & Miscellaneous"
##
## [[1]]$ranks_history[[1]]$published_date
## [1] "2016-09-04"
##
## [[1]]$ranks_history[[1]]$bestsellers_date
## [1] "2016-08-20"
##
## [[1]]$ranks_history[[1]]$weeks_on_list
## [1] 1
##
## [[1]]$ranks_history[[1]]$ranks_last_week
## NULL
##
## [[1]]$ranks_history[[1]]$asterisk
## [1] 0
##
## [[1]]$ranks_history[[1]]$dagger
## [1] 0
##
##
##
## [[1]]$reviews
## [[1]]$reviews[[1]]
## [[1]]$reviews[[1]]$book_review_link
## [1] ""
##
## [[1]]$reviews[[1]]$first_chapter_link
## [1] ""
##
## [[1]]$reviews[[1]]$sunday_review_link
## [1] ""
##
## [[1]]$reviews[[1]]$article_chapter_link
## [1] ""
##
##
##
##
## [[2]]
## [[2]]$title
## [1] "\"MOST BLESSED OF THE PATRIARCHS\""
##
## [[2]]$description
## [1] "A character study that attempts to make sense of Jefferson’s contradictions."
##
## [[2]]$contributor
## [1] "by Annette Gordon-Reed and Peter S. Onuf"
##
## [[2]]$author
## [1] "Annette Gordon-Reed and Peter S Onuf"
##
## [[2]]$contributor_note
## [1] ""
##
## [[2]]$price
## [1] 0
##
## [[2]]$age_group
## [1] ""
##
## [[2]]$publisher
## [1] "Liveright"
##
## [[2]]$isbns
## [[2]]$isbns[[1]]
## [[2]]$isbns[[1]]$isbn10
## [1] "0871404427"
##
## [[2]]$isbns[[1]]$isbn13
## [1] "9780871404428"
##
##
##
## [[2]]$ranks_history
## [[2]]$ranks_history[[1]]
## [[2]]$ranks_history[[1]]$primary_isbn10
## [1] "0871404427"
##
## [[2]]$ranks_history[[1]]$primary_isbn13
## [1] "9780871404428"
##
## [[2]]$ranks_history[[1]]$rank
## [1] 16
##
## [[2]]$ranks_history[[1]]$list_name
## [1] "Hardcover Nonfiction"
##
## [[2]]$ranks_history[[1]]$display_name
## [1] "Hardcover Nonfiction"
##
## [[2]]$ranks_history[[1]]$published_date
## [1] "2016-05-01"
##
## [[2]]$ranks_history[[1]]$bestsellers_date
## [1] "2016-04-16"
##
## [[2]]$ranks_history[[1]]$weeks_on_list
## [1] 1
##
## [[2]]$ranks_history[[1]]$ranks_last_week
## NULL
##
## [[2]]$ranks_history[[1]]$asterisk
## [1] 1
##
## [[2]]$ranks_history[[1]]$dagger
## [1] 0
##
##
##
## [[2]]$reviews
## [[2]]$reviews[[1]]
## [[2]]$reviews[[1]]$book_review_link
## [1] ""
##
## [[2]]$reviews[[1]]$first_chapter_link
## [1] ""
##
## [[2]]$reviews[[1]]$sunday_review_link
## [1] ""
##
## [[2]]$reviews[[1]]$article_chapter_link
## [1] ""
The results of the query are parsed as a list that has different number of rows. To convert it to a data frame we need to make even the number of rows in the list . The following code was retrieved from https://stackoverflow.com/questions/27153979/converting-nested-list-unequal-length-to-data-frame.
jsonData <- parsed$results
class(jsonData)
## [1] "list"
indx <- sapply(jsonData, length)
res <- as.data.frame(do.call(rbind,lapply(jsonData, `length<-`,
max(indx))))
colnames(res) <- names(jsonData[[which.max(indx)]])
class(res)
## [1] "data.frame"
names(res)
## [1] "title" "description" "contributor"
## [4] "author" "contributor_note" "price"
## [7] "age_group" "publisher" "isbns"
## [10] "ranks_history" "reviews"
dim(res)
## [1] 20 11
kable(head(res))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| title | description | contributor | author | contributor_note | price | age_group | publisher | isbns | ranks_history | reviews |
|---|---|---|---|---|---|---|---|---|---|---|
| “I GIVE YOU MY BODY …” | The author of the Outlander novels gives tips on writing sex scenes, drawing on examples from the books. | by Diana Gabaldon | Diana Gabaldon | 0 | Dell | list(list(isbn10 = “0399178570”, isbn13 = “9780399178573”)) | list(list(primary_isbn10 = “0399178570”, primary_isbn13 = “9780399178573”, rank = 8, list_name = “Advice How-To and Miscellaneous”, display_name = “Advice, How-To & Miscellaneous”, published_date = “2016-09-04”, bestsellers_date = “2016-08-20”, weeks_on_list = 1, ranks_last_week = NULL, asterisk = 0, dagger = 0)) | list(list(book_review_link = “”, first_chapter_link = “”, sunday_review_link = “”, article_chapter_link = “”)) | ||
| “MOST BLESSED OF THE PATRIARCHS” | A character study that attempts to make sense of Jefferson’s contradictions. | by Annette Gordon-Reed and Peter S. Onuf | Annette Gordon-Reed and Peter S Onuf | 0 | Liveright | list(list(isbn10 = “0871404427”, isbn13 = “9780871404428”)) | list(list(primary_isbn10 = “0871404427”, primary_isbn13 = “9780871404428”, rank = 16, list_name = “Hardcover Nonfiction”, display_name = “Hardcover Nonfiction”, published_date = “2016-05-01”, bestsellers_date = “2016-04-16”, weeks_on_list = 1, ranks_last_week = NULL, asterisk = 1, dagger = 0)) | list(list(book_review_link = “”, first_chapter_link = “”, sunday_review_link = “”, article_chapter_link = “”)) | ||
| #ASKGARYVEE | The entrepreneur expands on subjects addressed on his Internet show, like marketing, management and social media. | by Gary Vaynerchuk | Gary Vaynerchuk | 0 | HarperCollins | list(list(isbn10 = “0062273124”, isbn13 = “9780062273123”), list(isbn10 = “0062273132”, isbn13 = “9780062273130”)) | list(list(primary_isbn10 = “0062273124”, primary_isbn13 = “9780062273123”, rank = 5, list_name = “Business Books”, display_name = “Business”, published_date = “2016-04-10”, bestsellers_date = “2016-03-26”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 1), list(primary_isbn10 = “0062273124”, primary_isbn13 = “9780062273123”, rank = 6, list_name = “Advice How-To and Miscellaneous”, display_name = “Advice, How-To & Miscellaneous”, published_date = “2016-03-27”, bestsellers_date = “2016-03-12”, weeks_on_list = 1, ranks_last_week = NULL, asterisk = 0, dagger = 1)) | list(list(book_review_link = “”, first_chapter_link = “”, sunday_review_link = “”, article_chapter_link = “”)) | ||
| #GIRLBOSS | An online fashion retailer traces her path to success. | by Sophia Amoruso | Sophia Amoruso | 0 | Portfolio/Penguin/Putnam | list(list(isbn10 = “039916927X”, isbn13 = “9780399169274”), list(isbn10 = “1591847931”, isbn13 = “9781591847939”)) | list(list(primary_isbn10 = “1591847931”, primary_isbn13 = “9781591847939”, rank = 8, list_name = “Business Books”, display_name = “Business”, published_date = “2016-03-13”, bestsellers_date = “2016-02-27”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 0), list(primary_isbn10 = “1591847931”, primary_isbn13 = “9781591847939”, rank = 9, list_name = “Business Books”, display_name = “Business”, published_date = “2016-01-17”, bestsellers_date = “2016-01-02”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 0), list(primary_isbn10 = “1591847931”, primary_isbn13 = “9781591847939”, rank = 9, list_name = “Business Books”, display_name = “Business”, published_date = “2015-12-13”, bestsellers_date = “2015-11-28”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 0), list(primary_isbn10 = “1591847931”, primary_isbn13 = “9781591847939”, rank = 8, list_name = “Business Books”, display_name = “Business”, published_date = “2015-11-15”, bestsellers_date = “2015-10-31”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 0), list(primary_isbn10 = “039916927X”, primary_isbn13 = “9780399169274”, rank = 10, list_name = “Business Books”, display_name = “Business”, published_date = “2014-11-09”, bestsellers_date = “2014-10-25”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 0), list(primary_isbn10 = “039916927X”, primary_isbn13 = “9780399169274”, rank = 8, list_name = “Business Books”, display_name = “Business”, published_date = “2014-10-12”, bestsellers_date = “2014-09-27”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 0), list(primary_isbn10 = “039916927X”, primary_isbn13 = “9780399169274”, rank = 14, list_name = “Advice How-To and Miscellaneous”, display_name = “Advice, How-To & Miscellaneous”, published_date = “2014-09-21”, bestsellers_date = “2014-09-06”, weeks_on_list = 0, ranks_last_week = NULL, asterisk = 0, dagger = 0)) | list(list(book_review_link = “”, first_chapter_link = “”, sunday_review_link = “”, article_chapter_link = “”)) | ||
| #NEVERAGAIN | Students from Marjory Stoneman Douglas High School describe the Valentine’s Day mass shooting and outline ways to prevent similar incidents. | by David Hogg and Lauren Hogg | David Hogg and Lauren Hogg | 0 | Random House | list(list(isbn10 = “198480183X”, isbn13 = “9781984801838”)) | list(list(primary_isbn10 = “198480183X”, primary_isbn13 = “9781984801838”, rank = 9, list_name = “Paperback Nonfiction”, display_name = “Paperback Nonfiction”, published_date = “2018-07-08”, bestsellers_date = “2018-06-23”, weeks_on_list = 1, ranks_last_week = NULL, asterisk = 0, dagger = 0)) | list(list(book_review_link = “”, first_chapter_link = “”, sunday_review_link = “”, article_chapter_link = “”)) | ||
| $100 STARTUP | How to build a profitable start up for $100 or less and be your own boss. | by Chris Guillebeau | Chris Guillebeau | 23 | Crown Business | list(list(isbn10 = “0307951529”, isbn13 = “9780307951526”)) | list() | list(list(book_review_link = “”, first_chapter_link = “”, sunday_review_link = “”, article_chapter_link = “”)) |
We add a column with the ranks of the NYT best-sellers books, which will be located in the table as last column then we move it to the start of the column.
res <- mutate(res, row_number())
res <- rename(res, rank ="row_number()")
names(res)
## [1] "title" "description" "contributor"
## [4] "author" "contributor_note" "price"
## [7] "age_group" "publisher" "isbns"
## [10] "ranks_history" "reviews" "rank"
new_res <- res %>% select(rank, everything())
names(new_res)
## [1] "rank" "title" "description"
## [4] "contributor" "author" "contributor_note"
## [7] "price" "age_group" "publisher"
## [10] "isbns" "ranks_history" "reviews"
We note that the columns isbns, ranks_history and reviews contains lists and not that important for display of the data. So we remove them from display.
new_res <- new_res %>% select(rank:publisher)
kable(head(new_res))%>% kable_styling(bootstrap_options = c("striped", "condensed"))
| rank | title | description | contributor | author | contributor_note | price | age_group | publisher |
|---|---|---|---|---|---|---|---|---|
| 1 | “I GIVE YOU MY BODY …” | The author of the Outlander novels gives tips on writing sex scenes, drawing on examples from the books. | by Diana Gabaldon | Diana Gabaldon | 0 | Dell | ||
| 2 | “MOST BLESSED OF THE PATRIARCHS” | A character study that attempts to make sense of Jefferson’s contradictions. | by Annette Gordon-Reed and Peter S. Onuf | Annette Gordon-Reed and Peter S Onuf | 0 | Liveright | ||
| 3 | #ASKGARYVEE | The entrepreneur expands on subjects addressed on his Internet show, like marketing, management and social media. | by Gary Vaynerchuk | Gary Vaynerchuk | 0 | HarperCollins | ||
| 4 | #GIRLBOSS | An online fashion retailer traces her path to success. | by Sophia Amoruso | Sophia Amoruso | 0 | Portfolio/Penguin/Putnam | ||
| 5 | #NEVERAGAIN | Students from Marjory Stoneman Douglas High School describe the Valentine’s Day mass shooting and outline ways to prevent similar incidents. | by David Hogg and Lauren Hogg | David Hogg and Lauren Hogg | 0 | Random House | ||
| 6 | $100 STARTUP | How to build a profitable start up for $100 or less and be your own boss. | by Chris Guillebeau | Chris Guillebeau | 23 | Crown Business |