Access with NYT APIs
Problem Statement
The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis
You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.
LOad Packages
Approach
I selected two APIs from NYT’s site, namely Books API and Articles Search API. The former is simpler and the latter is relatively complex. After creating the accounts (of each member), I obtained the the key. In the RMD, I stored the key in a variable and hid it with echo=FALSE.
I Books API
The BOOKS API provides information about book reviews and the NYT best seller list.
Extraction of books by Yuval Noah Harari
From Books API’s best seller list, I filtered out author Harari’s books, and recasted into R dataframe.
url <- paste0("https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?author=Yuval+Noah+Harari&api-key=",NYTKEY)
harari <- fromJSON(url) %>%
as.data.frame() %>%
select(4, 5, 7, 11)
names(harari) <- str_replace_all(names(harari), "results.", "")
kable(harari)| title | description | author | publisher |
|---|---|---|---|
| 21 LESSONS FOR THE 21ST CENTURY | Technological, political and social issues in the modern era, and the choices individuals might consider in facing them. | Yuval Noah Harari | Spiegel & Grau |
| HOMO DEUS | A look into the future by the author of “Sapiens.” Read by Derek Perkins. 14 hours, 53 minutes unabridged. | Yuval Noah Harari | HarperAudio |
| SAPIENS | How Homo sapiens became Earth’s dominant species. | Yuval Noah Harari | Harper Perennial |
| SAPIENS: A GRAPHIC HISTORY | An illustrated adaptation of the book that looks at humankind’s creation and evolution. | Yuval Noah Harari and David Vandermeulen | Harper Perennial |
Extraction of current best sellers
From Books APIs I selected the current best sellers of NYT.
url <- paste0("https://api.nytimes.com/svc/books/v3/lists/current/hardcover-fiction.json?api-key=",NYTKEY)
current <- fromJSON(url,flatten = TRUE)
x <- current$results # Having scrutinized the content of "fromJSON(url,flatten = TRUE)", I located that a usable data structure was nested inside of current$results.
y <- x$books # The exact data frame, where it reided was x$books.
kable(y)| rank | publisher | description | title | author | contributor | amazon_product_url |
|---|---|---|---|---|---|---|
| 1 | Doubleday | The third book in the Jake Brigance series. A 16-year-old is accused of killing a deputy in Clanton, Miss., in 1990. | A TIME FOR MERCY | John Grisham | by John Grisham | https://www.amazon.com/dp/0385545967?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 2 | Ballantine | In a sequel to “Ready Player One,” Wade Watts discovers a technological advancement and goes on a new quest. | READY PLAYER TWO | Ernest Cline | by Ernest Cline | https://www.amazon.com/dp/1524761338?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 3 | Little, Brown | The 28th book in the Alex Cross series. An investigation of a double homicide sends Alex Cross to Alabama. | DEADLY CROSS | James Patterson | by James Patterson | https://www.amazon.com/dp/0316420255?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 4 | Grand Central | A doctor serving in the Navy in Afghanistan goes back to North Carolina where two women change his life. | THE RETURN | Nicholas Sparks | by Nicholas Sparks | https://www.amazon.com/dp/1538728575?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 5 | Riverhead | The lives of twin sisters who run away from a Southern Black community at age 16 diverge as one returns and the other takes on a different racial identity but their fates intertwine. | THE VANISHING HALF | Brit Bennett | by Brit Bennett | https://www.amazon.com/dp/0525536299?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 6 | Putnam | In a quiet town on the North Carolina coast in 1969, a young woman who survived alone in the marsh becomes a murder suspect. | WHERE THE CRAWDADS SING | Delia Owens | by Delia Owens | https://www.amazon.com/Where-Crawdads-Sing-Delia-Owens/dp/0735219095?tag=NYTBSREV-20 |
| 7 | Grand Central | The F.B.I. agent Atlee Pine’s search for her twin sister overlaps with a military investigator’s hunt for someone involved in a global conspiracy. | DAYLIGHT | David Baldacci | by David Baldacci | https://www.amazon.com/dp/1538761696?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 8 | Delacorte | Jack Reacher intervenes on an ambush in Tennessee and uncovers a conspiracy. | THE SENTINEL | Lee Child and Andrew Child | by Lee Child and Andrew Child | https://www.amazon.com/dp/1984818465?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 9 | St. Martin’s | The first book in the Dragon Heart Legacy series. Breen Kelly travels through a portal in Ireland to a land of faeries and mermaids. | THE AWAKENING | Nora Roberts | by Nora Roberts | https://www.amazon.com/dp/1250272610?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 10 | Little, Brown | The sixth book in the Mickey Haller series. Haller defends himself when police find the body of a former client in his car’s trunk. | THE LAW OF INNOCENCE | Michael Connelly | by Michael Connelly | https://www.amazon.com/dp/0316485624?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 11 | Scribner | Four novellas: “Mr. Harrigan’s Phone,” “The Life of Chuck,” “Rat” and “If It Bleeds.” | IF IT BLEEDS | Stephen King | by Stephen King | https://www.amazon.com/dp/1982137975?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 12 | Atria | The 27th book in the Stephanie Plum series. Stephanie deals with a soldier of fortune from Little Havana. | FORTUNE AND GLORY | Janet Evanovich | by Janet Evanovich | https://www.amazon.com/dp/1982154837?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 13 | Viking | Nora Seed finds a library beyond the edge of the universe that contains books with multiple possibilities of the lives one could have lived. | THE MIDNIGHT LIBRARY | Matt Haig | by Matt Haig | https://www.amazon.com/dp/0525559477?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 14 | Tor/Forge | A Faustian bargain comes with a curse that affects the adventure Addie LaRue has across centuries. | THE INVISIBLE LIFE OF ADDIE LARUE | VE Schwab | by V.E. Schwab | https://www.amazon.com/dp/0765387565?tag=NYTBSREV-20&tag=NYTBSREV-20 |
| 15 | Viking | In a prequel to “The Pillars of the Earth,” a boatbuilder, a Norman noblewoman and a monk live in England under attack by the Welsh and the Vikings. | THE EVENING AND THE MORNING | Ken Follett | by Ken Follett | https://www.amazon.com/dp/0525954988?tag=NYTBSREV-20&tag=NYTBSREV-20 |
II Articles Search API
Article Search API provides faclity to look up articles by keyword, which can be refined by using filters and facets.
Setting up parameters
Extraction of chosen articles
There were 36 hits, with 10 hits per page. In max_pages, I computed the number of pages to be 3.
article_url <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",search_item,"&begin_date=",begin_date,"&end_date=",end_date,"&facet_filter=true&api-key=", NYTKEY, sep="")
query <- fromJSON(article_url)
max_pages <- round((query$response$meta$hits[1] / 10) - 1) # Computing the number of pagesExtraction of chosen articles
Now, I’ll create a dictionary, called pages, and store the individual pages in it.
pages <- list()
for(i in 0:max_pages){
nyt_search <- fromJSON(paste0(article_url, "&page=", i), flatten = TRUE) %>% data.frame()
message("Retrieving page ", i + 1)
pages[[i + 1]] <- nyt_search
Sys.sleep(2)
}## Retrieving page 1
## Retrieving page 2
## Retrieving page 3
r-binding all pages into one big data frame, whose class is shown below. At this point, I achieved my aim of extracting news into a data frame.
all_pages <- rbind_pages(pages) # Now, I rbind the individual pages into one big data frame
class(all_pages) # This is a data frame## [1] "data.frame"
Data Visualization
Now, I’ll explore what data visualization I can do.
all_pages %>%
group_by(response.docs.type_of_material) %>%
summarize(count = n()) %>%
mutate(percent = (count / sum(count)) * 100) %>%
ggplot() +
geom_bar(aes(y = percent, x = reorder(response.docs.type_of_material, percent), fill = response.docs.type_of_material), stat = "identity") + coord_flip()## `summarise()` ungrouping output (override with `.groups` argument)
We observe that majority of the articles are in news category, followed by Op-Ed.
In the following I’ll explore a section wise analysis of news about Iran Nuclear deal.
all_pages %>%
group_by(response.docs.section_name) %>%
summarize(count = n()) %>%
mutate(percent = (count / sum(count)) * 100) %>%
ggplot() +
geom_bar(aes(y = percent, x = reorder(response.docs.section_name, percent), fill = response.docs.section_name), stat = "identity") + coord_flip()## `summarise()` ungrouping output (override with `.groups` argument)
We conclude that the World section contains the maximum amount of responses.
Marker: 607-09