title: “Dat 607 HW 9 - Web APIs”
author: “Sufian”
date: “10/22/2019”
output: html_document
Rpub links:
http://rpubs.com/ssufian/543492
The New York Times web site provides a rich set of APIs, as described here: http://developer.nytimes.com/docs
You’ll need to start by signing up for an API key.
Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and
transform it to an R dataframe.
#Load libraries
library(httr)
library(jsonlite)
library(tidyr)
library(lubridate)
library(rvest)
library(dplyr)
library(ggplot2)
library(tidyverse)
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
URL <- 'https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key='
name <- "MfbAm3jsAPQ0UZ3kAxRGZ8SdbxcDVWQD"
myurl <- (paste0(URL, name, sep=""))
r <- GET(myurl)
r
## Response [https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=MfbAm3jsAPQ0UZ3kAxRGZ8SdbxcDVWQD]
## Date: 2019-10-27 00:41
## Status: 200
## Content-Type: application/json; charset=UTF-8
## Size: 39.4 kB
# check for error (TRUE if above 400)
http_error(r)
## [1] FALSE
r2 <- fromJSON(myurl, flatten=TRUE) %>% data.frame()
head(r2)
## status
## 1 OK
## 2 OK
## 3 OK
## 4 OK
## 5 OK
## 6 OK
## copyright
## 1 Copyright (c) 2019 The New York Times Company. All Rights Reserved.
## 2 Copyright (c) 2019 The New York Times Company. All Rights Reserved.
## 3 Copyright (c) 2019 The New York Times Company. All Rights Reserved.
## 4 Copyright (c) 2019 The New York Times Company. All Rights Reserved.
## 5 Copyright (c) 2019 The New York Times Company. All Rights Reserved.
## 6 Copyright (c) 2019 The New York Times Company. All Rights Reserved.
## num_results
## 1 1881
## 2 1881
## 3 1881
## 4 1881
## 5 1881
## 6 1881
## results.url
## 1 https://www.nytimes.com/2019/10/13/us/politics/trump-video.html
## 2 https://www.nytimes.com/interactive/2019/arts/television/best-movies-on-netflix.html
## 3 https://www.nytimes.com/2019/10/01/opinion/trump-impeachment-2020.html
## 4 https://www.nytimes.com/2019/09/26/us/politics/who-is-whistleblower.html
## 5 https://www.nytimes.com/interactive/2019/09/26/us/politics/whistle-blower-complaint.html
## 6 https://www.nytimes.com/interactive/2019/10/16/us/politics/trump-letter-turkey.html
## results.adx_keywords
## 1 Trump, Donald J;United States Politics and Government;Presidential Election of 2020;Video Recordings, Downloads and Streaming;Violence (Media and Entertainment);Trump, Donald J Jr;Social Media;News and News Media;Kingsman: The Secret Service (Movie)
## 2 Netflix Inc;Movies;Video Recordings, Downloads and Streaming
## 3 Trump, Donald J;Presidential Election of 2020;Trump-Ukraine Whistle-Blower Complaint and Impeachment Inquiry;Corruption (Institutional);Senate;House of Representatives;Republican Party;Democratic Party
## 4 United States Politics and Government;Central Intelligence Agency;Trump, Donald J;Trump-Ukraine Whistle-Blower Complaint and Impeachment Inquiry;United States International Relations;Presidential Election of 2020;Espionage and Intelligence Services;Ukraine;Impeachment;Zelensky, Volodymyr
## 5 Trump-Ukraine Whistle-Blower Complaint and Impeachment Inquiry;Trump, Donald J;Zelensky, Volodymyr;Ukraine
## 6 Trump, Donald J;Turkey;Erdogan, Recep Tayyip
## results.column results.section
## 1 <NA> U.S.
## 2 Arts
## 3 <NA> Opinion
## 4 <NA> U.S.
## 5 U.S.
## 6 U.S.
## results.byline
## 1 By MICHAEL S. SCHMIDT and MAGGIE HABERMAN
## 2 By JASON BAILEY
## 3 By WILL WILKINSON
## 4 By JULIAN E. BARNES, MICHAEL S. SCHMIDT, ADAM GOLDMAN and KATIE BENNER
## 5 By THE NEW YORK TIMES
## 6 By THE NEW YORK TIMES
## results.type
## 1 Article
## 2 Interactive
## 3 Article
## 4 Article
## 5 Interactive
## 6 Interactive
## results.title
## 1 Macabre Video of Fake Trump Shooting Media and Critics Is Shown at His Resort
## 2 The 50 Best Movies on Netflix Right Now
## 3 Trump Has Disqualified Himself From Running in 2020
## 4 White House Knew of Whistle-Blower’s Allegations Soon After Trump’s Call With Ukraine Leader
## 5 Document: Read the Whistle-Blower Complaint
## 6 Read Trump’s Letter to President Erdogan of Turkey
## results.abstract
## 1 The video was shown at a conference attended by Donald Trump Jr. and Sarah Huckabee Sanders. They said they did not see it.
## 2 We’ve plucked out the 50 best films currently streaming on Netflix in the United States. Take a look.
## 3 The president’s brazen attempt at cheating has taken “decide it at the ballot box” off the menu. Impeachment is imperative.
## 4 The whistle-blower, a C.I.A. officer detailed to the White House at one point, first expressed his concerns anonymously to the agency’s top lawyer.
## 5 The complaint filed by an intelligence officer about President Trump’s interactions with the leader of Ukraine.
## 6 Trump said he’d written the “very powerful” letter to warn the Turkish leader.
## results.published_date results.source results.id results.asset_id
## 1 2019-10-13 The New York Times 1e+14 1e+14
## 2 2019-03-06 The New York Times 1e+14 1e+14
## 3 2019-10-01 The New York Times 1e+14 1e+14
## 4 2019-09-26 The New York Times 1e+14 1e+14
## 5 2019-09-26 The New York Times 1e+14 1e+14
## 6 2019-10-16 The New York Times 1e+14 1e+14
## results.views
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## results.des_facet
## 1 UNITED STATES POLITICS AND GOVERNMENT, PRESIDENTIAL ELECTION OF 2020, SOCIAL MEDIA
## 2 MOVIES, VIDEO RECORDINGS, DOWNLOADS AND STREAMING
## 3 PRESIDENTIAL ELECTION OF 2020, TRUMP-UKRAINE WHISTLE-BLOWER COMPLAINT AND IMPEACHMENT INQUIRY
## 4 UNITED STATES POLITICS AND GOVERNMENT, TRUMP-UKRAINE WHISTLE-BLOWER COMPLAINT AND IMPEACHMENT INQUIRY, UNITED STATES INTERNATIONAL RELATIONS, IMPEACHMENT
## 5 TRUMP-UKRAINE WHISTLE-BLOWER COMPLAINT AND IMPEACHMENT INQUIRY
## 6
## results.org_facet
## 1 VIDEO RECORDINGS, DOWNLOADS AND STREAMING, VIOLENCE (MEDIA AND ENTERTAINMENT), NEWS AND NEWS MEDIA
## 2 NETFLIX INC
## 3 CORRUPTION (INSTITUTIONAL), SENATE, HOUSE OF REPRESENTATIVES, REPUBLICAN PARTY, DEMOCRATIC PARTY
## 4 CENTRAL INTELLIGENCE AGENCY, PRESIDENTIAL ELECTION OF 2020, ESPIONAGE AND INTELLIGENCE SERVICES
## 5
## 6
## results.per_facet results.geo_facet
## 1 TRUMP, DONALD J, TRUMP, DONALD J JR
## 2
## 3 TRUMP, DONALD J
## 4 TRUMP, DONALD J, ZELENSKY, VOLODYMYR UKRAINE
## 5 TRUMP, DONALD J, ZELENSKY, VOLODYMYR UKRAINE
## 6 TRUMP, DONALD J, ERDOGAN, RECEP TAYYIP TURKEY
## results.media
## 1 image, photo, A video depicting a fake President Trump massacring the news media and his critics was shown at a conference for his supporters at Trump National Doral Miami last week., Ilana Panich-Linsman for The New York Times, 1, https://static01.nyt.com/images/2019/10/13/us/politics/13dc-video1-copy/13dc-video1-copy-thumbStandard-v2.jpg, https://static01.nyt.com/images/2019/10/13/us/politics/13dc-video1-copy/13dc-video1-copy-mediumThreeByTwo210-v3.jpg, https://static01.nyt.com/images/2019/10/13/us/politics/13dc-video1-copy/13dc-video1-copy-mediumThreeByTwo440-v3.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 2 image, photo, Uma Thurman in “Pulp Fiction.”, Linda R. Chen/Miramax Films, 1, https://static01.nyt.com/images/2016/06/19/watching/pulp-fiction-watching-recommendation/pulp-fiction-watching-recommendation-thumbStandard.jpg, https://static01.nyt.com/images/2016/06/19/watching/pulp-fiction-watching-recommendation/pulp-fiction-watching-recommendation-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2016/06/19/watching/pulp-fiction-watching-recommendation/pulp-fiction-watching-recommendation-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 3 image, photo, Mitch McConnell and the other Republicans in the Senate would be the decisive votes on President Trump’s fate if he is formally impeached., Anna Moneymaker/The New York Times, 1, https://static01.nyt.com/images/2019/10/01/opinion/01wilkinson/01wilkinson-thumbStandard.jpg, https://static01.nyt.com/images/2019/10/01/opinion/01wilkinson/01wilkinson-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/10/01/opinion/01wilkinson/01wilkinson-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 4 image, photo, The C.I.A. headquarters in Langley, Va. The whistle-blower is a C.I.A. officer, people familiar with the matter said., Doug Mills The New York Times, 1, https://static01.nyt.com/images/2019/09/26/us/politics/26dc-whistleblower-promo/26dc-whistleblower-promo-thumbStandard-v2.jpg, https://static01.nyt.com/images/2019/09/26/us/politics/26dc-whistleblower-promo/26dc-whistleblower-promo-mediumThreeByTwo210-v2.jpg, https://static01.nyt.com/images/2019/09/26/us/politics/26dc-whistleblower-promo/26dc-whistleblower-promo-mediumThreeByTwo440-v2.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 5 image, photo, , , 0, https://static01.nyt.com/images/2019/09/26/us/whistleblower-complaint-promo-1569502500532/whistleblower-complaint-promo-1569502500532-thumbStandard-v5.jpg, https://static01.nyt.com/images/2019/09/26/us/whistleblower-complaint-promo-1569502500532/whistleblower-complaint-promo-1569502500532-mediumThreeByTwo210-v5.jpg, https://static01.nyt.com/images/2019/09/26/us/whistleblower-complaint-promo-1569502500532/whistleblower-complaint-promo-1569502500532-mediumThreeByTwo440-v5.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 6 image, photo, , , 0, https://static01.nyt.com/images/2019/10/16/us/white-house-trump-letter-promo-1571261887115/white-house-trump-letter-promo-1571261887115-thumbStandard.jpg, https://static01.nyt.com/images/2019/10/16/us/white-house-trump-letter-promo-1571261887115/white-house-trump-letter-promo-1571261887115-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/10/16/us/white-house-trump-letter-promo-1571261887115/white-house-trump-letter-promo-1571261887115-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## results.uri
## 1 nyt://article/a1f124ab-6902-5b2c-a91f-e1c6dcd471a0
## 2 nyt://interactive/3769fe44-d294-5a71-8281-5220a39673bf
## 3 nyt://article/c4bee0c3-1135-53d2-871a-1d2233d166b7
## 4 nyt://article/a2721152-8ddd-5d14-8dfa-14dd898c746d
## 5 nyt://interactive/8e70e017-a1db-5fd7-b199-3747e7afa8b0
## 6 nyt://interactive/f51c9656-2936-5ee5-b010-30ff48792578
# Take a look at the columns
colnames(r2)
## [1] "status" "copyright"
## [3] "num_results" "results.url"
## [5] "results.adx_keywords" "results.column"
## [7] "results.section" "results.byline"
## [9] "results.type" "results.title"
## [11] "results.abstract" "results.published_date"
## [13] "results.source" "results.id"
## [15] "results.asset_id" "results.views"
## [17] "results.des_facet" "results.org_facet"
## [19] "results.per_facet" "results.geo_facet"
## [21] "results.media" "results.uri"
# The search returned 20 articles with 22 columns bc each page/request has a max of 10 articles
dim(r2)
## [1] 20 22
# take only columns that is relevant from the data from NY times
final <- tibble("News_Source"=r2$results.source, "Title"=r2$results.title,"Authors"=r2$results.byline ,"News_type"=r2$results.type,"News_url"=r2$results.url,"News_abstract"=r2$results.abstract
, "News_section"=r2$results.section)
head(final)
## # A tibble: 6 x 7
## News_Source Title Authors News_type News_url News_abstract News_section
## <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 The New Yor~ Macab~ By MIC~ Article https:/~ The video wa~ U.S.
## 2 The New Yor~ The 5~ By JAS~ Interact~ https:/~ We’ve plucke~ Arts
## 3 The New Yor~ Trump~ By WIL~ Article https:/~ The presiden~ Opinion
## 4 The New Yor~ White~ By JUL~ Article https:/~ The whistle-~ U.S.
## 5 The New Yor~ Docum~ By THE~ Interact~ https:/~ The complain~ U.S.
## 6 The New Yor~ Read ~ By THE~ Interact~ https:/~ Trump said h~ U.S.
# Visualize coverage of articles by news type
final %>%
group_by(News_type) %>%
summarize(count=n()) %>%
mutate(percent = (count / sum(count))*100) %>%
ggplot() +
geom_bar(aes(y=percent, x=News_type, fill=News_type), stat = "identity") + coord_flip()
# Visualize coverage of articles covered by sections
final %>%
group_by(News_section) %>%
summarize(count=n()) %>%
mutate(percent = (count / sum(count))*100) %>%
ggplot() +
geom_bar(aes(y=percent, x=News_section, fill=News_section), stat = "identity") + coord_flip()
url_impeachment<- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=impeachment&api-key=",name, sep="")
impeachment1 <- GET(url_impeachment, accept_json())
impeachment1 <- fromJSON(url_impeachment, flatten=TRUE) %>% data.frame()
# Take a look at the columns
colnames(impeachment1)
## [1] "status"
## [2] "copyright"
## [3] "response.docs.web_url"
## [4] "response.docs.snippet"
## [5] "response.docs.lead_paragraph"
## [6] "response.docs.abstract"
## [7] "response.docs.print_page"
## [8] "response.docs.source"
## [9] "response.docs.multimedia"
## [10] "response.docs.keywords"
## [11] "response.docs.pub_date"
## [12] "response.docs.document_type"
## [13] "response.docs.news_desk"
## [14] "response.docs.section_name"
## [15] "response.docs.type_of_material"
## [16] "response.docs._id"
## [17] "response.docs.word_count"
## [18] "response.docs.uri"
## [19] "response.docs.subsection_name"
## [20] "response.docs.headline.main"
## [21] "response.docs.headline.kicker"
## [22] "response.docs.headline.content_kicker"
## [23] "response.docs.headline.print_headline"
## [24] "response.docs.headline.name"
## [25] "response.docs.headline.seo"
## [26] "response.docs.headline.sub"
## [27] "response.docs.byline.original"
## [28] "response.docs.byline.person"
## [29] "response.docs.byline.organization"
## [30] "response.meta.hits"
## [31] "response.meta.offset"
## [32] "response.meta.time"
dim(impeachment1)
## [1] 10 32
# Concatenate pieces of the url for the api call
baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",term,
"&begin_date=",begin_date,"&end_date=",end_date,
"&facet_filter=true&api-key=",name, sep="")
# Identify the # of hits to calculate the max pages
initialQuery <- fromJSON(baseurl)
print(initialQuery$response$meta$hits[1]) # returns the total # of hits
## [1] 761
`
maxPages <- ceiling((initialQuery$response$meta$hits[1] / 10) -1) # reduce by 1 because loop starts with page 0
print(maxPages) # 88 is the max pages
## [1] 76
`
# Loop through all pages to get all the hits
pages <- list()
for(i in 0:10){
nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame()
message("Retrieving page ", i)
pages[[i+1]] <- nytSearch
Sys.sleep(9) # because there are 3 previous calls in under a min and the api call limit is 10/min
}
## Retrieving page 0
## Retrieving page 1
## Retrieving page 2
## Retrieving page 3
## Retrieving page 4
## Retrieving page 5
## Retrieving page 6
## Retrieving page 7
## Retrieving page 8
## Retrieving page 9
## Retrieving page 10
# Row bind the page results into a big a dataframe
impeachment_search <- rbind_pages(pages)
# Take a peek at 2 informative columns.
head(impeachment_search , n=10)[c('response.docs.web_url', 'response.docs.snippet')]
## response.docs.web_url
## 1 https://www.nytimes.com/2019/10/18/opinion/adam-schiff-impeachment.html
## 2 https://www.nytimes.com/video/us/politics/100000006770809/warren-and-sanders-back-impeachment-of-trump.html
## 3 https://www.nytimes.com/2019/10/06/opinion/trump-impeachment.html
## 4 https://www.nytimes.com/2019/10/11/opinion/letters/trump-impeachment.html
## 5 https://www.nytimes.com/2019/10/03/opinion/letters/trump-impeachment-democrats.html
## 6 https://www.nytimes.com/2019/10/13/opinion/impeachment-clinton.html
## 7 https://www.nytimes.com/2019/10/12/us/politics/fact-check-impeachment.html
## 8 https://www.nytimes.com/2019/10/10/podcasts/the-daily/impeachment-inquiry-democrats-republicans-kavanaugh.html
## 9 https://www.nytimes.com/2019/10/01/podcasts/the-daily/impeachment-republicans-trump-nixon-clinton.html
## 10 https://www.nytimes.com/2019/10/02/us/politics/trump-impeachment-inquiry.html
## response.docs.snippet
## 1 Public hearings are coming.
## 2 Several of the Democratic candidates offered strong support for the impeachment inquiry of President Trump.
## 3 A president should not be able to stonewall and run out the clock.
## 4 Readers discuss how the House and the Supreme Court should deal with the president’s refusal to cooperate with the inquiry.
## 5 Readers say that we are in a state of anarchy that must be remedied, and that Democrats should broaden their focus beyond Ukraine.
## 6 The 1998 trial damaged Democrats more than Republicans.
## 7 President Trump and his defenders have inaccurately attacked the impeachment inquiry for what they say are procedural and constitutional violations, a faulty premise and a lack of support.
## 8 After the successful confirmation of Brett Kavanaugh, one political operative sees a clear path through the impeachment inquiry. But at what cost?
## 9 As the party’s lawmakers wrestle with how to react to the investigation, we look at what the past may tell us about what’s to come.
## 10 President Trump attacked two House leaders after they threatened to subpoena the White House, and Mike Pompeo, the secretary of state, confirmed he listened in on the call with Ukraine’s president.
impeachment_search %>%
group_by(response.docs.type_of_material) %>%
summarize(count=n()) %>%
mutate(percent = (count / sum(count))*100) %>%
ggplot() +
geom_bar(aes(y=percent, x=response.docs.type_of_material, fill=response.docs.type_of_material), stat = "identity") + coord_flip()
# Visualize coverage of dreamers by section
impeachment_search %>%
group_by(response.docs.section_name) %>%
summarize(count=n()) %>%
mutate(percent = (count / sum(count))*100) %>%
ggplot() +
geom_bar(aes(y=percent, x=response.docs.section_name, fill=response.docs.section_name), stat = "identity") + coord_flip()
about the impeachment stories but the US