title: “Dat 607 HW 9 - Web APIs”

author: “Sufian”

date: “10/22/2019”

output: html_document

Rpub links:

http://rpubs.com/ssufian/543492

Problem Statement

transform it to an R dataframe.

#Load libraries
library(httr)
library(jsonlite)
library(tidyr)
library(lubridate)
library(rvest)
library(dplyr)
library(ggplot2)
library(tidyverse)
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

Testing the GET function to see if it works

URL <- 'https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key='

name <- "MfbAm3jsAPQ0UZ3kAxRGZ8SdbxcDVWQD"

myurl <- (paste0(URL, name, sep=""))

r <- GET(myurl)


r
## Response [https://api.nytimes.com/svc/mostpopular/v2/viewed/30.json?api-key=MfbAm3jsAPQ0UZ3kAxRGZ8SdbxcDVWQD]
##   Date: 2019-10-27 00:41
##   Status: 200
##   Content-Type: application/json; charset=UTF-8
##   Size: 39.4 kB

Error Check

# check for error (TRUE if above 400)
http_error(r)
## [1] FALSE

Using Jasonlite to get articles from NY Times & Transforming into Dataframe

r2 <- fromJSON(myurl, flatten=TRUE) %>% data.frame()

head(r2)
##   status
## 1     OK
## 2     OK
## 3     OK
## 4     OK
## 5     OK
## 6     OK
##                                                              copyright
## 1 Copyright (c) 2019 The New York Times Company.  All Rights Reserved.
## 2 Copyright (c) 2019 The New York Times Company.  All Rights Reserved.
## 3 Copyright (c) 2019 The New York Times Company.  All Rights Reserved.
## 4 Copyright (c) 2019 The New York Times Company.  All Rights Reserved.
## 5 Copyright (c) 2019 The New York Times Company.  All Rights Reserved.
## 6 Copyright (c) 2019 The New York Times Company.  All Rights Reserved.
##   num_results
## 1        1881
## 2        1881
## 3        1881
## 4        1881
## 5        1881
## 6        1881
##                                                                                results.url
## 1                          https://www.nytimes.com/2019/10/13/us/politics/trump-video.html
## 2     https://www.nytimes.com/interactive/2019/arts/television/best-movies-on-netflix.html
## 3                   https://www.nytimes.com/2019/10/01/opinion/trump-impeachment-2020.html
## 4                 https://www.nytimes.com/2019/09/26/us/politics/who-is-whistleblower.html
## 5 https://www.nytimes.com/interactive/2019/09/26/us/politics/whistle-blower-complaint.html
## 6      https://www.nytimes.com/interactive/2019/10/16/us/politics/trump-letter-turkey.html
##                                                                                                                                                                                                                                                                               results.adx_keywords
## 1                                        Trump, Donald J;United States Politics and Government;Presidential Election of 2020;Video Recordings, Downloads and Streaming;Violence (Media and Entertainment);Trump, Donald J Jr;Social Media;News and News Media;Kingsman: The Secret Service (Movie)
## 2                                                                                                                                                                                                                                     Netflix Inc;Movies;Video Recordings, Downloads and Streaming
## 3                                                                                        Trump, Donald J;Presidential Election of 2020;Trump-Ukraine Whistle-Blower Complaint and Impeachment Inquiry;Corruption (Institutional);Senate;House of Representatives;Republican Party;Democratic Party
## 4 United States Politics and Government;Central Intelligence Agency;Trump, Donald J;Trump-Ukraine Whistle-Blower Complaint and Impeachment Inquiry;United States International Relations;Presidential Election of 2020;Espionage and Intelligence Services;Ukraine;Impeachment;Zelensky, Volodymyr
## 5                                                                                                                                                                                       Trump-Ukraine Whistle-Blower Complaint and Impeachment Inquiry;Trump, Donald J;Zelensky, Volodymyr;Ukraine
## 6                                                                                                                                                                                                                                                     Trump, Donald J;Turkey;Erdogan, Recep Tayyip
##   results.column results.section
## 1           <NA>            U.S.
## 2                           Arts
## 3           <NA>         Opinion
## 4           <NA>            U.S.
## 5                           U.S.
## 6                           U.S.
##                                                           results.byline
## 1                              By MICHAEL S. SCHMIDT and MAGGIE HABERMAN
## 2                                                        By JASON BAILEY
## 3                                                      By WILL WILKINSON
## 4 By JULIAN E. BARNES, MICHAEL S. SCHMIDT, ADAM GOLDMAN and KATIE BENNER
## 5                                                  By THE NEW YORK TIMES
## 6                                                  By THE NEW YORK TIMES
##   results.type
## 1      Article
## 2  Interactive
## 3      Article
## 4      Article
## 5  Interactive
## 6  Interactive
##                                                                                  results.title
## 1                Macabre Video of Fake Trump Shooting Media and Critics Is Shown at His Resort
## 2                                                      The 50 Best Movies on Netflix Right Now
## 3                                          Trump Has Disqualified Himself From Running in 2020
## 4 White House Knew of Whistle-Blower’s Allegations Soon After Trump’s Call With Ukraine Leader
## 5                                                  Document: Read the Whistle-Blower Complaint
## 6                                           Read Trump’s Letter to President Erdogan of Turkey
##                                                                                                                                      results.abstract
## 1                         The video was shown at a conference attended by Donald Trump Jr. and Sarah Huckabee Sanders. They said they did not see it.
## 2                                               We’ve plucked out the 50 best films currently streaming on Netflix in the United States. Take a look.
## 3                         The president’s brazen attempt at cheating has taken “decide it at the ballot box” off the menu. Impeachment is imperative.
## 4 The whistle-blower, a C.I.A. officer detailed to the White House at one point, first expressed his concerns anonymously to the agency’s top lawyer.
## 5                                     The complaint filed by an intelligence officer about President Trump’s interactions with the leader of Ukraine.
## 6                                                                      Trump said he’d written the “very powerful” letter to warn the Turkish leader.
##   results.published_date     results.source results.id results.asset_id
## 1             2019-10-13 The New York Times      1e+14            1e+14
## 2             2019-03-06 The New York Times      1e+14            1e+14
## 3             2019-10-01 The New York Times      1e+14            1e+14
## 4             2019-09-26 The New York Times      1e+14            1e+14
## 5             2019-09-26 The New York Times      1e+14            1e+14
## 6             2019-10-16 The New York Times      1e+14            1e+14
##   results.views
## 1             1
## 2             2
## 3             3
## 4             4
## 5             5
## 6             6
##                                                                                                                                           results.des_facet
## 1                                                                        UNITED STATES POLITICS AND GOVERNMENT, PRESIDENTIAL ELECTION OF 2020, SOCIAL MEDIA
## 2                                                                                                         MOVIES, VIDEO RECORDINGS, DOWNLOADS AND STREAMING
## 3                                                             PRESIDENTIAL ELECTION OF 2020, TRUMP-UKRAINE WHISTLE-BLOWER COMPLAINT AND IMPEACHMENT INQUIRY
## 4 UNITED STATES POLITICS AND GOVERNMENT, TRUMP-UKRAINE WHISTLE-BLOWER COMPLAINT AND IMPEACHMENT INQUIRY, UNITED STATES INTERNATIONAL RELATIONS, IMPEACHMENT
## 5                                                                                            TRUMP-UKRAINE WHISTLE-BLOWER COMPLAINT AND IMPEACHMENT INQUIRY
## 6                                                                                                                                                          
##                                                                                    results.org_facet
## 1 VIDEO RECORDINGS, DOWNLOADS AND STREAMING, VIOLENCE (MEDIA AND ENTERTAINMENT), NEWS AND NEWS MEDIA
## 2                                                                                        NETFLIX INC
## 3   CORRUPTION (INSTITUTIONAL), SENATE, HOUSE OF REPRESENTATIVES, REPUBLICAN PARTY, DEMOCRATIC PARTY
## 4    CENTRAL INTELLIGENCE AGENCY, PRESIDENTIAL ELECTION OF 2020, ESPIONAGE AND INTELLIGENCE SERVICES
## 5                                                                                                   
## 6                                                                                                   
##                        results.per_facet results.geo_facet
## 1    TRUMP, DONALD J, TRUMP, DONALD J JR                  
## 2                                                         
## 3                        TRUMP, DONALD J                  
## 4   TRUMP, DONALD J, ZELENSKY, VOLODYMYR           UKRAINE
## 5   TRUMP, DONALD J, ZELENSKY, VOLODYMYR           UKRAINE
## 6 TRUMP, DONALD J, ERDOGAN, RECEP TAYYIP            TURKEY
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               results.media
## 1 image, photo, A video depicting a fake President Trump massacring the news media and his critics was shown at a conference for his supporters at Trump National Doral Miami last week., Ilana Panich-Linsman for The New York Times, 1, https://static01.nyt.com/images/2019/10/13/us/politics/13dc-video1-copy/13dc-video1-copy-thumbStandard-v2.jpg, https://static01.nyt.com/images/2019/10/13/us/politics/13dc-video1-copy/13dc-video1-copy-mediumThreeByTwo210-v3.jpg, https://static01.nyt.com/images/2019/10/13/us/politics/13dc-video1-copy/13dc-video1-copy-mediumThreeByTwo440-v3.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 2                                                     image, photo, Uma Thurman in “Pulp Fiction.”, Linda R. Chen/Miramax Films, 1, https://static01.nyt.com/images/2016/06/19/watching/pulp-fiction-watching-recommendation/pulp-fiction-watching-recommendation-thumbStandard.jpg, https://static01.nyt.com/images/2016/06/19/watching/pulp-fiction-watching-recommendation/pulp-fiction-watching-recommendation-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2016/06/19/watching/pulp-fiction-watching-recommendation/pulp-fiction-watching-recommendation-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 3                                                                                           image, photo, Mitch McConnell and the other Republicans in the Senate would be the decisive votes on President Trump’s fate if he is formally impeached., Anna Moneymaker/The New York Times, 1, https://static01.nyt.com/images/2019/10/01/opinion/01wilkinson/01wilkinson-thumbStandard.jpg, https://static01.nyt.com/images/2019/10/01/opinion/01wilkinson/01wilkinson-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/10/01/opinion/01wilkinson/01wilkinson-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 4                  image, photo, The C.I.A. headquarters in Langley, Va. The whistle-blower is a C.I.A. officer, people familiar with the matter said., Doug Mills The New York Times, 1, https://static01.nyt.com/images/2019/09/26/us/politics/26dc-whistleblower-promo/26dc-whistleblower-promo-thumbStandard-v2.jpg, https://static01.nyt.com/images/2019/09/26/us/politics/26dc-whistleblower-promo/26dc-whistleblower-promo-mediumThreeByTwo210-v2.jpg, https://static01.nyt.com/images/2019/09/26/us/politics/26dc-whistleblower-promo/26dc-whistleblower-promo-mediumThreeByTwo440-v2.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 5                                                                         image, photo,   ,   , 0, https://static01.nyt.com/images/2019/09/26/us/whistleblower-complaint-promo-1569502500532/whistleblower-complaint-promo-1569502500532-thumbStandard-v5.jpg, https://static01.nyt.com/images/2019/09/26/us/whistleblower-complaint-promo-1569502500532/whistleblower-complaint-promo-1569502500532-mediumThreeByTwo210-v5.jpg, https://static01.nyt.com/images/2019/09/26/us/whistleblower-complaint-promo-1569502500532/whistleblower-complaint-promo-1569502500532-mediumThreeByTwo440-v5.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
## 6                                                                              image, photo,  ,  , 0, https://static01.nyt.com/images/2019/10/16/us/white-house-trump-letter-promo-1571261887115/white-house-trump-letter-promo-1571261887115-thumbStandard.jpg, https://static01.nyt.com/images/2019/10/16/us/white-house-trump-letter-promo-1571261887115/white-house-trump-letter-promo-1571261887115-mediumThreeByTwo210.jpg, https://static01.nyt.com/images/2019/10/16/us/white-house-trump-letter-promo-1571261887115/white-house-trump-letter-promo-1571261887115-mediumThreeByTwo440.jpg, Standard Thumbnail, mediumThreeByTwo210, mediumThreeByTwo440, 75, 140, 293, 75, 210, 440
##                                              results.uri
## 1     nyt://article/a1f124ab-6902-5b2c-a91f-e1c6dcd471a0
## 2 nyt://interactive/3769fe44-d294-5a71-8281-5220a39673bf
## 3     nyt://article/c4bee0c3-1135-53d2-871a-1d2233d166b7
## 4     nyt://article/a2721152-8ddd-5d14-8dfa-14dd898c746d
## 5 nyt://interactive/8e70e017-a1db-5fd7-b199-3747e7afa8b0
## 6 nyt://interactive/f51c9656-2936-5ee5-b010-30ff48792578

Peeking into the columns

# Take a look at the columns
colnames(r2)
##  [1] "status"                 "copyright"             
##  [3] "num_results"            "results.url"           
##  [5] "results.adx_keywords"   "results.column"        
##  [7] "results.section"        "results.byline"        
##  [9] "results.type"           "results.title"         
## [11] "results.abstract"       "results.published_date"
## [13] "results.source"         "results.id"            
## [15] "results.asset_id"       "results.views"         
## [17] "results.des_facet"      "results.org_facet"     
## [19] "results.per_facet"      "results.geo_facet"     
## [21] "results.media"          "results.uri"

Check how many articles

# The search returned 20 articles with 22 columns bc each page/request has a max of 10 articles
dim(r2) 
## [1] 20 22

Make into a dataframe minus all the irrelavent columns

# take only columns that is relevant from the data from NY times
final <- tibble("News_Source"=r2$results.source, "Title"=r2$results.title,"Authors"=r2$results.byline ,"News_type"=r2$results.type,"News_url"=r2$results.url,"News_abstract"=r2$results.abstract
               , "News_section"=r2$results.section)

head(final)
## # A tibble: 6 x 7
##   News_Source  Title  Authors News_type News_url News_abstract News_section
##   <chr>        <chr>  <chr>   <chr>     <chr>    <chr>         <chr>       
## 1 The New Yor~ Macab~ By MIC~ Article   https:/~ The video wa~ U.S.        
## 2 The New Yor~ The 5~ By JAS~ Interact~ https:/~ We’ve plucke~ Arts        
## 3 The New Yor~ Trump~ By WIL~ Article   https:/~ The presiden~ Opinion     
## 4 The New Yor~ White~ By JUL~ Article   https:/~ The whistle-~ U.S.        
## 5 The New Yor~ Docum~ By THE~ Interact~ https:/~ The complain~ U.S.        
## 6 The New Yor~ Read ~ By THE~ Interact~ https:/~ Trump said h~ U.S.

Check news sources (which ones)

# Visualize coverage of articles by news type 
final %>% 
  group_by(News_type) %>%
  summarize(count=n()) %>%
  mutate(percent = (count / sum(count))*100) %>%
  ggplot() +
  geom_bar(aes(y=percent, x=News_type, fill=News_type), stat = "identity") + coord_flip()

WHich sections has most of the articles

# Visualize coverage of articles covered by sections
final %>% 
  group_by(News_section) %>%
  summarize(count=n()) %>%
  mutate(percent = (count / sum(count))*100) %>%
  ggplot() +
  geom_bar(aes(y=percent, x=News_section, fill=News_section), stat = "identity") + coord_flip()

url_impeachment<- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=impeachment&api-key=",name, sep="")
impeachment1 <- GET(url_impeachment, accept_json())

Lets try to see if the “Impeachment story” is a big deal?

impeachment1 <- fromJSON(url_impeachment, flatten=TRUE) %>% data.frame()



# Take a look at the columns
colnames(impeachment1)
##  [1] "status"                               
##  [2] "copyright"                            
##  [3] "response.docs.web_url"                
##  [4] "response.docs.snippet"                
##  [5] "response.docs.lead_paragraph"         
##  [6] "response.docs.abstract"               
##  [7] "response.docs.print_page"             
##  [8] "response.docs.source"                 
##  [9] "response.docs.multimedia"             
## [10] "response.docs.keywords"               
## [11] "response.docs.pub_date"               
## [12] "response.docs.document_type"          
## [13] "response.docs.news_desk"              
## [14] "response.docs.section_name"           
## [15] "response.docs.type_of_material"       
## [16] "response.docs._id"                    
## [17] "response.docs.word_count"             
## [18] "response.docs.uri"                    
## [19] "response.docs.subsection_name"        
## [20] "response.docs.headline.main"          
## [21] "response.docs.headline.kicker"        
## [22] "response.docs.headline.content_kicker"
## [23] "response.docs.headline.print_headline"
## [24] "response.docs.headline.name"          
## [25] "response.docs.headline.seo"           
## [26] "response.docs.headline.sub"           
## [27] "response.docs.byline.original"        
## [28] "response.docs.byline.person"          
## [29] "response.docs.byline.organization"    
## [30] "response.meta.hits"                   
## [31] "response.meta.offset"                 
## [32] "response.meta.time"

Repeat the same process again to see the dimensions

dim(impeachment1) 
## [1] 10 32

Taking just the last 7 days or so to see how many articles were related to the impeachment story

# Set some parameters to grab all the hits by identifying a date range and max page # to loop through
term <- "impeachment" 
begin_date <- "20191001" # YYYYMMDD
end_date <- "20191023"

Piecing the url together for the API call

# Concatenate pieces of the url for the api call
baseurl <- paste0("http://api.nytimes.com/svc/search/v2/articlesearch.json?q=",term,
                  "&begin_date=",begin_date,"&end_date=",end_date,
                  "&facet_filter=true&api-key=",name, sep="")

Determine the max no. of pages

# Identify the # of hits to calculate the max pages 
initialQuery <- fromJSON(baseurl)
print(initialQuery$response$meta$hits[1]) # returns the total # of hits
## [1] 761

Looking at max pages found

`

maxPages <- ceiling((initialQuery$response$meta$hits[1] / 10) -1) # reduce by 1 because loop starts with page 0
print(maxPages) # 88 is the max pages
## [1] 76

I had to limit the loop to only 10 pages so the connection would not timed out for illustratative purposes

`

# Loop through all pages to get all the hits
pages <- list()
for(i in 0:10){
  nytSearch <- fromJSON(paste0(baseurl, "&page=", i), flatten = TRUE) %>% data.frame() 
  message("Retrieving page ", i)
  pages[[i+1]] <- nytSearch 
  Sys.sleep(9) # because there are 3 previous calls in under a min and the api call limit is 10/min 
}
## Retrieving page 0
## Retrieving page 1
## Retrieving page 2
## Retrieving page 3
## Retrieving page 4
## Retrieving page 5
## Retrieving page 6
## Retrieving page 7
## Retrieving page 8
## Retrieving page 9
## Retrieving page 10
# Row bind the page results into a big a dataframe
impeachment_search <- rbind_pages(pages)

# Take a peek at 2 informative columns. 
head(impeachment_search , n=10)[c('response.docs.web_url', 'response.docs.snippet')]  
##                                                                                             response.docs.web_url
## 1                                         https://www.nytimes.com/2019/10/18/opinion/adam-schiff-impeachment.html
## 2     https://www.nytimes.com/video/us/politics/100000006770809/warren-and-sanders-back-impeachment-of-trump.html
## 3                                               https://www.nytimes.com/2019/10/06/opinion/trump-impeachment.html
## 4                                       https://www.nytimes.com/2019/10/11/opinion/letters/trump-impeachment.html
## 5                             https://www.nytimes.com/2019/10/03/opinion/letters/trump-impeachment-democrats.html
## 6                                             https://www.nytimes.com/2019/10/13/opinion/impeachment-clinton.html
## 7                                      https://www.nytimes.com/2019/10/12/us/politics/fact-check-impeachment.html
## 8  https://www.nytimes.com/2019/10/10/podcasts/the-daily/impeachment-inquiry-democrats-republicans-kavanaugh.html
## 9          https://www.nytimes.com/2019/10/01/podcasts/the-daily/impeachment-republicans-trump-nixon-clinton.html
## 10                                  https://www.nytimes.com/2019/10/02/us/politics/trump-impeachment-inquiry.html
##                                                                                                                                                                                    response.docs.snippet
## 1                                                                                                                                                                            Public hearings are coming.
## 2                                                                                            Several of the Democratic candidates offered strong support for the impeachment inquiry of President Trump.
## 3                                                                                                                                     A president should not be able to stonewall and run out the clock.
## 4                                                                            Readers discuss how the House and the Supreme Court should deal with the president’s refusal to cooperate with the inquiry.
## 5                                                                     Readers say that we are in a state of anarchy that must be remedied, and that Democrats should broaden their focus beyond Ukraine.
## 6                                                                                                                                                The 1998 trial damaged Democrats more than Republicans.
## 7           President Trump and his defenders have inaccurately attacked the impeachment inquiry for what they say are procedural and constitutional violations, a faulty premise and a lack of support.
## 8                                                     After the successful confirmation of Brett Kavanaugh, one political operative sees a clear path through the impeachment inquiry. But at what cost?
## 9                                                                    As the party’s lawmakers wrestle with how to react to the investigation, we look at what the past may tell us about what’s to come.
## 10 President Trump attacked two House leaders after they threatened to subpoena the White House, and Mike Pompeo, the secretary of state, confirmed he listened in on the call with Ukraine’s president.

The impeachment stories/articles are mostly from News type; the impeachement coverages are virtually in News

impeachment_search  %>% 
  group_by(response.docs.type_of_material) %>%
  summarize(count=n()) %>%
  mutate(percent = (count / sum(count))*100) %>%
  ggplot() +
  geom_bar(aes(y=percent, x=response.docs.type_of_material, fill=response.docs.type_of_material), stat = "identity") + coord_flip()

# Visualize coverage of dreamers by section
impeachment_search  %>% 
  group_by(response.docs.section_name) %>%
  summarize(count=n()) %>%
  mutate(percent = (count / sum(count))*100) %>%
  ggplot() +
  geom_bar(aes(y=percent, x=response.docs.section_name, fill=response.docs.section_name), stat = "identity") + coord_flip()

The impeachment stories/articles are mostly from US News Sections. Looks like the world does not really care

about the impeachment stories but the US