Problem Statement

Extracting content from diversified web resources, cleaning up the raw data, preparing them for the statistical analysis and actually performing the analysis it is far from being a simple task. The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/api . Task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

Acceptance Criteria

Preparing Data

Choose one of the New York Times APIs, request API key
Construct an interface in R to read in the JSON data
Transform data to an R dataframe

Reproducibility

Using R Markdown text and headers

Workflow

Included a brief description of the assigned problem.
Included an overview of your approach.
Explained your reasoning.
Provided a conclusion (including any findings and recommendations).

Approach

New York times has provided various API’s to query contents by using Web API’s.
Listed below are API’s to fetch data in JSON format
These API’s returns data in JSON format
To access these API’s, one needs to sign up which provides an access key to use
Top Stories API’ is selected for this assignment
Data is fetched from Top Stories API into R for using jsonlite API and used for further analysis
wordcloud2 is used to draw wordcloud for frequently found text

Implementation

Load required libraries

library(DT)
library(jsonlite)
library(tidyjson)
library(dplyr)
library(tidyr)
library(ggplot2)
library(wordcloud)
library(RColorBrewer)
library(wordcloud2)
library(tm)
library(textclean)
library(lares)

Load NY Times API credentials from secret management

baseurl<-get_creds()$`nyt.api`$baseurl
apikey <-get_creds()$`nyt.api`$apikey

Generic function to fetch the data from Web API

get_data<- function(section) {
  url<-paste(baseurl,section,".json?api-key=",sep = "")
request <- fromJSON(URLencode(paste0(url, apikey)))
stories <- request$results
newdata<-data.frame(Subsection=stories$subsection, 
                    Title=stories$title, 
                    Abstract=stories$abstract, 
                    Byline=stories$byline, 
                    Created=as.Date(stories$created_date), 
                    'Short URL'=stories$short_url, stringsAsFactors = FALSE);
  return(newdata)
}

Generic function to plot Webcloud for given text

get_wordcloud<-function(dataframe) {
abstract<-dataframe$Abstract
words <- Corpus(VectorSource(abstract))
words <- words %>%
tm_map(removeNumbers) %>%
tm_map(removePunctuation) %>%
tm_map(replace_contraction) %>%
tm_map(replace_curly_quote) %>%
tm_map(stripWhitespace)
words <- tm_map(words, content_transformer(tolower))
words <- tm_map(words, removeWords, stopwords("english"))
dtm <- TermDocumentMatrix(words) 
matrix <- as.matrix(dtm) 
words <- sort(rowSums(matrix),decreasing=TRUE) 
df <- data.frame(word = names(words),freq=words)
df=df[-1,]
set.seed(1234) # for reproducibility 
wordcloud(words = df$word, freq = df$freq, min.freq = 1,
          max.words=150, random.order=FALSE, rot.per=0.35,
          colors=brewer.pal(8, "Dark2"))
}

Data Analysis

Conclusion

The New York Times Top Stories API provides top stories in various sections.
Top stories from automobiles, book and Sports sections are analyzed in this assignment
Integration using APIs gives a solid foundation with dataset to analyzed further
Additional features on Web API’s including specifying query parameters, retrieved in JSON format are very helpful
Developer version of API’s has limitations of fetching full dataset from various sections so only selected sections(Automobiles, Books, Sports, Health) are used in this analysis

Week 9 Assignment: Web APIs

Ramnivas Singh

04/10/2021

Problem Statement

Acceptance Criteria

Approach

Implementation

Load required libraries

Load NY Times API credentials from secret management

Generic function to fetch the data from Web API

Generic function to plot Webcloud for given text

Data Analysis

Top Stories

Automobiles

Plot a table

Draw wordcloud

Books

Plot a

Draw wordcloud

Sports

Plot a table

Draw wordcloud

Health

Plot a table

Draw wordcloud

Conclusion