Web APIs

The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

library(httr)
library(jsonlite)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
library("stringr")
library("ggplot2")

Connecting API

Web scraping is a valuable method for collecting data from websites automatically. To be able to use their API, a developer account was created, register an application and choose “Most Popular API” to activate.Then, I had my API key which is used to interact with APIs that I selected.

Fetch data from Popular articles on NYTimes.com. using a specific article URL:

# API Key
apikey <- "EE8AqkYM74AA8jrTpENJXlUPbWzKO1Pd"

# Get the URL 
most_polular <- GET(paste("https://api.nytimes.com/svc/mostpopular/v2/shared/1/facebook.json?api-key=", apikey,sep=""))
# Get status code
most_polular$status_code
## [1] 200
summary(most_polular)
##             Length Class       Mode       
## url             1  -none-      character  
## status_code     1  -none-      numeric    
## headers        24  insensitive list       
## all_headers     1  -none-      list       
## cookies         7  data.frame  list       
## content     35968  -none-      raw        
## date            1  POSIXct     numeric    
## times           6  -none-      numeric    
## request         7  request     list       
## handle          1  curl_handle externalptr
mostpolular <- content(most_polular, as = "text")

Converting the JSON data to an R DataFrame

The extracted file is a json, which is why jsonlite tool is used to make the work easier. It will extract the JSON object and make it an R object. We also specified the term flatten = TRUE to convert the data from its nested form to a non-nested data. This will help simplify the data structure for easier analysis in R.

df <- fromJSON(mostpolular, flatten = TRUE)
df <- data.frame(df$results, stringsAsFactors = FALSE)

#Get column names
colnames(df)
##  [1] "uri"            "url"            "id"             "asset_id"      
##  [5] "source"         "published_date" "updated"        "section"       
##  [9] "subsection"     "nytdsection"    "adx_keywords"   "column"        
## [13] "byline"         "type"           "title"          "abstract"      
## [17] "des_facet"      "org_facet"      "per_facet"      "geo_facet"     
## [21] "media"          "eta_id"
#Rename columns
colnames(df) <- c("URI","URL", "ID", "asset_ID", "Source", "Published_Date", "Updated", "Section", "Subsection", "NYTDsection")

#Drop columns not needed
popular_df <- df[, -c(11:22)]

Conclusions

Facebook is one of the largest social media platform in the world, had 2.4 billion users in 2019 that has changed the world. The New York Times utilized effectively the rapid and vast adoption of these technologies to share most popular articles for general public who can easily access latest information.