607Assignment

Web APIs

The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame.

library(httr)
library(jsonlite)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

library("stringr")
library("ggplot2")

Connecting API

Web scraping is a valuable method for collecting data from websites automatically. To be able to use their API, a developer account was created, register an application and choose “Most Popular API” to activate.Then, I had my API key which is used to interact with APIs that I selected.

Fetch data from Popular articles on NYTimes.com. using a specific article URL:

# API Key
apikey <- "EE8AqkYM74AA8jrTpENJXlUPbWzKO1Pd"

# Get the URL 
most_polular <- GET(paste("https://api.nytimes.com/svc/mostpopular/v2/shared/1/facebook.json?api-key=", apikey,sep=""))
# Get status code
most_polular$status_code

## [1] 200

summary(most_polular)

##             Length Class       Mode       
## url             1  -none-      character  
## status_code     1  -none-      numeric    
## headers        24  insensitive list       
## all_headers     1  -none-      list       
## cookies         7  data.frame  list       
## content     35968  -none-      raw        
## date            1  POSIXct     numeric    
## times           6  -none-      numeric    
## request         7  request     list       
## handle          1  curl_handle externalptr

mostpolular <- content(most_polular, as = "text")

Converting the JSON data to an R DataFrame

The extracted file is a json, which is why jsonlite tool is used to make the work easier. It will extract the JSON object and make it an R object. We also specified the term flatten = TRUE to convert the data from its nested form to a non-nested data. This will help simplify the data structure for easier analysis in R.

df <- fromJSON(mostpolular, flatten = TRUE)
df <- data.frame(df$results, stringsAsFactors = FALSE)

#Get column names
colnames(df)

##  [1] "uri"            "url"            "id"             "asset_id"      
##  [5] "source"         "published_date" "updated"        "section"       
##  [9] "subsection"     "nytdsection"    "adx_keywords"   "column"        
## [13] "byline"         "type"           "title"          "abstract"      
## [17] "des_facet"      "org_facet"      "per_facet"      "geo_facet"     
## [21] "media"          "eta_id"

#Rename columns
colnames(df) <- c("URI","URL", "ID", "asset_ID", "Source", "Published_Date", "Updated", "Section", "Subsection", "NYTDsection")

#Drop columns not needed
popular_df <- df[, -c(11:22)]

Analyzing Popular articles with Section and Published date

An analysis on popular articles with the ‘Section’ and ‘Published Date’ using R and ggplot2 for data visualization will be conducted.

#Count of Section
NYT_Publihsed_Date <- popular_df%>%
  group_by(Published_Date)%>%
  summarise(num=n())%>%
  arrange(desc(num))
print(NYT_Publihsed_Date)

## # A tibble: 3 × 2
##   Published_Date   num
##   <chr>          <int>
## 1 2023-10-27        10
## 2 2023-10-26         7
## 3 2023-10-25         3

ggplot(data = NYT_Publihsed_Date, aes(x = Published_Date, y = num, fill = Published_Date))+ geom_bar(stat="identity", position="dodge") + ggtitle("Number of Polular Articles Recently Published on Facebook") + ylab("Frequency")+  geom_text(aes(label = num))

NYTsection <- popular_df%>%
  group_by(Section)%>%
  summarise(num=n())%>%
  arrange(desc(num))
print(NYTsection)

## # A tibble: 9 × 2
##   Section       num
##   <chr>       <int>
## 1 Opinion         7
## 2 World           3
## 3 Arts            2
## 4 Travel          2
## 5 U.S.            2
## 6 New York        1
## 7 Real Estate     1
## 8 Technology      1
## 9 Well            1

ggplot(data = NYTsection, aes(x = Section, y = num, fill = Section))+ geom_bar(stat="identity", position="dodge") + ggtitle("Most Polular Articles Recently Viewed on Facebook") + ylab("Frequency")+  geom_text(aes(label = num))

Conclusions

Facebook is one of the largest social media platform in the world, had 2.4 billion users in 2019 that has changed the world. The New York Times utilized effectively the rapid and vast adoption of these technologies to share most popular articles for general public who can easily access latest information.

607Assignment_9

Lwin Shwe

2023-10-28