Overview

Given the list of APIs found here, I’ve chosen to look at the most popular one and specifically picked the one that shows the most viewed articles for the last seven days.

api_link <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key="

data <- GET(paste(api_link, Sys.getenv("key"), sep=""))

Note: I’ve stored my api-key in a system environment variable called key. In order to replicate this .rmd, you would need to set your own key in that variable as well before running all of the code chunks.

JSON -> Data Frame

Using jsonlite, let’s try to turn this into a dataframe.

d <- fromJSON(rawToChar(data$content)) # convert raw unicode
json_df <- as.data.frame(d$results) # pull the results into a data frame
head(json_df, 3)

Looks like we got it in a dataframe!

Just as a quick peek, what sections are most popular according to this dataset?

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
sections <- json_df |> 
  group_by(section) |>
  summarise(count = n(),
            .groups = 'drop') |>
  arrange(desc(count))

head(sections, 10)

Looks like U.S. news is the most popular with 6 of the top articles!

Conclusion & Next Steps

The API call fetches a bit more than the request calls for, but once getting into the results JSON, things were simple enough to get into a dataframe. The next steps would definitely be to clean this table up and maybe create a relational database structure to connect things like media (currently just an array/list of pictures in each article) as well as the different geo_facet values it could have.

One thing I would’ve liked to have known is how popular is one article over another. I’m not sure how popular each publication is, however maybe they purposefully left that information out as that’s something they’d like to analyze themselves.