Given the list of APIs found here, I’ve chosen to look at the most popular one and specifically picked the one that shows the most viewed articles for the last seven days.
api_link <- "https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key="
data <- GET(paste(api_link, Sys.getenv("key"), sep=""))
Note: I’ve stored my api-key in a system environment
variable called key. In order to replicate this .rmd, you
would need to set your own key in that variable as well before running
all of the code chunks.
Using jsonlite, let’s try to turn this into a
dataframe.
d <- fromJSON(rawToChar(data$content)) # convert raw unicode
json_df <- as.data.frame(d$results) # pull the results into a data frame
head(json_df, 3)
Looks like we got it in a dataframe!
Just as a quick peek, what sections are most popular according to this dataset?
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
sections <- json_df |>
group_by(section) |>
summarise(count = n(),
.groups = 'drop') |>
arrange(desc(count))
head(sections, 10)
Looks like U.S. news is the most popular with 6 of the top articles!
The API call fetches a bit more than the request calls for, but once
getting into the results JSON, things were simple enough to
get into a dataframe. The next steps would definitely be to clean this
table up and maybe create a relational database structure to connect
things like media (currently just an array/list of pictures
in each article) as well as the different geo_facet values
it could have.
One thing I would’ve liked to have known is how popular is one article over another. I’m not sure how popular each publication is, however maybe they purposefully left that information out as that’s something they’d like to analyze themselves.