This week’s assignment is to access the NYT web API. We’ll get data in JSON format and convert it into a dataframe.
Let’s load the packages we’ll use:
library(httr)
library(jsonlite)
## Warning: package 'jsonlite' was built under R version 3.6.3
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
We’ll need to setup some specific information to use in our API call:
key <- 'nrs4EBIHbZqiuZK3GFOAwGFO3vMHEfgY'
path = '/svc/movies/v2/reviews/all.json'
query = list(`api-key` = key,
offset = 0)
The NYT API allows 10 calls per minute and 4,000 requests per day. If we wanted to gather the entire dataset, we could try to time the loops. Perhaps if we get 429 status code, we can add a minute long pause.
For this exercise, we’ll only be looking at 10 requests worth of data, with 20 observations each. That should leave us with a 200 row dataset of the latest movies that have been reviewed by the NYT.
for (i in 0:10) {
# set the offset parameter based on the loop number
query[[2]] <- i * 20
# modify the url to be called
url <- modify_url('https://api.nytimes.com',
path = path,
query = query)
# call the API
response <- GET(url)
# exit the loop if the response is not a success
if (response$status_code != 200) {
print(i)
break
}
# extract the raw content from the response
raw_content <- rawToChar(response$content)
# convert the raw content into a json object
json <- fromJSON(raw_content)
if (i == 0) {
# flatten the response into a dataframe
df <- flatten(json$results)
} else {
# flatten the response into a dataframe
df_new <- flatten(json$results)
# combine the new dataframe into the initial one
df <- bind_rows(df, df_new)
# remove the dataset of this current loop
rm(df_new)
}
}
## [1] 10
# preview the dataset
str(df)
## 'data.frame': 200 obs. of 16 variables:
## $ display_title : chr "Dosed" "Blow the Man Down" "Justine" "Human Capital" ...
## $ mpaa_rating : chr "" "R" "" "" ...
## $ critics_pick : int 0 0 0 0 0 0 1 1 0 0 ...
## $ byline : chr "Ben Kenigsberg" "Helen T. Verongos" "Natalia Winkelman" "Wesley Morris" ...
## $ headline : chr "‘Dosed’ Review: The Case for Plant-Based Recovery" "‘Blow the Man Down’ Review: Women, They Get the Job Done" "‘Justine’ Review: A Bittersweet Intersection of Lonely Lives" "‘Human Capital’ Review: The Waiter’s in a Coma. Tennis Anyone?" ...
## $ summary_short : chr "A documentarian follows a friend of his as she experiments with psychoactive vegetation as a treatment for drug addiction." "A fishing village in Maine with a rowdy history is the raw setting for a smart fable about who’s really in charge." "A single mother befriends a young girl with spina bifida in this earnest character study." "This movie seems set to deliver riffs on class, but offers up only indifference and an amusing turn by Marisa Tomei." ...
## $ publication_date : chr "2020-03-19" "2020-03-19" "2020-03-19" "2020-03-19" ...
## $ opening_date : chr NA "2020-03-20" "2020-03-13" "2020-03-20" ...
## $ date_updated : chr "2020-03-19 12:04:03" "2020-03-19 12:04:02" "2020-03-19 11:04:03" "2020-03-19 11:04:02" ...
## $ link.type : chr "article" "article" "article" "article" ...
## $ link.url : chr "http://www.nytimes.com/2020/03/19/movies/dosed-review.html" "http://www.nytimes.com/2020/03/19/movies/blow-the-man-down-review.html" "http://www.nytimes.com/2020/03/19/movies/justine-review.html" "http://www.nytimes.com/2020/03/19/movies/human-capital-review.html" ...
## $ link.suggested_link_text: chr "Read the New York Times Review of Dosed" "Read the New York Times Review of Blow the Man Down" "Read the New York Times Review of Justine" "Read the New York Times Review of Human Capital" ...
## $ multimedia.type : chr "mediumThreeByTwo210" "mediumThreeByTwo210" "mediumThreeByTwo210" "mediumThreeByTwo210" ...
## $ multimedia.src : chr "https://static01.nyt.com/images/2020/03/17/arts/dosed1/dosed1-mediumThreeByTwo210.jpg" "https://static01.nyt.com/images/2020/03/20/arts/19blowthemanpix/19blowthemanpix-mediumThreeByTwo210.jpg" "https://static01.nyt.com/images/2020/03/20/arts/justine1/justine1-mediumThreeByTwo210.jpg" "https://static01.nyt.com/images/2020/03/20/arts/00humancapital/00humancapital-mediumThreeByTwo210.jpg" ...
## $ multimedia.width : int 210 210 210 210 210 210 210 210 210 210 ...
## $ multimedia.height : int 140 140 140 140 140 140 140 140 140 140 ...