#Load packages
library(tidyverse)
library(jsonlite)
library(magrittr)
library(httr) Assignment 6: Open Movie Database API Tutorial
About the Open Movie Database
The Open Movie Database (OMDb), https://www.omdbapi.com, is an API web service that allows users to collect information on various movies. For example, you might desire to compare ratings (IMDb, Metacritic, or Rotten Tomatoes) of some of your favorite movies (I’ll demonstrate this later), or maybe look at a given actor or director’s filmography.
Walk-through for Setting up the OMDb API
To use the OMDb API, you must first obtain an API key. A tab at the top of the home page will direct you to where you set this up, or you can go to: https://www.omdbapi.com/apikey.aspx. You can access this API for free (daily limit of 1,000 requests) by entering your email to receive your own unique key. After acquiring this, you can begin setting up the API in R.
Briefly, before going to R, the OMDb website allows you to see an example of an API request. By entering a movie title (and the release year if needed), you can see what a JSON response looks like, and the fields it contains. Here is an example for the movie, The Sting.
Request:
http://www.omdbapi.com/?t=the+sting
Response:
{"Title":"The Sting","Year":"1973","Rated":"PG","Released":"25 Dec 1973","Runtime":"129 min","Genre":"Comedy, Crime, Drama","Director":"George Roy Hill","Writer":"David S. Ward","Actors":"Paul Newman, Robert Redford, Robert Shaw","Plot":"Two grifters team up to pull off the ultimate con.","Language":"English","Country":"United States","Awards":"Won 7 Oscars. 18 wins & 6 nominations total","Poster":"https://m.media-amazon.com/images/M/MV5BZGI4OTk4MDMtYmQ1Ni00YTUzLTkyYTktZGUwMjMyN2M4NjQ5XkEyXkFqcGc@._V1_SX300.jpg","Ratings":[{"Source":"Internet Movie Database","Value":"8.2/10"},{"Source":"Rotten Tomatoes","Value":"93%"},{"Source":"Metacritic","Value":"83/100"}],"Metascore":"83","imdbRating":"8.2","imdbVotes":"298,899","imdbID":"tt0070735","Type":"movie","DVD":"N/A","BoxOffice":"$156,000,000","Production":"N/A","Website":"N/A","Response":"True"}
(This is what the response looks like when it neatly formatted, but you can’t do this through the example feature on the website. You need to include your API key in the request.)
{
"Title": "The Sting",
"Year": "1973",
"Rated": "PG",
"Released": "25 Dec 1973",
"Runtime": "129 min",
"Genre": "Comedy, Crime, Drama",
"Director": "George Roy Hill",
"Writer": "David S. Ward",
"Actors": "Paul Newman, Robert Redford, Robert Shaw",
"Plot": "Two grifters team up to pull off the ultimate con.",
"Language": "English",
"Country": "United States",
"Awards": "Won 7 Oscars. 18 wins & 6 nominations total",
"Poster": "https://m.media-amazon.com/images/M/MV5BZGI4OTk4MDMtYmQ1Ni00YTUzLTkyYTktZGUwMjMyN2M4NjQ5XkEyXkFqcGc@._V1_SX300.jpg",
"Ratings": [
{
"Source": "Internet Movie Database",
"Value": "8.2/10"
},
{
"Source": "Rotten Tomatoes",
"Value": "93%"
},
{
"Source": "Metacritic",
"Value": "83/100"
}
],
"Metascore": "83",
"imdbRating": "8.2",
"imdbVotes": "298,899",
"imdbID": "tt0070735",
"Type": "movie",
"DVD": "N/A",
"BoxOffice": "$156,000,000",
"Production": "N/A",
"Website": "N/A",
"Response": "True"
}
As you can see in the request, the format is pretty simple. You must provide your API key, and then either provide the title (t) or the IMDb ID (i). All other parameters are optional. These include the type of media (film, series, or episode), the release year, the plot, etc.
With an understanding of how this API works, we can now move into R. Bring out the code.
The first step is to build a function that makes a GET request to the API. This is done by putting together a URL that includes the website endpoint, movie title parameter (t), and API key.
#Create a function to make a OMDb API GET request
omdb_api_GET_url <- function(t, api_key) {
#URL endpoint
omdb_endpoint <- "http://www.omdbapi.com/"
#Convert title (string) into a URL
t <- URLencode(t)
#Full URL
full_url <- paste0(omdb_endpoint, "?t=", t, "&apikey=", api_key)
return(full_url)
}The next step is to this URL function that we just created to build a data frame with our desired fields. You could just end the function after the “fromJSON()” line if you wanted all 25 fields, but if I’m only interested in some of them, I can specify those fields with the “select()” function.
#Create a function that turns the URL argument into a data frame
omdb_api_GET_df <- function(t, api_key) {
#Use previous function
omdb_api_url <- omdb_api_GET_url(t, api_key)
#Build df with desired fields
omdb_df <-
omdb_api_url %>%
GET() %>%
content(as = "text",
encoding = "UTF-8") %>%
fromJSON(flatten = TRUE) %>%
as_tibble() %>%
select(
Title,
Year,
Rated,
Genre,
Director,
Actors,
imdbRating,
Ratings,
Metascore)
return(omdb_df)
}Next, before putting this second function to use, I have to build a vector of the movies I want to pull information on. I have put together a list of some of my favorite movies below.
#Desired movies to pull info on
titles <- c("pulp fiction", "caddyshack", "point break", "the departed", "fight club",
"dumb and dumber", "the princess bride", "back to the future",
"the blues brothers", "the big lebowski")Lastly, I am going to build an empty data frame and then loop my desired movies through the GET_df function to populate it.
#Build empty data frame to populate data in
omdb_films <- data.frame()
#Loop to collect info for desired movie
for(t in titles){
omdb_films <- bind_rows(omdb_films, omdb_api_GET_df(t, api_key = API_KEY)
)
#Rest for 2 seconds between each request
Sys.sleep(2)
}#Data pulled from the API
omdb_films Title Year Rated Genre
1 Pulp Fiction 1994 R Crime, Drama
2 Pulp Fiction 1994 R Crime, Drama
3 Pulp Fiction 1994 R Crime, Drama
4 Caddyshack 1980 R Comedy, Sport
5 Caddyshack 1980 R Comedy, Sport
6 Caddyshack 1980 R Comedy, Sport
7 Point Break 1991 R Action, Crime, Thriller
8 Point Break 1991 R Action, Crime, Thriller
9 Point Break 1991 R Action, Crime, Thriller
10 The Departed 2006 R Crime, Drama, Thriller
11 The Departed 2006 R Crime, Drama, Thriller
12 The Departed 2006 R Crime, Drama, Thriller
13 Fight Club 1999 R Crime, Drama, Thriller
14 Fight Club 1999 R Crime, Drama, Thriller
15 Fight Club 1999 R Crime, Drama, Thriller
16 Dumb and Dumber 1994 PG-13 Comedy
17 Dumb and Dumber 1994 PG-13 Comedy
18 Dumb and Dumber 1994 PG-13 Comedy
19 The Princess Bride 1987 PG Adventure, Comedy, Family
20 The Princess Bride 1987 PG Adventure, Comedy, Family
21 The Princess Bride 1987 PG Adventure, Comedy, Family
22 Back to the Future 1985 PG Adventure, Comedy, Sci-Fi
23 Back to the Future 1985 PG Adventure, Comedy, Sci-Fi
24 Back to the Future 1985 PG Adventure, Comedy, Sci-Fi
25 The Blues Brothers 1980 R Adventure, Comedy, Crime
26 The Blues Brothers 1980 R Adventure, Comedy, Crime
27 The Blues Brothers 1980 R Adventure, Comedy, Crime
28 The Big Lebowski 1998 R Comedy, Crime
29 The Big Lebowski 1998 R Comedy, Crime
30 The Big Lebowski 1998 R Comedy, Crime
Director
1 Quentin Tarantino
2 Quentin Tarantino
3 Quentin Tarantino
4 Harold Ramis
5 Harold Ramis
6 Harold Ramis
7 Kathryn Bigelow
8 Kathryn Bigelow
9 Kathryn Bigelow
10 Martin Scorsese
11 Martin Scorsese
12 Martin Scorsese
13 David Fincher
14 David Fincher
15 David Fincher
16 Peter Farrelly, Bobby Farrelly
17 Peter Farrelly, Bobby Farrelly
18 Peter Farrelly, Bobby Farrelly
19 Rob Reiner
20 Rob Reiner
21 Rob Reiner
22 Robert Zemeckis
23 Robert Zemeckis
24 Robert Zemeckis
25 John Landis
26 John Landis
27 John Landis
28 Joel Coen, Ethan Coen
29 Joel Coen, Ethan Coen
30 Joel Coen, Ethan Coen
Actors imdbRating
1 John Travolta, Uma Thurman, Samuel L. Jackson 8.8
2 John Travolta, Uma Thurman, Samuel L. Jackson 8.8
3 John Travolta, Uma Thurman, Samuel L. Jackson 8.8
4 Chevy Chase, Rodney Dangerfield, Bill Murray 7.2
5 Chevy Chase, Rodney Dangerfield, Bill Murray 7.2
6 Chevy Chase, Rodney Dangerfield, Bill Murray 7.2
7 Patrick Swayze, Keanu Reeves, Gary Busey 7.3
8 Patrick Swayze, Keanu Reeves, Gary Busey 7.3
9 Patrick Swayze, Keanu Reeves, Gary Busey 7.3
10 Leonardo DiCaprio, Matt Damon, Jack Nicholson 8.5
11 Leonardo DiCaprio, Matt Damon, Jack Nicholson 8.5
12 Leonardo DiCaprio, Matt Damon, Jack Nicholson 8.5
13 Brad Pitt, Edward Norton, Meat Loaf 8.8
14 Brad Pitt, Edward Norton, Meat Loaf 8.8
15 Brad Pitt, Edward Norton, Meat Loaf 8.8
16 Jim Carrey, Jeff Daniels, Lauren Holly 7.3
17 Jim Carrey, Jeff Daniels, Lauren Holly 7.3
18 Jim Carrey, Jeff Daniels, Lauren Holly 7.3
19 Cary Elwes, Mandy Patinkin, Robin Wright 8.0
20 Cary Elwes, Mandy Patinkin, Robin Wright 8.0
21 Cary Elwes, Mandy Patinkin, Robin Wright 8.0
22 Michael J. Fox, Christopher Lloyd, Lea Thompson 8.5
23 Michael J. Fox, Christopher Lloyd, Lea Thompson 8.5
24 Michael J. Fox, Christopher Lloyd, Lea Thompson 8.5
25 John Belushi, Dan Aykroyd, Cab Calloway 7.9
26 John Belushi, Dan Aykroyd, Cab Calloway 7.9
27 John Belushi, Dan Aykroyd, Cab Calloway 7.9
28 Jeff Bridges, John Goodman, Julianne Moore 8.1
29 Jeff Bridges, John Goodman, Julianne Moore 8.1
30 Jeff Bridges, John Goodman, Julianne Moore 8.1
Ratings.Source Ratings.Value Metascore
1 Internet Movie Database 8.8/10 95
2 Rotten Tomatoes 92% 95
3 Metacritic 95/100 95
4 Internet Movie Database 7.2/10 48
5 Rotten Tomatoes 73% 48
6 Metacritic 48/100 48
7 Internet Movie Database 7.3/10 60
8 Rotten Tomatoes 68% 60
9 Metacritic 60/100 60
10 Internet Movie Database 8.5/10 85
11 Rotten Tomatoes 91% 85
12 Metacritic 85/100 85
13 Internet Movie Database 8.8/10 67
14 Rotten Tomatoes 81% 67
15 Metacritic 67/100 67
16 Internet Movie Database 7.3/10 41
17 Rotten Tomatoes 69% 41
18 Metacritic 41/100 41
19 Internet Movie Database 8.0/10 78
20 Rotten Tomatoes 96% 78
21 Metacritic 78/100 78
22 Internet Movie Database 8.5/10 88
23 Rotten Tomatoes 93% 88
24 Metacritic 88/100 88
25 Internet Movie Database 7.9/10 60
26 Rotten Tomatoes 71% 60
27 Metacritic 60/100 60
28 Internet Movie Database 8.1/10 71
29 Rotten Tomatoes 79% 71
30 Metacritic 71/100 71
OMDb API Feature
One thing you’ll notice about the setup of this API, is that there are three different ratings sources: Internet Movie Database (IMDb), Rotten Tomatoes, and Metacritic. When the movie data is stored in a data frame, a row is created for each of these rating sources because the HTML code nests each of the Ratings. So for one movie, there will be three rows. All the other fields remain the same for each of these three rows, just the source and its rating are different per row.
Now, lets say I want to see the Metacritic rating of each of the movies I’ve collected to see which ones are ranked the highest. I can unnest the Ratings and then just filter based on the source.
#Metacritic score rankings
omdb_films %>%
unnest(Ratings) %>%
filter(Source == "Metacritic") %>%
select(Title, Value) %>%
arrange(desc(Value))# A tibble: 10 × 2
Title Value
<chr> <chr>
1 Pulp Fiction 95/100
2 Back to the Future 88/100
3 The Departed 85/100
4 The Princess Bride 78/100
5 The Big Lebowski 71/100
6 Fight Club 67/100
7 Point Break 60/100
8 The Blues Brothers 60/100
9 Caddyshack 48/100
10 Dumb and Dumber 41/100
Or, maybe I just prefer the IMDb rating, which has its own field.
#IMDb ranking rankings
omdb_films %>%
distinct(Title, imdbRating) %>%
arrange(desc(imdbRating)) Title imdbRating
1 Pulp Fiction 8.8
2 Fight Club 8.8
3 The Departed 8.5
4 Back to the Future 8.5
5 The Big Lebowski 8.1
6 The Princess Bride 8.0
7 The Blues Brothers 7.9
8 Point Break 7.3
9 Dumb and Dumber 7.3
10 Caddyshack 7.2
Among these two rating sources, there appears to be a consensus on the best, and worst, rated films: Caddyshack, Dumb and Dumber, Point Break, and The Blues Brothers make up the bottom four (in varying order) in each source, while Pulp Fiction ranks at the top of both. But, well, you know, that’s just like uh, their opinion, man.
Maybe you want to look at the movies that contain a specific actor and/or director. There is no overlap of actors or directors in the movies I have selected, but there are a few “Johns.”
#Actors with the name John
omdb_films %>%
filter(str_detect(Actors, "John")) %>%
distinct(Title, Actors) Title Actors
1 Pulp Fiction John Travolta, Uma Thurman, Samuel L. Jackson
2 The Blues Brothers John Belushi, Dan Aykroyd, Cab Calloway
3 The Big Lebowski Jeff Bridges, John Goodman, Julianne Moore
Similarly, you could filter for a specific genre(s).
#Crime or action movies
omdb_films %>%
filter(str_detect(Genre, "Crime|Action")) %>%
distinct(Title, Genre) Title Genre
1 Pulp Fiction Crime, Drama
2 Point Break Action, Crime, Thriller
3 The Departed Crime, Drama, Thriller
4 Fight Club Crime, Drama, Thriller
5 The Blues Brothers Adventure, Comedy, Crime
6 The Big Lebowski Comedy, Crime
Now, this is only a few pieces of analysis that you can do with this API, using only 9 of the 25 given fields. So you can replicate this code with your own movie and variable choices and conduct your own analysis.
Gunga galunga… gunga, gunga-lagunga