Assignment 6: Open Movie Database API Tutorial

Author

Sawyer Hurley

About the Open Movie Database

The Open Movie Database (OMDb), https://www.omdbapi.com, is an API web service that allows users to collect information on various movies. For example, you might desire to compare ratings (IMDb, Metacritic, or Rotten Tomatoes) of some of your favorite movies (I’ll demonstrate this later), or maybe look at a given actor or director’s filmography.

Walk-through for Setting up the OMDb API

To use the OMDb API, you must first obtain an API key. A tab at the top of the home page will direct you to where you set this up, or you can go to: https://www.omdbapi.com/apikey.aspx. You can access this API for free (daily limit of 1,000 requests) by entering your email to receive your own unique key. After acquiring this, you can begin setting up the API in R.

Briefly, before going to R, the OMDb website allows you to see an example of an API request. By entering a movie title (and the release year if needed), you can see what a JSON response looks like, and the fields it contains. Here is an example for the movie, The Sting.

Request:

http://www.omdbapi.com/?t=the+sting

Response:

{"Title":"The Sting","Year":"1973","Rated":"PG","Released":"25 Dec 1973","Runtime":"129 min","Genre":"Comedy, Crime, Drama","Director":"George Roy Hill","Writer":"David S. Ward","Actors":"Paul Newman, Robert Redford, Robert Shaw","Plot":"Two grifters team up to pull off the ultimate con.","Language":"English","Country":"United States","Awards":"Won 7 Oscars. 18 wins & 6 nominations total","Poster":"https://m.media-amazon.com/images/M/MV5BZGI4OTk4MDMtYmQ1Ni00YTUzLTkyYTktZGUwMjMyN2M4NjQ5XkEyXkFqcGc@._V1_SX300.jpg","Ratings":[{"Source":"Internet Movie Database","Value":"8.2/10"},{"Source":"Rotten Tomatoes","Value":"93%"},{"Source":"Metacritic","Value":"83/100"}],"Metascore":"83","imdbRating":"8.2","imdbVotes":"298,899","imdbID":"tt0070735","Type":"movie","DVD":"N/A","BoxOffice":"$156,000,000","Production":"N/A","Website":"N/A","Response":"True"}

(This is what the response looks like when it neatly formatted, but you can’t do this through the example feature on the website. You need to include your API key in the request.)

{
  "Title": "The Sting",
  "Year": "1973",
  "Rated": "PG",
  "Released": "25 Dec 1973",
  "Runtime": "129 min",
  "Genre": "Comedy, Crime, Drama",
  "Director": "George Roy Hill",
  "Writer": "David S. Ward",
  "Actors": "Paul Newman, Robert Redford, Robert Shaw",
  "Plot": "Two grifters team up to pull off the ultimate con.",
  "Language": "English",
  "Country": "United States",
  "Awards": "Won 7 Oscars. 18 wins & 6 nominations total",
  "Poster": "https://m.media-amazon.com/images/M/MV5BZGI4OTk4MDMtYmQ1Ni00YTUzLTkyYTktZGUwMjMyN2M4NjQ5XkEyXkFqcGc@._V1_SX300.jpg",
  "Ratings": [
    {
      "Source": "Internet Movie Database",
      "Value": "8.2/10"
    },
    {
      "Source": "Rotten Tomatoes",
      "Value": "93%"
    },
    {
      "Source": "Metacritic",
      "Value": "83/100"
    }
  ],
  "Metascore": "83",
  "imdbRating": "8.2",
  "imdbVotes": "298,899",
  "imdbID": "tt0070735",
  "Type": "movie",
  "DVD": "N/A",
  "BoxOffice": "$156,000,000",
  "Production": "N/A",
  "Website": "N/A",
  "Response": "True"
}

As you can see in the request, the format is pretty simple. You must provide your API key, and then either provide the title (t) or the IMDb ID (i). All other parameters are optional. These include the type of media (film, series, or episode), the release year, the plot, etc.

With an understanding of how this API works, we can now move into R. Bring out the code.

#Load packages
library(tidyverse) 
library(jsonlite)  
library(magrittr)  
library(httr)     

The first step is to build a function that makes a GET request to the API. This is done by putting together a URL that includes the website endpoint, movie title parameter (t), and API key.

#Create a function to make a OMDb API GET request
omdb_api_GET_url <- function(t, api_key) {
  
  #URL endpoint
  omdb_endpoint <- "http://www.omdbapi.com/"
  
  #Convert title (string) into a URL
  t <- URLencode(t)
  
  #Full URL
  full_url <- paste0(omdb_endpoint, "?t=", t, "&apikey=", api_key)
  
  return(full_url)
}

The next step is to this URL function that we just created to build a data frame with our desired fields. You could just end the function after the “fromJSON()” line if you wanted all 25 fields, but if I’m only interested in some of them, I can specify those fields with the “select()” function.

#Create a function that turns the URL argument into a data frame
omdb_api_GET_df <- function(t, api_key) {
  
  #Use previous function
  omdb_api_url <- omdb_api_GET_url(t, api_key)
  
  #Build df with desired fields
  omdb_df <- 
    omdb_api_url %>% 
    GET() %>% 
    content(as = "text", 
            encoding = "UTF-8") %>% 
    fromJSON(flatten = TRUE) %>% 
    as_tibble() %>% 
    select(
      Title,
      Year,
      Rated,
      Genre,
      Director,
      Actors,
      imdbRating,
      Ratings,
      Metascore) 
  
  return(omdb_df)
}

Next, before putting this second function to use, I have to build a vector of the movies I want to pull information on. I have put together a list of some of my favorite movies below.

#Desired movies to pull info on
titles <- c("pulp fiction", "caddyshack", "point break", "the departed", "fight club", 
            "dumb and dumber", "the princess bride", "back to the future", 
            "the blues brothers", "the big lebowski")

Lastly, I am going to build an empty data frame and then loop my desired movies through the GET_df function to populate it.

#Build empty data frame to populate data in
omdb_films <- data.frame()

#Loop to collect info for desired movie
for(t in titles){
  
  omdb_films <- bind_rows(omdb_films, omdb_api_GET_df(t, api_key = API_KEY)
  )
  #Rest for 2 seconds between each request
  Sys.sleep(2)
}
#Data pulled from the API
omdb_films
                Title Year Rated                     Genre
1        Pulp Fiction 1994     R              Crime, Drama
2        Pulp Fiction 1994     R              Crime, Drama
3        Pulp Fiction 1994     R              Crime, Drama
4          Caddyshack 1980     R             Comedy, Sport
5          Caddyshack 1980     R             Comedy, Sport
6          Caddyshack 1980     R             Comedy, Sport
7         Point Break 1991     R   Action, Crime, Thriller
8         Point Break 1991     R   Action, Crime, Thriller
9         Point Break 1991     R   Action, Crime, Thriller
10       The Departed 2006     R    Crime, Drama, Thriller
11       The Departed 2006     R    Crime, Drama, Thriller
12       The Departed 2006     R    Crime, Drama, Thriller
13         Fight Club 1999     R    Crime, Drama, Thriller
14         Fight Club 1999     R    Crime, Drama, Thriller
15         Fight Club 1999     R    Crime, Drama, Thriller
16    Dumb and Dumber 1994 PG-13                    Comedy
17    Dumb and Dumber 1994 PG-13                    Comedy
18    Dumb and Dumber 1994 PG-13                    Comedy
19 The Princess Bride 1987    PG Adventure, Comedy, Family
20 The Princess Bride 1987    PG Adventure, Comedy, Family
21 The Princess Bride 1987    PG Adventure, Comedy, Family
22 Back to the Future 1985    PG Adventure, Comedy, Sci-Fi
23 Back to the Future 1985    PG Adventure, Comedy, Sci-Fi
24 Back to the Future 1985    PG Adventure, Comedy, Sci-Fi
25 The Blues Brothers 1980     R  Adventure, Comedy, Crime
26 The Blues Brothers 1980     R  Adventure, Comedy, Crime
27 The Blues Brothers 1980     R  Adventure, Comedy, Crime
28   The Big Lebowski 1998     R             Comedy, Crime
29   The Big Lebowski 1998     R             Comedy, Crime
30   The Big Lebowski 1998     R             Comedy, Crime
                         Director
1               Quentin Tarantino
2               Quentin Tarantino
3               Quentin Tarantino
4                    Harold Ramis
5                    Harold Ramis
6                    Harold Ramis
7                 Kathryn Bigelow
8                 Kathryn Bigelow
9                 Kathryn Bigelow
10                Martin Scorsese
11                Martin Scorsese
12                Martin Scorsese
13                  David Fincher
14                  David Fincher
15                  David Fincher
16 Peter Farrelly, Bobby Farrelly
17 Peter Farrelly, Bobby Farrelly
18 Peter Farrelly, Bobby Farrelly
19                     Rob Reiner
20                     Rob Reiner
21                     Rob Reiner
22                Robert Zemeckis
23                Robert Zemeckis
24                Robert Zemeckis
25                    John Landis
26                    John Landis
27                    John Landis
28          Joel Coen, Ethan Coen
29          Joel Coen, Ethan Coen
30          Joel Coen, Ethan Coen
                                            Actors imdbRating
1    John Travolta, Uma Thurman, Samuel L. Jackson        8.8
2    John Travolta, Uma Thurman, Samuel L. Jackson        8.8
3    John Travolta, Uma Thurman, Samuel L. Jackson        8.8
4     Chevy Chase, Rodney Dangerfield, Bill Murray        7.2
5     Chevy Chase, Rodney Dangerfield, Bill Murray        7.2
6     Chevy Chase, Rodney Dangerfield, Bill Murray        7.2
7         Patrick Swayze, Keanu Reeves, Gary Busey        7.3
8         Patrick Swayze, Keanu Reeves, Gary Busey        7.3
9         Patrick Swayze, Keanu Reeves, Gary Busey        7.3
10   Leonardo DiCaprio, Matt Damon, Jack Nicholson        8.5
11   Leonardo DiCaprio, Matt Damon, Jack Nicholson        8.5
12   Leonardo DiCaprio, Matt Damon, Jack Nicholson        8.5
13             Brad Pitt, Edward Norton, Meat Loaf        8.8
14             Brad Pitt, Edward Norton, Meat Loaf        8.8
15             Brad Pitt, Edward Norton, Meat Loaf        8.8
16          Jim Carrey, Jeff Daniels, Lauren Holly        7.3
17          Jim Carrey, Jeff Daniels, Lauren Holly        7.3
18          Jim Carrey, Jeff Daniels, Lauren Holly        7.3
19        Cary Elwes, Mandy Patinkin, Robin Wright        8.0
20        Cary Elwes, Mandy Patinkin, Robin Wright        8.0
21        Cary Elwes, Mandy Patinkin, Robin Wright        8.0
22 Michael J. Fox, Christopher Lloyd, Lea Thompson        8.5
23 Michael J. Fox, Christopher Lloyd, Lea Thompson        8.5
24 Michael J. Fox, Christopher Lloyd, Lea Thompson        8.5
25         John Belushi, Dan Aykroyd, Cab Calloway        7.9
26         John Belushi, Dan Aykroyd, Cab Calloway        7.9
27         John Belushi, Dan Aykroyd, Cab Calloway        7.9
28      Jeff Bridges, John Goodman, Julianne Moore        8.1
29      Jeff Bridges, John Goodman, Julianne Moore        8.1
30      Jeff Bridges, John Goodman, Julianne Moore        8.1
            Ratings.Source Ratings.Value Metascore
1  Internet Movie Database        8.8/10        95
2          Rotten Tomatoes           92%        95
3               Metacritic        95/100        95
4  Internet Movie Database        7.2/10        48
5          Rotten Tomatoes           73%        48
6               Metacritic        48/100        48
7  Internet Movie Database        7.3/10        60
8          Rotten Tomatoes           68%        60
9               Metacritic        60/100        60
10 Internet Movie Database        8.5/10        85
11         Rotten Tomatoes           91%        85
12              Metacritic        85/100        85
13 Internet Movie Database        8.8/10        67
14         Rotten Tomatoes           81%        67
15              Metacritic        67/100        67
16 Internet Movie Database        7.3/10        41
17         Rotten Tomatoes           69%        41
18              Metacritic        41/100        41
19 Internet Movie Database        8.0/10        78
20         Rotten Tomatoes           96%        78
21              Metacritic        78/100        78
22 Internet Movie Database        8.5/10        88
23         Rotten Tomatoes           93%        88
24              Metacritic        88/100        88
25 Internet Movie Database        7.9/10        60
26         Rotten Tomatoes           71%        60
27              Metacritic        60/100        60
28 Internet Movie Database        8.1/10        71
29         Rotten Tomatoes           79%        71
30              Metacritic        71/100        71

OMDb API Feature

One thing you’ll notice about the setup of this API, is that there are three different ratings sources: Internet Movie Database (IMDb), Rotten Tomatoes, and Metacritic. When the movie data is stored in a data frame, a row is created for each of these rating sources because the HTML code nests each of the Ratings. So for one movie, there will be three rows. All the other fields remain the same for each of these three rows, just the source and its rating are different per row.

Now, lets say I want to see the Metacritic rating of each of the movies I’ve collected to see which ones are ranked the highest. I can unnest the Ratings and then just filter based on the source.

#Metacritic score rankings
omdb_films %>%
  unnest(Ratings) %>%
  filter(Source == "Metacritic") %>%
  select(Title, Value) %>%
  arrange(desc(Value))
# A tibble: 10 × 2
   Title              Value 
   <chr>              <chr> 
 1 Pulp Fiction       95/100
 2 Back to the Future 88/100
 3 The Departed       85/100
 4 The Princess Bride 78/100
 5 The Big Lebowski   71/100
 6 Fight Club         67/100
 7 Point Break        60/100
 8 The Blues Brothers 60/100
 9 Caddyshack         48/100
10 Dumb and Dumber    41/100

Or, maybe I just prefer the IMDb rating, which has its own field.

#IMDb ranking rankings
omdb_films %>%
  distinct(Title, imdbRating) %>%
  arrange(desc(imdbRating))
                Title imdbRating
1        Pulp Fiction        8.8
2          Fight Club        8.8
3        The Departed        8.5
4  Back to the Future        8.5
5    The Big Lebowski        8.1
6  The Princess Bride        8.0
7  The Blues Brothers        7.9
8         Point Break        7.3
9     Dumb and Dumber        7.3
10         Caddyshack        7.2

Among these two rating sources, there appears to be a consensus on the best, and worst, rated films: Caddyshack, Dumb and Dumber, Point Break, and The Blues Brothers make up the bottom four (in varying order) in each source, while Pulp Fiction ranks at the top of both. But, well, you know, that’s just like uh, their opinion, man.

Maybe you want to look at the movies that contain a specific actor and/or director. There is no overlap of actors or directors in the movies I have selected, but there are a few “Johns.”

#Actors with the name John
omdb_films %>%
  filter(str_detect(Actors, "John")) %>% 
  distinct(Title, Actors) 
               Title                                        Actors
1       Pulp Fiction John Travolta, Uma Thurman, Samuel L. Jackson
2 The Blues Brothers       John Belushi, Dan Aykroyd, Cab Calloway
3   The Big Lebowski    Jeff Bridges, John Goodman, Julianne Moore

Similarly, you could filter for a specific genre(s).

#Crime or action movies
omdb_films %>% 
  filter(str_detect(Genre, "Crime|Action")) %>% 
  distinct(Title, Genre)
               Title                    Genre
1       Pulp Fiction             Crime, Drama
2        Point Break  Action, Crime, Thriller
3       The Departed   Crime, Drama, Thriller
4         Fight Club   Crime, Drama, Thriller
5 The Blues Brothers Adventure, Comedy, Crime
6   The Big Lebowski            Comedy, Crime

Now, this is only a few pieces of analysis that you can do with this API, using only 9 of the 25 given fields. So you can replicate this code with your own movie and variable choices and conduct your own analysis.

Gunga galunga… gunga, gunga-lagunga