The Open Movie Database (OMDb) API Tutorial

Author

Aaryan Bhatta

Introduction

The Open Movie Database API (OMDb) is online web service that we can query in order to extract data about movies such as, the title of the movie, the plot, the director and more. What makes the OMDb API really valuable is how it allows users to easily extract movie data according to their needs. What makes this API really accessible is the fact that users are able to sign up to get their own api key for free.

This data can be useful for researchers, movie enthusiasts or even students, as the data extracted from this API allows these people to conduct research on movies, utilize the data for developing applications or even learn about APIs in general as this API is fairly easy to use.

Key Fields

The Key fields that we will be extracting in this tutorial are:

Title = The name of the movie
Year = When the movie was released
Director= The director(s) of the movie
Genres = The genre(s) of the movie
imdbID = The unique imdb identifier
Type = Specifies whether it is a movie, series or episode

Note: There are more details that we can extraxt from this API, but for now, we are going to focus on these key fields.

Fetching API Key

In order to use the OMDB API, we first need to retrieve your own api key.

Go to the OMDb website: https://www.omdbapi.com
Click the API Key section on the top right of the website
Select either a Patreon account or the free option which provides you your own api key for free.
For this tutorial, the free version will be good enough, so just register with your email and your API key should be in the email you registered with. (Note: The free version will only allow 1000 requests to the api daily)

Setup

In order to utilize the OMDb API to extract the movie details that we need, we first need to load the necessary libraries.

Next, you will use the API key that you got in your email. The API key allows us to extract the necessary movie data, which is formatted as a JSON file.

Below is the base URL format that we will need to use in order to extract the necessary data. Where it says “[yourkey]”, you will replace that part with your own API key.

(Note: each parameter in this URL will be separated by “&”)

Base Url: http://www.omdbapi.com/?apikey=[yourkey]&

One of the ways to extract data from the API, is to specify what movie we want to extract by it’s title or its IMDb title. We will be using the title parameter to extract the data we need.

Main parameter: “&t=” - searches for a specific movie by its exact title

Here is the base setup of how we are going to extract the data of a movie.

your_api_key = "" # Replace with your api key into the string
movie_name = "" # Replace with the movie title that you are looking for
url = paste0("http://www.omdbapi.com/?apikey=",your_api_key,"&t=",movie_name) # For the title parameter, we need to use "t="

Extracting the Data of One Movie Tutorial

Here is an example of using the setup above:

We want to gather data about the movie known as , “The Batman” and extract its title, year released, the director and the genre. We will accomplish this as follows:

# Create a function that will extract the data of one movie.
# The two parameters: movie_name = "Title of the movie we want", 
#                     your_api_key = "Your own api key"
extract_movie_info <- function(movie_name,your_api_key){
  # URL to query the API. 
  #(Used URLencode to handle special characters and spaces)
  url = paste0("http://www.omdbapi.com/?apikey=",your_api_key,"&t=",URLencode(movie_name)) 
  
  # Extract the data of the movie and format it 
  movie_data <- 
    url %>% 
    GET() %>% 
    content(as = "text",
            encoding = "UTF-8") %>% 
    fromJSON()
  
  return(movie_data)
}
# Remember to add your api key in the code chunk in "Setup"!
movie_name = "The Batman"
batman_data <- extract_movie_info(movie_name,your_api_key)

# Print the details of the movie
print(paste("Title:",batman_data$Title))

[1] "Title: The Batman"

print(paste("Year:",batman_data$Year))

[1] "Year: 2022"

print(paste("Director:",batman_data$Director))

[1] "Director: Matt Reeves"

print(paste("Genre:",batman_data$Genre))

[1] "Genre: Action, Crime, Drama"

In this tutorial, we have extracted very basic details about the movie, “The Batman”. We extracted, the title of the movie, the year it was released, the directors of the movie and the genres of the movie.

The OMDb API is not only limited to the details that we have extracted, as there are plenty more details that you can extract according to your needs such as, Actors, Plot, Awards, and many more details.

Extracting the Data of Multiple Movies Tutorial

Let’s say I would like to extract more movie data about multiple Batman movies. In order to do this, we need to extract a function that will extract the data of multiple movies and we need to define a vector of all the Batman movies we would like to look at.

# Create a function that will extract the data of multiple movies.
# The two parmeters: movie_names = "Vector of the titles of movies",                        your_api_key = "Your own api key"
extract_multiple_movies_info <- function(movie_names,your_api_key){
  for(i in movie_names){
    # URL to query the API. 
    #(Used URLencode to handle special characters and spaces)
    url = paste0("http://www.omdbapi.com/?apikey=",your_api_key,"&t=",URLencode(i))
    
    # Extract the data of the movie and format it
    movie_data <- 
      url %>% 
      GET() %>% 
      content(as = "text",
              encoding = "UTF-8") %>% 
      fromJSON()
    # Printing out the details 
    print(paste("Title:",movie_data$Title))
    print(paste("Year:",movie_data$Year))
    print(paste("Director:",movie_data$Director))
    print(paste("Genre:",movie_data$Genre))
    print(paste("~~~~~~~~~~~"))
    
    # Use sleep function avoid rate limits
    Sys.sleep(5)
  }
}
# Vector that contains the movie that we want 
movie_names=c("The Batman","Batman Begins","The Dark Knight","The Dark Knight Rises")
# Remember to add your api key in the code chunk in "Setup"!
batman_data <- extract_multiple_movies_info(movie_names,your_api_key)

[1] "Title: The Batman"
[1] "Year: 2022"
[1] "Director: Matt Reeves"
[1] "Genre: Action, Crime, Drama"
[1] "~~~~~~~~~~~"
[1] "Title: Batman Begins"
[1] "Year: 2005"
[1] "Director: Christopher Nolan"
[1] "Genre: Action, Drama"
[1] "~~~~~~~~~~~"
[1] "Title: The Dark Knight"
[1] "Year: 2008"
[1] "Director: Christopher Nolan"
[1] "Genre: Action, Crime, Drama"
[1] "~~~~~~~~~~~"
[1] "Title: The Dark Knight Rises"
[1] "Year: 2012"
[1] "Director: Christopher Nolan"
[1] "Genre: Action, Drama, Thriller"
[1] "~~~~~~~~~~~"

In this tutorial involved us using loops to continuously extract results of the multiple movies’ details that we were looking for while, making sure we avoid reaching the rate limit. This shows that we are not limited to just extracting only one movie. By doing this, it allows people of interest to analyze more movies and visualize any insights they want to research.

Searching For Movies Based off Key Word

In the previous sections, we were able to query the API by inputting the specific title(s) of the movie(s) that we wanted. We can also query the API by asking it for a key word of the title of the movies that we are looking for. Through this query, the API will return all the movies that have this keyword.

Let’s try an example with “Batman” where we will try create a data frame with all movies that has “Batman” in its title. In order to query the API, we need to replace the parameter that we used previously, “&t=” with “&s=”. There may be multiple pages of data that we need to go through, so we will need to go through each page.

(Note: Each page can only display a max of 10 movies)

Here is the new URL: http://www.omdbapi.com/?apikey=“your_api_key”&s=“key word”)

# Create a function that will extract the data of multiple movies
# based off a key word and make a data frame of the results
# The two parameters: movie_names ="Key word used to search movies",                         your_api_key = "Your own api key"
search_movies <- function(key_word,your_api_key){
  # URL to query the API. 
  #(Used URLencode to handle speical characters and spaces)
  search_url = paste0("http://www.omdbapi.com/?apikey=",your_api_key,"&s=",URLencode(key_word))
  
  # Extract the data of the movie and format it
  movie_data <- 
    search_url %>% 
    GET() %>% 
    content(as = "text",
            encoding = "UTF-8") %>% 
    fromJSON()
  # Calculate the total pages of data there is 
  total_pages <- ceiling(movie_data$total_results / 10)
  
  searchedMovies <- data.frame()
  
  # Loop to gather data from each page an append it to the data frame
  # Normally we would replace "5" with total_pages but because there 
  # are too many pages to go through, we limit to only 5
  for (i in 1:5) { 
    test_url = paste0("http://www.omdbapi.com/?apikey=",your_api_key,"&s=",URLencode(key_word),"&page=",i)
    movie_data <- 
      test_url %>% 
      GET() %>% 
      content(as = "text",
              encoding = "UTF-8") %>% 
      fromJSON() %>% 
      use_series(Search)
    
    print(paste("Page",i,"of",total_pages,"results collected", sep = " "))
    # Appending the "search results" to data frame
    searchedMovies <- bind_rows(searchedMovies,movie_data)
    # Use sleep function avoid rate limits
    Sys.sleep(3)
  }
  return(searchedMovies)
}
key_word="Batman"
# Accessing the data through the function call
batman_movies <- search_movies(key_word,your_api_key)

[1] "Page 1 of  results collected"
[1] "Page 2 of  results collected"
[1] "Page 3 of  results collected"
[1] "Page 4 of  results collected"
[1] "Page 5 of  results collected"

head(batman_movies) %>% 
  select(-Poster) %>% 
  kable()

Title	Year	imdbID	Type
Batman Begins	2005	tt0372784	movie
The Batman	2022	tt1877830	movie
Batman v Superman: Dawn of Justice	2016	tt2975590	movie
Batman	1989	tt0096895	movie
Batman Returns	1992	tt0103776	movie
Batman & Robin	1997	tt0118688	movie

In this tutorial above, we have limited the loop to only 5 pages as, the total pages that we would have to loop would be too large to process in this example. Under normal circumstances, we would loop with the total_pages variable that we calculated.

As seen here, by using a keyword of a title to find movies we were able to extract multiple movies and store these results into a data frame. The format of our results are a little different this time, as we only have the title of the movie, when it was released, its unique imdb identification and whether our entry is a movie, series or an episode.

Conclusion

In conclusion, the OMDb API is really useful to to extract a wide array of movie data. Through this tutorial, we have learnt how to query the API to extract data about a specific movie, compare the data of multiple movies and create a data frame from searching movies based off a keyword in its title.

Although this tutorial only focused on extracting essential fields from the API, the API offers more additional data points of interest, like actors, ratings, run time and so much more. These additional fields allows users to engage into a more deeper analysis for their own needs.

The OMDb API is a powerful tool in which users can explore to a vast array of movie data to fulfill their goals and interests.