OMDb API

Author

Stevie Wolf

Introduction

OMDb is an API that pulls information on movies and shows that can be used to compare movies.

Parameters

This API includes variables such as: Title, Year, Rated, Released, Runtime (mins), Genre, Director, Writer, Actors and many more.

The following code are the packages that you will need to load in before beginning to call on the API::

library(tidyverse)
library(jsonlite) 
library(dplyr)
library(knitr)
library(lubridate)
library(magrittr)
library(httr)

Set up Endpoint and API Key

To start, you will need to go to http://www.omdbapi.com and click on API Key where you will sign up and receive an email with your key. Below you will see that we will be setting the endpoint and the key as a value in our environment to set up for the URL. You will then set up your fields that you will be looking for. In my example, I am looking for movie titles with the words Harry Potter in it. By running this, you will get a URL that returns all the movies with “Harry Potter” in the title.

You can use i= or t= when searching by ID or Title and s= when using search.

Type = movie, series, or episode.

y = Year of Release

plot = short or full

page = 1-100

omdb_endpoint <- "http://www.omdbapi.com/?"

api_key <- "apikey=bfba18bc" #using your own API Key

movie_title <- paste("s=Harry%20Potter",
                            "type=movie",
                            "page=1",
                            sep = "&")

omdb_api_get <- paste(omdb_endpoint,api_key,movie_title,sep = "&")
omdb_api_get
[1] "http://www.omdbapi.com/?&apikey=bfba18bc&s=Harry%20Potter&type=movie&page=1"

Load Data

Here we are making the data pulled from the API into a data frame that we can use for visualizations. The first code returns all the movies with “Harry Potter” in the title into R studio. The second code will reorder the columns that makes logistic sense with ‘imdbID’ being on the far left and ‘Poster’ being on the far right.

#1
movie_data <- 
  omdb_api_get %>% 
  eval() %>% 
  fromJSON()
#2
movie_df <-
  movie_data %>% 
    extract2(1) %>% 
    relocate(imdbID,Title, Type, Year, Poster)

Visualization

Below I am running a visualization code that will give me the distribution of Harry Potter Movies over time. I am characterizing ‘Year’ as a numeric variable

release_years <- movie_data$Search$Year
release_years <- as.numeric(release_years)
years_data <- data.frame(Release_years = release_years)

ggplot(years_data, aes(x = Release_years)) +
    geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
    labs(title = "Distribution of Movie Release Years", x = "Year", y = "Count") +
    theme_minimal()

Summary

The visualization above shows that there was a pretty even distribution of Harry Potter movies being released over the years. The years surrounding 2010 were a big release years for the Harry Potter franchise as there were two movies released in the same year and a movie that was released the year before and the year following. The one outlier is a movie that was released at least a decade later than the other movies which was the 20th anniversary movie to honor the entire franchise.