# Setup packages
library(tidyverse)
library(jsonlite)
library(magrittr)
omdb_endpoint <- "http://www.omdbapi.com/?"
api_key <- "apikey=ENTER_API_KEY" #using your own API Key
movie_title_search <- paste("s=Iron%20Man", #broad search parameter
"type=movie",
"page=1",
sep = "&")
omdb_api_get <- paste(omdb_endpoint,api_key,movie_title_search,sep = "&")
omdb_api_getMovie Search API
OMDb API Intro
For this project, I used the OMDb API which gets movie information and images from IMDb.com
Someone might want to use this API in order to pull data on different movies to compare ratings and scores, box office earnings, and plenty of other interesting things. Most of this page will help you see how you can pull movies and create a type of timeline for movies with Iron Man in the title.
Variables
Key Parameters
There are different search parameter options for the API. You can pull a movie by its IMDb ID or full title, or you can pull a list of movies from a search based on movie title. That means if you type “Batman” into the search, all the movies with “Batman” somewhere in the title will come up. This is different than having the parameter be set to title, which would pull up all the information for the movie titled “Batman” from 1989. One of the 3 of these options has to be used.
Other Parameters
If searching by ID or full title, you are returned with a complete list of information about that specific movie. If you search using the broad title search syntax you are provided with only info on the title, year, imdb ID, type, and the poster jpeg if it is available. The other parameters available to narrow your search are as follows:
Type
- Movie, series, or episode
Year of release
Plot (only for ID or full title)
- Full or short
Page (only for broad title search)
- 1-100 pages
Getting the URL
In R, pasting together parts of the URL when broadly searching for movies with “Iron Man” in the title, it looks like this:
If broad searching for movies with Iron Man in the title and everything is set up and ready to call the API, the URL will look like this:
http://www.omdbapi.com/?&apikey=ENTER_API_KEY&s=Iron%20Man&type=movie&page=1
Note: You will need to enter your own API Key in the R code
- To get your own API Key, simply go to the OMDbAPI website and sign up for a free key.
Getting The Data In RStudio
After getting the URL created, you can use it to pull the data into RStudio and create a data frame.
- The first piece of the code below brings the data into RStudio in the form of lists. In this case we have 3 lists: the search results, total number of results (90 here), and response (True). Since we have to get results from each page I pulled all the results separately into its own 3 lists.
- The only list we really need is the list containing our search results and the data inside that list for each page. That’s what the second part of the below code does. I comine all the data and then it extracts the first list and makes it into a data frame.
#### PART 1: Pulling each page of data
movie_data_page1 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page2 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page3 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page4 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page5 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page6 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page7 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page8 <-
omdb_api_get %>%
eval() %>%
fromJSON()
movie_data_page9 <-
omdb_api_get %>%
eval() %>%
fromJSON()
#### PART 2: Combining all the data together, then putting into one data frame
full_movie_data <-
bind_rows(
movie_data_page1, movie_data_page2, movie_data_page3, movie_data_page4,
movie_data_page5, movie_data_page6, movie_data_page7, movie_data_page8,
movie_data_page9)
# Put data into a data frame
movie_get_df <-
full_movie_data %>%
extract2(1) %>%
relocate(imdbID,Title,Year,Type,Poster)Visualizing the Data in Some Way
If we wanted to create a type of timeline that shows when each movie was released and the count of movies released that year we could code something like this:
# Extract release years of all Iron Man movies
release_years <- full_movie_data$Search$Year
# Convert release years to numeric
release_years <- as.numeric(gsub("[^0-9]", "", full_movie_data$Search$Year))
# Create a data frame for the timeline
timeline_data <- data.frame(Release_Year = release_years)
# Count the number of movies released each year
year_counts <- table(timeline_data$Release_Year)
# Create a data frame for plotting
plot_data <- data.frame(Year = as.numeric(names(year_counts)),
Count = as.numeric(year_counts))
# Plot the timeline showing count of movies per year
ggplot(plot_data, aes(x = Year, y = Count)) +
geom_point(color = "red", size = 3) +
geom_line(color = "blue", size = 0.7) +
labs(title = "Timeline of Movies with Iron Man in the Title",
subtitle = "Height shows number of those movies released that year",
x = "Release Year",
y = "Number of Movies Released")Here is what it comes out to based on our data:
Now first glance at this we see the large spike around what appears to be 2012 and the other spike a few years before that. This is what we know as Iron Man from the Marvel movies. Many of the other movies have nothing to do with Iron Man from Marvel, but are here because of how we were interacting with the API, broadly searching from title. You may be thinking, why is there a spike though, since it was only 1 Marvel movie at a time? The answer is there are some duplicates for the same movie that have their own page, or dubbed versions for other countries, or even just weird mini films based on Iron Man that I’ve never heard of.
Concluding Ideas
In the future, I would like to be able to make the code more function-able where the user can input either the specific movie based on title or ID, or do their own broad search. However, what we have here gives a good glimpse at what the API can do and how it can be used in R.