library(tidyverse)
library(jsonlite)
library(dplyr)
library(knitr)
library(lubridate)
library(magrittr)
library(httr)OMDb API
Introduction
OMDb is an API that pulls information on movies and shows that can be used to compare movies.
Parameters
This API includes variables such as: Title, Year, Rated, Released, Runtime (mins), Genre, Director, Writer, Actors and many more.
The following code are the packages that you will need to load in before beginning to call on the API::
Set up Endpoint and API Key
To start, you will need to go to http://www.omdbapi.com and click on API Key where you will sign up and receive an email with your key. Below you will see that we will be setting the endpoint and the key as a value in our environment to set up for the URL. You will then set up your fields that you will be looking for. In my example, I am looking for movie titles with the words Harry Potter in it. By running this, you will get a URL that returns all the movies with “Harry Potter” in the title.
You can use i= or t= when searching by ID or Title and s= when using search.
Type = movie, series, or episode.
y = Year of Release
plot = short or full
page = 1-100
omdb_endpoint <- "http://www.omdbapi.com/?"
api_key <- "apikey=bfba18bc" #using your own API Key
movie_title <- paste("s=Harry%20Potter",
"type=movie",
"page=1",
sep = "&")
omdb_api_get <- paste(omdb_endpoint,api_key,movie_title,sep = "&")
omdb_api_get[1] "http://www.omdbapi.com/?&apikey=bfba18bc&s=Harry%20Potter&type=movie&page=1"
Load Data
Here we are making the data pulled from the API into a data frame that we can use for visualizations. The first code returns all the movies with “Harry Potter” in the title into R studio. The second code will reorder the columns that makes logistic sense with ‘imdbID’ being on the far left and ‘Poster’ being on the far right.
#1
movie_data <-
omdb_api_get %>%
eval() %>%
fromJSON()
#2
movie_df <-
movie_data %>%
extract2(1) %>%
relocate(imdbID,Title, Type, Year, Poster)Visualization
Below I am running a visualization code that will give me the distribution of Harry Potter Movies over time. I am characterizing ‘Year’ as a numeric variable
release_years <- movie_data$Search$Year
release_years <- as.numeric(release_years)
years_data <- data.frame(Release_years = release_years)
ggplot(years_data, aes(x = Release_years)) +
geom_histogram(binwidth = 1, fill = "skyblue", color = "black") +
labs(title = "Distribution of Movie Release Years", x = "Year", y = "Count") +
theme_minimal()
Summary
The visualization above shows that there was a pretty even distribution of Harry Potter movies being released over the years. The years surrounding 2010 were a big release years for the Harry Potter franchise as there were two movies released in the same year and a movie that was released the year before and the year following. The one outlier is a movie that was released at least a decade later than the other movies which was the 20th anniversary movie to honor the entire franchise.