Open Movie Database API (omdb)

Using Gnu R

Access to data has become commonplace thanks to the Internet today. But this applies only to data that is readable by humans. Machine-readable data are rare and usually not openly accessible. It is possible to parse data from html pages. However, this is cumbersome and slow. In most cases, this is also illegal and is expressly mentioned not to do, in the Use Condition.

Fortunately, there are exceptions:

OMDbAPI provides access to a machine-readable film and media database.

OMDbAPI uses JSON for data exchange.

Here an example on how to access this data with Gnu R.

Getting Started

This tutorial will help you to write some R code to have access on the OMDbAPI database. The first thing you might want to implement is a title search function.

library("jsonlite")

searchMovie <- function(exp, url_mirror="http://www.omdbapi.com") {

    jsonURL <- url(paste0(url_mirror, "/?s=", exp))
    movie <-  fromJSON(readLines(jsonURL, warn=FALSE), simplifyVector=FALSE)
    close(jsonURL)
    return(movie[[1]])
}

The expression /?s="peace" is appended to the url. http://www.omdbapi.com/?s="peace". This will return the result in a JSON object. Since a JSON object is read, a JSON parser comes to use. In this example it is »jsonlite«.

Search a movie title:

# one word search
searchResult <- searchMovie("peace")

	Title	Year	imdbID	Type
1	Superman IV: The Quest for Peace	1987	tt0094074	movie
2	War and Peace	1956	tt0049934	movie
3	War and Peace	1966	tt0063794	movie
4	Peace, Love, & Misunderstanding	2011	tt1649780	movie
5	War and Peace	2007	tt0495055	movie
6	Peace on Earth	1939	tt0031790	movie
7	Rest in Peace, Mrs. Columbo	1990	tt0097088	episode
8	Metal Gear Solid: Peace Walker	2010	tt1531061	game
9	A Separate Peace	2004	tt0328400	movie
10	Peace, Propaganda & the Promised Land	2004	tt0428959	movie

The function searchMovie returns a list of all hits. Containing information like:

Title
Year
imdbID
Type

The imdbID is a unique identifier that is needed to get detailed movie information.

movie Detail:

The next function movieDetail gets some Detailed information like rating or runtime in minutes.

movieDetail <- function(imdbId, url_mirror="http://www.omdbapi.com") {

    getMovieDetail <- function(ID, url=url_mirror)
    {
        jsonURL <- url(paste0(url_mirror, "/?i=", ID))
        details <- fromJSON(readLines(jsonURL, warn=FALSE))
        close(jsonURL)
        return(details)
    }
    return(lapply(imdbId, getMovieDetail))
}

This time the expression /?i="tt...." is appended to the url. To get detailed information about the movie, the imdbID (Identifier) is needed. For this example the imdbID from the searchResult is used. Any imdbID can be used. This function could be used to get the top 250 rated movies just by adding a vector with corresponding imdbID numbers.

# Using the function. 
movies <- movieDetail(searchResult[2:3,3])

	Title	Year	imdbID	imdbRating	imdbVotes	Runtime
1	War and Peace	1956	tt0049934	6.80	5314	208
2	War and Peace	1966	tt0063794	7.70	3934	427

The function movieDetail returns a data structure with Details like:

imdbRating = user rating,
imdbVotes = numer of votes,
Runtime = runtime in minutes,
many more.

Not all attributes are listet in this example. Some coercing from a list to a dataframe have to be done.