In the past, working in R has meant importing data into the application from numerous sources. I have found this very manual and not very reproducible in the development of applications. One of the weaknesses I have identified and would like to address is to understand how to access online data resources using the appropriate packages in R. The scope of this exercise is not to master the use of one package or several packages, but rather understand how certain R packages fit into the overall process, and what that process is.
This document provides an introductory overview of what APIs are and how to access data and metadata available from remote servers over the internet, using R. The specifics of this access is defined grammatically via APIs - Application Programming Interfaces, which define what resources are to be accessed, where these might be accessed, who is authorised to access them, what are the rules and syntax for accessing them, among many other things.
Representational State Transfer (REST) is a collection of syntax and constraints used in the development and operation of web services. RESTful APIs send and receive information through their ‘endpoint’ - a URL interface facilitating the external communication using HTTP methods, or instructions native to the internet’s main communication protocol. These are used to GET, PUT, POST and DELETE. In REST, the abstraction of information is called a ‘resource’ and REST APIs use Uniform Resource Identifiers (URIs) to identify and address resources.
An API allows an application to access resources or functionality available on another server remote to that application’s architectural and security domain. Access to APIs is often authenticated, or only allowed to authenticated/authorised users. This interaction is a cornerstone of today’s web applications, which interchange information via authorised links and methods. Specifically for the purpose of creating solutions in R, accessing APIs is highly useful as it allows the programmable definition of not only the successful connection to remote services, but also of the specific data and metadata being sought. For any serious and automated application, having the ability to connect to remotely hosted resources is vital.
RESTful API responses are usually structured in XML or JSON file formats, both machine readable, and allowing data to be structured in a hierarchical format. JSON, short for JavaScript Object Notation, is a file format that not only allows data to be contained and organised, but defines the structure of that organisation. This is useful as data is often organised in a structure, both at the source (e.g. people hosting the data) and target (e.g. people accessing the data).
Below is an example of JSON data, which includes various data types, and is nested in a hierarchy, as well as a diagram illustrating this data and hierarchy.
[{ "array": [ 1, 2, [3, 4], 5 ], "boolean": true, "null": null, "number": 123, "object": { "a": "b", "c": "d", "e": "f" },"string": "Hello World" }, { "|": "|", " ": { "??": "" }, " ": ["???", [ { "hello! ": "<??????" } ], "???"], "_": { "": "??" } }]
There are several R packages that will ‘wrap’ API calls. These are often written to access specific services and may include a limited of endpoints. However, data may also be accessed making direct HTTP requests using packages like ‘httr’, and packages like ‘jsonlite’ to parse JSON files for use in R. For the purpose of this document, httr and jsonlite will be used to access data. Also in REST,
The OMDb API is an open web service that hosts movie information, contributed and maintained by users. To make a call to this service, the endpoint must be specified, as well as an ‘API key’, which may be requested from the host. Such services are usually secured by various layers of security. Issuing an API key, is often mainly done to identify users and to limit traffic.
Other parameters must also be specified in the request call. These are listed below.
Using these, as well as an ‘apikey’ supplied by the host, I will request details for the 1957 release of “12 Angry Men” using the’ following HTTP request method ‘GET’:
library(httr)
path <- "http://www.omdbapi.com/?apikey=e380c2d6&type=movie&t=12+Angry+Men&y=1957&plot=full&r=json"
r <- GET(url = path)
To see if a HTTP request is successful, check the reply status code. For a list of these, please refer to https://www.restapitutorial.com/httpstatuscodes.html. A successful connection and transmission is represented by ‘200’.
Using httr, this can be checked as follows:
status_code(r)
## [1] 200
To view the content of the reply:
str(content(r))
## List of 25
## $ Title : chr "12 Angry Men"
## $ Year : chr "1957"
## $ Rated : chr "Not Rated"
## $ Released : chr "10 Apr 1957"
## $ Runtime : chr "96 min"
## $ Genre : chr "Crime, Drama"
## $ Director : chr "Sidney Lumet"
## $ Writer : chr "Reginald Rose (story), Reginald Rose (screenplay)"
## $ Actors : chr "Martin Balsam, John Fiedler, Lee J. Cobb, E.G. Marshall"
## $ Plot : chr "The defense and the prosecution have rested and the jury is filing into the jury room to decide if a young man "| __truncated__
## $ Language : chr "English"
## $ Country : chr "USA"
## $ Awards : chr "Nominated for 3 Oscars. Another 16 wins & 8 nominations."
## $ Poster : chr "https://m.media-amazon.com/images/M/MV5BMWU4N2FjNzYtNTVkNC00NzQ0LTg0MjAtYTJlMjFhNGUxZDFmXkEyXkFqcGdeQXVyNjc1NTY"| __truncated__
## $ Ratings :List of 3
## ..$ :List of 2
## .. ..$ Source: chr "Internet Movie Database"
## .. ..$ Value : chr "8.9/10"
## ..$ :List of 2
## .. ..$ Source: chr "Rotten Tomatoes"
## .. ..$ Value : chr "100%"
## ..$ :List of 2
## .. ..$ Source: chr "Metacritic"
## .. ..$ Value : chr "96/100"
## $ Metascore : chr "96"
## $ imdbRating: chr "8.9"
## $ imdbVotes : chr "583,651"
## $ imdbID : chr "tt0050083"
## $ Type : chr "movie"
## $ DVD : chr "06 Mar 2001"
## $ BoxOffice : chr "N/A"
## $ Production: chr "Criterion Collection"
## $ Website : chr "http://www.criterion.com/films/27871-12-angry-men"
## $ Response : chr "True"
As it can be seen, a record is retrieved, including three nested ‘Ratings’ record: Internet Movie Database, Rotten Tomatoes and Metacritic. The JSON reply is show diagrammatically below.
To convert file contents to text and arrange this data into an R dataframe, the following instructions may be used.
r <- content(r, as = "text", encoding = "UTF-8")
library(jsonlite)
df <- fromJSON(r,flatten = TRUE)
df
## $Title
## [1] "12 Angry Men"
##
## $Year
## [1] "1957"
##
## $Rated
## [1] "Not Rated"
##
## $Released
## [1] "10 Apr 1957"
##
## $Runtime
## [1] "96 min"
##
## $Genre
## [1] "Crime, Drama"
##
## $Director
## [1] "Sidney Lumet"
##
## $Writer
## [1] "Reginald Rose (story), Reginald Rose (screenplay)"
##
## $Actors
## [1] "Martin Balsam, John Fiedler, Lee J. Cobb, E.G. Marshall"
##
## $Plot
## [1] "The defense and the prosecution have rested and the jury is filing into the jury room to decide if a young man is guilty or innocent of murdering his father. What begins as an open-and-shut case of murder soon becomes a detective story that presents a succession of clues creating doubt, and a mini-drama of each of the jurors' prejudices and preconceptions about the trial, the accused, and each other. Based on the play, all of the action takes place on the stage of the jury room."
##
## $Language
## [1] "English"
##
## $Country
## [1] "USA"
##
## $Awards
## [1] "Nominated for 3 Oscars. Another 16 wins & 8 nominations."
##
## $Poster
## [1] "https://m.media-amazon.com/images/M/MV5BMWU4N2FjNzYtNTVkNC00NzQ0LTg0MjAtYTJlMjFhNGUxZDFmXkEyXkFqcGdeQXVyNjc1NTYyMjg@._V1_SX300.jpg"
##
## $Ratings
## Source Value
## 1 Internet Movie Database 8.9/10
## 2 Rotten Tomatoes 100%
## 3 Metacritic 96/100
##
## $Metascore
## [1] "96"
##
## $imdbRating
## [1] "8.9"
##
## $imdbVotes
## [1] "583,651"
##
## $imdbID
## [1] "tt0050083"
##
## $Type
## [1] "movie"
##
## $DVD
## [1] "06 Mar 2001"
##
## $BoxOffice
## [1] "N/A"
##
## $Production
## [1] "Criterion Collection"
##
## $Website
## [1] "http://www.criterion.com/films/27871-12-angry-men"
##
## $Response
## [1] "True"
Also, the user may only be interested in a number of columns, e.g. movie name, year, imdb rating, actors, and director. In this case, the dataframe may be filtered:
df_filtered <- df[c(1, 2, 7,9,18)]
df_filtered
## $Title
## [1] "12 Angry Men"
##
## $Year
## [1] "1957"
##
## $Director
## [1] "Sidney Lumet"
##
## $Actors
## [1] "Martin Balsam, John Fiedler, Lee J. Cobb, E.G. Marshall"
##
## $imdbVotes
## [1] "583,651"
The limits of the data structures that may be accessed are virtually endless. From the access of multi-dimensional data structures (such as JSON), to source code, to database schemas, APIs allow any application to become a node within an almost infinite pool of expanding data and functionality.
OMDb API - The Open Movie Database [WWW Document], n.d. URL http://www.omdbapi.com/ (accessed 3.31.19). REST - PUT vs POST - REST API Tutorial [WWW Document], n.d. URL https://restfulapi.net/rest-put-vs-post/ (accessed 3.31.19). REST Resource Identifier (URI) Naming - REST API Tutorial [WWW Document], n.d. URL https://restfulapi.net/resource-naming/ (accessed 3.31.19). toJSON, fromJSON function | R Documentation [WWW Document], n.d. URL https://www.rdocumentation.org/packages/jsonlite/versions/1.6/topics/toJSON%2C%20fromJSON (accessed 3.31.19). Understanding And Using REST APIs [WWW Document], 100AD. . Smashing Magazine. URL https://www.smashingmagazine.com/2018/01/understanding-using-rest-api/ (accessed 3.31.19).