Getting Started with APIs in R

Interesting data sets are the fuel of a good data science project. And while websites like Kaggle offer free data sets to interested data scientists, APIs are another very common way to access and acquire interesting data.

Instead of having to download a data set, APIs allow programmers to request data directly from certain websites through what’s called an Application Programming Interface (hence, “API”). Many large websites like Reddit, Twitter and Facebook offer APIs so that data analysts and data scientists can access interesting data.In this article, we’re going to cover the basics of accessing an API using the R programming language.

Introduction to APIs with R

“API” is a general term for the place where one computer program interacts with another, or with itself. In this tutorial, we’ll specifically be working with web APIs, where two different computers — a client and server — will interact with each other to request and provide data, respectively.

Why is this valuable? Contrast the API approach to pure web scraping. When a programmer scrapes a web page, they receive the data in a messy chunk of HTML. While there are certainly libraries out there that make parsing HTML text easy, these are all cleaning steps that need to be taken before we even get our hands on the data we want!

Often, we can immediately use the data we get from an API, which saves us time and frustration.

Making Our First API Request

The first step in getting data from an API is making the actual request in R. This request will be sent to the computer server that has the API, and assuming everything goes smoothly, it will send back a response.

#install.packages(c("httr", "jsonlite"))

library(httr, warn.conflicts=F, quietly=T)
library(jsonlite, warn.conflicts=F, quietly=T)

There are several types of requests that one can make to an API server. These types of requests correspond to different actions that you want the server to make. For our example, we’ll be working with the Open Notify API, which opens up data on various NASA projects. Using the Open Notify API, we can learn about the location of the International Space Station and how many people are currently in space.

We’ll be working with the latter API first. We’ll start by making our request using the GET() function and specifying the API’s URL:

res = GET("http://api.open-notify.org/astros.json")
res
## Response [http://api.open-notify.org/astros.json]
##   Date: 2022-10-17 12:36
##   Status: 200
##   Content-Type: application/json
##   Size: 477 B

The output of the GET() function is a list, which contains all of the information that is returned by the API server. In other words, the res variable contains the response of the API server to our request

Investigating the res variable gives us a summary look at the resulting response. The first thing to notice is that it contains the URL that the GET request was sent to. We can also see the date and time that the request was made, as well as the size of the response.

The content type gives us an idea of what form the data takes. This particular response says that the data takes on a json format, which gives a hint about why we need the jsonlite library.

The status deserves some special attention. “Status” refers to the success or failure of the API request, and it comes in the form of a number. The number returned tells you whether or not the request was a success and can also detail some reasons why it might have failed.

Since we have a successful 200 status response, we know that we have the data on hand and we can start working with it.

#Lets get data from json
data = fromJSON(rawToChar(res$content))
names(data)
## [1] "message" "people"  "number"
#In our data variable, the data set that we’re interested in looking at is contained in the people data frame. We can use the $ operator to directly look at this data frame:
data$people
##                name    craft
## 1         Cai Xuzhe Tiangong
## 2         Chen Dong Tiangong
## 3          Liu Yang Tiangong
## 4  Sergey Prokopyev      ISS
## 5    Dmitry Petelin      ISS
## 6       Frank Rubio      ISS
## 7       Nicole Mann      ISS
## 8      Josh Cassada      ISS
## 9     Koichi Wakata      ISS
## 10      Anna Kikina      ISS

APIs and Query Parameters

What if we wanted to know when the ISS was going to pass over a given location on earth? Unlike the People in Space API, Open Notify’s ISS Pass Times API requires us to provide additional parameters before it can return the data we want.

Specifically, we’ll need to specify the latitude and longitude of the location we’re asking about as part of our GET() request. Once a latitude and longitude are specified, they are combined with the original URL as query parameters.

Let’s use this API to find out where is the ISS now and let’s extract the data from the response:

res = GET("http://api.open-notify.org/iss-now.json")
data = fromJSON(rawToChar(res$content))
data$iss_position
## $longitude
## [1] "80.2103"
## 
## $latitude
## [1] "-27.8738"

You’ve Got the Basics of APIs in R!

In this tutorial, we learned what an API is, and how they can be useful to data analysts and data scientists.

Using our R programming skills and the httr and jsonlite libraries, we took data from an API and converted it into a familiar format for analysis.

We’ve just scratched the surface with working with APIs here, but hopefully this introduction has given you the confidence to look into some more complex and powerful APIs, and helped unlock a whole new world of data out there for you to explore!

Source.