Introduction

A data analyst have to rely on numerous sources to import the data in the programming setting for analysis. We can import the data by downloading from the websites, web scraping, local repositories, Database System. APIs are another very common way to access and acquire interesting data for analysis.

API stands for Application Programming Interface, a place where one computer program can interact with another program or itself: a client and a server.

Social media websites like Facebook, Twitter, Instagram builds API which are basically computers waiting for the requests of data. Once these computers receive the requests through their APIs, they process the request and validity of the request then they send it back to the computers that requested it.

In this short project, we will learn how to make request through APIs, receive the responses from those APIs, and retrieve our data from the contents of the responses from the APIs. The APIs provide us with nicely-formatted and curated data that can be extracted by few libraries in R.

Creating API request

Two most important libraries we will be importing for this API project are the httr and jasonlite libraries. httr library will help us getting through requesting for the data and recording responses from the API. The jsonlite library will be used to extract the data from the recorded responses in a json format through a couple of functional steps and convert the content of the responses to a meaningful data that we can further analyze upon.

library("httr")
library("jsonlite")

We will use the GET() function from the httr library to request a formal response from the Open Notify Api that serves the information of many projects by NASA. We can learn about the location of the International Space Station and the number of people in the ISS through the Open Notify Api. Let’s create a response first through the URL provided by the Open Notify Api documentation.

If we correctly get a response from the API then the status of the response will be 200. Other status will simply tell us that there’s something wrong in connecting to the API. In those cases, we must read through the documentation of the API carefully to check whether we have missed any vital information in the URL.

response<-GET("http://api.open-notify.org/astros.json") #don't use 'https' in the URL for the API
response
## Response [http://api.open-notify.org/astros.json]
##   Date: 2022-07-08 08:55
##   Status: 200
##   Content-Type: application/json
##   Size: 490 B

We have got a correct response as the status is 200.

Manipulating the JSON data

The actual data is stored in Unicode format in the content of the response but that is simply not readable. For that, we need to convert these raw data in to readable JSON format. JSON is formatted as a series of key-value pairs, where a particular key is associated or paired with a certain value.

First we will apply the rawToChar() function on the content of the response to convert the raw data to JSON format then we will sequentially apply the fromJSON() function to extract the actual data out of this character vector in JSON format.

ISS.data<-fromJSON(rawToChar(response$content))
names(ISS.data)
## [1] "number"  "people"  "message"

SO, the ISS.data is a list of three objects. We simply want to know the current number of people and their name in the ISS and other space crafts. So, we can only extract the people object from this list.

ISS.data$people
##                      name    craft
## 1           Oleg Artemyev      ISS
## 2           Denis Matveev      ISS
## 3         Sergey Korsakov      ISS
## 4          Kjell Lindgren      ISS
## 5               Bob Hines      ISS
## 6  Samantha Cristoforetti      ISS
## 7         Jessica Watkins      ISS
## 8               Cai Xuzhe Tiangong
## 9               Chen Dong Tiangong
## 10               Liu Yang Tiangong

We have been able to successfully extract the name of the people from the ISS and other crafts in the space as of right now.

Query Parameters for API

So far, we didn’t have to explicitly provide any query parameters for the API URL. Suppose, we want to know the time when the ISS will fly over our location. For that, we need to provide extra information or query i.e the location parameter: longitude and latitude of our current location.

Right now, I am living in South Paikpara, Dhaka city, Bangladesh. So, I am approximately providing the latitude and longitude of my current location.

new.response<-GET("http://api.open-notify.org/iss-pass.json", 
                  query = list(lat = 23.783300, lon = 90.364879))
new.response
## Response [http://api.open-notify.org/iss-pass.json?lat=23.7833&lon=90.364879]
##   Date: 2022-07-08 08:55
##   Status: 200
##   Content-Type: application/json
##   Size: 525 B
## {
##   "message": "success", 
##   "request": {
##     "altitude": 100, 
##     "datetime": 1657270549, 
##     "latitude": 23.7833, 
##     "longitude": 90.364879, 
##     "passes": 5
##   }, 
##   "response": [
## ...

We have retrieved the response successfully as the status is 200. Now we will extract the data from our newly collected response.

time.data<-fromJSON(rawToChar(new.response$content))
time.data$response
##   duration   risetime
## 1      464 1657289560
## 2      642 1657295254
## 3      342 1657301244
## 4      162 1657319186
## 5      624 1657324811

Finally, we get the time when ISS will fly by my location. But, we are not finished yet. It seems like the risetime has been given in UNIX time format which is the time passed from January 1st, 1970 till now and not so intuitive either. We can convert this time format to UTC time by using the lubridate library.

library(lubridate)
date.time<-time.data$response
UTC.time<-as_datetime(date.time$risetime)
date.time$UTC_time<-UTC.time
date.time
##   duration   risetime            UTC_time
## 1      464 1657289560 2022-07-08 14:12:40
## 2      642 1657295254 2022-07-08 15:47:34
## 3      342 1657301244 2022-07-08 17:27:24
## 4      162 1657319186 2022-07-08 22:26:26
## 5      624 1657324811 2022-07-09 00:00:11

API with Key/Token requirement

Now, we are going to request an API to send us time-series data on COVID for a single county in United States. First, we will take a look at the following URL and dissect it’s anatomy.
https://api.covidactnow.org/v2/county/06037.timeseries.json?apiKey=abcdefgh

The first part https://api.covidactnow.org/v2/ is the base URL. After that the county part tells the API that this data has been requested for a single county. Then the number 06037 is the unique identifier for that county we want the data to retrieve. The .timeseries.json tells the API to return only the time-series data in JSON format.

And finally the apikey part requires us to enter an authorization code or token to make that request. The authorization code can be achieved in many ways. Here, we have collected an api key from the documentation section of this particular API.

Let’s get a response from the API and retrieve the data step step following from the previous discussion.

response<-GET("https://api.covidactnow.org/v2/county/06037.timeseries.json?apiKey=c4a203f3888a47f2a186efcd0c4e59ed")
response
## Response [https://api.covidactnow.org/v2/county/06037.timeseries.json?apiKey=c4a203f3888a47f2a186efcd0c4e59ed]
##   Date: 2022-07-08 08:33
##   Status: 200
##   Content-Type: application/json
##   Size: 1.19 MB

We have received quite a large response with a status 200. That is good! Now we will extract the content of this response and save it as a data frame same as before.

covid.data.list<-fromJSON(rawToChar(response$content))

This covid.data.list contains 25 elements or objects. But we don’t need them all. All we need is the time series data.

covid.timeseries.data<-covid.data.list$actualsTimeseries
dim(covid.timeseries.data)
## [1] 894  19

Let’s take a look at fewer rows and columns of this data set.

head(covid.timeseries.data[ , c("cases", "deaths", "newCases", "newDeaths", "date")])
##   cases deaths newCases newDeaths       date
## 1    NA     NA       NA        NA 2020-01-05
## 2     1      0       NA        NA 2020-01-26
## 3     1      0        0         0 2020-01-27
## 4     1      0        0         0 2020-01-28
## 5     1      0        0         0 2020-01-29
## 6     1      0        0         0 2020-01-30

We have finally been able to retrieve our desired time series Covid data from the API request.