For a lot of data science projects, instead of finding the data sitting quietly in CSV files, you may find them in much fluid and dynamic forms. You cannot even ‘see’ them until you ‘ask’ for them since they are hiding behind Application Programming Interfaces (APIs). You need to know how to knock on the API doors in the correct manner, and sometimes you need to go through an initiation process before you knock. This vignette tries to capture some of the common ways of calling such APIs. Here are the different flavours:

  1. A lollipop or look at my body

  2. One key? Not enough!

Let’s load in 2 packages, httr and jsonlite.

library(httr)
library(jsonlite)

Use Case 1 - API calls with simple query string

In this case, the API door is always unlocked like a swing door. If you need a lollipop, just put it at the end of the URL path like http://some.api/lollipop.

Below is a call to find out the current location of the ISS space station using the Open Notify API

result <- GET("http://api.open-notify.org/", path = "/iss-now.json")

# The full url to call API 
result$url
## [1] "http://api.open-notify.org/iss-now.json"
# The current location of ISS can be revealed with 
httr::content(result)$iss_position
## $longitude
## [1] "88.9208"
## 
## $latitude
## [1] "50.7347"

Use Case 1.1 Multiple parameters and saving to disk

What if the API offers lollipops of different colours and flavours? You can use parameters to specify exactly what you like. Here is a way to ask the API from Bureau of Meteorology for a compressed file of daily solar exposure. Specifically, we only want the solar data:

  • as recorded in Peakhurst Golf Club (p_stn_num=“066148”)
    • for all years (p_startYear=“2020”)
library(httr)
library(jsonlite)

result <- GET("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av"
              , query = list(p_display_type="dailyZippedDataFile"
                             ,p_stn_num="066148"
                             ,p_c="-875217264"
                             ,p_nccObsCode="193"
                             ,p_startYear="2020"
                             )
              , write_disk(tf <- tempfile(fileext = ".zip"))
)

result
## Response [http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=066148&p_c=-875217264&p_nccObsCode=193&p_startYear=2020]
##   Date: 2020-08-16 06:45
##   Status: 200
##   Content-Type: application/zip
##   Size: 52.7 kB
## <ON DISK>  C:\Users\RATOLI~1\AppData\Local\Temp\RtmpszXLwK\file331c5117572c.zip
# Note that the exact file location is shown in the last line

You can unzip the file saved in the temp directory to get the CSV file.

Use Case 2 - API calls with parameters in body

This kind of calls is similar to Use Case #1 except that the parameters are stored in the body of the HTTP message. Technically the underlying HTTP request are constructed differently between the GET and POST methods (Difference Between Get and Post Method in Http 2020). You need to follow the instructions of the API to use the correct method. Refer to Use Case #3 for an example.

Use Case 3 - API Calls with API Key

For proprietary or paid data, you will normally need to acquire an API key before you can use the API, just like you need the right key to open a door fitted with a lock. To demonstrate, I use the GeoDataSource API Key to unlock the information about the city which I specify with the lat and lng parameters.

geoKey <- Sys.getenv('GEODATASOURCE_KEY')
result <- GET("https://api.geodatasource.com/city"
              , query = list(key=geoKey
                             ,lat="17.733676800000069"
                             ,lng="-64.751575799999955"
                             ,format="json"
                             )
              )
httr::content(result)
## $country
## [1] "VI"
## 
## $region
## [1] "Saint Croix Island"
## 
## $city
## [1] "Ruby"
## 
## $latitude
## [1] "17.7369"
## 
## $longitude
## [1] "-64.7549"
## 
## $currency_code
## [1] "USD"
## 
## $currency_name
## [1] "United States Dollar"
## 
## $currency_symbol
## [1] "$"
## 
## $sunrise
## [1] "06:01"
## 
## $sunset
## [1] "18:44"
## 
## $time_zone
## [1] "-04:00"
## 
## $distance_km
## [1] "0.5023"

Use Case 4 - API Calls to Google Analytics - saved by googleAnalyticsR

Data is gold! Its hidden value can be evidenced by the arduous ways of protecting them by some of the providers. For example, to extract your website data from Google Analytics, you need a bunch of keys! Their API is like a modern security door. To open it, not only you need to swipe your access card, you also need to scan your retina AND finger print.

Initiation
The initiation process before using any Google API (Developers 2020) looks like this:

  1. Get an API key
  2. For each combination of application type and target View, get a pair of Client ID and Client Secret
Calling API

In your R code, before calling the Reporting API, use the Client ID, Client Secret, and the View ID to generate a Token. Then use the Token to call the API.

Thanks to googleAnalyticsR package, the whole process is simplified. You don’t even see the actual API endpoint!

library(googleAnalyticsR)
## authenticate,
ga_auth()
## get your accounts
account_list <- ga_account_list()
#View(account_list)
## pick a profile with data to query
ga_id <- account_list[1,'viewId']
## create filters on metrics
mf <- met_filter("bounces", "GREATER_THAN", 0)
mf2 <- met_filter("sessions", "GREATER", 2)
## create filters on dimensions
df <- dim_filter("source","BEGINS_WITH","1",not = TRUE)
df2 <- dim_filter("source","BEGINS_WITH","a",not = TRUE)
## construct filter objects
fc2 <- filter_clause_ga4(list(df, df2), operator = "AND")
fc <- filter_clause_ga4(list(mf, mf2), operator = "AND")
## make API request
ga_data1 <- google_analytics(ga_id
                             ,date_range = c("2020-07-30","2020-08-10")
                             ,dimensions=c('source','medium')
                             ,metrics = c('sessions','bounces')
                             ,met_filters = fc
                             ,dim_filters = fc2
                             ,filtersExpression = "ga:source!=(direct)"
                             )
# view response
ga_data1

Conclusion

Many valuable datasets are available from APIs. Understanding the various ways of accessing these APIs can be crucial to a data scientist’s job. To say the least, it is more efficient to fetch the data directly into your code instead of manually extracting and storing them before one can start the analysis.

To Explore Further

  1. Difference between GET and POST method in HTTP
  2. Getting Started with APIs in R
  3. Accessing Web Date (JSON) in R using httr
  4. Routing & Input

References

Developers, Google. 2020. Using Google Analytics with R. 1600 Amphitheatre Parkway Mountain View, CA 94043 USA: Google LLC. https://developers.google.com/analytics/solutions/r-google-analytics.

Difference Between Get and Post Method in Http. 2020. 4th Floor, Incor9 Building, Plot No: 283/A, Kavuri Hills, Madhapur, Hyderabad, Telangana, INDIA-500081: Tutorials Point. https://www.tutorialspoint.com/listtutorial/Difference-between-GET-and-POST-method-in-HTTP/3916#:~:text=method%20in%20HTTP.-,Both%20GET%20and%20POST%20method%20is%20used%20to%20transfer%20data,transferring%20data%20from%20client%20to.