Let’s load in 2 packages, httr and jsonlite.
library(httr)
library(jsonlite)
In this case, the API door is always unlocked like a swing door. If you need a lollipop, just put it at the end of the URL path like http://some.api/lollipop.
Below is a call to find out the current location of the ISS space station using the Open Notify API
result <- GET("http://api.open-notify.org/", path = "/iss-now.json")
# The full url to call API
result$url
## [1] "http://api.open-notify.org/iss-now.json"
# The current location of ISS can be revealed with
httr::content(result)$iss_position
## $longitude
## [1] "88.9208"
##
## $latitude
## [1] "50.7347"
What if the API offers lollipops of different colours and flavours? You can use parameters to specify exactly what you like. Here is a way to ask the API from Bureau of Meteorology for a compressed file of daily solar exposure. Specifically, we only want the solar data:
library(httr)
library(jsonlite)
result <- GET("http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av"
, query = list(p_display_type="dailyZippedDataFile"
,p_stn_num="066148"
,p_c="-875217264"
,p_nccObsCode="193"
,p_startYear="2020"
)
, write_disk(tf <- tempfile(fileext = ".zip"))
)
result
## Response [http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=066148&p_c=-875217264&p_nccObsCode=193&p_startYear=2020]
## Date: 2020-08-16 06:45
## Status: 200
## Content-Type: application/zip
## Size: 52.7 kB
## <ON DISK> C:\Users\RATOLI~1\AppData\Local\Temp\RtmpszXLwK\file331c5117572c.zip
# Note that the exact file location is shown in the last line
You can unzip the file saved in the temp directory to get the CSV file.
This kind of calls is similar to Use Case #1 except that the parameters are stored in the body of the HTTP message. Technically the underlying HTTP request are constructed differently between the GET and POST methods (Difference Between Get and Post Method in Http 2020). You need to follow the instructions of the API to use the correct method. Refer to Use Case #3 for an example.
For proprietary or paid data, you will normally need to acquire an API key before you can use the API, just like you need the right key to open a door fitted with a lock. To demonstrate, I use the GeoDataSource API Key to unlock the information about the city which I specify with the lat and lng parameters.
geoKey <- Sys.getenv('GEODATASOURCE_KEY')
result <- GET("https://api.geodatasource.com/city"
, query = list(key=geoKey
,lat="17.733676800000069"
,lng="-64.751575799999955"
,format="json"
)
)
httr::content(result)
## $country
## [1] "VI"
##
## $region
## [1] "Saint Croix Island"
##
## $city
## [1] "Ruby"
##
## $latitude
## [1] "17.7369"
##
## $longitude
## [1] "-64.7549"
##
## $currency_code
## [1] "USD"
##
## $currency_name
## [1] "United States Dollar"
##
## $currency_symbol
## [1] "$"
##
## $sunrise
## [1] "06:01"
##
## $sunset
## [1] "18:44"
##
## $time_zone
## [1] "-04:00"
##
## $distance_km
## [1] "0.5023"
Data is gold! Its hidden value can be evidenced by the arduous ways of protecting them by some of the providers. For example, to extract your website data from Google Analytics, you need a bunch of keys! Their API is like a modern security door. To open it, not only you need to swipe your access card, you also need to scan your retina AND finger print.
In your R code, before calling the Reporting API, use the Client ID, Client Secret, and the View ID to generate a Token. Then use the Token to call the API.
Thanks to googleAnalyticsR package, the whole process is simplified. You don’t even see the actual API endpoint!
library(googleAnalyticsR)
## authenticate,
ga_auth()
## get your accounts
account_list <- ga_account_list()
#View(account_list)
## pick a profile with data to query
ga_id <- account_list[1,'viewId']
## create filters on metrics
mf <- met_filter("bounces", "GREATER_THAN", 0)
mf2 <- met_filter("sessions", "GREATER", 2)
## create filters on dimensions
df <- dim_filter("source","BEGINS_WITH","1",not = TRUE)
df2 <- dim_filter("source","BEGINS_WITH","a",not = TRUE)
## construct filter objects
fc2 <- filter_clause_ga4(list(df, df2), operator = "AND")
fc <- filter_clause_ga4(list(mf, mf2), operator = "AND")
## make API request
ga_data1 <- google_analytics(ga_id
,date_range = c("2020-07-30","2020-08-10")
,dimensions=c('source','medium')
,metrics = c('sessions','bounces')
,met_filters = fc
,dim_filters = fc2
,filtersExpression = "ga:source!=(direct)"
)
# view response
ga_data1
Many valuable datasets are available from APIs. Understanding the various ways of accessing these APIs can be crucial to a data scientist’s job. To say the least, it is more efficient to fetch the data directly into your code instead of manually extracting and storing them before one can start the analysis.
Developers, Google. 2020. Using Google Analytics with R. 1600 Amphitheatre Parkway Mountain View, CA 94043 USA: Google LLC. https://developers.google.com/analytics/solutions/r-google-analytics.
Difference Between Get and Post Method in Http. 2020. 4th Floor, Incor9 Building, Plot No: 283/A, Kavuri Hills, Madhapur, Hyderabad, Telangana, INDIA-500081: Tutorials Point. https://www.tutorialspoint.com/listtutorial/Difference-between-GET-and-POST-method-in-HTTP/3916#:~:text=method%20in%20HTTP.-,Both%20GET%20and%20POST%20method%20is%20used%20to%20transfer%20data,transferring%20data%20from%20client%20to.