Overview

This R vignette will demonstrate the extraction of data using an API provided by a web repository using the rsdmx package in R.

Data

The data we will be extracting is GDP in Australia for the period 2007 - 2017

Data Source

The Data will be sourced from the UNESCO website.

About the rsdmx Package

The rsdmx package “provides a set of classes and methods to read data and metadata documents exchanged through the Statistical Data and Metadata Exchange (SDMX) framework.”

First things first, load the rsdmx package in R

library(rsdmx)

If you would like to read more about the rsdmx package, including how to read different dataset documents, I reccomend reviewing the rdsmx quickstart guide:

vignette("quickstart", package = "rsdmx")

Finding the right data & the Statistical and Metadata Exchange (SDMX) Standard

The SDMX framework facilitates the standard exchange of statistical data through standard specifications covering format and web-service. Browsing through the OECD’s Data pages, data appears as indicators, indicator groups, databases and publications.

For our purposes, in order to use the readsSDMX function, we need data in the SDMX format, so we want to search the catalogue of database. The OECD has application programming interfaces (APIs) that provide access to datasets within the catalogue of databases in SDMX format.

You can access the API documentation (SDMX-JSON) here.

Find the correct Dataset (in this case, the Demographic and Socioeconomic Indicators database) and customise the variables and until you have the desired outputs.

Export the SDMX Data Url.

Create an object (url) in R, by using the URL generated by the Export to SDMX (XML) function on your filtered dataset.

UNESCOurl <- "https://api.uis.unesco.org/sdmx/data/UNESCO,DEM_ECO,1.0/GDP....AU?format=sdmx-generic-2.1&startPeriod=2007&endPeriod=2017&locale=en&subscription-key=f3943df8d3814d06b6cf6d51d50741b3"
UNESCOurl
## [1] "https://api.uis.unesco.org/sdmx/data/UNESCO,DEM_ECO,1.0/GDP....AU?format=sdmx-generic-2.1&startPeriod=2007&endPeriod=2017&locale=en&subscription-key=f3943df8d3814d06b6cf6d51d50741b3"

Next we will use the readsDMX function as this is the main function to read SDMX data like that we have just found on the UNESCO website. Let’s first create an object:

UNESCOdata <- readSDMX(UNESCOurl)

You can print the object at this stage if you like, but be warned that it will return a lot of data in the console.

Querying the dataset with DSD

You can query a dataset in a seperate SDMX-XML document named Data Structure Definition (DSD). This is useful as it allows you to add all labels to your dataset.

To get the DSD, you need to append dsd=true to your readsDMX request.

Useful information on the query structure, and how you add this to your arguments in rsdmx can be found in the UNESCO API Portal User Guide.

UNESCOdata_dsd <- readSDMX(providerId = "UIS2", resource = "data", flowRef = "DEM_ECO", key = "GDP....AU", key.mode = "SDMX", start = 2007, end = 2017, dsd = TRUE)
## Error in requestHandler$data(requestParams): Requests to this service endpoint requires an API key

Note the error that was returned above. UNESCO requires user authentication / subscription key, this needs to be added via the providerKey argument as follows:

UNESCOdata_dsd <- readSDMX(providerId = "UIS2", resource = "data", flowRef = "DEM_ECO", key = "GDP....AU", key.mode = "SDMX", start = 2007, end = 2017, providerKey = "f3943df8d3814d06b6cf6d51d50741b3", dsd = TRUE)
## -> Fetching 'http://api.uis.unesco.org/sdmx/data/DEM_ECO/GDP....AU/?startPeriod=2007&endPeriod=2017&subscription-key=f3943df8d3814d06b6cf6d51d50741b3'
## -> Attempt to fetch DSD ref from dataflow description
## -> Fetching 'http://api.uis.unesco.org/sdmx/dataflow/all/DEM_ECO/latest/?&subscription-key=f3943df8d3814d06b6cf6d51d50741b3'
## -> Fetching 'http://api.uis.unesco.org/sdmx/datastructure/all/DEM_ECO/latest/?references=children&subscription-key=f3943df8d3814d06b6cf6d51d50741b3'
## -> DSD fetched and associated to dataset!

Thta’s better!

Now that the DSD has been fetched and associated with our dataset, we can call the data frame & assign to an object.

ausgdp <- as.data.frame(UNESCOdata_dsd)
ausgdp

To get more information about our data you can use str to display the structure of your object (wealth).

str(ausgdp)
## 'data.frame':    40 obs. of  11 variables:
##  $ STAT_UNIT   : Factor w/ 1 level "GDP": 1 1 1 1 1 1 1 1 1 1 ...
##  $ UNIT_MEASURE: Factor w/ 4 levels "USD_CUR","LCU_CONST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ SEX         : Factor w/ 1 level "_Z": 1 1 1 1 1 1 1 1 1 1 ...
##  $ AGE         : Factor w/ 1 level "_Z": 1 1 1 1 1 1 1 1 1 1 ...
##  $ REF_AREA    : Factor w/ 1 level "AU": 1 1 1 1 1 1 1 1 1 1 ...
##  $ TIME_PERIOD : chr  "2007" "2008" "2009" "2010" ...
##  $ OBS_VALUE   : chr  "853764622752.60999" "1055334825425.25000" "927168310999.85303" "1142876772659.21020" ...
##  $ OBS_STATUS  : chr  "A" "A" "A" "A" ...
##  $ UNIT_MULT   : chr  "0" "0" "0" "0" ...
##  $ DECIMALS    : chr  "5" "5" "5" "5" ...
##  $ FREQ        : chr  "A" "A" "A" "A" ...

From here you can start cleaning your data, conducting exploratory analysis, graphing results and comparing to other datasets.