Document created by Michael Rahija, Research Officer

Global Strategy to Improve Agricultural and Rural Statistics (www.gsars.org)

Before getting started

If you haven’t already read the support instructions, please do so before proceeding. They’re available here:

https://surveysolutions.desk.com/customer/en/portal/articles/2574862-api-for-data-export?b_id=12728

The first thing you’ll need to do is create an R API user in Administrator. Do this by going to Teams and Roles tab in the Administrator, and choosing API Users.

Next you will define a user name and password for an API user.

Creating the query to request the server to build

To build the query, you’ll need 4 things:

  1. Headquarters Address
  2. Export type (tabular, spss, stata, binary, paradata)
  3. Questionnaire identity and version number
  4. Action (start, detail, cancel, “”) Note that “” means nothing and is used to download

The only tricky thing is finding the questionnaire identity. This can be done by going to the Survey Setup tab, and create a new case for the questionnaire corresponding to the desired data export. In the figure below, you can find an image that shows the questionnaire ID in the URL underlined in red.

You can also review completed cases in the Interviews tab and play around with the URLs to find the questionnaire ID.

Define the components of the query and save as objects. Then use the sprintf() function to put the query together. Notice that before we can download the dataset, we must request it to be built on the server. In this regard, the action is “start”.

headquarters_address <- "https://globalstrategy.mysurvey.solutions"
export_type <- "tabular"
questionnaire_identity <- "9ef1f7c9-d8e6-464f-950d-e35da96aa7d0$1"
action = "start"

query <- sprintf("%s/api/v1/export/%s/%s/%s", 
                       headquarters_address, export_type, questionnaire_identity, action)

query
## [1] "https://globalstrategy.mysurvey.solutions/api/v1/export/tabular/9ef1f7c9-d8e6-464f-950d-e35da96aa7d0$1/start"

The next thing we need to do is request that the server builds the file. We can do that using the POST function which is available in the httr package.

library(httr)
data <- POST(query, authenticate("michael_rahija", "WorldBank2016")) #Note to reader, pw changed

Now we can look at the structure of the object which will provide general information about the operation we requested on the server.

Notably, the status code 200 indicates that the operation was successful. To view the meaning of other status codes see: https://en.wikipedia.org/wiki/List_of_HTTP_status_codes

str(data)
## List of 10
##  $ url        : chr "https://globalstrategy.mysurvey.solutions/api/v1/export/tabular/9ef1f7c9-d8e6-464f-950d-e35da96aa7d0$1/start"
##  $ status_code: int 200
##  $ headers    :List of 5
##   ..$ cache-control : chr "no-cache"
##   ..$ pragma        : chr "no-cache"
##   ..$ expires       : chr "-1"
##   ..$ date          : chr "Thu, 05 Jan 2017 10:44:49 GMT"
##   ..$ content-length: chr "0"
##   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ all_headers:List of 1
##   ..$ :List of 3
##   .. ..$ status : int 200
##   .. ..$ version: chr "HTTP/1.1"
##   .. ..$ headers:List of 5
##   .. .. ..$ cache-control : chr "no-cache"
##   .. .. ..$ pragma        : chr "no-cache"
##   .. .. ..$ expires       : chr "-1"
##   .. .. ..$ date          : chr "Thu, 05 Jan 2017 10:44:49 GMT"
##   .. .. ..$ content-length: chr "0"
##   .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ cookies    :'data.frame': 0 obs. of  7 variables:
##   ..$ domain    : logi(0) 
##   ..$ flag      : logi(0) 
##   ..$ path      : logi(0) 
##   ..$ secure    : logi(0) 
##   ..$ expiration:Classes 'POSIXct', 'POSIXt'  num(0) 
##   ..$ name      : logi(0) 
##   ..$ value     : logi(0) 
##  $ content    : raw(0) 
##  $ date       : POSIXct[1:1], format: "2017-01-05 10:44:49"
##  $ times      : Named num [1:6] 0 0.031 0.14 0.608 0.718 0.718
##   ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##  $ request    :List of 7
##   ..$ method    : chr "POST"
##   ..$ url       : chr "https://globalstrategy.mysurvey.solutions/api/v1/export/tabular/9ef1f7c9-d8e6-464f-950d-e35da96aa7d0$1/start"
##   ..$ headers   : Named chr [1:2] "application/json, text/xml, application/xml, */*" ""
##   .. ..- attr(*, "names")= chr [1:2] "Accept" "Content-Type"
##   ..$ fields    : NULL
##   ..$ options   :List of 7
##   .. ..$ useragent    : chr "libcurl/7.50.3 r-curl/2.1 httr/1.2.1"
##   .. ..$ cainfo       : chr "C:/Users/Rahija/Documents/R/win-library/3.3/openssl/cacert.pem"
##   .. ..$ post         : logi TRUE
##   .. ..$ postfieldsize: int 0
##   .. ..$ postfields   : raw(0) 
##   .. ..$ httpauth     : num 1
##   .. ..$ userpwd      : chr "michael_rahija:WorldBank2016"
##   ..$ auth_token: NULL
##   ..$ output    : list()
##   .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
##   ..- attr(*, "class")= chr "request"
##  $ handle     :Class 'curl_handle' <externalptr> 
##  - attr(*, "class")= chr "response"

Creating the query to download the dataset

Now we can request to download the export file. Note that we need to change the action in the query to nothing, or a blank string, and use the GET function.

action = ""
query <- sprintf("%s/api/v1/export/%s/%s/%s", 
                       headquarters_address, export_type, questionnaire_identity, action)

data <- GET(query, authenticate("michael_rahija", "WorldBank2016"))

str(data)
## List of 10
##  $ url        : chr "https://globalstrategy.mysurvey.solutions/api/v1/export/tabular/9ef1f7c9-d8e6-464f-950d-e35da96aa7d0$1/"
##  $ status_code: int 200
##  $ headers    :List of 9
##   ..$ cache-control      : chr "no-cache"
##   ..$ pragma             : chr "no-cache"
##   ..$ content-length     : chr "12869"
##   ..$ content-type       : chr "application/zip"
##   ..$ expires            : chr "-1"
##   ..$ accept-ranges      : chr "bytes"
##   ..$ set-cookie         : chr "ASP.NET_SessionId=vxp2d3eh11yjzrhaeha4gvli; path=/; secure; HttpOnly"
##   ..$ content-disposition: chr "attachment; filename*=utf-8''AGRIS_Exper_Group1_FINAL_1_Tabular_All.zip"
##   ..$ date               : chr "Thu, 05 Jan 2017 10:44:49 GMT"
##   ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ all_headers:List of 1
##   ..$ :List of 3
##   .. ..$ status : int 200
##   .. ..$ version: chr "HTTP/1.1"
##   .. ..$ headers:List of 9
##   .. .. ..$ cache-control      : chr "no-cache"
##   .. .. ..$ pragma             : chr "no-cache"
##   .. .. ..$ content-length     : chr "12869"
##   .. .. ..$ content-type       : chr "application/zip"
##   .. .. ..$ expires            : chr "-1"
##   .. .. ..$ accept-ranges      : chr "bytes"
##   .. .. ..$ set-cookie         : chr "ASP.NET_SessionId=vxp2d3eh11yjzrhaeha4gvli; path=/; secure; HttpOnly"
##   .. .. ..$ content-disposition: chr "attachment; filename*=utf-8''AGRIS_Exper_Group1_FINAL_1_Tabular_All.zip"
##   .. .. ..$ date               : chr "Thu, 05 Jan 2017 10:44:49 GMT"
##   .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
##  $ cookies    :'data.frame': 1 obs. of  7 variables:
##   ..$ domain    : chr "#HttpOnly_globalstrategy.mysurvey.solutions"
##   ..$ flag      : logi FALSE
##   ..$ path      : chr "/"
##   ..$ secure    : logi TRUE
##   ..$ expiration: POSIXct[1:1], format: NA
##   ..$ name      : chr "ASP.NET_SessionId"
##   ..$ value     : chr "vxp2d3eh11yjzrhaeha4gvli"
##  $ content    : raw [1:12869] 50 4b 03 04 ...
##  $ date       : POSIXct[1:1], format: "2017-01-05 10:44:49"
##  $ times      : Named num [1:6] 0 0 0 0 0.109 0.218
##   ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
##  $ request    :List of 7
##   ..$ method    : chr "GET"
##   ..$ url       : chr "https://globalstrategy.mysurvey.solutions/api/v1/export/tabular/9ef1f7c9-d8e6-464f-950d-e35da96aa7d0$1/"
##   ..$ headers   : Named chr "application/json, text/xml, application/xml, */*"
##   .. ..- attr(*, "names")= chr "Accept"
##   ..$ fields    : NULL
##   ..$ options   :List of 5
##   .. ..$ useragent: chr "libcurl/7.50.3 r-curl/2.1 httr/1.2.1"
##   .. ..$ cainfo   : chr "C:/Users/Rahija/Documents/R/win-library/3.3/openssl/cacert.pem"
##   .. ..$ httpauth : num 1
##   .. ..$ userpwd  : chr "michael_rahija:WorldBank2016"
##   .. ..$ httpget  : logi TRUE
##   ..$ auth_token: NULL
##   ..$ output    : list()
##   .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
##   ..- attr(*, "class")= chr "request"
##  $ handle     :Class 'curl_handle' <externalptr> 
##  - attr(*, "class")= chr "response"

We see from the description that the download was successful and that the content is a zip file.

Write the zip file, unzip, and read in files to a list

One way to extract the raw data and read it back into R would be to write the object data to file, unzip it, and read it back into R. See the steps in the code chunk below.

#create temporary directory for storing and unzipping file
td <- tempdir()

#open connection to write contents
filecon <- file(file.path(td, "SuSo.zip"), "wb") 

#write data contents to the temporary file
writeBin(data$content, filecon) 
#close the connection
close(filecon)


#unzip zip file 
zipF <- paste0(td, "\\SuSo.zip")
unzip(zipF, exdir = td)

#read in data files and store as elements in a list
data.files <- list.files(td, 
                         pattern = ".tab")

data.files <- paste0(td,
                     "\\",
                     data.files)



x <- vector("list", length(data.files))
names(x) <- data.files

x <- lapply(data.files, function(x) read.delim(x, sep = "\t"))

#show objects in list
sapply(x, class)
## [1] "data.frame" "data.frame" "data.frame" "data.frame" "data.frame"
## [6] "data.frame" "data.frame" "data.frame" "data.frame"

As you can see, now all of the datasets in the zip file are saved as dataframes to a list.