The resourcer package is an R implementation of the concept of Resources (see also the source repository).
Install the package if not already available in the R environment.
if (!require(resourcer)) {
install.packages(c("resourcer"))
}
When loading the resourcer package, the output informs about the available resource resolvers. These are responsible for building the appropriate connector to a resource object. This set of resolvers can be extended to new types of resources.
library(resourcer)
Note that the resourcer package makes an extensive use of the R6 class system.
The resource gives access to some data.
The file is publicly available on the Github website: preview of CAPostalCodes.csv.
CAPostalCodes.res <- resourcer::newResource(
url = "https://github.com/obiba/obiba-home/raw/master/opal/seed/fs/home/administrator/geo/CAPostalCodes.csv",
format = "csv"
)
CAPostalCodes.res
$name
[1] ""
$url
[1] "https://github.com/obiba/obiba-home/raw/master/opal/seed/fs/home/administrator/geo/CAPostalCodes.csv"
$identity
NULL
$secret
NULL
$format
[1] "csv"
attr(,"class")
[1] "resource"
Note that at this point, no connection, nor data extraction has been performed. It is a simple definition object.
The following code will resolve the resource and build the corresponding client connector.
CAPostalCodes.client <- resourcer::newResourceClient(CAPostalCodes.res)
class(CAPostalCodes.client)
[1] "TidyFileResourceClient" "FileResourceClient" "ResourceClient"
[4] "R6"
The resource was identified as a “tidy” data file, i.e. data that can be read using one of the reader developed by the tidyverse project. In the case of the csv data format, the readr package is used. This CSV reader tries to guess the data type of the columns.
At this point again, no connection with the remote server has been formally established.
This client function call gives access to the data as a data.frame:
CAPostalCodes.data <- CAPostalCodes.client$asDataFrame()
head(CAPostalCodes.data)
# A tibble: 6 x 5
entity_id Place Province ProvinceCode Coordinate
<chr> <chr> <chr> <chr> <chr>
1 T0E Western Alberta (Jasper) Alberta AB [-117.2308,53.4021]
2 T0A Eastern Alberta (St. Paul) Alberta AB [-111.7174,54.766]
3 T0B Wainwright Region (Tofield) Alberta AB [-111.5816,53.0727]
4 T0C Central Alberta (Stettler) Alberta AB [-112.8113,52.4922]
5 T0H Northwestern Alberta (High Level) Alberta AB [-116.9153,57.5403]
6 T0G North Central Alberta (Slave Lake) Alberta AB [-114.4529,55.6993]
It is also possible to coerce a resource object directly to a data.frame, without explicitly building a resource client object. It is as simple as:
CAPostalCodes.data <- as.data.frame(CAPostalCodes.res)
head(CAPostalCodes.data)
# A tibble: 6 x 5
entity_id Place Province ProvinceCode Coordinate
<chr> <chr> <chr> <chr> <chr>
1 T0E Western Alberta (Jasper) Alberta AB [-117.2308,53.4021]
2 T0A Eastern Alberta (St. Paul) Alberta AB [-111.7174,54.766]
3 T0B Wainwright Region (Tofield) Alberta AB [-111.5816,53.0727]
4 T0C Central Alberta (Stettler) Alberta AB [-112.8113,52.4922]
5 T0H Northwestern Alberta (High Level) Alberta AB [-116.9153,57.5403]
6 T0G North Central Alberta (Slave Lake) Alberta AB [-114.4529,55.6993]
The file is stored in a Opal server file system. Authentication and authorization apply and is performed with a Personal Access Token.
gps_participant.res <- resourcer::newResource(
url = "opal+https://opal-demo.obiba.org/ws/files/projects/RSRC/gps_participant.RData",
format = "data.frame",
secret = "EeTtQGIob6haio5bx6FUfVvIGkeZJfGq"
)
gps_participant.res
$name
[1] ""
$url
[1] "opal+https://opal-demo.obiba.org/ws/files/projects/RSRC/gps_participant.RData"
$identity
NULL
$secret
[1] "EeTtQGIob6haio5bx6FUfVvIGkeZJfGq"
$format
[1] "data.frame"
attr(,"class")
[1] "resource"
Make a resource client object.
gps_participant.client <- resourcer::newResourceClient(gps_participant.res)
class(gps_participant.client)
[1] "RDataFileResourceClient" "FileResourceClient" "ResourceClient"
[4] "R6"
The resource was identified as an R data file, containing a data.frame object.
When extracting the inner R object from the remote R data file, the resource client object will establish the connection with the Opal server and will authenticate with the provided Personal Access Token; then it will perform the file download and will read its content. The function getValue() returns the raw object (in this case a data.frame)
gps_participant.data <- gps_participant.client$getValue()
head(gps_participant.data)
id age sex inc fsmoke fedu BMI
1 1 43.24393 0 45730.56 former Higher 28.95163
2 2 44.17817 1 50169.00 current Secondary 23.22624
3 3 53.89189 0 55981.00 never Higher 26.60151
4 4 43.36741 1 67524.85 never Secondary 24.60853
5 5 46.82916 0 47397.88 never Higher 24.67301
6 6 57.70891 0 34686.13 former Advanced 24.91010
There are no limitations regarding the class of the object contained in the R data file. The only requirements are the ones of the base::load() function, i.e. the library in which the class of the object is defined must be available in the R environment.
The resource gives access to some remote computation services.
A server is accessible through a secure shell. The path part of the URL is the remote working directory. The available commands are defined by the exec query parameter.
ssh.res <- resourcer::newResource(
url = "ssh://plink-demo.obiba.org:2222/home/master/brge?exec=ls,pwd",
identity = "master",
secret = "master"
)
The resource connection client is resolved as follow:
ssh.client <- resourcer::newResourceClient(ssh.res)
class(ssh.client)
[1] "SshResourceClient" "CommandResourceClient" "ResourceClient" "R6"
This type of client allows to issue shell commands through a SSH connection.
ssh.client$getAllowedCommands()
[1] "ls" "pwd"
Trying to coerce to a data.frame raises an error, because there is no tabular data representation of such a resource:
tryCatch(ssh.client$asDataFrame(), error = function(e) e)
<simpleError in ssh.client$asDataFrame(): Operation not applicable>
To execute a remote shell command:
rval <- ssh.client$exec("ls", "-la")
Server fingerprint: b0:0c:f0:67:fd:a1:50:f3:93:7e:f2:40:74:e6:71:e7:70:23:e0:80
rval
$status
[1] 0
$output
[1] "total 92992"
[2] "dr-xr-xr-x 2 master master 4096 Apr 29 2020 ."
[3] "drwxr-xr-x 7 master master 4096 Jan 4 18:41 .."
[4] "-r--r--r-- 1 master master 57800003 Apr 29 2020 brge.bed"
[5] "-r--r--r-- 1 master master 2781294 Apr 29 2020 brge.bim"
[6] "-r--r--r-- 1 master master 45771 Apr 29 2020 brge.fam"
[7] "-r--r--r-- 1 master master 34442346 Apr 29 2020 brge.gds"
[8] "-r--r--r-- 1 master master 59802 Apr 29 2020 brge.phe"
[9] "-r--r--r-- 1 master master 72106 Apr 29 2020 brge.txt"
$error
character(0)
$command
[1] "cd /home/master/brge && ls -la"
attr(,"class")
[1] "resource.exec"
The resulting value contains different information:
status of the command (failure if not 0),output the character vector of the command output,error is the error message if command failed,command is the actual shell command that was executed.For example some bad shell command arguments would return a value with an error:
rval <- ssh.client$exec("ls", "-xyz")
rval
$status
[1] 2
$output
character(0)
$error
[1] "ls: invalid option -- 'y'" "Try 'ls --help' for more information."
$command
[1] "cd /home/master/brge && ls -xyz"
attr(,"class")
[1] "resource.exec"
Calling a shell command that is not allowed would raise an error.
tryCatch(ssh.client$exec("plink"), error = function(e) e)
<simpleError in private$makeCommand(command, params): Shell command not allowed: plink>
The resourcer package comes with some built-in resource types. These can be extended by programming your own resource resolver and client. For more information, read the sections about Resources in the book Orchestrating privacy-protected non-disclosive big data analyses of data from different resources with R and DataSHIELD.
The Opal data management server facilitates the usage of resources by:
See also Tutorial: Using the resources in Opal and DataSHIELD.