The DataBiosphere project includes a vision
schema
of which AnVIL/Terra forms a part.
The terra-notebook-utils python modules is described as a “Python API and CLI providing utilities for working with DRS objects, VCF files, and the Terra notebook environment.”
This R package aims to provide a regulated interface between R and terra-notebook-utils for use in AnVIL.
By “regulated” we mean that the entire python ecosystem used
to work with terra-notebook-utils is defined in a virtual environment.
We make some exceptions for the sake of demonstration, but, for example,
the drs_access command uses a very particular interface between
R and python, using the Bioconductor basilisk package.
As of 10/2022, BiocTNU exists in a github repository. To install and use properly with R in AnVIL
PIP_USER=falseBiocManager::install("vjcitn/BiocTNU")library(BiocTNU); example(drs_access) produces a signed URLOnce installation has succeeded, we use basilisk-mediated commands defined in the BiocTNU package to probe or use terra-notebook-utils. We can get the names of all modules available after importing terra-notebook-utils.
library(BiocTNU)## Loading required package: tools
## Loading required package: reticulate
## Loading required package: basilisk
tnu_top()## + '/home/rstudio/.cache/R/basilisk/1.8.1/0/bin/conda' 'create' '--yes' '--prefix' '/home/rstudio/.cache/R/basilisk/1.8.1/BiocTNU/0.0.7/bsklenv' 'python=3.7.7' '--quiet' '-c' 'conda-forge'
## + '/home/rstudio/.cache/R/basilisk/1.8.1/0/bin/conda' 'install' '--yes' '--prefix' '/home/rstudio/.cache/R/basilisk/1.8.1/BiocTNU/0.0.7/bsklenv' 'python=3.7.7'
## + '/home/rstudio/.cache/R/basilisk/1.8.1/0/bin/conda' 'install' '--yes' '--prefix' '/home/rstudio/.cache/R/basilisk/1.8.1/BiocTNU/0.0.7/bsklenv' '-c' 'conda-forge' 'python=3.7.7' 'pandas=1.3.5'
## [1] "IO_CONCURRENCY" "MARTHA_URL"
## [3] "MARTHA_URL_VERSION" "os"
## [5] "TERRA_DEPLOYMENT_ENV" "WORKSPACE_BUCKET"
## [7] "WORKSPACE_GOOGLE_PROJECT" "WORKSPACE_NAME"
## [9] "WORKSPACE_NAMESPACE"
We can also retrieve the help content for the python modules subordinate to terra-notebook-utils.
cat(tnu_help())## Help on package terra_notebook_utils:
##
## NAME
## terra_notebook_utils
##
## PACKAGE CONTENTS
## blobstore (package)
## cli (package)
## costs
## drs
## gs
## http
## logger
## profile
## table
## tar_gz
## utils
## vcf
## version
## workflows
## workspace
## xprofile
##
## DATA
## IO_CONCURRENCY = 3
## MARTHA_URL = 'https://us-central1-broad-dsde-prod.cloudfunctions.net/m...
## MARTHA_URL_VERSION = 'martha_v3'
## TERRA_DEPLOYMENT_ENV = 'prod'
## WORKSPACE_BUCKET = 'fc-48f42333-d659-4762-845e-5dbe7e00ef1b'
## WORKSPACE_GOOGLE_PROJECT = 'terra-91e8a8e4'
## WORKSPACE_NAME = 'Bioconductor-Package-BiocTNU'
## WORKSPACE_NAMESPACE = 'landmarkanvil2'
##
## FILE
## /home/rstudio/.cache/R/basilisk/1.8.1/BiocTNU/0.0.7/bsklenv/lib/python3.7/site-packages/terra_notebook_utils/__init__.py
The default argument to drs_access is the google storage location of a CRAI file.
substr(drs_access(), 1, 80)## [1] "https://nih-nhlbi-biodata-catalyst-1000-genomes.storage.googleapis.com/CCDG_1360"
More features related to DRS become available when basilisk interfaces are added to this package.
cat(tnu_drs_help())## Help on module terra_notebook_utils.drs in terra_notebook_utils:
##
## NAME
## terra_notebook_utils.drs - Utilities for working with DRS objects.
##
## CLASSES
## builtins.Exception(builtins.BaseException)
## DRSResolutionError
## builtins.tuple(builtins.object)
## DRSInfo
## terra_notebook_utils.blobstore.copy_client.CopyClient(builtins.object)
## DRSCopyClient
##
## class DRSCopyClient(terra_notebook_utils.blobstore.copy_client.CopyClient)
## | DRSCopyClient(concurrency: int = 4, raise_on_error: bool = False, indicator_type: Union[terra_notebook_utils.blobstore.progress.Indicator, NoneType] = None)
## |
## | Method resolution order:
## | DRSCopyClient
## | terra_notebook_utils.blobstore.copy_client.CopyClient
## | builtins.object
## |
## | Methods defined here:
## |
## | copy(self, drs_uri: str, dst: str)
## |
## | ----------------------------------------------------------------------
## | Data and other attributes defined here:
## |
## | __annotations__ = {'workspace': typing.Union[str, NoneType], 'workspac...
## |
## | workspace = None
## |
## | workspace_namespace = None
## |
## | ----------------------------------------------------------------------
## | Methods inherited from terra_notebook_utils.blobstore.copy_client.CopyClient:
## |
## | __enter__(self)
## |
## | __exit__(self, *args, **kwargs)
## |
## | __init__(self, concurrency: int = 4, raise_on_error: bool = False, indicator_type: Union[terra_notebook_utils.blobstore.progress.Indicator, NoneType] = None)
## | If 'raise_on_error' is False, all copy operations will be attempted even if one or more operations error. If
## | 'raise_on_error' is True, the first error encountered will be raise and all scheduled operations will be
## | canceled.
## |
## | ----------------------------------------------------------------------
## | Data descriptors inherited from terra_notebook_utils.blobstore.copy_client.CopyClient:
## |
## | __dict__
## | dictionary for instance variables (if defined)
## |
## | __weakref__
## | list of weak references to the object (if defined)
## |
## | ----------------------------------------------------------------------
## | Data and other attributes inherited from terra_notebook_utils.blobstore.copy_client.CopyClient:
## |
## | multipart_threshold = 134217728
##
## class DRSInfo(builtins.tuple)
## | DRSInfo(credentials, access_url, bucket_name, key, name, size, updated, checksums)
## |
## | DRSInfo(credentials, access_url, bucket_name, key, name, size, updated, checksums)
## |
## | Method resolution order:
## | DRSInfo
## | builtins.tuple
## | builtins.object
## |
## | Methods defined here:
## |
## | __getnewargs__(self)
## | Return self as a plain tuple. Used by copy and pickle.
## |
## | __repr__(self)
## | Return a nicely formatted representation string
## |
## | _asdict(self)
## | Return a new OrderedDict which maps field names to their values.
## |
## | _replace(_self, **kwds)
## | Return a new DRSInfo object replacing specified fields with new values
## |
## | ----------------------------------------------------------------------
## | Class methods defined here:
## |
## | _make(iterable) from builtins.type
## | Make a new DRSInfo object from a sequence or iterable
## |
## | ----------------------------------------------------------------------
## | Static methods defined here:
## |
## | __new__(_cls, credentials, access_url, bucket_name, key, name, size, updated, checksums)
## | Create new instance of DRSInfo(credentials, access_url, bucket_name, key, name, size, updated, checksums)
## |
## | ----------------------------------------------------------------------
## | Data descriptors defined here:
## |
## | credentials
## | Alias for field number 0
## |
## | access_url
## | Alias for field number 1
## |
## | bucket_name
## | Alias for field number 2
## |
## | key
## | Alias for field number 3
## |
## | name
## | Alias for field number 4
## |
## | size
## | Alias for field number 5
## |
## | updated
## | Alias for field number 6
## |
## | checksums
## | Alias for field number 7
## |
## | ----------------------------------------------------------------------
## | Data and other attributes defined here:
## |
## | _field_defaults = {}
## |
## | _fields = ('credentials', 'access_url', 'bucket_name', 'key', 'name', ...
## |
## | _fields_defaults = {}
## |
## | ----------------------------------------------------------------------
## | Methods inherited from builtins.tuple:
## |
## | __add__(self, value, /)
## | Return self+value.
## |
## | __contains__(self, key, /)
## | Return key in self.
## |
## | __eq__(self, value, /)
## | Return self==value.
## |
## | __ge__(self, value, /)
## | Return self>=value.
## |
## | __getattribute__(self, name, /)
## | Return getattr(self, name).
## |
## | __getitem__(self, key, /)
## | Return self[key].
## |
## | __gt__(self, value, /)
## | Return self>value.
## |
## | __hash__(self, /)
## | Return hash(self).
## |
## | __iter__(self, /)
## | Implement iter(self).
## |
## | __le__(self, value, /)
## | Return self<=value.
## |
## | __len__(self, /)
## | Return len(self).
## |
## | __lt__(self, value, /)
## | Return self<value.
## |
## | __mul__(self, value, /)
## | Return self*value.
## |
## | __ne__(self, value, /)
## | Return self!=value.
## |
## | __rmul__(self, value, /)
## | Return value*self.
## |
## | count(self, value, /)
## | Return number of occurrences of value.
## |
## | index(self, value, start=0, stop=9223372036854775807, /)
## | Return first index of value.
## |
## | Raises ValueError if the value is not present.
##
## class DRSResolutionError(builtins.Exception)
## | Common base class for all non-exit exceptions.
## |
## | Method resolution order:
## | DRSResolutionError
## | builtins.Exception
## | builtins.BaseException
## | builtins.object
## |
## | Data descriptors defined here:
## |
## | __weakref__
## | list of weak references to the object (if defined)
## |
## | ----------------------------------------------------------------------
## | Methods inherited from builtins.Exception:
## |
## | __init__(self, /, *args, **kwargs)
## | Initialize self. See help(type(self)) for accurate signature.
## |
## | ----------------------------------------------------------------------
## | Static methods inherited from builtins.Exception:
## |
## | __new__(*args, **kwargs) from builtins.type
## | Create and return a new object. See help(type) for accurate signature.
## |
## | ----------------------------------------------------------------------
## | Methods inherited from builtins.BaseException:
## |
## | __delattr__(self, name, /)
## | Implement delattr(self, name).
## |
## | __getattribute__(self, name, /)
## | Return getattr(self, name).
## |
## | __reduce__(...)
## | Helper for pickle.
## |
## | __repr__(self, /)
## | Return repr(self).
## |
## | __setattr__(self, name, value, /)
## | Implement setattr(self, name, value).
## |
## | __setstate__(...)
## |
## | __str__(self, /)
## | Return str(self).
## |
## | with_traceback(...)
## | Exception.with_traceback(tb) --
## | set self.__traceback__ to tb and return self.
## |
## | ----------------------------------------------------------------------
## | Data descriptors inherited from builtins.BaseException:
## |
## | __cause__
## | exception cause
## |
## | __context__
## | exception context
## |
## | __dict__
## |
## | __suppress_context__
## |
## | __traceback__
## |
## | args
##
## FUNCTIONS
## access(drs_url: str, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2', billing_project: Union[str, NoneType] = 'terra-91e8a8e4') -> str
## Return a signed url for a drs:// URI, if available.
##
## blob_for_url(url: str, billing_project: Union[str, NoneType] = 'terra-91e8a8e4') -> terra_notebook_utils.blobstore.Blob
##
## copy(drs_uri: str, dst: str, indicator_type: terra_notebook_utils.blobstore.progress.Indicator = <Indicator.bar: <class 'getm.progress.ProgressBar'>>, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2')
## Copy a DRS object to either the local filesystem, or to a Google Storage location if `dst` starts with
## "gs://".
##
## copy_batch(drs_urls: Union[Iterable[str], NoneType] = None, dst_pfx: Union[str, NoneType] = None, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2', indicator_type: terra_notebook_utils.blobstore.progress.Indicator = <Indicator.log: <class 'getm.progress.ProgressLogger'>>, manifest: Union[List[Dict[str, str]], NoneType] = None)
##
## copy_batch_manifest(manifest: List[Dict[str, str]], indicator_type: terra_notebook_utils.blobstore.progress.Indicator = <Indicator.log: <class 'getm.progress.ProgressLogger'>>, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2')
##
## copy_batch_urls(drs_urls: Iterable[str], dst_pfx: str, indicator_type: terra_notebook_utils.blobstore.progress.Indicator = <Indicator.log: <class 'getm.progress.ProgressLogger'>>, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2')
##
## copy_to_bucket(drs_uri: str, dst_key: str = '', dst_bucket_name: Union[str, NoneType] = None, indicator_type: terra_notebook_utils.blobstore.progress.Indicator = <Indicator.bar: <class 'getm.progress.ProgressBar'>>, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2')
## Resolve `drs_url` and copy into user-specified bucket `dst_bucket`. If `dst_bucket` is None, copy into
## workspace bucket.
##
## enable_requester_pays(workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2')
##
## extract_tar_gz(drs_url: str, dst: Union[str, NoneType] = None, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2', billing_project: Union[str, NoneType] = 'terra-91e8a8e4')
## Extract a `.tar.gz` archive resolved by a DRS url. 'dst' may be either a local filepath or a 'gs://' url.
## Default extraction is to the bucket for 'workspace'.
##
## get_drs(drs_url: str, fields: List[str]) -> requests.models.Response
## Request DRS information from martha.
##
## get_drs_blob(drs_url_or_info: Union[str, terra_notebook_utils.drs.DRSInfo], billing_project: Union[str, NoneType] = 'terra-91e8a8e4') -> Union[terra_notebook_utils.blobstore.gs.GSBlob, terra_notebook_utils.blobstore.url.URLBlob]
##
## get_drs_info(drs_url: str, access_url: bool = False) -> terra_notebook_utils.drs.DRSInfo
## Attempt to resolve gs:// url and credentials for a DRS object.
##
## head(drs_url: str, num_bytes: int = 1, workspace_name: Union[str, NoneType] = 'Bioconductor-Package-BiocTNU', workspace_namespace: Union[str, NoneType] = 'landmarkanvil2', billing_project: Union[str, NoneType] = 'terra-91e8a8e4')
## Head a DRS object by byte.
##
## info(drs_url: str) -> dict
## Return a curated subset of data from `get_drs`.
##
## DATA
## Dict = typing.Dict
## Iterable = typing.Iterable
## List = typing.List
## MARTHA_URL = 'https://us-central1-broad-dsde-prod.cloudfunctions.net/m...
## Optional = typing.Optional
## TERRA_DEPLOYMENT_ENV = 'prod'
## Tuple = typing.Tuple
## Union = typing.Union
## WORKSPACE_BUCKET = 'fc-48f42333-d659-4762-845e-5dbe7e00ef1b'
## WORKSPACE_GOOGLE_PROJECT = 'terra-91e8a8e4'
## WORKSPACE_NAME = 'Bioconductor-Package-BiocTNU'
## WORKSPACE_NAMESPACE = 'landmarkanvil2'
## http = <requests.sessions.Session object>
## logger = <Logger terra_notebook_utils.logger (INFO)>
## manifest_schema = {'items': {'properties': {'drs_uri': {'type': 'strin...
##
## FILE
## /home/rstudio/.cache/R/basilisk/1.8.1/BiocTNU/0.0.7/bsklenv/lib/python3.7/site-packages/terra_notebook_utils/drs.py