• 1 Using OSN on TMA Data
  • 2 Data Location
  • 3 Data Representation
    • 3.1 ZarrRemote class
    • 3.2 endpoint and bucket inputs
    • 3.3 URL input
  • 4 Data Retrieval
    • 4.1 ZarrRemote import method
    • 4.2 Image from channels 4, 5, 10, 17
  • 5 Session Info

1 Using OSN on TMA Data

The Open Storage Network has allowed the Bioconductor project expanded offerings to cloud data. We demonstrate the usage of the RemoteZarr class to access Tissue MicroArray (TMA) data, one ‘core’ at a time.

library(ZarrExperiment)

2 Data Location

https://mghp.osn.xsede.org/bir190004-bucket01/index.html#TMA11/zarr/

There are about 123 cores within the OSN storage location above. Cores are numerically labeled e.g., 1.zarr corresponds to core number one.

3 Data Representation

3.1 ZarrRemote class

showClass("ZarrRemote")
## Class "ZarrRemote" [package "ZarrExperiment"]
## 
## Slots:
##                                                                               
## Name:                 endpoint                  bucket                resource
## Class:               character               character character_OR_connection
## 
## Extends: 
## Class "ZarrArchive", directly
## Class "BiocFile", by class "ZarrArchive", distance 2

3.2 endpoint and bucket inputs

The endpoint corresponds to the OSN website and the bucket input corresponds to the bucket name and subfolders where the .zarr archives can be located.

zr <- ZarrRemote(
    endpoint = "https://mghp.osn.xsede.org/",
    bucket = "bir190004-bucket01/TMA11/zarr/"
)
zr
## class: ZarrRemote
## endpoint: https://mghp.osn.xsede.org/
## bucket: bir190004-bucket01/TMA11/zarr/
##  ├── 1.zarr 
##  ├── 10.zarr 
##  ├── ... 
##  ├── 98.zarr 
##  └── 99.zarr

3.3 URL input

A URL may also be used as the resource input. Both the endpoint and the bucket location will be deduced when possible.

zr <- ZarrRemote(
    resource = "https://mghp.osn.xsede.org/bir190004-bucket01/TMA11/zarr/"
)
zr
## class: ZarrRemote
## endpoint: https://mghp.osn.xsede.org/
## bucket: bir190004-bucket01/TMA11/zarr/
##  ├── 1.zarr 
##  ├── 10.zarr 
##  ├── ... 
##  ├── 98.zarr 
##  └── 99.zarr

4 Data Retrieval

Currently, data retrieval is possible on a single directory via the import method on the ZarrRemote class.

4.1 ZarrRemote import method

The import method makes internal use of the s3fs python module:

getMethod(import, "ZarrRemote")
## Method Definition:
## 
## function (con, format, text, ...) 
## {
##     dots <- list(...)
##     file <- dots[["filename"]]
##     if (is.null(file) || !isScalarCharacter(file)) 
##         stop("Provide a 'filename' input found in the bucket")
##     fs <- .s3fs()$S3FileSystem(anon = TRUE, key = "dummy", secret = "dummy", 
##         client_kwargs = reticulate::dict(endpoint_url = con@endpoint))
##     files <- fs$ls(con@bucket)
##     if (!file %in% basename(files)) 
##         stop("'filename': ", file, " not found")
##     mapper <- fs$get_mapper(paste0(con@bucket, file))
##     .zarr()$load(mapper)
## }
## <bytecode: 0x559bae3b3f98>
## <environment: namespace:ZarrExperiment>
## 
## Signatures:
##         con          format text 
## target  "ZarrRemote" "ANY"  "ANY"
## defined "ZarrRemote" "ANY"  "ANY"

This operation takes a few minutes to complete and produces approximately a 2GB array.

c99 <- import(zr, filename = "99.zarr")

4.2 Image from channels 4, 5, 10, 17

par(mfrow = c(2, 2))
for (channel in c(4, 5, 10, 17))
    image(
        c99[channel, , ],
        main = paste0("Channel No. ", as.character(channel)),
        useRaster = TRUE
    )

5 Session Info

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ZarrExperiment_0.0.11 BiocStyle_2.25.0     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.9                  highr_0.9                  
##  [3] XVector_0.37.0              bslib_0.4.0                
##  [5] compiler_4.2.1              BiocManager_1.30.18        
##  [7] jquerylib_0.1.4             GenomeInfoDb_1.33.5        
##  [9] zlibbioc_1.43.0             MatrixGenerics_1.9.1       
## [11] bitops_1.0-7                tools_4.2.1                
## [13] SingleCellExperiment_1.19.0 digest_0.6.29              
## [15] lattice_0.20-45             jsonlite_1.8.0             
## [17] evaluate_0.16               png_0.1-7                  
## [19] rlang_1.0.4                 Matrix_1.4-1               
## [21] DelayedArray_0.23.1         cli_3.3.0                  
## [23] rstudioapi_0.13             yaml_2.3.5                 
## [25] xfun_0.32                   fastmap_1.1.0              
## [27] GenomeInfoDbData_1.2.8      stringr_1.4.0              
## [29] knitr_1.39                  sass_0.4.2                 
## [31] S4Vectors_0.35.1            BiocBaseUtils_0.99.9       
## [33] IRanges_2.31.0              triebeard_0.3.0            
## [35] grid_4.2.1                  stats4_4.2.1               
## [37] reticulate_1.25             Biobase_2.57.1             
## [39] R6_2.5.1                    rmarkdown_2.14             
## [41] bookdown_0.28               magrittr_2.0.3             
## [43] urltools_1.7.3              GenomicRanges_1.49.0       
## [45] htmltools_0.5.3             matrixStats_0.62.0         
## [47] BiocGenerics_0.43.1         SummarizedExperiment_1.27.1
## [49] stringi_1.7.8               RCurl_1.98-1.8             
## [51] cachem_1.0.6                BiocIO_1.7.1