This is a test how R handles mwl data in HDF5 format using rhdf5 library. Data model suggested by Christian and by Tommy is compared.

File structure

This is a structure of Christian’s HDF5:

str.chr <- h5ls(file="cml_test.h5")
print(str.chr)
##              group      name       otype   dclass  dim
## 0                /     cml_1   H5I_GROUP              
## 1           /cml_1 channel_1   H5I_GROUP              
## 2 /cml_1/channel_1      data H5I_DATASET COMPOUND 1000
## 3 /cml_1/channel_1  metadata H5I_DATASET COMPOUND    1
## 4           /cml_1 channel_2   H5I_GROUP              
## 5 /cml_1/channel_2      data H5I_DATASET COMPOUND 1000
## 6 /cml_1/channel_2  metadata H5I_DATASET COMPOUND    1
## 7           /cml_1  metadata H5I_DATASET COMPOUND    1

This is a structure of Christian’s HDF5:

str.tom <- h5ls(file="example.h5")
print(str.tom)
##                 group        name       otype  dclass  dim
## 0                   /       cml_1   H5I_GROUP             
## 1              /cml_1   channel_1   H5I_GROUP             
## 2    /cml_1/channel_1          Rx H5I_DATASET   FLOAT 1000
## 3    /cml_1/channel_1          Tx H5I_DATASET   FLOAT 1000
## 4    /cml_1/channel_1        time H5I_DATASET INTEGER 1000
## 5              /cml_1   channel_2   H5I_GROUP             
## 6    /cml_1/channel_2          Rx H5I_DATASET   FLOAT 1000
## 7    /cml_1/channel_2          Tx H5I_DATASET   FLOAT 1000
## 8    /cml_1/channel_2        time H5I_DATASET INTEGER 1000
## 9              /cml_1 geolocation   H5I_GROUP             
## 10 /cml_1/geolocation    altitude H5I_DATASET INTEGER    2
## 11 /cml_1/geolocation    latitude H5I_DATASET   FLOAT    2
## 12 /cml_1/geolocation   longitude H5I_DATASET   FLOAT    2
## 13 /cml_1/geolocation      siteID H5I_DATASET  STRING    2

Reading HDF5

This is a way how to read all data (main group). R reads HDF5 file into list object (an ordered collection of objects), where each subgroup is represented as another list belonging to the main list.

This is how object from Chrisitan’s HDF5 looks like:

h5.chr <- h5read(file="cml_test.h5", name="cml_1")
str(h5.chr)
## List of 3
##  $ channel_1:List of 2
##   ..$ data    :'data.frame': 1000 obs. of  3 variables:
##   .. ..$ time_UTC: num [1:1000(1d)] 1.43e+09 1.43e+09 1.43e+09 1.43e+09 1.43e+09 ...
##   .. ..$ RX      : num [1:1000(1d)] -45.6 -45.9 -44.7 -45.9 -45.3 ...
##   .. ..$ TX      : num [1:1000(1d)] 20 20 20 20 20 20 20 20 20 20 ...
##   ..$ metadata:'data.frame': 1 obs. of  4 variables:
##   .. ..$ RX_site: chr [1(1d)] "Site A"
##   .. ..$ TX_site: chr [1(1d)] "Site B"
##   .. ..$ name   : chr [1(1d)] "far_near"
##   .. ..$ short  : chr [1(1d)] "fn"
##  $ channel_2:List of 2
##   ..$ data    :'data.frame': 1000 obs. of  3 variables:
##   .. ..$ time_UTC: num [1:1000(1d)] 1.43e+09 1.43e+09 1.43e+09 1.43e+09 1.43e+09 ...
##   .. ..$ RX      : num [1:1000(1d)] -45.6 -45 -45.3 -45.9 -45.6 ...
##   .. ..$ TX      : num [1:1000(1d)] 20 20 20 20 20 20 20 20 20 20 ...
##   ..$ metadata:'data.frame': 1 obs. of  4 variables:
##   .. ..$ RX_site: chr [1(1d)] "Site B"
##   .. ..$ TX_site: chr [1(1d)] "Site A"
##   .. ..$ name   : chr [1(1d)] "near_far"
##   .. ..$ short  : chr [1(1d)] "nf"
##  $ metadata :'data.frame':   1 obs. of  1 variable:
##   ..$ ID: chr [1(1d)] "MY2345_MY4567"

This is how object from Tommys’s HDF5 looks like:

h5.tom <- h5read(file="example.h5", name="cml_1")
str(h5.tom)
## List of 3
##  $ channel_1  :List of 3
##   ..$ Rx  : num [1:1000(1d)] 0.1391 0.6608 -0.0294 -0.7938 -1.4106 ...
##   ..$ Tx  : num [1:1000(1d)] 0.515 -0.415 -1.062 1.461 -1.248 ...
##   ..$ time: int [1:1000(1d)] 0 1 2 3 4 5 6 7 8 9 ...
##  $ channel_2  :List of 3
##   ..$ Rx  : num [1:1000(1d)] 0.1391 0.6608 -0.0294 -0.7938 -1.4106 ...
##   ..$ Tx  : num [1:1000(1d)] 0.515 -0.415 -1.062 1.461 -1.248 ...
##   ..$ time: int [1:1000(1d)] 0 1 2 3 4 5 6 7 8 9 ...
##  $ geolocation:List of 4
##   ..$ altitude : int [1:2(1d)] 30 20
##   ..$ latitude : num [1:2(1d)] 52.5 52.5
##   ..$ longitude: num [1:2(1d)] 5.66 5.67
##   ..$ siteID   : chr [1:2(1d)] "siteA" "siteB"

Data model suggested by Tommy leads to less branched lists than the one form Christian. Accesing datasets when full HDF5 file is loaded is therefore bit more comfortable.

Get Rx from HDF5 of Tommy:

print(h5.tom$channel_1$Rx[1:5])
## [1]  0.13905687  0.66080112 -0.02937565 -0.79383851 -1.41060434
#or
print(h5.tom[[1]][[1]][1:5])
## [1]  0.13905687  0.66080112 -0.02937565 -0.79383851 -1.41060434

Get Rx from HDF5 of Christian:

print(h5.chr$channel_1$data$RX[1:5])
## [1] -45.59375 -45.90625 -44.68750 -45.90625 -45.31250
#or
print(h5.chr[[1]][[1]][[2]][1:5])
## [1] -45.59375 -45.90625 -44.68750 -45.90625 -45.31250

Reading subsets from HDF5

I do not know if it is possible to extracting subset of RX directly from Christian’s HDF5 file. I am able to only extract whole data.frame.

dat.chr <- h5read("cml_test.h5", "cml_1/channel_1/data/")#, index=list(1:5, 2:3))#, start=c(1,1), stride=c(2,2))
str(dat.chr)
## 'data.frame':    1000 obs. of  3 variables:
##  $ time_UTC: num [1:1000(1d)] 1.43e+09 1.43e+09 1.43e+09 1.43e+09 1.43e+09 ...
##  $ RX      : num [1:1000(1d)] -45.6 -45.9 -44.7 -45.9 -45.3 ...
##  $ TX      : num [1:1000(1d)] 20 20 20 20 20 20 20 20 20 20 ...

Tommy’s structure enables easy subsetting

dat.tom <- h5read("example.h5", "cml_1/channel_1/Rx/", index=list(1:5))#, index=list(1:5, 2:3))#, start=c(1,1), 
print(dat.tom)
## [1]  0.13905687  0.66080112 -0.02937565 -0.79383851 -1.41060434

It would be maybe possible to save Rx and Tx in one matrix and then subsetting could be even more powerfull. We should try on big data sets how subsetting works.