1 Background

An increasingly common use case involves a set of samples or patients who provide measurements on multiple data types, such as gene expression, genotype, miRNA abundance. It will frequently be the case that not all samples will contribute to all assays, so some sparsity in the set of samples \(\times\) assays is expected.

2 Basic demonstrative resources

Here are some very simple manipulations with TCGA ovarian cancer data. The data sizes are manageable enough that the loadHub function is used to deserialize all relevant data.

suppressPackageStartupMessages(library(biocMultiAssay))
#
# crude way of enumerating RDA files planted in extdata
#
ov = dir(system.file("extdata/tcga_ov",
   package="biocMultiAssay"), full=TRUE, pattern="\\.rda$")
drop = grep("pheno", ov)
if (length(drop)>0) {
  pdpath=ov[drop]
  ov=ov[-drop]
  }
#
# informal labels for constituents
#
tags = c("ov RNA-seq", "ov agilent", "ov mirna", "ov affy", "ov CNV gistic",
  "ov methy 450k")
#
# construct expt instances from ExpressionSets
#
exptlist = lapply(1:length(ov), function(x) new("expt",
     serType="RData", assayPath=ov[x], tag=tags[x]))
#
# populate an eHub, witha master phenotype data frame
#
ovhub = new("eHub", hub=exptlist, masterSampleData = get(load(pdpath)))
ovhub
## eHub with 6 experiments.  User-defined tags:
##   ov RNA-seq 
##   ov agilent 
##   ov mirna 
##   ov affy 
##   ov CNV gistic 
##   ov methy 450k 
## Sample level data is 580 x 29.

This is a lightweight representation of the scope of data identified to an eHub. We have as well a class that includes materializations of all the experimental data. Constructing it is currently slow.

lovhub = loadHub(ovhub)
lovhub
## loadedHub instance.
##               Features Samples                        feats.
## ov RNA-seq       24174     545       ACAP3, ACTRT2, AGRN ...
## ov agilent       14269     556          A1CF, A2BP1, A2M ...
## ov mirna         12989     578         A1CF, A2M, A4GALT ...
## ov affy          17814     574       15E1.2, 2'-PDE, 7A5 ...
## ov CNV gistic      799     554 ebv-miR-BART1-3p, ebv-miR ...
## ov methy 450k    20502     261             ?, A1BG, A1CF ...
object.size(lovhub)
## 19759320 bytes

This is a heavy representation but manageable at this level of data reduction.

We can determine the set of common identifiers.

allid = lapply(lovhub@elist, sampleNames)
commids = allid[[1]]
for (i in 2:length(allid))
 commids = intersect(commids, allid[[i]])
length(commids)
## [1] 248

We can now generate the loadedHub instance with only the common samples.

locomm = lovhub
locomm@elist = lapply(locomm@elist, function(x) x[,commids])
locomm
## loadedHub instance.
##               Features Samples                        feats.
## ov RNA-seq       24174     248       ACAP3, ACTRT2, AGRN ...
## ov agilent       14269     248          A1CF, A2BP1, A2M ...
## ov mirna         12989     248         A1CF, A2M, A4GALT ...
## ov affy          17814     248       15E1.2, 2'-PDE, 7A5 ...
## ov CNV gistic      799     248 ebv-miR-BART1-3p, ebv-miR ...
## ov methy 450k    20502     248             ?, A1BG, A1CF ...

Where to put these abstractions for both the light and heavy representations is a point of discussion.

3 createHub shortcut function for creating a loadedHub

ovlist <- lapply(ov, function(x) get(load(x)))
names(ovlist) <- tags

lovhub2 <- createHub(masterpheno=pData(ovlist[[2]]), objlist=ovlist, drop=TRUE)
## Dropping the following samples:
## ov RNA-seq :
## TCGA.36.2530
## 
##  
## ov mirna :
## TCGA.04.1341 TCGA.01.0630 TCGA.01.0631 TCGA.01.0633 TCGA.01.0636 TCGA.01.0637 TCGA.13.0730 TCGA.01.0628 TCGA.01.0639 TCGA.01.0642 TCGA.04.1357 TCGA.04.1360 TCGA.59.2352 TCGA.30.1861 TCGA.04.1519 TCGA.04.1353 TCGA.42.2593 TCGA.36.2539 TCGA.36.2530 TCGA.29.2429 TCGA.36.2533 TCGA.29.1699
## 
##  
## ov affy :
## TCGA.13.0730 TCGA.13.0760 TCGA.04.1341 TCGA.04.1353 TCGA.04.1357 TCGA.04.1360 TCGA.04.1519 TCGA.25.2390 TCGA.29.1699 TCGA.29.2429 TCGA.30.1861 TCGA.36.2530 TCGA.36.2533 TCGA.36.2539 TCGA.42.2593 TCGA.59.2352 TCGA.61.2610 TCGA.61.2611
## 
##  
## ov CNV gistic :
## TCGA.04.1341 TCGA.04.1357 TCGA.04.1360 TCGA.04.1519 TCGA.13.0730 TCGA.13.0760 TCGA.29.1699 TCGA.29.2429 TCGA.30.1861 TCGA.30.1869 TCGA.36.2530 TCGA.36.2533 TCGA.59.2352
## 
##  
## ov methy 450k :
## TCGA.04.1519 TCGA.29.1699 TCGA.13.0730 TCGA.59.2352 TCGA.04.1357
## 
## 
lovhub2
## loadedHub instance.
##               Features Samples                        feats.
## ov RNA-seq       24174     544       ACAP3, ACTRT2, AGRN ...
## ov agilent       14269     556          A1CF, A2BP1, A2M ...
## ov mirna         12989     556         A1CF, A2M, A4GALT ...
## ov affy          17814     556       15E1.2, 2'-PDE, 7A5 ...
## ov CNV gistic      799     541 ebv-miR-BART1-3p, ebv-miR ...
## ov methy 450k    20502     256             ?, A1BG, A1CF ...
object.size(lovhub2)
## 19602448 bytes