MultiAssayExperiment
Application Programming Interface (API)MultiAssayExperiment
classMultiAssayExperiment
object: a rich exampleRangedRaggedAssay
class[
GenomicRanges
A built html version of this vignette is available.
MultiAssayExperiment
Application Programming Interface (API)To see the API
wiki document on GitHub, type:
API()
A Shiny app that browses the package API
is also available via:
API(shiny=TRUE)
MultiAssayExperiment
classHere is an overview of the class and its constructors and extractors:
empty <- MultiAssayExperiment()
empty
## A MultiAssayExperiment object of 0 listed
## experiments with no user-defined names and respective classes.
## Containing an ExperimentList class object of length 0:
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
slotNames(empty)
## [1] "ExperimentList" "pData" "sampleMap" "metadata"
## [5] "drops"
The ExperimentList
slot and class is the container workhorse for the MultiAssayExperiment
class. It contains all the experimental data. It inherits from class S4Vectors::SimpleList
with one element/component per data type.
class(experiments(empty)) # ExperimentList
## [1] "ExperimentList"
## attr(,"package")
## [1] "MultiAssayExperiment"
The elements of the ExperimentList
can contain ID-based and range-based data. Requirements for all classes in the ExperimentList
are listed in the API.
The following base and Bioconductor classes are known to work as elements of the ExperimentList:
base::matrix
: the base class, can be used for ID-based datasets such as gene expression summarized per-gene, microRNA, metabolomics, or microbiome data.
Biobase::ExpressionSet
: A richer representation of ID-based datasets capable of storing additional assay-level metadata.
SummarizedExperiment::SummarizedExperiment
: Also provides a rich representation of ID-based matrix-like datasets.
SummarizedExperiment::RangedSummarizedExperiment
: For rectangular range-based datasets, one set of genomic ranges are assayed for multiple samples. It can be used for gene expression, methylation, or other data types that refer to genomic positions.
MultiAssayExperiment::RangedRaggedAssay
: inherits from GRangesList
, for ranged-based ragged arrays, meaning that a potentially different set of genomic ranges are assayed for each sample. A typical example would be segmented copy number, where segmentation of copy number alterations occurs and different genomic locations in each sample.
See API()
for details on requirements for using other data classes. In general, data classes meeting minimum requirements, including support for square bracket [
subsetting and dimnames()
will work by default.
The datasets contained in elements of the ExperimentList
must have:
The column names correspond to samples, and are used to match assay data to specimen metadata stored in pData
.
The row names can correspond to a variety of features in the data including but not limited to gene names, probe IDs, proteins, and named ranges. Note that the existence of “row” names does not mean the data must be rectangular or matrix-like. The RangedRaggedAssay class, for example, supports ragged arrays where the number and identifiers of measurements can be different for each sample.
Classes contained in the ExperimentList
must support the following list of methods:
[
: single square bracket subsetting, with a single comma. It is assumed that values before the comma subset rows, and values after the comma subset columns.colnames()
: corresponding to experimental samplesrownames()
: corresponding to features such as genes, proteins, etc.dim()
: returns a vector of the number of rows and number of columnsThe MultiAssayExperiment
keeps one set of “primary” metadata that describes the ‘biological unit’ which can refer to specimens, experimental subjects, patients, etc. In this vignette, we will refer to each experimental subject as a patient.
The pData
dataset should be of class DataFrame
but can accept a data.frame
class object that will be coerced.
In order to relate metadata of the biological unit, the row names of the pData
dataset must contain patient identifiers.
patient.data <- data.frame(sex=c("M", "F", "M", "F"),
age=38:41,
row.names=c("Jack", "Jill", "Bob", "Barbara"))
patient.data
## sex age
## Jack M 38
## Jill F 39
## Bob M 40
## Barbara F 41
DataFrame
For many typical purposes the DataFrame
and data.frame
behave equivalently; but the Dataframe
is more flexible as it allows any vector-like data type to be stored in its columns. The flexibility of the DataFrame
permits, for example, storing multiple dose-response values for a single cell line, even if the number of doses and responses is not consistent across all cell lines. Doses could be stored in one column of pData
as a SimpleList
, and responses in another column, also as a SimpleList
. Or, dose-response values could be stored in a single column of pData
as a two-column matrix for each cell line.
The sampleMap
is a DataFrame
that relates the “primary” data (pData
) to the experimental assays:
class(sampleMap(empty)) # DataFrame
## [1] "DataFrame"
## attr(,"package")
## [1] "S4Vectors"
The sampleMap
provides an unambiguous map from every experimental observation to one and only one row in pData
. It is, however, permissible for a row of pData
to be associated with multiple experimental observations or no observations at all. In other words, there is a “many-to-one” mapping from experimental observations to rows of pData
, and a “one-to-any-number” mapping from rows of pData
to experimental observations.
The sampleMap
has three columns, with the following column names:
assay provides the names of the different experiments / assays performed. These are user-defined, with the only requirement that the names of the ExperimentList
, where the experimental assays are stored, must be contained in this column.
primary provides the “primary” sample names. All values in this column must also be present in the rownames of pData(MultiAssayExperiment)
. In this example, allowable values in this column are “Jack”, “Jill”, “Barbara”, and “Bob”.
colname provides the sample names used by experimental datasets, which in practice are often different than the primary sample names. For each assay, all column names must be found in this column. Otherwise, those assays would be orphaned: it would be impossible to match them up to samples in the overall experiment. As mentioned above, duplicated values are allowed, to represent replicates with the same overall experiment-level annotation.
This design is motivated by the following situations:
sampleMap
isn’t providedIf each assay uses the same colnames (i.e., if the same sample identifiers are used for each experiment), a simple list of these datasets is sufficient for the MultiAssayExperiment
constructor function. It is not necessary for them to have the same rownames or colnames:
exprss1 <- matrix(rnorm(16), ncol = 4,
dimnames = list(sprintf("ENST00000%i", sample(288754:290000, 4)),
c("Jack", "Jill", "Bob", "Bobby")))
exprss2 <- matrix(rnorm(12), ncol = 3,
dimnames = list(sprintf("ENST00000%i", sample(288754:290000, 4)),
c("Jack", "Jane", "Bob")))
doubleExp <- list("methyl 2k" = exprss1, "methyl 3k" = exprss2)
simpleMultiAssay <- MultiAssayExperiment(experiments=doubleExp)
simpleMultiAssay
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] methyl 2k: matrix with 4 rows and 4 columns
## [2] methyl 3k: matrix with 4 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
In the above example, the user did not provide the pData
argument so the constructor function filled it with an empty DataFrame
:
pData(simpleMultiAssay)
## DataFrame with 5 rows and 0 columns
But the pData
can be provided. Here, note that any assay sample (column) that cannot be mapped to a corresponding row in the provided pData
gets dropped. This is part of ensuring internal validity of the MultiAssayExperiment
.
simpleMultiAssay2 <- MultiAssayExperiment(experiments=doubleExp,
pData=patient.data)
## Warning in .generateMap(pData, experiments): Data from rows:
## NA - Bobby
## NA - Jane
## dropped due to missing phenotype data
## harmonizing input:
## removing 1 pData rownames not in sampleMap 'primary'
simpleMultiAssay2
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] methyl 2k: matrix with 4 rows and 3 columns
## [2] methyl 3k: matrix with 4 rows and 2 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
pData(simpleMultiAssay2)
## DataFrame with 3 rows and 2 columns
## sex age
## <factor> <integer>
## Jack M 38
## Jill F 39
## Bob M 40
Can be of ANY class, for storing study-wide metadata, such as citation information. For an empty MultiAssayExperiment
object, it is NULL.
class(metadata(empty)) # NULL (class "ANY")
## [1] "NULL"
MultiAssayExperiment
object: a rich exampleIn this section we demonstrate all core supported data classes, using different sample ID conventions for each assay, with primary pData. The some supported classes such as, matrix
, ExpressionSet
, SummarizedExperiment
, RangedSummarizedExperiment
, and RangedRaggedAssay
.
We have three matrix-like datasets. First, let’s represent expression data as an ExpressionSet
:
library(Biobase)
(arraydat <- matrix(seq(101, 108), ncol=4,
dimnames=list(c("ENST00000294241", "ENST00000355076"),
c("array1", "array2", "array3", "array4"))))
## array1 array2 array3 array4
## ENST00000294241 101 103 105 107
## ENST00000355076 102 104 106 108
arraypdat <- as(data.frame(slope53=rnorm(4),
row.names=c("array1", "array2", "array3",
"array4")), "AnnotatedDataFrame")
exprdat <- ExpressionSet(assayData=arraydat, phenoData=arraypdat)
exprdat
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 2 features, 4 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: array1 array2 array3 array4
## varLabels: slope53
## varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation:
The following map matches pData
sample names to exprdata
sample names. Note that row orders aren’t initially matched up, and this is OK.
(exprmap <- data.frame(primary=rownames(patient.data)[c(1, 2, 4, 3)],
assay=c("array1", "array2", "array3", "array4"),
stringsAsFactors = FALSE))
## primary assay
## 1 Jack array1
## 2 Jill array2
## 3 Barbara array3
## 4 Bob array4
Now methylation data, which we will represent as a matrix
. It uses gene identifiers also, but measures a partially overlapping set of genes. Now, let’s store this as a simple matrix which can contains a replicate for one of the patients.
(methyldat <-
matrix(1:10, ncol=5,
dimnames=list(c("ENST00000355076", "ENST00000383706"),
c("methyl1", "methyl2", "methyl3",
"methyl4", "methyl5"))))
## methyl1 methyl2 methyl3 methyl4 methyl5
## ENST00000355076 1 3 5 7 9
## ENST00000383706 2 4 6 8 10
The following map matches pData
sample names to methyldat
sample names.
(methylmap <- data.frame(primary = c("Jack", "Jack", "Jill", "Barbara", "Bob"),
assay = c("methyl1", "methyl2", "methyl3", "methyl4", "methyl5"),
stringsAsFactors = FALSE))
## primary assay
## 1 Jack methyl1
## 2 Jack methyl2
## 3 Jill methyl3
## 4 Barbara methyl4
## 5 Bob methyl5
Now we have a microRNA platform, which has no common identifiers with the other datasets, and which we also represent as a matrix
. It is also missing data for “Jill”. We will use the same sample naming convention as we did for arrays.
(microdat <- matrix(201:212, ncol=3,
dimnames=list(c("hsa-miR-21", "hsa-miR-191",
"hsa-miR-148a", "hsa-miR148b"),
c("micro1", "micro2", "micro3"))))
## micro1 micro2 micro3
## hsa-miR-21 201 205 209
## hsa-miR-191 202 206 210
## hsa-miR-148a 203 207 211
## hsa-miR148b 204 208 212
And the following map matches pData sample names to microdat sample names.
(micromap <- data.frame(primary = c("Jack", "Barbara", "Bob"),
assay = c("micro1", "micro2", "micro3"),
stringsAsFactors = FALSE))
## primary assay
## 1 Jack micro1
## 2 Barbara micro2
## 3 Bob micro3
Let’s include a RangedRaggedAssay
, which is defined in this package and extends GRangesList
. This is intended for data such as segmented copy number, which provides genomic ranges that may be different for each sample. We start with a GRangesList
, which will later be converted automatically by the MultiAssayExperiment
constructor function.
suppressPackageStartupMessages(library(GenomicRanges))
## completely encompasses ENST00000355076
gr1 <-
GRanges(seqnames = "chr3", ranges = IRanges(58000000, 59502360),
strand = "+", score = 5L, GC = 0.45)
## first is within ENST0000035076
gr2 <-
GRanges(seqnames = c("chr3", "chr3"),
ranges = IRanges(c(58493000, 3), width=9000),
strand = c("+", "-"), score = 3:4, GC = c(0.3, 0.5))
gr3 <-
GRanges(seqnames = c("chr1", "chr2"),
ranges = IRanges(c(1, 4), c(3, 9)),
strand = c("-", "-"), score = c(6L, 2L), GC = c(0.4, 0.1))
grl <- GRangesList("gr1" = gr1, "gr2" = gr2, "gr3" = gr3)
names(grl) <- c("snparray1", "snparray2", "snparray3")
grl
## GRangesList object of length 3:
## $snparray1
## GRanges object with 1 range and 2 metadata columns:
## seqnames ranges strand | score GC
## <Rle> <IRanges> <Rle> | <integer> <numeric>
## [1] chr3 [58000000, 59502360] + | 5 0.45
##
## $snparray2
## GRanges object with 2 ranges and 2 metadata columns:
## seqnames ranges strand | score GC
## [1] chr3 [58493000, 58501999] + | 3 0.3
## [2] chr3 [ 3, 9002] - | 4 0.5
##
## $snparray3
## GRanges object with 2 ranges and 2 metadata columns:
## seqnames ranges strand | score GC
## [1] chr1 [1, 3] - | 6 0.4
## [2] chr2 [4, 9] - | 2 0.1
##
## -------
## seqinfo: 3 sequences from an unspecified genome; no seqlengths
The following data.frame
matches pData
samples to the GRangesList
:
(rangemap <- data.frame(primary = c("Jack", "Jill", "Jill"),
assay = c("snparray1", "snparray2", "snparray3"),
stringsAsFactors = FALSE))
## primary assay
## 1 Jack snparray1
## 2 Jill snparray2
## 3 Jill snparray3
Finally, we create a dataset of class RangedSummarizedExperiment
:
library(SummarizedExperiment)
## Loading required package: DelayedArray
##
## Attaching package: 'DelayedArray'
## The following object is masked from 'package:base':
##
## apply
nrows <- 5; ncols <- 4
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(2, nrows - 2)),
IRanges(floor(runif(nrows, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), nrows, TRUE),
feature_id=sprintf("ID\\%03d", 1:nrows))
names(rowRanges) <- letters[1:5]
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 2),
row.names= c("mysnparray1", "mysnparray2",
"mysnparray3", "mysnparray4"))
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
rowRanges=rowRanges, colData=colData)
And we map the pData
samples to the RangedSummarizedExperiment
:
(rangemap2 <-
data.frame(primary = c("Jack", "Jill", "Bob", "Barbara"),
assay = c("mysnparray1", "mysnparray2", "mysnparray3",
"mysnparray4"), stringsAsFactors = FALSE))
## primary assay
## 1 Jack mysnparray1
## 2 Jill mysnparray2
## 3 Bob mysnparray3
## 4 Barbara mysnparray4
sampleMap
creationThe MultiAssayExperiment
constructor function can create the sampleMap
automatically if a single naming convention is used, but in this example it cannot because we used platform-specific sample identifiers (e.g. mysnparray1, etc). So we must provide an ID map that matches the samples of each experiment back to the pData
, as a three-column data.frame
or DataFrame
with three columns named “assay”, primary“, and”colname“. Here we start with a list:
listmap <- list(exprmap, methylmap, micromap, rangemap, rangemap2)
names(listmap) <- c("Affy", "Methyl 450k", "Mirna", "CNV gistic", "CNV gistic2")
listmap
## $Affy
## primary assay
## 1 Jack array1
## 2 Jill array2
## 3 Barbara array3
## 4 Bob array4
##
## $`Methyl 450k`
## primary assay
## 1 Jack methyl1
## 2 Jack methyl2
## 3 Jill methyl3
## 4 Barbara methyl4
## 5 Bob methyl5
##
## $Mirna
## primary assay
## 1 Jack micro1
## 2 Barbara micro2
## 3 Bob micro3
##
## $`CNV gistic`
## primary assay
## 1 Jack snparray1
## 2 Jill snparray2
## 3 Jill snparray3
##
## $`CNV gistic2`
## primary assay
## 1 Jack mysnparray1
## 2 Jill mysnparray2
## 3 Bob mysnparray3
## 4 Barbara mysnparray4
and use the convenience function listToMap
to convert the list of data.frame
objects to a valid object for the sampleMap
:
dfmap <- listToMap(listmap)
dfmap
## DataFrame with 19 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 Affy Jack array1
## 2 Affy Jill array2
## 3 Affy Barbara array3
## 4 Affy Bob array4
## 5 Methyl 450k Jack methyl1
## ... ... ... ...
## 15 CNV gistic Jill snparray3
## 16 CNV gistic2 Jack mysnparray1
## 17 CNV gistic2 Jill mysnparray2
## 18 CNV gistic2 Bob mysnparray3
## 19 CNV gistic2 Barbara mysnparray4
Note, dfmap
can be reverted to a list with another provided function:
mapToList(dfmap, "assay")
list()
Create an named list of experiments for the MultiAssayExperiment
function. All of these names must be found within in the third column of dfmap
:
objlist <- list("Affy" = exprdat, "Methyl 450k" = methyldat,
"Mirna" = microdat, "CNV gistic" = grl, "CNV gistic2" = rse)
MultiAssayExperiment
class objectWe recommend using the MultiAssayExperiment
constructor function:
myMultiAssay <- MultiAssayExperiment(objlist, patient.data, dfmap)
myMultiAssay
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 4 columns
## [2] Methyl 450k: matrix with 2 rows and 5 columns
## [3] Mirna: matrix with 4 rows and 3 columns
## [4] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
The following extractor functions can be used to get extract data from the object:
experiments(myMultiAssay)
## ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 4 columns
## [2] Methyl 450k: matrix with 2 rows and 5 columns
## [3] Mirna: matrix with 4 rows and 3 columns
## [4] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
pData(myMultiAssay)
## DataFrame with 4 rows and 2 columns
## sex age
## <factor> <integer>
## Jack M 38
## Jill F 39
## Bob M 40
## Barbara F 41
sampleMap(myMultiAssay)
## DataFrame with 19 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 Affy Jack array1
## 2 Affy Jill array2
## 3 Affy Barbara array3
## 4 Affy Bob array4
## 5 Methyl 450k Jack methyl1
## ... ... ... ...
## 15 CNV gistic Jill snparray3
## 16 CNV gistic2 Jack mysnparray1
## 17 CNV gistic2 Jill mysnparray2
## 18 CNV gistic2 Bob mysnparray3
## 19 CNV gistic2 Barbara mysnparray4
metadata(myMultiAssay)
## NULL
Note that the ExperimentList
class extends the SimpleList
class to add some validity checks specific to MultiAssayExperiment
. It can be used like a list.
MultiAssayExperiment
objectThe PrepMultiAssay
function helps diagnose common problems when creating a MultiAssayExperiment
object. It provides error messages and/or warnings in instances where names (either colnames
or ExperimentList
element names) are inconsistent with those found in the sampleMap. Input arguments are the same as those in the MultiAssayExperiment
(i.e., ExperimentList
, pData
, sampleMap
). The resulting output of the PrepMultiAssay
function is a list of inputs including a “drops” element for names that were not able to be matched.
Instances where ExperimentList
is created without names will prompt an error from PrepMultiAssay
. Named ExperimentList
elements are essential for checks in MultiAssayExperiment
.
objlist3 <- objlist
(names(objlist3) <- NULL)
## NULL
try(PrepMultiAssay(objlist3, patient.data, dfmap)$ExperimentList)
Non-matching names may also be present in the ExperimentList
elements and the “assay” column of the sampleMap
. If names only differ by case and are identical and unique, names will be standardized to lower case and replaced.
names(objlist3) <- toupper(names(objlist))
names(objlist3)
## [1] "AFFY" "METHYL 450K" "MIRNA" "CNV GISTIC" "CNV GISTIC2"
unique(dfmap[, "assay"])
## [1] Affy Methyl 450k Mirna CNV gistic CNV gistic2
## Levels: Affy Methyl 450k Mirna CNV gistic CNV gistic2
PrepMultiAssay(objlist3, patient.data, dfmap)$ExperimentList
## Names in the ExperimentList do not match sampleMap assaynames
## standardizing will be attempted...
## - names set to lowercase
## ExperimentList class object of length 5:
## [1] affy: ExpressionSet with 2 rows and 4 columns
## [2] methyl 450k: matrix with 2 rows and 5 columns
## [3] mirna: matrix with 4 rows and 3 columns
## [4] cnv gistic: RangedRaggedAssay with 5 rows and 3 columns
## [5] cnv gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
When colnames
in the ExperimentList
cannot be matched back to the primary data (pData
), these will be dropped and added to the drops element.
exampleMap <- sampleMap(simpleMultiAssay2)
sapply(doubleExp, colnames)
## $`methyl 2k`
## [1] "Jack" "Jill" "Bob" "Bobby"
##
## $`methyl 3k`
## [1] "Jack" "Jane" "Bob"
exampleMap
## DataFrame with 5 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 methyl 2k Jack Jack
## 2 methyl 2k Jill Jill
## 3 methyl 2k Bob Bob
## 4 methyl 3k Jack Jack
## 5 methyl 3k Bob Bob
PrepMultiAssay(doubleExp, patient.data, exampleMap)$drops
## Not all colnames in the ExperimentList are found in the
## sampleMap, dropping samples from ExperimentList...
## $`methyl 2k`
## [1] "Bobby"
##
## $`methyl 3k`
## [1] "Jane"
## $`columns.methyl 2k`
## [1] "Bobby"
##
## $`columns.methyl 3k`
## [1] "Jane"
A similar operation is performed for checking “primary” sampleMap
names and pData
rownames. In this example, we add a row corresponding to “Joe” that does not have a match in the experimental data.
exMap <- rbind(dfmap,
DataFrame(assay = "New methyl",
primary = "Joe",
colname = "Joe"))
PrepMultiAssay(objlist, patient.data, exMap)$drops
## Warning in PrepMultiAssay(objlist, patient.data, exMap): Lengths of names
## in the ExperimentList and sampleMap are not equal
## Not all names in the primary column of the sampleMap
## could be matched to the pData rownames; see $drops
## DataFrame with 1 row and 3 columns
## assay primary colname
## <character> <character> <character>
## 1 New methyl Joe Joe
## $sampleMap_rows
## DataFrame with 1 row and 3 columns
## assay primary colname
## <character> <character> <character>
## 1 New methyl Joe Joe
To create a MultiAssayExperiment
from the results of the PrepMultiAssay
function, take each corresponding element from the resulting list and enter them as arguments to the MultiAssayExperiment
constructor function.
prepped <- PrepMultiAssay(objlist, patient.data, exMap)
## Warning in PrepMultiAssay(objlist, patient.data, exMap): Lengths of names
## in the ExperimentList and sampleMap are not equal
## Not all names in the primary column of the sampleMap
## could be matched to the pData rownames; see $drops
## DataFrame with 1 row and 3 columns
## assay primary colname
## <character> <character> <character>
## 1 New methyl Joe Joe
preppedMulti <- MultiAssayExperiment(prepped$ExperimentList, prepped$pData,
prepped$sampleMap)
preppedMulti
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 4 columns
## [2] Methyl 450k: matrix with 2 rows and 5 columns
## [3] Mirna: matrix with 4 rows and 3 columns
## [4] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
Bioconductor
classes from raw dataRecent updates to the GenomicRanges
and SummarizedExperiment
packages allow the user to create standard Bioconductor classes from raw data. Raw data read in as either data.frame
or DataFrame
can be converted to GRangesList
or SummarizedExperiment
classes depending on the type of data.
The function to create a GRangesList
from a data.frame
, called makeGRangesListFromDataFrame
can be found in the GenomicRanges
package. makeSummarizedExperimentFromDataFrame
is available in the SummarizedExperiment
package. It is also possible to create a RangedSummarizedExperiment
class object from raw data when ranged data is available.
A simple example can be obtained from the function documentation in GenomicRanges
:
library(GenomicRanges)
grldf <- structure(
list(chr = structure(c(1L, 1L, 1L, 1L, 1L),
class = "factor", .Label = "chr1"),
start = 11:15, end = 12:16,
strand = structure(c(4L, 1L, 4L, 3L, 2L),
.Label = c("-", ".", "*", "+"),
class = "factor"),
score = 1:5, specimen =
structure(c(1L, 1L, 2L, 2L, 3L),
.Label = c("a", "b", "c"),
class = "factor"),
gene_symbols = structure(1:5, .Label = c("GENEa", "GENEb",
"GENEc", "GENEd",
"GENEe"),
class = "factor")),
.Names = c("chr", "start", "end", "strand", "score", "specimen",
"gene_symbols"),
row.names = c(NA, -5L), class = "data.frame")
makeGRangesListFromDataFrame(grldf, split.field = "specimen",
names.field = "gene_symbols")
## GRangesList object of length 3:
## $a
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## <Rle> <IRanges> <Rle>
## GENEa chr1 [11, 12] +
## GENEb chr1 [12, 13] -
##
## $b
## GRanges object with 2 ranges and 0 metadata columns:
## seqnames ranges strand
## GENEc chr1 [13, 14] +
## GENEd chr1 [14, 15] *
##
## $c
## GRanges object with 1 range and 0 metadata columns:
## seqnames ranges strand
## GENEe chr1 [15, 16] *
##
## -------
## seqinfo: 1 sequence from an unspecified genome; no seqlengths
In the SummarizedExperiment
package:
library(SummarizedExperiment)
sedf <- structure(list(chr = structure(c(1L, 1L, 1L, 1L, 1L),
class = "factor", .Label = "chr2"),
start = 11:15, end = 12:16,
strand = structure(c(4L, 1L, 4L, 3L, 2L),
.Label = c("-", ".", "*", "+"),
class = "factor"),
expr0 = 3:7, expr1 = 8:12, expr2 = 12:16),
.Names = c("chr", "start", "end", "strand",
"expr0", "expr1", "expr2"),
row.names = c("GENEe", "GENEd", "GENEc", "GENEb", "GENEa"),
class = "data.frame")
sedf
## chr start end strand expr0 expr1 expr2
## GENEe chr2 11 12 + 3 8 12
## GENEd chr2 12 13 - 4 9 13
## GENEc chr2 13 14 + 5 10 14
## GENEb chr2 14 15 * 6 11 15
## GENEa chr2 15 16 . 7 12 16
makeSummarizedExperimentFromDataFrame(sedf)
## class: RangedSummarizedExperiment
## dim: 5 3
## metadata(0):
## assays(1): ''
## rownames(5): GENEe GENEd GENEc GENEb GENEa
## rowData names(0):
## colnames(3): expr0 expr1 expr2
## colData names(0):
RangedRaggedAssay
classNote that the GRangesList
got converted to a RangedRaggedAssay
, a class intended for data such as segmented copy number that is provides different genomic ranges for each sample. RangedRaggedAssay
is defined by this package and inherits from GRangesList
:
methods(class="RangedRaggedAssay")
## [1] ! != [
## [4] [[ [[<- [<-
## [7] %in% < <=
## [10] == > >=
## [13] $ $<- aggregate
## [16] anyNA append as.character
## [19] as.complex as.data.frame as.env
## [22] as.integer as.list as.logical
## [25] as.matrix as.numeric as.raw
## [28] assay by c
## [31] cbind classNameForDisplay coerce
## [34] coerce<- countOverlaps coverage
## [37] dim dimnames dimnames<-
## [40] disjoin do.call droplevels
## [43] duplicated elementMetadata elementMetadata<-
## [46] elementNROWS elementType end
## [49] end<- endoapply eval
## [52] expand expand.grid extractROWS
## [55] Filter findOverlaps flank
## [58] getHits getListElement head
## [61] ifelse intersect is.na
## [64] is.unsorted isDisjoint isEmpty
## [67] lapply length lengths
## [70] match mcols mcols<-
## [73] mendoapply merge metadata
## [76] metadata<- mstack names
## [79] names<- narrow NROW
## [82] order overlapsAny parallelSlotNames
## [85] parallelVectorNames pcompare pcompareRecursively
## [88] pintersect promoters psetdiff
## [91] punion range ranges
## [94] ranges<- rank rbind
## [97] rearrange Reduce reduce
## [100] relist rename rep
## [103] rep.int replaceROWS resize
## [106] restrict rev revElements
## [109] ROWNAMES rowRanges<- sapply
## [112] score score<- seqinfo
## [115] seqinfo<- seqlevelsInUse seqnames
## [118] seqnames<- setdiff setequal
## [121] shift shiftApply show
## [124] sort split split<-
## [127] stack start start<-
## [130] strand strand<- subset
## [133] subsetByColumn subsetByOverlaps subsetByRow
## [136] table tail tapply
## [139] union unique unlist
## [142] unsplit updateObject values
## [145] values<- width width<-
## [148] window window<- with
## [151] within xtabs zipdown
## see '?methods' for accessing help and source code
getMethod("dimnames", "RangedRaggedAssay")
## Method Definition:
##
## function (x)
## {
## dgr <- names(unlist(x, use.names = FALSE))
## list(dgr, names(x))
## }
## <environment: namespace:MultiAssayExperiment>
##
## Signatures:
## x
## target "RangedRaggedAssay"
## defined "RangedRaggedAssay"
It has some additional methods that are required for any data class contained in a MultiAssayExperiment
:
class(experiments(myMultiAssay)[[4]])
## [1] "RangedRaggedAssay"
## attr(,"package")
## [1] "MultiAssayExperiment"
rownames(experiments(myMultiAssay)[[4]])
## [1] "chr3:58000000-59502360:+" "chr3:58493000-58501999:+"
## [3] "chr3:3-9002:-" "chr1:1-3:-"
## [5] "chr2:4-9:-"
colnames(experiments(myMultiAssay)[[4]])
## [1] "snparray1" "snparray2" "snparray3"
One of the requirements for the assay
method (specifically for this RangedRaggedAssay
ExperimentList
element) is that the metadata have a score
column from which to obtain values for the resulting assay matrix. Here we have added ficticious values to such column contained within list elements. See assay,RangedRaggedAssay,ANY-method
documentation.
assay(experiments(myMultiAssay)[[4]], background = 2)
## snparray1 snparray2 snparray3
## chr3:58000000-59502360:+ 5 3 2
## chr3:58493000-58501999:+ 5 4 2
## chr3:3-9002:- 2 3 2
## chr1:1-3:- 2 2 2
## chr2:4-9:- 2 2 6
assay
methodThe assay
method uses the “inner” metadata columns to obtain a score value from which to create the assay matrix.
rra <- experiments(myMultiAssay)[[4]]
mcols(rra[[1]])
## DataFrame with 1 row and 2 columns
## score GC
## <integer> <numeric>
## 1 5 0.45
assay(rra, background = 2)
## snparray1 snparray2 snparray3
## chr3:58000000-59502360:+ 5 3 2
## chr3:58493000-58501999:+ 5 4 2
## chr3:3-9002:- 2 3 2
## chr1:1-3:- 2 2 2
## chr2:4-9:- 2 2 6
MultiAssayExperiment
allows subsetting by “rows”, columns, and assays, rownames, and colnames, across all experiments simultaneously while guaranteeing continued matching of samples.
Subsetting can be done most compactly by the square bracket method, or more verbosely and potentially more flexibly by the subsetBy*()
methods.
[
The three positions within the bracket operator indicate rows, columns, and assays, respectively (pseudocode):
myMultiAssay[rows, columns, assays]
For example, to select the gene “ENST00000355076”:
myMultiAssay["ENST00000355076", , ]
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] Affy: ExpressionSet with 1 rows and 4 columns
## [2] Methyl 450k: matrix with 1 rows and 5 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
The above operation works across all types of assays, whether ID-based (e.g. matrix, ExpressionSet, SummarizedExperiment) or range-based (e.g. RangedSummarizedExperiment, RangedRaggedAssay). Note that when using the bracket method [
, the drop argument is TRUE by default.
You can subset by rows, columns, and assays in a single bracket operation, and they will be performed in that order (rows, then columns, then assays). The following selects the ENST00000355076 gene across all samples, then the first two samples of each assay, and finally the Affy and Methyl 450k assays:
myMultiAssay["ENST00000355076", 1:2, c("Affy", "Methyl 450k")]
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] Affy: ExpressionSet with 1 rows and 2 columns
## [2] Methyl 450k: matrix with 1 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
By columns - character, integer, and logical are all allowed, for example:
myMultiAssay[, "Jack", ]
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 1 columns
## [2] Methyl 450k: matrix with 2 rows and 2 columns
## [3] Mirna: matrix with 4 rows and 1 columns
## [4] CNV gistic: RangedRaggedAssay with 1 rows and 1 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 1 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
myMultiAssay[, 1, ]
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 1 columns
## [2] Methyl 450k: matrix with 2 rows and 2 columns
## [3] Mirna: matrix with 4 rows and 1 columns
## [4] CNV gistic: RangedRaggedAssay with 1 rows and 1 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 1 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
myMultiAssay[, c(TRUE, FALSE, FALSE, FALSE), ]
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 1 columns
## [2] Methyl 450k: matrix with 2 rows and 2 columns
## [3] Mirna: matrix with 4 rows and 1 columns
## [4] CNV gistic: RangedRaggedAssay with 1 rows and 1 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 1 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
By assay - character, integer, and logical are allowed:
myMultiAssay[, , "Mirna"]
## A MultiAssayExperiment object of 1 listed
## experiment with a user-defined name and respective class.
## Containing an ExperimentList class object of length 1:
## [1] Mirna: matrix with 4 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
myMultiAssay[, , 3]
## A MultiAssayExperiment object of 1 listed
## experiment with a user-defined name and respective class.
## Containing an ExperimentList class object of length 1:
## [1] Mirna: matrix with 4 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
myMultiAssay[, , c(FALSE, FALSE, TRUE, FALSE, FALSE)]
## A MultiAssayExperiment object of 1 listed
## experiment with a user-defined name and respective class.
## Containing an ExperimentList class object of length 1:
## [1] Mirna: matrix with 4 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
Specify drop=FALSE
to keep assays with zero rows or zero columns, e.g.:
myMultiAssay["ENST00000355076", , , drop=FALSE]
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 1 rows and 4 columns
## [2] Methyl 450k: matrix with 1 rows and 5 columns
## [3] Mirna: matrix with 0 rows and 3 columns
## [4] CNV gistic: RangedRaggedAssay with 0 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 0 rows and 4 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
Using the default drop=TRUE
, assays with no rows or no columns are removed:
myMultiAssay["ENST00000355076", , , drop=TRUE]
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] Affy: ExpressionSet with 1 rows and 4 columns
## [2] Methyl 450k: matrix with 1 rows and 5 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
Experimental samples are stored in the rows of pData
but the columns of elements of ExperimentList
, so when we refer to subsetting by columns, we are referring to columns of the experimental assays. Subsetting by samples / columns will be more obvious after recalling the pData
:
pData(myMultiAssay)
## DataFrame with 4 rows and 2 columns
## sex age
## <factor> <integer>
## Jack M 38
## Jill F 39
## Bob M 40
## Barbara F 41
Subsetting by samples identifies the selected samples in rows of the pData DataFrame, then selects all columns of the ExperimentList
corresponding to these rows. Here we use an integer to keep the first two rows of pData, and all experimental assays associated to those two primary samples:
myMultiAssay[, 1:2]
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 2 columns
## [2] Methyl 450k: matrix with 2 rows and 3 columns
## [3] Mirna: matrix with 4 rows and 1 columns
## [4] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 2 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
Note that the above operation keeps different numbers of columns / samples from each assay, reflecting the reality that some samples may not have been assayed in all experiments, and may have replicates in some.
Columns can be subset using a logical vector. Here the dollar sign operator ($
) accesses one of the columns in pData
.
malesMultiAssay <- myMultiAssay[, myMultiAssay$sex == "M"]
pData(malesMultiAssay)
## DataFrame with 2 rows and 2 columns
## sex age
## <factor> <integer>
## Jack M 38
## Bob M 40
Finally, for special use cases you can exert detailed control of row or column subsetting, by using a list
or CharacterList
to subset. The following creates a CharacterList
of the column names of each assay:
allsamples <- colnames(myMultiAssay)
allsamples
## CharacterList of length 5
## [["Affy"]] array1 array2 array3 array4
## [["Methyl 450k"]] methyl1 methyl2 methyl3 methyl4 methyl5
## [["Mirna"]] micro1 micro2 micro3
## [["CNV gistic"]] snparray1 snparray2 snparray3
## [["CNV gistic2"]] mysnparray1 mysnparray2 mysnparray3 mysnparray4
Now let’s get rid of three Methyl 450k arrays, those in positions 3, 4, and 5:
allsamples[["Methyl 450k"]] <- allsamples[["Methyl 450k"]][-3:-5]
myMultiAssay[, as.list(allsamples), ]
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 4 columns
## [2] Methyl 450k: matrix with 2 rows and 2 columns
## [3] Mirna: matrix with 4 rows and 3 columns
## [4] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
subsetByColumn(myMultiAssay, as.list(allsamples)) #equivalent
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 4 columns
## [2] Methyl 450k: matrix with 2 rows and 2 columns
## [3] Mirna: matrix with 4 rows and 3 columns
## [4] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
You can select certain assays / experiments using subset, by providing a character, logical, or integer vector. An example using character:
myMultiAssay[, , c("Affy", "CNV gistic")]
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] Affy: ExpressionSet with 2 rows and 4 columns
## [2] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
You can subset assays also using logical or integer vectors:
is.cnv = grepl("CNV", names(experiments(myMultiAssay)))
is.cnv
## [1] FALSE FALSE FALSE TRUE TRUE
myMultiAssay[, , is.cnv] #logical subsetting
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [2] CNV gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
myMultiAssay[, , which(is.cnv)] #integer subsetting
## A MultiAssayExperiment object of 2 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 2:
## [1] CNV gistic: RangedRaggedAssay with 5 rows and 3 columns
## [2] CNV gistic2: RangedSummarizedExperiment with 5 rows and 4 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
Rows of the assays correspond to assay features or measurements, such as genes. Regardless of whether the assay is ID-based (e.g., matrix, ExpressionSet) or range-based (e.g., RangedSummarizedExperiment, RangedRaggedAssay), they can be subset using any of the following:
a character vector of IDs that will be matched to rownames in each assay
an integer vector that will select rows of this position from each assay. This probably doesn’t make sense unless every ExperimentList
element represents the same measurements in the same order and will generate an error if any of the integer elements exceeds the number of rows in any ExperimentList
element. The most likely use of integer subsetting would be as a head
function, for example to look at the first 6 rows of each assay.
a logical vector that will be passed directly to the row subsetting operation for each assay.
a list or CharacterList of the same length as ExperimentList. Each element of the subsetting list will be passed on exactly to subset rows of the corresponding element of the ExperimentList
.
Again, this operation always returns a MultiAssayExperiment
class, unless “drop=TRUE” is passed to the [
backet subset, with any ExperimentList
element not containing the feature having zero rows.
For example, return a MultiAssayExperiment where Affy
and Methyl 450k
contain only “ENST0000035076”" row, and “Mirna” and “CNV gistic” have zero rows (drop
argument is set to FALSE
by default in subsetBy*
):
featSubsetted0 <- subsetByRow(myMultiAssay, "ENST00000355076")
featSubsetted0 <- myMultiAssay["ENST00000355076", , ] #equivalent
class(featSubsetted0)
## [1] "MultiAssayExperiment"
## attr(,"package")
## [1] "MultiAssayExperiment"
class(experiments(featSubsetted0))
## [1] "ExperimentList"
## attr(,"package")
## [1] "MultiAssayExperiment"
experiments(featSubsetted0)
## ExperimentList class object of length 2:
## [1] Affy: ExpressionSet with 1 rows and 4 columns
## [2] Methyl 450k: matrix with 1 rows and 5 columns
In the following, Affy
ExpressionSet keeps both rows but with their order reversed, and Methyl 450k
keeps only its second row.
featSubsetted <-
subsetByRow(myMultiAssay, c("ENST00000355076", "ENST00000294241"))
exprs(experiments(myMultiAssay)[[1]])
## array1 array2 array3 array4
## ENST00000294241 101 103 105 107
## ENST00000355076 102 104 106 108
exprs(experiments(featSubsetted)[[1]])
## array1 array2 array3 array4
## ENST00000355076 102 104 106 108
## ENST00000294241 101 103 105 107
GenomicRanges
For MultiAssayExperiment
objects containing range-based objects (currently RangedSummarizedExperiment
and RangedRaggedAssay
), these can be subset using a GRanges
object, for example:
gr <- GRanges(seqnames = c("chr1"), strand = c("-", "+", "-"),
ranges = IRanges(start = c(1, 4, 6), width = 3))
Now do the subsetting. The function doing the work here is IRanges::subsetByOverlaps
- see its arguments for flexible types of subsetting by range. The first three arguments here are for subset
, the rest passed on to IRanges::subsetByOverlaps
through “…”:
subsetted <- subsetByRow(myMultiAssay, gr, maxgap = 2L, type = "within")
experiments(subsetted)
## ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 0 rows and 4 columns
## [2] Methyl 450k: matrix with 0 rows and 5 columns
## [3] Mirna: matrix with 0 rows and 3 columns
## [4] CNV gistic: RangedRaggedAssay with 1 rows and 3 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 0 rows and 4 columns
Square bracket subsetting can still be used here, but passing on arguments to IRanges::subsetByOverlaps
through “…” is simpler using subsetByRow()
.
subsetByRow
, subsetByColumn
, subsetByAssay
, and square bracket subsetting are all “endogenous” operations, in that they always returns another MultiAssayExperiment
object.
A double-bracket subset operation refers to an experiment, and will return the object contained within an ExperimentList
element. It is not endogenous. For example, the first ExperimentList
element is called “Affy” and contains an ExpressionSet
:
names(myMultiAssay)
## [1] "Affy" "Methyl 450k" "Mirna" "CNV gistic" "CNV gistic2"
myMultiAssay[[1]]
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 2 features, 4 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: array1 array2 array3 array4
## varLabels: slope53
## varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation:
myMultiAssay[["Affy"]]
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 2 features, 4 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: array1 array2 array3 array4
## varLabels: slope53
## varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation:
complete.cases()
returns a logical identifying which primary units have data for all experiments. Recall that myMultiAssay
provides data for four individuals:
pData(myMultiAssay)
## DataFrame with 4 rows and 2 columns
## sex age
## <factor> <integer>
## Jack M 38
## Jill F 39
## Bob M 40
## Barbara F 41
Of these, only Jack has data for all 5 experiments:
complete.cases(myMultiAssay)
## [1] TRUE FALSE FALSE FALSE
But all four have coplete cases for Affy and Methyl 450k:
complete.cases(myMultiAssay[, , 1:2])
## [1] TRUE TRUE TRUE TRUE
This output can be used to select individuals with complete data:
myMultiAssay[, complete.cases(myMultiAssay), ]
## A MultiAssayExperiment object of 5 listed
## experiments with user-defined names and respective classes.
## Containing an ExperimentList class object of length 5:
## [1] Affy: ExpressionSet with 2 rows and 1 columns
## [2] Methyl 450k: matrix with 2 rows and 2 columns
## [3] Mirna: matrix with 4 rows and 1 columns
## [4] CNV gistic: RangedRaggedAssay with 1 rows and 1 columns
## [5] CNV gistic2: RangedSummarizedExperiment with 5 rows and 1 columns
## Features:
## experiments() - obtain the ExperimentList instance
## pData() - the primary/phenotype DataFrame
## sampleMap() - the sample availability DataFrame
## `$`, `[`, `[[` - extract pData columns, subset, or experiment
## reduce() - select complete cases, order columns, disjoin ranges
## rearrange() - convert ExperimentList into a long or wide DataFrame
## assay() - convert ExperimentList to a list of rectangular matrices
The duplicated()
function returns a list of lists, identifying replicate samples for each individual within each assay:
duplicated(myMultiAssay)
## $Affy
## LogicalList of length 0
##
## $`Methyl 450k`
## LogicalList of length 1
## [["Jack"]] TRUE TRUE FALSE FALSE FALSE
##
## $Mirna
## LogicalList of length 0
##
## $`CNV gistic`
## LogicalList of length 1
## [["Jill"]] FALSE TRUE TRUE
##
## $`CNV gistic2`
## LogicalList of length 0
A call to reduce()
by default returns another MultiAssayExperiment where the columns of each element of ExperimentList
correspond exactly to the rows of pData
.
It performs a series of clean-up operations, including merging of replicates with a user-defined function (default: mean), disjoining non-disjoint ranges, select individuals with complete cases and re-order columns of each assay to correspond to the same individual. For example, the first three experiments of myMultiAssay
have three complete cases and some replicates:
colnames(myMultiAssay[, , 1:3])
## CharacterList of length 3
## [["Affy"]] array1 array2 array3 array4
## [["Methyl 450k"]] methyl1 methyl2 methyl3 methyl4 methyl5
## [["Mirna"]] micro1 micro2 micro3
After a reduce()
call, replicates are merged and only complete cases remain:
maeredux <- reduce(myMultiAssay[, , 1:3])
## harmonizing input:
## removing 1 sampleMap rows with 'colname' not in colnames of experiments
It is not obvious from the column names that the ordering of columns is identical:
colnames(maeredux)
## CharacterList of length 3
## [["Affy"]] array1 array4 array3
## [["Methyl 450k"]] methyl1 methyl5 methyl4
## [["Mirna"]] micro1 micro3 micro2
but it is clear from the sample map:
sampleMap(maeredux)
## DataFrame with 9 rows and 3 columns
## assay primary colname
## <factor> <character> <character>
## 1 Affy Jack array1
## 2 Affy Bob array4
## 3 Affy Barbara array3
## 4 Methyl 450k Jack methyl1
## 5 Methyl 450k Bob methyl5
## 6 Methyl 450k Barbara methyl4
## 7 Mirna Jack micro1
## 8 Mirna Bob micro3
## 9 Mirna Barbara micro2
See ?\MultiAssayExperiment-class\\
:
reduce(x, drop.empty.ranges = FALSE,
replicates = NULL, combine = rowMeans, vectorized = TRUE, ...)
Extractor functions convert a MultiAssayExperiment
into other forms that are convenient for analyzing. These would normally be called after any desired subsetting has been performed.
Produces long (default) or wide DataFrame
objects. The following produces a long DataFrame (the default) for the first two assays:
rearrange(myMultiAssay[, , 1:2])
## DataFrame with 18 rows and 5 columns
## assay primary rowname colname value
## <Rle> <Rle> <character> <Rle> <integer>
## 1 Affy Jack ENST00000294241 array1 101
## 2 Affy Jack ENST00000355076 array1 102
## 3 Affy Jill ENST00000294241 array2 103
## 4 Affy Jill ENST00000355076 array2 104
## 5 Affy Barbara ENST00000294241 array3 105
## ... ... ... ... ... ...
## 14 Methyl 450k Jill ENST00000383706 methyl3 6
## 15 Methyl 450k Barbara ENST00000355076 methyl4 7
## 16 Methyl 450k Barbara ENST00000383706 methyl4 8
## 17 Methyl 450k Bob ENST00000355076 methyl5 9
## 18 Methyl 450k Bob ENST00000383706 methyl5 10
This is especially useful for performing regression against patient or sample data from pData
using the pDataCols
argument:
rearrange(myMultiAssay[, , 1:2], pDataCols="age")
## DataFrame with 18 rows and 6 columns
## assay primary rowname colname value age
## <Rle> <Rle> <character> <Rle> <integer> <integer>
## 1 Affy Jack ENST00000294241 array1 101 38
## 2 Affy Jack ENST00000355076 array1 102 38
## 3 Affy Jill ENST00000294241 array2 103 39
## 4 Affy Jill ENST00000355076 array2 104 39
## 5 Affy Barbara ENST00000294241 array3 105 41
## ... ... ... ... ... ... ...
## 14 Methyl 450k Jill ENST00000383706 methyl3 6 39
## 15 Methyl 450k Barbara ENST00000355076 methyl4 7 41
## 16 Methyl 450k Barbara ENST00000383706 methyl4 8 41
## 17 Methyl 450k Bob ENST00000355076 methyl5 9 40
## 18 Methyl 450k Bob ENST00000383706 methyl5 10 40
The “wide” format is useful for calculating correlations or performing regression against different genomic features. Wide format is in general not possible with replicate measurements, so we demonstrate on the reduced MultiAssayExperiment:
rearrange(maeredux, shape="wide", pDataCols="sex")
## DataFrame with 21 rows and 6 columns
## primary rowname sex Affy Methyl.450k Mirna
## <factor> <character> <factor> <numeric> <numeric> <numeric>
## 1 Barbara ENST00000294241 F 105 NA NA
## 2 Barbara ENST00000355076 F 106 7 NA
## 3 Barbara ENST00000383706 F NA 8 NA
## 4 Barbara hsa-miR-148a F NA NA 207
## 5 Barbara hsa-miR-191 F NA NA 206
## ... ... ... ... ... ... ...
## 17 Jack ENST00000383706 M NA 3 NA
## 18 Jack hsa-miR-148a M NA NA 203
## 19 Jack hsa-miR-191 M NA NA 202
## 20 Jack hsa-miR-21 M NA NA 201
## 21 Jack hsa-miR148b M NA NA 204
The assay()
function returns a list of data matrices from the ExperimentList
:
assay(myMultiAssay)
## $Affy
## array1 array2 array3 array4
## ENST00000294241 101 103 105 107
## ENST00000355076 102 104 106 108
##
## $`Methyl 450k`
## methyl1 methyl2 methyl3 methyl4 methyl5
## ENST00000355076 1 3 5 7 9
## ENST00000383706 2 4 6 8 10
##
## $Mirna
## micro1 micro2 micro3
## hsa-miR-21 201 205 209
## hsa-miR-191 202 206 210
## hsa-miR-148a 203 207 211
## hsa-miR148b 204 208 212
##
## $`CNV gistic`
## snparray1 snparray2 snparray3
## chr3:58000000-59502360:+ 5 3 NA
## chr3:58493000-58501999:+ 5 4 NA
## chr3:3-9002:- NA 3 NA
## chr1:1-3:- NA NA 2
## chr2:4-9:- NA NA 6
##
## $`CNV gistic2`
## mysnparray1 mysnparray2 mysnparray3 mysnparray4
## a 6104.882 2789.1225 6267.4714 8373.7987
## b 4999.030 2914.0944 5105.8910 9608.7513
## c 9107.909 3258.4186 790.9714 5606.0981
## d 9484.951 6875.5835 7056.0430 3375.1630
## e 3256.058 737.6586 9739.1178 168.3109
rownames and colnames return a CharacterList
of rownames and colnames across all the assays. A CharacterList
is just an alternative to list
when each element contains a character vector, that provides a nice show method:
rownames(myMultiAssay)
## CharacterList of length 5
## [["Affy"]] ENST00000294241 ENST00000355076
## [["Methyl 450k"]] ENST00000355076 ENST00000383706
## [["Mirna"]] hsa-miR-21 hsa-miR-191 hsa-miR-148a hsa-miR148b
## [["CNV gistic"]] chr3:58000000-59502360:+ ... chr2:4-9:-
## [["CNV gistic2"]] a b c d e
colnames(myMultiAssay)
## CharacterList of length 5
## [["Affy"]] array1 array2 array3 array4
## [["Methyl 450k"]] methyl1 methyl2 methyl3 methyl4 methyl5
## [["Mirna"]] micro1 micro2 micro3
## [["CNV gistic"]] snparray1 snparray2 snparray3
## [["CNV gistic2"]] mysnparray1 mysnparray2 mysnparray3 mysnparray4
Any data classes in the ExperimentList
object must support the following methods:
colnames()
rownames()
[
dim()
Here is what happens if one of the methods doesn’t:
objlist2 <- objlist
objlist2[[2]] <- as.vector(objlist2[[2]])
invalid.obj <- try(MultiAssayExperiment(objlist2, patient.data, dfmap))
invalid.obj
## [1] "Error in if (dim(object)[1] > 0 && is.null(rownames(object))) { : \n missing value where TRUE/FALSE needed\n"
## attr(,"class")
## [1] "try-error"
## attr(,"condition")
## <simpleError in if (dim(object)[1] > 0 && is.null(rownames(object))) { paste(" rownames in", class(object), "are NULL")} else { NULL}: missing value where TRUE/FALSE needed>
The following methods are defined for MultiAssayExperiment
:
methods(class="MultiAssayExperiment")
## [1] [ [[ [[<- $
## [5] $<- assay complete.cases dimnames
## [9] duplicated experiments experiments<- getHits
## [13] isEmpty length metadata metadata<-
## [17] names pData pData<- rearrange
## [21] reduce sampleMap sampleMap<- show
## [25] subsetByAssay subsetByColumn subsetByRow updateObject
## see '?methods' for accessing help and source code
c()
function for adding new assays to existing MultiAssayExperiment
sessionInfo()
## R Under development (unstable) (2017-02-10 r72157)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.3
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] SummarizedExperiment_1.5.6 DelayedArray_0.1.6
## [3] GenomicRanges_1.27.22 GenomeInfoDb_1.11.9
## [5] IRanges_2.9.18 S4Vectors_0.13.14
## [7] Biobase_2.35.0 BiocGenerics_0.21.3
## [9] MultiAssayExperiment_1.1.22 BiocStyle_2.3.30
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.9 plyr_1.8.4
## [3] compiler_3.4.0 XVector_0.15.2
## [5] bitops_1.0-6 tools_3.4.0
## [7] zlibbioc_1.21.0 digest_0.6.12
## [9] tibble_1.2 evaluate_0.10
## [11] gtable_0.2.0 lattice_0.20-34
## [13] Matrix_1.2-8 shiny_1.0.0
## [15] yaml_2.1.14 gridExtra_2.2.1
## [17] GenomeInfoDbData_0.99.0 UpSetR_1.3.2
## [19] stringr_1.1.0 knitr_1.15.1
## [21] rprojroot_1.2 grid_3.4.0
## [23] shinydashboard_0.5.3 R6_2.2.0
## [25] rmarkdown_1.3 tidyr_0.6.1
## [27] reshape2_1.4.2 ggplot2_2.2.1
## [29] magrittr_1.5 backports_1.0.5
## [31] scales_0.4.1 htmltools_0.3.5
## [33] assertthat_0.1 colorspace_1.3-2
## [35] mime_0.5 xtable_1.8-2
## [37] httpuv_1.3.3 stringi_1.1.2
## [39] lazyeval_0.2.0 RCurl_1.95-4.8
## [41] munsell_0.4.3