Package-specific methods

Intro: The Bioconductor project contains analysis packages which often depend on a number of core packages, which contain core classes (S4 classes like eSet, SummarizedExperiment, etc.) It is recommended not to duplicate the core classes, but instead to extend these classes when contributing new packages to the project. This means that many methods can be applied to a new class that is defined in a new analysis package. (This is good! We don’t have to re-implement [ subsetting everytime.)

Problem: As far as I can tell there is not an quick and easy way for users to find out which methods are defined for a specific class, but restricted to the package that defines the class. By quick and easy, I mean that this should be a few keystrokes, be easy to remember, and should output a simple character vector. In Bioconductor, these methods are really the main ones that a user would be interested in. These are the important custom accessors and typical analysis steps.

Example: The DESeq2 package has a main class DESeqDataSet, which contains the data, metadata, and the results that are added through a typical analysis.

suppressPackageStartupMessages(library(DESeq2))
dds <- makeExampleDESeqDataSet()
class(dds)

## [1] "DESeqDataSet"
## attr(,"package")
## [1] "DESeq2"

The DESeqDataSet extends the following classes:

extends("DESeqDataSet")

## [1] "DESeqDataSet"               "RangedSummarizedExperiment"
## [3] "SummarizedExperiment0"      "Vector"                    
## [5] "Annotated"

So if I want to know, what can I do with this dds thing, I can ask:

methods(class="DESeqDataSet")

##   [1] aggregate              anyNA                  <=                    
##   [4] <                      ==                     >=                    
##   [7] >                      !=                     append                
##  [10] as.character           as.complex             as.data.frame         
##  [13] as.env                 as.integer             as.list               
##  [16] as.logical             as.numeric             as.raw                
##  [19] assayNames<-           assayNames             assays<-              
##  [22] assays                 assay<-                assay                 
##  [25] cbind                  coef                   coerce                
##  [28] coerce<-               colData<-              colData               
##  [31] compare                Compare                countOverlaps         
##  [34] counts<-               counts                 coverage              
##  [37] design<-               design                 dimnames<-            
##  [40] dimnames               dim                    disjointBins          
##  [43] dispersionFunction<-   dispersionFunction     dispersions           
##  [46] dispersions<-          distance               distanceToNearest     
##  [49] duplicated             elementMetadata<-      elementMetadata       
##  [52] end<-                  end                    estimateDispersions   
##  [55] estimateSizeFactors    eval                   expand                
##  [58] exptData<-             exptData               extractROWS           
##  [61] findOverlaps           flank                  follow                
##  [64] granges                head                   high2low              
##  [67] %in%                   isDisjoint             is.unsorted           
##  [70] length                 lengths                match                 
##  [73] mcols<-                mcols                  metadata<-            
##  [76] metadata               mstack                 names<-               
##  [79] names                  narrow                 nearest               
##  [82] normalizationFactors<- normalizationFactors   NROW                  
##  [85] order                  overlapsAny            parallelSlotNames     
##  [88] plotDispEsts           plotMA                 precede               
##  [91] promoters              ranges<-               ranges                
##  [94] rank                   rbind                  relist                
##  [97] rename                 rep.int                replaceROWS           
## [100] rep                    resize                 restrict              
## [103] rev                    ROWNAMES               rowRanges             
## [106] rowRanges<-            seqinfo<-              seqinfo               
## [109] seqlevelsInUse         seqnames               shiftApply            
## [112] shift                  showAsCell             show                  
## [115] sizeFactors            sizeFactors<-          sort                  
## [118] split                  split<-                start<-               
## [121] start                  strand<-               strand                
## [124] subsetByOverlaps       subset                 [<-                   
## [127] [                      [[<-                   [[                    
## [130] $<-                    $                      table                 
## [133] tail                   tapply                 trim                  
## [136] unique                 updateObject           values<-              
## [139] values                 width<-                width                 
## [142] window<-               window                 with                  
## [145] xtfrm                 
## see '?methods' for accessing help and source code

But this is a giant list of methods of all the possible things I can do, not restricted to the methods that the package author wrote. Of course, both kinds of information are valuable, but users typically want to know the smaller set.

Martin Morgan pointed me to:

showMethods(classes="DESeqDataSet", where=getNamespace("DESeq2"))

## Function: counts<- (package BiocGenerics)
## object="DESeqDataSet", value="matrix"
## 
## Function: counts (package BiocGenerics)
## object="DESeqDataSet"
## 
## Function: design<- (package BiocGenerics)
## object="DESeqDataSet", value="formula"
## 
## Function: design (package BiocGenerics)
## object="DESeqDataSet"
## 
## Function: dispersionFunction<- (package DESeq2)
## object="DESeqDataSet", value="function"
## 
## Function: dispersionFunction (package DESeq2)
## object="DESeqDataSet"
## 
## Function: dispersions<- (package DESeq2)
## object="DESeqDataSet", value="numeric"
## 
## Function: dispersions (package DESeq2)
## object="DESeqDataSet"
## 
## Function: estimateDispersions (package BiocGenerics)
## object="DESeqDataSet"
## 
## Function: estimateSizeFactors (package BiocGenerics)
## object="DESeqDataSet"
## 
## Function: normalizationFactors<- (package DESeq2)
## object="DESeqDataSet", value="matrix"
## 
## Function: normalizationFactors (package DESeq2)
## object="DESeqDataSet"
## 
## Function: plotDispEsts (package BiocGenerics)
## object="DESeqDataSet"
## 
## Function: plotMA (package BiocGenerics)
## object="DESeqDataSet"
## 
## Function: sizeFactors<- (package BiocGenerics)
## object="DESeqDataSet", value="numeric"
## 
## Function: sizeFactors (package BiocGenerics)
## object="DESeqDataSet"

This is the right information, but is a lot to type, and in my opinion, too verbose in its output.

I’ve come up with two messy lines of code, which take a class name as input and return a simple character vector of the methods:

intersect(sapply(strsplit(as.character(methods(class="DESeqDataSet")), ","), `[`, 1), ls(attr(findClass("DESeqDataSet")[[1]],"name")))

##  [1] "counts<-"               "counts"                
##  [3] "design<-"               "design"                
##  [5] "dispersionFunction<-"   "dispersionFunction"    
##  [7] "dispersions"            "dispersions<-"         
##  [9] "estimateDispersions"    "estimateSizeFactors"   
## [11] "normalizationFactors<-" "normalizationFactors"  
## [13] "plotDispEsts"           "plotMA"                
## [15] "show"                   "sizeFactors"           
## [17] "sizeFactors<-"

And another approach:

sub("Function: (.*) \\(package .*\\)","\\1",grep("Function",showMethods(classes="DESeqDataSet", where=findClass("DESeqDataSet")[[1]], printTo=FALSE), value=TRUE))

##  [1] "counts<-"               "counts"                
##  [3] "design<-"               "design"                
##  [5] "dimnames"               "dispersionFunction<-"  
##  [7] "dispersionFunction"     "dispersions<-"         
##  [9] "dispersions"            "estimateDispersions"   
## [11] "estimateSizeFactors"    "names"                 
## [13] "normalizationFactors<-" "normalizationFactors"  
## [15] "plotDispEsts"           "plotMA"                
## [17] "sizeFactors<-"          "sizeFactors"

Maybe there is an even better way? I’m considering to define this function and put it into rafalib where we have some other convenience functions stashed.

If you have feedback, you can reply to me @mikelove

Package-specific methods

Mike Love

April 18, 2016