Reference classes in R are very useful for some situations, but they don’t come for free. In this document, I’ll explore the costs in memory and speed of standard R reference classes vs. other reference objects which are created in different ways.

I won’t cover the other major cost of reference classes, which is the complexity and fickleness of using S4 (since reference classes are built on S4).

# Some setup stuff
library(microbenchmark)
library(pryr)
library(testclasses)  # For Winston's simple reference classes 

Class definitions

Here are a number of ways of creating reference objects in R, starting with the most complicated (standard R reference class) and ending with the simplest (an environment created by a closure).

Standard R reference class

A_rc <- setRefClass("A_rc", 
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 1) .self$x <<- x,
    inc = function(n = 1) x <<- x + n
  )
)

Winston’s simple reference class

This is a simpler implementation of reference objects, from the testclasses package.

B_wrc <- createRefClass("B_wrc",
  members = list(
    x = NULL,
    initialize = function(x = 1) self$x <<- x,
    inc = function(n = 1) x <<- x + n
  )
)

Objects of this type also have an automatically-created self member:

print(B_wrc$new())
## B_wrc object
##   inc: function
##   initialize: function
##   self: environment
##   x: 1

Winston’s simple reference class 2, with public and private members

This is a variant on the previous type of reference class, but this version has public and private members.

C_wrc_priv <- createRefClass2("C_wrc_priv",
  private = list(x = NULL),
  public = list(
    initialize = function(x = 1) private$x <<- x,
    inc = function(n = 1) x <<- x + n
  )
)

Instead of a single self object which refers to all items in an object, these objects have public and private.

print(C_wrc_priv$new())
## <C_wrc_priv>
##   Public:
##     inc: function
##     initialize: function
##     private: environment
##     public: environment
##   Private:
##     private: environment
##     public: environment
##     x: 1

Environment created by a closure, with class attribute

This is simply an environment with a class attached to it.

D_closure_class <- function(x = 1) {
  inc <- function(n = 1) x <<- x + n
  structure(environment(), class = "D_closure")
}

This doesn’t have a self member, although that could be added.

Even though x isn’t declared in the function body, it gets captured because it’s an argument to the function.

# Roundabout way to print the contents of a D object
str(as.list.environment(D_closure_class()))
## List of 2
##  $ inc:function (n = 1)  
##   ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 2 10 2 36 10 36 2 2
##   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fb79b1d91f0> 
##  $ x  : num 1

Environment created by a closure, without class attribute

This is the simplest type of reference object:

E_closure_noclass <- function(x = 1) {
  inc <- function(n = 1) x <<- x + n
  environment()
}

Tests

For all the timings using microbenchmark(), the results are reported in microseconds, and the most useful value is probably the median column.

Memory footprint

How much memory does a single instance of each object take, and how much memory does each additional object take?

# Utility functions for calculating sizes
obj_size <- function(newfun) {
  data.frame(
    one = as.numeric(object_size(newfun())),
    incremental = as.numeric(object_size(newfun(), newfun()) - object_size(newfun()))
  )
}

obj_sizes <- function(...) {
  dots <- list(...)
  sizes <- Map(obj_size, dots)
  do.call(rbind, sizes)
}

Sizes of each type of object, in bytes:

obj_sizes(
  A_rc = A_rc$new,
  B_wrc = B_wrc$new,
  C_wrc_priv = C_wrc_priv$new,
  D_closure_class = D_closure_class,
  E_closure_noclass = E_closure_noclass
)
##                      one incremental
## A_rc              470664        1368
## B_wrc              11080         616
## C_wrc_priv         11744         840
## D_closure_class     9304         400
## E_closure_noclass   8232         344

It looks like the standard reference class object takes up a huge amount of memory, but much of that is shared between reference classes. Adding another object from a different reference class doesn’t require much more memory – about 30KB:

A_rc2 <- setRefClass("A_rc2",
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 2) .self$x <<- x,
    inc = function(n = 2) x <<- x * n
  )
)

# Size of a new A_rc2 object, over and above an A_rc object
as.numeric(object_size(A_rc$new(), A_rc2$new()) - object_size(A_rc$new()))
## [1] 37688

Object instantiation speed

How much time does it take to create one of these objects?

# Garbage collect now so that we (probably) won't do it in the middle of a run
invisible(gc())
speed <- microbenchmark(
  A_rc = A_rc$new(),
  B_wrc = B_wrc$new(),
  C_wrc_priv = C_wrc_priv$new(),
  D_closure_class = D_closure_class(),
  E_closure_noclass = E_closure_noclass(),
  unit = "us"
)
speed
## Unit: microseconds
##               expr     min      lq  median      uq      max neval
##               A_rc 321.118 337.682 362.034 470.760 1662.109   100
##              B_wrc  44.832  47.737  53.237  62.462  114.351   100
##         C_wrc_priv  61.043  67.573  74.205  87.382  249.420   100
##    D_closure_class  10.976  13.198  16.269  22.483   31.282   100
##  E_closure_noclass   0.717   1.329   1.621   2.185    4.073   100

Standard R reference classes are much slower to instantiate than the other types of classes, with a median of 0.362 milliseconds per instantiation.

D is slower than E. The only difference between them is that the D object has a class attribute. The slowness appears to be due mostly to the call to the structure() function that’s called in the creation of a D object:

microbenchmark(
  structure = structure(1, class = "foo"),
  unit = "us"
)
## Unit: microseconds
##       expr   min    lq median    uq   max neval
##  structure 9.148 9.614  9.953 10.34 59.95   100

Method call speed

How much overhead is there when calling a method from one of these objects?

A <- A_rc$new()
B <- B_wrc$new()
C <- C_wrc_priv$new()
D <- D_closure_class()
E <- E_closure_noclass()

invisible(gc())
microbenchmark(
  A_rc = A$inc(),
  B_wrc = B$inc(),
  C_wrc_priv = C$inc(),
  D_closure_class = D$inc(),
  E_closure_noclass = E$inc(),
  unit = "us"
)
## Unit: microseconds
##               expr    min     lq median     uq    max neval
##               A_rc 37.144 68.303 70.445 74.675 391.70   100
##              B_wrc  2.690  5.319  6.369  7.215  13.05   100
##         C_wrc_priv  2.537  3.410  6.071  7.313  21.81   100
##    D_closure_class  2.093  4.235  5.184  5.813  14.77   100
##  E_closure_noclass  0.639  1.358  1.723  2.034  10.38   100

As expected, standard reference classes are the slowest by a large margin. The difference between D and E is probably due to S3 method lookup for the $ function – there could be a $.myclass method which would be called for myclass objects.

When we manually remove the class from the B object, it performs on par with E.

# Really create a reference object without a class
B2 <- B_wrc$new()
class(B2) <- NULL
microbenchmark(
  B_wrc_noclass = B2$inc(),
  E_closure_noclass = E$inc(),
  unit = "us"
)
## Unit: microseconds
##               expr   min    lq median     uq   max neval
##      B_wrc_noclass 0.594 0.639  0.716 0.8245 16.91   100
##  E_closure_noclass 0.570 0.621  0.690 0.8485 34.30   100

Overhead from using self

With standard reference class objects, you can modify fields using the <<- operator, or by using the self object. For example, compare the inc() methods of these two classes:

rc_no_self <- setRefClass("rc_no_self", 
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 1) .self$x <<- x,
    inc = function(n = 1) x <<- x + n
  )
)

rc_self <- setRefClass("rc_self", 
  fields = list(x = "numeric"),
  methods = list(
    initialize = function(x = 1) .self$x <<- x,
    inc = function(n = 1) .self$x <- x + n
  )
)

Winston’s reference classes are similar, except they use self instead of .self:

wrc_no_self <- createRefClass("wrc_no_self",
  members = list(
    x = 1,
    inc = function(n = 1) x <<- x + n
  )
)

wrc_self <- createRefClass("wrc_self",
  members = list(
    x = 1,
    inc = function(n = 1) self$x <- self$x + n
  )
)
rc_no_self_obj <- rc_no_self$new()
rc_self_obj <- rc_self$new()
wrc_no_self_obj <- wrc_no_self$new()
wrc_self_obj <- wrc_self$new()

invisible(gc())
microbenchmark(
  rc_no_self = rc_no_self_obj$inc(),
  rc_self = rc_self_obj$inc(),
  wrc_no_self = wrc_no_self_obj$inc(),
  wrc_self = wrc_self_obj$inc(),
  unit = "us"
)
## Unit: microseconds
##         expr    min     lq median     uq     max neval
##   rc_no_self 38.580 40.044 41.285 47.610 262.523   100
##      rc_self 61.047 64.189 66.593 74.271 151.730   100
##  wrc_no_self  2.819  3.567  4.181  4.717   9.865   100
##     wrc_self  8.509  9.752 10.835 11.648  62.110   100

Using .self or self adds some overhead, which makes sense when you consider how R searches for objects.

When the method accesses x without using .self, R first searches in the execution environment but doesn’t find x there, so it then searches in the parent environment, finds x there, and assigns the value.

When using .self, R searches for .self in the function’s execution environment but doesn’t find it there, so it looks in the parent environment (which also happens to be the object environment, as well as the environment that .self points to) and does find it there. Then it looks in the .self environment for x, and assigns the value.

Lists vs. environments, and S3 object access overhead

This compares member access time with lists vs. environments, and when the list/environment has a class attribute vs. not having a class. If the class has a class attribute, R will use method lookup for $, which adds overhead.

list_noclass <- list(x = 10)
list_class <- structure(list(x = 10), class = "foo")
env_noclass <- new.env()
env_noclass$x <- 10
env_class <- structure(new.env(), class = "foo")
env_class$x <- 10

invisible(gc())
microbenchmark(
  list_noclass = list_noclass$x,
  list_class = list_class$x,
  env_noclass = env_noclass$x,
  env_class = env_class$x,
  unit = "us"
)
## Unit: microseconds
##          expr   min     lq median    uq    max neval
##  list_noclass 0.200 0.2365 0.3095 0.363  6.841   100
##    list_class 1.504 1.6105 1.7200 1.841 36.184   100
##   env_noclass 0.201 0.2390 0.3175 0.367 10.274   100
##     env_class 1.551 1.6250 1.6990 1.772 24.421   100

Wrap-up

Standard reference class objects take more memory and are slower than other, simpler types of reference objects. They do provide additional features, such as type checking of fields and class inheritance, but these aren’t, in my opinion, enough to offset the performance penalty and especially the pain of dealing with S4 (which reference classes are built on). Type checking of fields is only a minor benefit, and it would be simple to add inheritance to Winston’s reference class implementation. It would also be simple to implement private and public members, though there would be a small performance penalty.

Another drawback to standard R reference class objects is it’s not entirely clear how they work, as many advanced R developers can attest. The other types of reference objects used here are simple and can be understood completely, as long as one understands how environments work.

Appendix

sessionInfo()
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-apple-darwin13.1.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] testclasses_0.1      pryr_0.1             microbenchmark_1.3-0
## 
## loaded via a namespace (and not attached):
## [1] codetools_0.2-8  evaluate_0.5.3   formatR_0.10     knitr_1.5.33    
## [5] Rcpp_0.11.1      rmarkdown_0.1.95 stringr_0.6.2    tools_3.1.0     
## [9] yaml_2.1.11