This post aims to give some insight into object oriented programming in R for people who have experience with other more standard object oriented programming languages like Python, Java, C++, PHP, Ruby, Swift, you name it. If you have no idea what object oriented programming is, this might not be for you. Be warned though what R calls ‘class’ has little to do with the mainstream idea of classes.
Object oriented programming in R can be a confusing topic. There are many different ways to programm which are traditionally referred to as objects in R. The R internal ways to create S3 and S4 ‘classes’ and Reference classes are described well in the section in Advanced R. And there are also several third party libraries for other approaches, for example R6 classes.
If you have done object oriented programming before (in Java or Python for example), you are probably better off not thinking of S3 and S4 ‘classes’ as classes like you know them. S3 and S4 are really just a way to implement ploymorphism for static functions. This works by providing calling conventions for generic functions (like print
and plot
) which should be called on different datastructures to do differen things.
Reference and R6 classes are more like traditional classes but a little un-R-like. They are certainly well suited for special tasks. But I would suggest you just stop looking for classes in R as a way to create your own data structure. Instead I think it is best to embrace the functional programming style which is easier to debug and well suited for data science. The most natural way to structure data in R is by using lists and environments, which are highly customizable data structures for which you can write constructor functions to create class-like structures. In combination with the fact that functions are first class objects and the ability to dispatch generic S3 functions for common functionalities, you will have all you need for most usecaes.
If you really need inheritance and more complicated polymorphism, you probably want to look into S4 classes and Reference and R6 classes. But only after you have understood the concept of environments and generic S3 functions. You might also want to reconsider whether Python isn’t the better language for your task.
In the following I introduce how to create customized list objects and create class-like structures with constructor functions. I will then briefly show how generic S3 functions work and finally get to the powerfull concept of environments and how you can create more traditional object like behaviour with them. Finally I will highlight some cool possibilities how to create promises and computed attributes for your environemnt objects.
The idea of an object is really just to bundle data and corresponding methods together. Lists in R are well suited to implement this, since they can contain different data types, even functions — which are first class objects that can be assigned or returned like any other. Infact we can literally create objects of a new class in R by taking a list and simply setting the class
attribute of the list to a new value.
# create an object of myClass...
my_object <- list(x=5, get_x=function() "x was 5")
class(my_object) <- "myClass"
# ...which has classy behaviour
class(my_object)
## [1] "myClass"
my_object$x
## [1] 5
my_object$get_x()
## [1] "x was 5"
It is that easy to create a new class. But it can also lead to very confusing and hard to debug code when not used with great care. S4 provides some protection from such behaviour but it also introduces a lot of verbosity, so you can probably get by without them if you don’t have complicated inheritance structures. So just take care and you’ll be fine, this is also very true for environments. As Hadly Wickham said:
R doesn’t protect you from yourself: you can easily shoot yourself in the foot. As long as you don’t aim the gun at your foot and pull the trigger, you won’t have a problem. — Hadly Wickham in Advanced R
Ok, back to the topic. We have created an object of our own class, but we really want to define that class a bit somehow to not shoot our foots. All we need to do for that is create a constructor function — which is just a normal function that will return a list.
# a constructor for myClass...
myClass <- function(x){
structure(class = "myClass", list(
# attributes
x = x,
# methods
get_x = function() paste("x was", x)
))
}
# ...which can creates classy objects
my_object <- myClass(7)
class(my_object)
## [1] "myClass"
my_object$get_x()
## [1] "x was 7"
To avoid shooting your foots, you should only ever set the class atribute with such constructor functions which ensure that there is something like a class definition.
While this is a nice way to create custom list structures (a bit like structs in C), there is something obvious missing. Methods created in this way don’t have access to class attributes which is beside the point in many situations.
# you may reassign attributes of myClass objects...
my_object$x <- 9
paste("x is", my_object$x)
## [1] "x is 9"
# ...but the get_x function will never know
my_object$get_x()
## [1] "x was 7"
There are two ways to fix this. The first is to give up the object$method(...)
notation and simply write methods as functions that are called like method(object, ...)
. You should consider this option, if all you want is something like a struct and a few methods that operate on those structs. This approach keeps things functional, which imrpoves readability and security of your code. So think twice before you abandone it. A typical case where you do want to avoid this, is when you objects get very large and will be slow to copy around.
To sweeten the deal a little R allows you to overload functions by using so called S3 generics. You can keep your function names simple by defining different behaviours for one function depending on the type of object it is called with. Simply append the function name with a .myClass
to redefine it to behave differently for myClass
objects.
# let's custamize the summary, for example
summary.myClass <- function(my_object){
cat("myClass obect with x =", my_object$x)
}
summary(my_object)
## myClass obect with x = 9
Actually all R internal classes are implemented using S3. That’s why we could so easily overwrite summary
. If a function is not already defined as an S3 generics, we can create our own generic function by using the UseMethod
function that handles the method dispatch automatically.1
# we create the generic dispatcher and a default...
is_mine <- function(object) UseMethod("is_mine")
is_mine.default <- function(object) FALSE
# ...than we overwrite the behaviour for myClass objects...
is_mine.myClass <- function(object) TRUE
# ...now different functions are called on objects of different classes
is_mine(4)
## [1] FALSE
is_mine(my_object)
## [1] TRUE
The second way to fix the method-attribute-access-probkem is by making use of the lexical scoping of R and the deep assignment operator <<-
(this is equivalent to the assign
function) which doesn’t assign a value to a variable but searches for the given variable name by walking up the environments and then assigning it the new value.
# here's a funky constructor using lexical scoping...
myClass <- function(x){
# attributes
self.x <- x
structure(class = "myClass", list(
# methods
set_x = function(y) self.x <<- y,
get_x = function() self.x
))
}
# ...which creates classy objects...
my_object <- myClass(7)
my_object$get_x()
## [1] 7
# ...that can have their attributes reassigned...
my_object$set_x(9)
my_object$get_x()
## [1] 9
# ...but only with methods
my_object$x
## NULL
You might like this approach if you are into writing methods for everything like in Java, maybe you performance at work is measured in lines of code written or you think that getters and setters make everythink so much safer (but you don’t want to embrace functional programming). Whatever, I find it a little repulsive to write this verbose code — this is not Java but R after all, a language that is likable because of its brevity and simplicity. A better solution might be to just use environments directly.
So lets reflect a second on what happened in our last constructor: the constructor created a variable in the environment of the function which is available in the sub-environment of the getter and found in the same parent environment of the setter when it calls the deep assign operator.
Environments are actually first class data structures in R. They behave pretty much like lists (but unlike lists they implement hashing for their names). We can create a new enviroment with the new.env
function and then assign and retrieve values just like list values.
# we can create enviroments...
MyEnv <- new.env()
# ...and assign values to their variables...
MyEnv$a <- 5
MyEnv$b <- TRUE
# ...and retrieve them, just like with lists...
MyEnv$a
## [1] 5
MyEnv[["b"]]
## [1] TRUE
# ...and even loop throug them easily
for( el in ls(MyEnv) ) print(MyEnv[[el]])
## [1] 5
## [1] TRUE
The only big difference between environments and lists is that envrionments have assignment by reference. So the asignment or passing in functions doesn’t create a copy but just sets a new pointer. (The values in the enviroment are still pass by copy of course.) You should really read the [section in Advanced R]](http://adv-r.had.co.nz/Environments.html) to learn more about enviroments.
# we can try to copy an environment...
MyEnv_copy <- MyEnv
# ...but it will just create another pointer
MyEnv$a <- 10
MyEnv_copy$a
## [1] 10
# you also can't use == to compare, but need the identical function
identical(MyEnv_copy, MyEnv)
## [1] TRUE
Just like we did before with lists, we can also set the class attribute of environments. So we get an easy way to create objects that behave like we know it from other languages by simply returning a pointer to the environment of a function from it. We do this by using the environment
function to get this pointer to the current environment inside the constructor function.
# with envrionments we get classes...
MyClass <- function(x){
x <- x-1
get_x <- function() x
structure(class="MyClass", environment())
}
# ...that are truly classy
MyObject <- MyClass(3)
MyObject$x
## [1] 2
MyObject$x <- 5
MyObject$get_x()
## [1] 5
This is a very elegant way to create object oriented constructs in R, especially when we have very large objects where the assignment by reference is actually desired. Of course we can also create S3 generic methods for such environment-objects2, and it is good practice to decide for each method wether it should be an S3 generic or a class method.
Of course this returns all the variables of the function environment, if you want to avoid that, you can select the ones you want at the end, add them to a new environment and return the pointer to that environment. But I would acutally advise you to simply avoid any helper variables in your conscturctors and outsource them to helper functions which are defined outside of the constructor.
As a little bonus, you can also create promises and computed attributes in R, just like in Swift. Those are a great tool to save some memory for properties that are only needed seldomly and replace methods with no additional variables by attributes.
You best import the pryr
package to get access to the %<d-%
operators which wrapps the delayedAssign
function. Then you can easily assign a variable as a promise which will be lazy evaluated — that means it will only be set once it is requested. You should use this for attributes which take time to computer but are not needed at the initialization of you object for example.
library(pryr)
# this variable is set instantly...
system.time(d %<d-% {Sys.sleep(1)})["elapsed"]
## elapsed
## 0
# ...even though it takes a second to compute
system.time(d)["elapsed"]
## elapsed
## 1.005
similarly we can use the %<a-%
operator which wraps the makeActiveBinding
function. This creates a variable for which the right hand side of the assignment is evaluated every time the variable gets called. This is usefull to convert methods into attributes for such methods that don’t have any input variables, but need to compute something new every time.
# we can set a variable...
x %<a-% sample(0:99, 1)
# ...which will be recalculated every time
x
## [1] 20
x
## [1] 56
Such lazily and actively evaluated attributes can be very handy to reduce the amount of methods that your objects have, which can greatly improve usability and speed.
This works by creating a vector of function names to call from that is matched with the class attribute. The class attribute can actually also be a vector, which allows you to create some kind of inheritance structures that allow you to call NextMethod
to invoke the parent class method — think of it like a weird super
.↩
we could also create S3 generic functions inside environments and use NextMethod
to create some kind of inheritance.↩