Functional Object-Oriented Programming in R

Justin Taylor
09/23/2016

Basis

What is Object-Oriented Programming (OOP)?

  • A programming paradigm where data structures are modeled as objects with state (attributes) and behavior (methods).

Why learn OOP in R?

  • Write reusable, organized code
  • Handle more complex data
  • Improvements in troubleshooting
  • Understanding code in third-party packages

How is OOP implemented in R?

  • Generic-function calls specific method based on input
  • Three distinct systems: S3, S4, and Reference

Outline

  1. R Functional OO Philosophy
  2. S3 OO System
  3. S4 OO System

Packages

library(magrittr) # pipe
library(pryr)     # OO tools
library(methods)  # S4 functions
library(ggplot2)
library(foreach)  # functional construct
set.seed(112020)

Functional and Object-Oriented Programming Introduction

OOP Philosophy

  • Structure data to accomodate computation
  • Everything is an object, and methods are functions that define relationship between objects

car-display.jpg

http://www.civicx.com/attachments/37-2016-honda-civic-fd-1-jpg.5641/

OO Classes

  • Classes are blueprints that define attributes

fallingwaterbp

Fallingwater Floorplan (Arsenalbubs: Creative Commons CC0 1.0 Universal Public Domain Dedication), http

Inheritance

  • Child classes have attributes and methods of the parent
  • General to more specific
    • Vehicle \( \rightarrow \) Car \( \rightarrow \) Civic
    • Flora \( \rightarrow \) Tree \( \rightarrow \) Fir

sixchars_phylo.png

Functional Programming (FP)

  • Structure computation to accomodate data
  • No side-effects

\[ \huge{f(x_{1}, \dots, x_{n}) \rightarrow (y_{1}, \dots, y_{n})} \]

  • Inspired by lambda calculus

\[ \Huge{\lambda} \]

FP Example

matrices <- foreach(i = 1:10) %do% 
    {rnorm(9) %>% matrix(nrow = 3, ncol = 3)} 
matrices[[1]]
          [,1]       [,2]       [,3]
[1,] 0.1612602 -0.5468579  1.0151041
[2,] 0.3100241  0.4903638 -0.7219488
[3,] 0.3076704 -0.9811351  0.5357476
Map(function(x) (x^2), matrices) %>% 
  {Reduce("+", .)}
         [,1]     [,2]      [,3]
[1,] 5.994196 4.321331 14.644285
[2,] 8.236990 6.695512 10.584092
[3,] 6.282488 9.951355  5.978112

FP and OOP are Orthogonal

scala-lang

FP and OOP in R

  • Not purely functional
    • ex. Random number generation
.Random.seed %>% head
[1]        403        180 -369443614 2027902397 1122624055 1608839390


  • Provides utilities for both FP and OOP
?funprog
??methods

R Philosophy

  • Functional object-oriented programming

“The chief motivation for introducing classes and functional methods to S was the initial application: fitting, examining and modifying diverse kinds of statistical models for data”

(Chambers, 2014)


  • Inspired by natural language used by statisticians
    • “Plot” the data may mean different things for different input

\[ \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} \quad \begin{bmatrix} x_{11} & \dots & x_{1n} \\ \vdots & \ddots & \vdots \\ x_{n1} & \dots & x_{nn} \end{bmatrix} \]

Method Dispatch

  • Generic functions call a specific method based on the class of the input argument.

Method Dispatch Graph

Common

  • Python
  • Java
  • C++
object.method(args)

Generic-Function

  • R
function(object, args)

S3 OO System

S3

  • Flexible
  • No formal class definitions (instance-based)
  • Most commonly used on CRAN

Generic Functions and Methods

  • S3 methods written in the form generic.class.
mean
function (x, ...) 
UseMethod("mean")
<bytecode: 0x559a5fc25a80>
<environment: namespace:base>
mean.Date
function (x, ...) 
structure(mean(unclass(x), ...), class = "Date")
<bytecode: 0x559a5fc8aed8>
<environment: namespace:base>


mean(numeric) calls mean.default

x <- c(4, 2, 2, 2, 1)
mean(x)
[1] 2.2

mean(Date) calls mean.Date

x <- rep(Sys.Date(), 4)
mean(x)
[1] "2016-09-23"

Writing a Method for a Generic Function

x <- list(2, 3, 4)
x
[[1]]
[1] 2

[[2]]
[1] 3

[[3]]
[1] 4
mean.list <- function(x) {Reduce("+", x) / length(x)}
mean(x)
[1] 3

Method Dispatch

  • Dispatches on first element of class vector.
e <- rnorm(1000) %>% ecdf 
# draw 1000 samples from N(0, 1)
# get ECDF
class(e)
[1] "ecdf"     "stepfun"  "function"


plot(e) # calls plot.ecdf()

plot of chunk unnamed-chunk-14

Identifying Generic Functions

  • Look for UseMethod in the definition
plot
function (x, y, ...) 
UseMethod("plot")
<bytecode: 0x559a5ddd02c0>
<environment: namespace:graphics>
  • Use pryr::ftype
ftype(plot)
[1] "s3"      "generic"
  • Show methods for a generic
methods(plot) %>% head
[1] "plot,ANY-method"    "plot,color-method"  "plot.acf"          
[4] "plot.data.frame"    "plot.decomposed.ts" "plot.default"      

Identifying a S3 Object

  • pryr::otype

  • factor

factor(1, 2) %>% otype
[1] "S3"
  • data.frame
data.frame(a = c(1, 2)) %>% otype
[1] "S3"

Creating a S3 Object

At initialization:

stock <- structure(list(), 
                   class = "stock")
class(stock)
[1] "stock"


After initialization

stock <- list()
class(stock) <- "stock"
class(stock)
[1] "stock"

Constructor Functions

  • Should have same name as object and return object of class
  • Provide guidelines for creating an object
#' Constructor function for stock S3 object
#' 
#' @param ticker Character vector of length one for ticker
#' @param price Numeric vector of length one for price
#' @return stock object with ticker and price attributes
#' 
stock <- function(ticker, price) {
  stopifnot(is.character(ticker), 
            is.numeric(price))
  structure(list(
                 ticker = ticker,
                 price = price),
            class = "stock")
}

Initializing a S3 Object with Constructor Function

my_stock <- stock(ticker = "TICKERX", 
                  price = 42.00)
class(my_stock)
[1] "stock"
pryr::otype(my_stock)
[1] "S3"

Simulate 1-D Random Walk

walk <- sample(c(-1, 1), 1000, TRUE) %>% cumsum 
qplot(x = 1:length(walk), y = walk, geom = "line") + 
  theme_bw() +
  ylab("Close (USD)") + xlab("Day") +
  ggtitle("Simulated Close for Security X")

plot of chunk unnamed-chunk-30

Writing Functions and Methods

  • Generic function
random_walk <- function(x) { UseMethod("random_walk")}
  • Method for one-dimensional random walk
random_walk.stock <- function(x) {
  walk <- sample(c(-1, 1), 1000, TRUE) %>% cumsum + x$price
  walk
}
random_walk(my_stock) %>% head
[1] 41 40 39 38 39 38

Default Methods

generic.default is called when no method is defined for an object

random_walk.default <- function(n=1000) {
  sample(c(-1, 1), n, TRUE) %>% cumsum
}
random_walk(1000) %>% head
[1]  1  0  1  0 -1 -2

Attributes can be added

names(my_stock)
[1] "ticker" "price" 
  • Call random_walk function and set result to new attribute
my_stock$walk <- random_walk(my_stock)
head(my_stock$walk)
[1] 41 40 41 40 39 38
names(my_stock)
[1] "ticker" "price"  "walk"  

Writing a Plot Method

#' Plot 1-dimensional Brownian motion simulated for a given security
#'
#' @param x A list with a numeric vector of length n named `walk`
#' @return 1-dimensional time series plot with base R graphics

plot.stock <- function(x) {
  plot(x$walk, type = "l", 
       main = paste("Simulated Random Walk for ", 
                    x$ticker, " Close"),
       ylab = "Price (USD)",
       xlab = "Day")
}

Writing a Plot Method

plot(my_stock)

plot of chunk unnamed-chunk-40

Simple Inheritance

a <- structure(list(), class = c("turkey", "bird"))
move <- function(x) { UseMethod("move")}

NextMethod dispatches on 2nd element in class vector

move.bird <- function(x) "fly"
move.turkey <- function(x) { NextMethod()}
move.default <- function(x) "hop"
move(a)
[1] "fly"

S4 System

S4 System

History

  • First implemented in S in 1998
  • Implemented in R in 2000

Characteristics

  • Possible attributes are immutable
  • Extends method dispatch to > 1 argument
  • Class-based

Formal, more overhead

Example of an S4 Object

An S4 object must have a name and list of attributes (slots)

library(Biobase)
load("data/eset.Rda")
class(eset)
[1] "ExpressionSet"
attr(,"package")
[1] "Biobase"
pryr::otype(eset)
[1] "S4"
isS4(eset)
[1] TRUE

Defining a S4 Class

  • Use setClass to define a S4 class
setClass("Security", 
         representation(
           price = "numeric")
         )
  • Use new to initialize a S4 object
securityx <- new("Security", price = 42.00)

Accessing a S4 Object

  • getSlots will show attributes and types
getSlots("Security")
    price 
"numeric" 
  • Use @ to access attributes
securityx@price
[1] 42

Defining a S4 Class

The prototype argument allows for default attributes

setClass("Security", 
         representation(
           price = "numeric"),
         prototype(price = NA_real_)
         )

Initializing a S4 Class with Default Arguments

Initialize object with new

securityx <- new("Security")
securityx
An object of class "Security"
Slot "price":
[1] NA

Ensuring Class Validity

  • Write a function that takes an object as output
    • If the object is valid, return TRUE
    • Else, return character vector with errors
validate_security <- function(object) {
  errors <- vector("character", 0)
  if (object@price < 0) {
    errors <- c(errors, "Stock must have price on the interval [0, Inf)")
  }
  if (length(object@price) > 1) {
    errors <- c(errors, "Stock must only have one closing price")
  }
  if (length(errors) == 0) TRUE else errors
}

Ensuring Class Validity

validity argument is set to the function to validate input

setClass("Security", 
         representation(
           price = "numeric"),
         prototype(price = NA_real_),
         validity = validate_security
         )
new("Security", price = c(-42.00, 1.00)) # raises exception

S4 Inheritance

  • contains argument allows for inheritance
setClass("Stock",
         representation(
           ticker = "character"
         ),
         contains = "Security")
stockx <- new("Stock", price = 42.00, 
              ticker = "tickerx")
  • Child class inherits slots from parent.
getSlots("Stock")
     ticker       price 
"character"   "numeric" 

S4 Inheritance

A security is a tradeable asset.

setClass("Security", representation(
  price = "numeric",
  name = "character"))

A stock is a tradeable unit of ownership in a company.

setClass("Stock", representation(
  ticker = "character",
  walk = "numeric"), 
  contains = "Security")

An option is a contract to trade a unit of ownership in the future.

setClass("Option", representation(
  expiration_date = "Date",
  spot_price = "numeric",
  strike_prike = "numeric"),
  contains = "Stock")

S4 Generic Functions

  • setGeneric creates new S4 generic function
    • valueClass specifies expected output type
    • standardGeneric if function does not yet exist
setGeneric("random_walk", valueClass = "numeric",
           function(object) {
               standardGeneric("random_walk")
})
  • setMethod creates a new S4 method
setMethod("random_walk", signature("Stock"), 
          function(object) {
              sample(c(-1, 1), 1000, TRUE) %>% 
              cumsum + object@price
})
[1] "random_walk"

S4 Method Dispatch

stockx <- new("Stock", ticker = "TICKERX", walk = vector("numeric", 0),
              price = 42.0)
random_walk(stockx) %>% head
[1] 43 42 43 42 43 44

S4 Multiple Dispatch

  • Method is chosen based on arbitrary combinations of arguments.
setGeneric("plot_security",
           function(stock, ndays) {
             standardGeneric("plot_security")
           })
[1] "plot_security"
setMethod("plot_security", 
          signature("Stock", "numeric"), 
          function(stock, ndays) {
            p <- qplot(x = seq(1, ndays), 
                       y = stock@walk[1:ndays], 
                       geom = "line") +
                ylab("Close (USD)") + xlab("Day") + 
                theme_bw() +
                ggtitle(paste0(
                  "Simulated 1-D Random Walk for ", stock@ticker))
            invisible(p)
})
[1] "plot_security"

S4 Plotting Method

p <- plot_security(stockx, 500)
p

plot of chunk unnamed-chunk-69

Notes on Method Dispatch in S4

  • Use missing in the signature to call method when argument is not provided
  • Use ANY for default method
  • When signature does not match the provided arguments, pick method most similar to provided arguments

Exposing S4 Attributes

  • Discourage users from using @ to access attributes
stockx@walk <- random_walk(stockx)
  • Instead, write accessors to expose attributes
#' Get ticker slot for S4 Stock object
#'
#' @param stock Stock object
#' @return ticker Character vector of length one

ticker <- function(stock) {
  stopifnot(class(stock)[1] == "Stock")
  stock@ticker
}
ticker(stockx)
[1] "TICKERX"

Mixing Systems

  • S4 Methods can be exposed to S3 classes with setOldClass("s3class")
  • Google says no, R asks why not?
setOldClass("stock")
setMethod("plot_security", signature("stock"),
          function(stock) {
            plot(x = 1:length(stock$walk), 
                 y = stock$walk, 
                 type = "l")
          })
[1] "plot_security"
plot_security(my_stock)

plot of chunk unnamed-chunk-74

Reference Classes

  • Encapsulated OOP
  • Mutable state
  • Objects implicitly available to methods

java-logo

S3 vs. S4

S3

  • Not as much software needs to be written
  • Reliability is not as important
  • Scope of data or methods is narrower or not clearly defined
  • Flexibility desired

S4

  • Lots of software required
  • Reliability and code safety is important
  • Scope of data or methods is wide
  • Code that utilizes existing Bioconductor functions

References

R Language Definition Chapter 5: Object-Oriented Programming
https://cran.r-project.org/

Advanced R Hadley Wickham Chapman & Hall/CRC The R Series ISBN-13: 978-1466586963 http://adv-r.had.co.nz/

Object-Oriented Programming, Functional Programming and R
John M. Chambers
2014 Vol. 29 No. 2 167-180
DOI: 10.1214/13-STS452
Retrieved from https://arxiv.org/pdf/1409.3531v1.pdf on 9/19/2016

Session Info

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 23 (Twenty Three)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] Biobase_2.32.0      BiocGenerics_0.18.0 foreach_1.4.3      
[4] ggplot2_2.1.0       pryr_0.1.2          magrittr_1.5       
[7] knitr_1.14         

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7      codetools_0.2-14 grid_3.3.1       plyr_1.8.4      
 [5] gtable_0.2.0     formatR_1.4      evaluate_0.9     scales_0.4.0    
 [9] stringi_1.1.1    labeling_0.3     iterators_1.0.8  tools_3.3.1     
[13] stringr_1.1.0    munsell_0.4.3    compiler_3.3.1   colorspace_1.2-6