R Quirks and Tips from Hadley Wickham’s Advanced R

This material is a really short extract from Advanced R by Hadley Wickham. It contains some things that I believe are counterintuitive or simply useful. I suggest you to read this book first, though.

Data Structures

  1. is.vector() does not test if an object is a vector. Instead it returns TRUE only if the object is a vector with no attributes apart from names. Use is.atomic(x) || is.list(x) to test if an object is actually a vector.”

  2. is.numeric() is a general test for the “numberliness” of a vector and returns TRUE for both integer and double vectors. It is not a specific test for double vectors, which are often called numeric.” (is.double() is).

  3. How factors behave:

(f1 <- factor(letters))
##  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
levels(f1) <- rev(levels(f1))
f1
##  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
rev(factor(letters))
##  [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
factor(letters, levels = rev(letters))
##  [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a

Subsetting

  1. Real numbers are silently truncated to integers:
x <- 1:4
x[c(2.1, 2.9)]
## [1] 2 2
  1. Lookup tables with 1) match() and 2) rownames:
grades <- c(1, 2, 2, 3, 1)

(info <- data.frame(
  grade = 3:1,
  desc = c("Excellent", "Good", "Poor"),
  fail = c(F, F, T)
))
##   grade      desc  fail
## 1     3 Excellent FALSE
## 2     2      Good FALSE
## 3     1      Poor  TRUE
id <- match(grades, info$grade)
info[id, ]
##     grade      desc  fail
## 3       1      Poor  TRUE
## 2       2      Good FALSE
## 2.1     2      Good FALSE
## 1       3 Excellent FALSE
## 3.1     1      Poor  TRUE
rownames(info) <- info$grade
info[as.character(grades), ]
##     grade      desc  fail
## 1       1      Poor  TRUE
## 2       2      Good FALSE
## 2.1     2      Good FALSE
## 3       3 Excellent FALSE
## 1.1     1      Poor  TRUE

Vocabulary

Some functions and packages one might forget about but they worth to be kept in mind:

  1. Basics
    assign
    all.equal, identical
    cummax, cummin, cumprod, cumsum, diff
    rle
    missing, on.exit, invisible
    intersect, union, setdiff, setequal
    sweep
    rep_len, seq_len, combn
    expand.grid
    next - control flow
    replicate

  2. Data manipulation
    library(lubridate) - keeps you from hell when you work with dates
    agrep - approximate string matching
    chartr
    library(stringr)
    findInterval
    nlevels, reorder, relevel, interaction
    library(abind)

  3. Statistics
    logLik
    apropos("\\.test$")
    crossprod, tcrossprod
    eigen, qr, svd
    %*%, %o%, outer
    rcond

  4. Working with R
    recover
    options(error = )
    tryCatch, try

  5. Input and Output
    dput
    format
    count.fields
    capture.output
    readLines, writeLines
    dir
    basename, dirname, tools::file_ext
    file.path
    path.expand, normalizePath
    file.choose
    file.copy, file.create, file.remove, file.rename, dir.create
    file.exists, file.info
    tempdir, tempfile

OOP field guide

There are three of OO systems, four if you count base types (described in Data Structures). is.object(x) returns FALSE if it’s a base type. See also ?typeof.

1. S3 system

  • Check whether it’s S3: is.object(x) & !is.S4(x), pryr::otype, pryr::ftype.
  • See all S3 methods: methods("generic") for base package, getS3method("generic"), getS3method(class="ts").

  • Creating objects:

foo <- structure(list(), class="foo")  
# is equal to
foo <- list()
class(foo) <- "foo"

class(foo)
## [1] "foo"
inherits(foo, "foo")
## [1] TRUE

Constructor function is the only way to check for type safety:

foo <- function(x) {
  if(!is.numeric(x)) stop("x must be numeric")
  structure(list(), class="foo")
}
  • Creating generics:
f <- function(x) UseMethod("f")
  • New methods for generics:
f.foo <- function(x) "class foo"
f(foo)
## [1] "class foo"
# Works for existing functions as well:
mean.foo <- function(x) "foo"
mean(foo) # No check for class compatibility b/ generic and its method.
## [1] "foo"
  • Don’t call methods directly unless you have a very good reason. (Considerable performance improvements, to be exact).

  • “You can also call an S3 generic with a non-S3 object. Non-internal S3 generics will dispatch on the implicit class of base types. (Internal generics don’t do that for performance reasons.)”

  • See also: ?"internal generic", ?groupGeneric.

2. S4 system

  • “…it’s a good idea to include an explicit library(methods) whenever you’re using S4”.

  • How to recognise: str(), isS4(), pryr::otype().

  • “Use is() with one argument to list all classes that an object inherits from. Use is() with two arguments to test if an object inherits from a specific class”.

  • List S4 generics, classes and methods with getGenerics(), getClasses() and showMethods().
  • Documentation for an S4 class: class?mle or whichever you need.

Defining classes

  • Key properties of S4 class is a name, a list of slots and contains. Optional are validity and prototype.

  • “In slots and contains you can use S4 classes, S3 classes registered with setOldClass(), or the implicit class of a base type. In slots you can also use the special class ANY which does not restrict the input”.

The new class “Person” w/ slots name and age:

setClass("Person",
  slots = list(name = "character", age = "numeric"))

Class “Employee” inherits from “Person” and adds a new slot:

setClass("Employee",
  slots = list(boss = "Person"),
  contains = "Person")

Call new() to create an object (or use a constructor function if exists)

alice <- new("Person", name = "Alice", age = 40)
john <- new("Employee", name = "John", age = 20, boss = alice)

Creating functions and methods

setGeneric("union")                           ## what if we want to extend it to data frames

setMethod("union",                            ## name of generic
  c(x = "data.frame", y = "data.frame"),      ## classes to associate a method with
  function(x, y) {                            ##
    unique(rbind(x, y))                       ## body of the function
  }

New generic from scratch:

setGeneric("myGeneric", function(x) {
  standardGeneric("myGeneric")               ## S4 equivalent to UseMethod()
})
## [1] "myGeneric"

Method dispatch:

  • special class ANY
  • showMethods() is like S3 methods()
  • selectMethod() - which method gets called?
  • callNextMethod() is like S3 NextMethod()
  • ?S4GroupGenerics

3. Reference Classes

Defining classes

Account <- setRefClass("Account",
  fields = list(balance = "numeric"))

Creating objects

a <- Account$new(balance = 100)
a$balance
## [1] 100
a$balance <- 200
a$balance
## [1] 200

RC objects are mutable:

b <- a
b$balance
## [1] 200
a$balance <- 0
b$balance
## [1] 0

One deals with that with a copy method, like this: c <- a$copy().

Methods

Account <- setRefClass("Account",
  fields = list(balance = "numeric"),
  ## contains = ...,  This is name of the parent RC class to inherit behaviour from.
  methods = list(
    withdraw = function(x) {
      balance <<- balance - x
    },
    deposit = function(x) {
      balance <<- balance + x
    }
  )
)

## To call an RC method:
a <- Account$new(balance = 100)
a$deposit(100)
a$balance
## [1] 200