Advanced R, Middling Vocab

This is my worksheet for exploring common R Language capabilities via common R vocabulary. We’re reviewing Chapter 4 of Hadley Wickham’s book Advanced R http://adv-r.had.co.nz/Vocabulary.html in STAT290, an R coding course at Stanford.

I really like the approach of this succinct chapter. It’s a great example of how to convey lean, sufficient information by knowing your intended audience… it is an example that should be noted by other authors seemlingly more interested in inflating their page count. Rather than regurgitate explanations and examples already available in R docs, help, and innumerable books, Hadley says “Hey, in order to code Advanced R, as a baseline here are some keywords (and a few libraries) that should ring familiar to your ears. If they don’t, go learn about them.” It’s a simple categorized R vocab list; a self-checklist.

My simple count tells me there are around 350 keywords in the list. In the spirit of Lean and Sufficient, I’m not going to regurgitate the full Chapter 4 vocab list here, which readily available at the link above. Instead what I’m doing in this worksheet is to explore those R keywords from Hadley’s list that I didn’t know about, in order to fill the knowledge gaps on my path to Advanced R coding!

So this is not a worksheet for the basics of R, nor on any of the Advanced R topics which come later in Hadley’s book… for me it’s a worksheet on specific keywords in that Middling ground I need to pave the path forward…

Bold print: A lot of the examples explored are “canonical examples” (read: copy-paste) from the R docs, but often tweaked, trimmed, further commented and showing output in order to quickly connect the dots when reviewing this list.

assign

An assignment function that lets you specify the environment, and whether assignment can take place up the tree (inherits)

e1 <- new.env(parent = baseenv())  # this one has enclosure package:base.
e2 <- new.env(parent = e1)
assign("a", 3, envir = e1)
ls.str(e1)

## a :  num 3

ls.str(e2)
exists("a", envir = e2)            # this succeeds by inheritance from e1

## [1] TRUE

exists("a", envir = e2, inherits = FALSE)

## [1] FALSE

exists("+", envir = e2)            # this succeeds by inheritance from global

## [1] TRUE

assign("a", 4, envir = e2, inherits=TRUE) # assign in parent, e1
ls.str(e1)                         # updated via assignment in child env

## a :  num 4

get

An object reference that lets you specify the environment to find it, and whether to look up the tree (inherits)

get("%o%")             # returns outer product operator

## function (X, Y) 
## outer(X, Y)
## <bytecode: 0x00000000064818a8>
## <environment: namespace:base>

# test mget (multiple get, which passes a char vector)
# create e1 so we can look in contained environment:
e1 <- new.env(parent = baseenv())                  
# elements are letters searched one-by-one! A and B not found, but c (combine) function found in global env
mget(letters[1:3], e1, ifnotfound = LETTERS[1:3], inherits = TRUE)

## $a
## [1] "A"
## 
## $b
## [1] "B"
## 
## $c
## function (..., recursive = FALSE)  .Primitive("c")

complete.cases

Returns a logical vector where TRUE means cases are complete, i.e., have no missing values across corresponding positional values. Args are vectors, matrices and/or data frames of same length.

x <- 1:6
y <- data.frame(c(1:5, NA))
complete.cases(x,y)

## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

is.finite

is.finite and is.infinite return a vector of the same length as x, indicating which elements are finite (not infinite and not missing) or infinite.

# also handy: is.infinite and is.nan (don't forget is.na of course)
paste(is.finite(NA), is.finite(NaN), is.finite(pi), is.nan(NaN), is.infinite(Inf), is.infinite(-Inf))

## [1] "FALSE FALSE TRUE TRUE TRUE TRUE"

rle

rle encodes equal values in a vector into two vectors: lengths and values (inverse.rle decodes rle back into a vector)

x <- rev(rep(6:10, 1:5))
x

##  [1] 10 10 10 10 10  9  9  9  9  8  8  8  7  7  6

rle(x)

## Run Length Encoding
##   lengths: int [1:5] 5 4 3 2 1
##   values : int [1:5] 10 9 8 7 6

invisible

Returns a (temporarily) invisible copy of an object. This is handy to have functions return values which can be assigned, but which do not print when they are not assigned.

f1 <- function(x) x
f2 <- function(x) invisible(x)
f1(1)  # prints

## [1] 1

f2(1)  # does not
y <- f2(2)
y      # but reappears in assigned variable!

## [1] 2

sweep

Returns an array of the same shape as the input array, with some function applied over elements, specifying MARGIN = the extents (dimensions) of the array, and STATS = the statistic to apply (typically as a vector of the same length as the dimension applied), and FUN = the function, which by default is subtraction. It seems similar to x-apply style functions, but has additional STATS parameter, and always returns an array of the same dimensions as input.

# a 4x3x2 array of 1:24
A <- array(1:24, dim = 4:2)  
# no recycle warnings in normal use: 5 is recycled and subtracted (the default FUN) from each element
sweep(A, 1, 5)

## , , 1
## 
##      [,1] [,2] [,3]
## [1,]   -4    0    4
## [2,]   -3    1    5
## [3,]   -2    2    6
## [4,]   -1    3    7
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]    8   12   16
## [2,]    9   13   17
## [3,]   10   14   18
## [4,]   11   15   19

seq_along

A variation of seq in which the arg along.with supplies the length of the sequence. A quick way to generate sequence numbers for the elements of an object.

# using A from the previous example
A <- array(1:24, dim = 4:2)  
seq_along(A)

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24

combn

Generate all combinations of the elements of x taken m at a time. See also: choose and expand.grid

combn(letters[1:4], 2)     # 4 choose 2.  Uses default FUN of identity. choose(4,2) gives the number of combintations.

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] "a"  "a"  "a"  "b"  "b"  "c" 
## [2,] "b"  "c"  "d"  "c"  "d"  "d"

(m <- combn(10, 5, min))   # minimum value in each combination

##   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##  [71] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [106] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [141] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [176] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## [211] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4
## [246] 4 5 5 5 5 5 6

tabulate

Added by JJ. Counts the number of times each integer occurs in a vector of positive integers. Bin count can be set.

tabulate(c(2,3,5))

## [1] 0 1 1 0 1

tabulate(c(2,3,3,5), nbins = 10)

##  [1] 0 1 2 0 1 0 0 0 0 0

tabulate(c(-2,0,2,3,3,5))             # -2 and 0 are ignored

## [1] 0 1 2 0 1

tabulate(c(-2,0,2,3,3,5), nbins = 3)  # -2, 0  and 5 are ignored

## [1] 0 1 2

split

Divides the data in x into the groups defined by f. Unsplit does the reverse.

# data frame example
g <- airquality$Month
summary(g)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   5.000   6.000   7.000   6.993   8.000   9.000

l <- split(airquality, g)  # split airquality data frame into monthly data frames
summary(l)

##   Length Class      Mode
## 5 6      data.frame list
## 6 6      data.frame list
## 7 6      data.frame list
## 8 6      data.frame list
## 9 6      data.frame list

expand.grid

Returns a data frame containing one row for each combination of the supplied factors.

grid <- expand.grid(height = seq(60, 80, 5), weight = seq(100, 300, 50), sex = c("Male","Female"))
head(grid, n=8)

##   height weight  sex
## 1     60    100 Male
## 2     65    100 Male
## 3     70    100 Male
## 4     75    100 Male
## 5     80    100 Male
## 6     60    150 Male
## 7     65    150 Male
## 8     70    150 Male

library(lubridate)

I didn’t have this library installed before I read this list, but now I do! No examples here, but I’ve learned that Lubridate provides tools that make it easier to parse and manipulate dates.

# run vignette("lubridate") to get useful examples

library(stringr)

My installed version of stingr (0.6.2) doesn’t have a nice vignette like lubridate, but does have a lot to offer. For example it processes factors and characters in the same way, and gives functions consistent names and arguments.

findInterval

Finds the index of one vector x in another, vec, where the latter must be non-decreasing. Used for distribution functions.

x <- 2:12
v <- c(5, 8, 11) # create two bins [5,8) and [8,11)  Use with tabulate() to plot a histogram...
cbind(x, findInterval(x, v))

##        x  
##  [1,]  2 0
##  [2,]  3 0
##  [3,]  4 0
##  [4,]  5 1
##  [5,]  6 1
##  [6,]  7 1
##  [7,]  8 2
##  [8,]  9 2
##  [9,] 10 2
## [10,] 11 3
## [11,] 12 3

interaction

Computes a factor which represents the per-element interaction of the given factors. Recycling can occur only if shorter vectors length are a multiple of longer vectors.

a <- gl(2, 4, 8)
b <- gl(2, 2, 8, labels = c("ctrl", "treat"))
s <- gl(2, 1, 4, labels = c("M", "F"))
interaction(a, b)

## [1] 1.ctrl  1.ctrl  1.treat 1.treat 2.ctrl  2.ctrl  2.treat 2.treat
## Levels: 1.ctrl 2.ctrl 1.treat 2.treat

interaction(a, b, s, sep = ":")

## [1] 1:ctrl:M  1:ctrl:F  1:treat:M 1:treat:F 2:ctrl:M  2:ctrl:F  2:treat:M
## [8] 2:treat:F
## 8 Levels: 1:ctrl:M 2:ctrl:M 1:treat:M 2:treat:M 1:ctrl:F ... 2:treat:F

gl

Added by JJ. Generate Factor Levels. See example above in the interaction topic…

aperm

Transpose an array by permuting its dimensions and optionally resizing it. The second arg, perm, is the subscript permutation vector, usually a permutation of the integers 1:n, where n is the number of dimensions of a. This is similar to the MARGIN parameter in the sweep function.

x  <- array(1:24, 4:2)    # default is by col, by row ...
xt <- aperm(x, c(2,1,3))  # transpose to by row, by col ...
xt

## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13   14   15   16
## [2,]   17   18   19   20
## [3,]   21   22   23   24

library(abind)

I didn’t have this library installed before I read this list, but now I do. Within the bind package, abind() is a function to combine multi-dimensional arrays. Using the along and rev.along args gives a lot of control over how the arrays are bound.

library(abind)
x <- matrix(1:12,3,4)
y <- x+100
dim(abind(x,y,along=0))     # binds on new dimension before first

## [1] 2 3 4

dim(abind(x,y,along=1))     # binds on first dimension

## [1] 6 4

dim(abind(x,y,along=1.5))   # inserts between dim 1 and 2

## [1] 3 2 4

# more: 
# dim(abind(x,y,along=2))
# dim(abind(x,y,along=3))
# dim(abind(x,y,rev.along=1)) # binds on last dimension
# dim(abind(x,y,rev.along=0)) # binds on new dimension after last

ftable

Creates “flat” contingency tables. Easy to rearrange the frequency dimensions.

## Start with a data frame.
x <- ftable(mtcars[c("cyl", "vs", "am", "gear")])
x

##           gear  3  4  5
## cyl vs am              
## 4   0  0        0  0  0
##        1        0  0  1
##     1  0        1  2  0
##        1        0  6  1
## 6   0  0        0  0  0
##        1        0  2  1
##     1  0        2  2  0
##        1        0  0  0
## 8   0  0       12  0  0
##        1        0  0  2
##     1  0        0  0  0
##        1        0  0  0

ftable(x, row.vars = c(2, 4))

##         cyl  4     6     8   
##         am   0  1  0  1  0  1
## vs gear                      
## 0  3         0  0  0  0 12  0
##    4         0  0  0  2  0  0
##    5         0  1  0  1  0  2
## 1  3         1  0  2  0  0  0
##    4         2  6  2  0  0  0
##    5         0  1  0  0  0  0

rstandard

influence.measures

hat

apropos(“\.test$”)

apropos

RSiteSearch

library(foreign)

I didn’t have this library installed before I read this list, but now I do.

library(downloader)