Subsetting Basics

There are a number of operators that can be used to extract subsets of R objects.

[ always returns an object of the same class as the original; can be used to select more than one element (there is one exception)

[[ is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame
$ is used to extract elements of a list or data frame by name; semantics are similar to that of [[.

x <- c("a", "b", "c", "c", "d", "a")
x[1]
## [1] "a"
x[1:4]
## [1] "a" "b" "c" "c"
 x[x > "a"]
## [1] "b" "c" "c" "d"
  u <- x > "a"
  u
## [1] FALSE  TRUE  TRUE  TRUE  TRUE FALSE
  x[u]
## [1] "b" "c" "c" "d"

Subsetting Lists

x <- list(foo = 1:4, bar = 0.6)
x[1]
## $foo
## [1] 1 2 3 4
x[[1]]
## [1] 1 2 3 4
x$bar
## [1] 0.6
x[["bar"]]
## [1] 0.6
x["bar"]
## $bar
## [1] 0.6

If extracting multiple elements, must use single brackets

 x <- list(foo = 1:4, bar = 0.6, baz = "hello")
 x[c(1,3)]
## $foo
## [1] 1 2 3 4
## 
## $baz
## [1] "hello"

The [[ operator can be used with computed indices; $ can only be used with literal names.

x <- list(foo = 1:4, bar = 0.6, baz = "hello")
name <- "foo"
x[[name]] ## computed index for ‘foo’
## [1] 1 2 3 4
x$name ## element ‘name’ doesn’t exist!
## NULL
x$foo## element ‘foo’ does exist
## [1] 1 2 3 4

Subsetting Nested Elements of a List

The [[ can take an integer sequence.

x <- list(a = list(10, 12, 14), b = c(3.14, 2.81))

x[[c(1, 3)]]
## [1] 14
x[[1]][[3]]
## [1] 14
x[[c(2,1)]]
## [1] 3.14

Subsetting a Matrix

Matrices can be subsetted in the usual way with (i,j) type indices.

x <- matrix(1:6, 2,3)
x[1,2]
## [1] 3
x[2,1]
## [1] 2

Indices can also be missing.

x[1, ]
## [1] 1 3 5
x[, 2]
## [1] 3 4

By default, when a single element of a matrix is retrieved, it is returned as a vector of length 1 rather than a 1 × 1 matrix. This behavior can be turned off by setting drop = FALSE

 x <- matrix(1:6, 2, 3)
x[1,2]
## [1] 3
x[1,2,drop=FALSE]
##      [,1]
## [1,]    3

Similarly, subsetting a single column or a single row will give you a vector, not a matrix

 x <- matrix(1:6, 2, 3)

x[1, ]
## [1] 1 3 5
x[1, ,drop=FALSE]
##      [,1] [,2] [,3]
## [1,]    1    3    5

Removing NA VAlues

 x <- c(1, 2, NA, 4, NA, 5)
bad <- is.na(x)
x[!bad]
## [1] 1 2 4 5

What if there are multiple things and you want to take the subset with no missing values?

x <- c(1, 2, NA, 4, NA, 5)
y <- c("a", "b", NA, "d", NA, "f")
good <- complete.cases(x, y)
good
## [1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE
x[good]
## [1] 1 2 4 5
y[good]
## [1] "a" "b" "d" "f"

Vecorized Operations

Many operations in R are vectorized making code more efficient, concise, and easier to read.

x <- 1:4; y <- 6:9
x + y
## [1]  7  9 11 13
x > 2
## [1] FALSE FALSE  TRUE  TRUE
x >= 2
## [1] FALSE  TRUE  TRUE  TRUE
y == 8
## [1] FALSE FALSE  TRUE FALSE
x * y
## [1]  6 14 24 36
x / y
## [1] 0.1666667 0.2857143 0.3750000 0.4444444
#------------------------
 x <- matrix(1:4, 2, 2); y <- matrix(rep(10, 4), 2, 2)
x
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
y
##      [,1] [,2]
## [1,]   10   10
## [2,]   10   10
x*y #element wise multiplication
##      [,1] [,2]
## [1,]   10   30
## [2,]   20   40
x/y
##      [,1] [,2]
## [1,]  0.1  0.3
## [2,]  0.2  0.4
x%%y #True matrix multiplication
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4