Chapter 4 Indexing

When one works with data objects, most tasks require to extract, alter or update parts of the data objects. Therefore, data objects are indexed by position, name, or logical identifier. Native data types (vectors, matrices, array) are indexed by <*object name*>[<*index*>].

4.1 Vectors

For vectors, indexing allows one to access and extract scalars and sub-vectors.

x <- c(5, 6, 7)
x[1]              # extract 1st entry --> result is a scalar
x[c(1,3)]         # extract 1st & 3rd entry --> result is a vector
x[4]              # a 4th entry does not exist
x[c(T,F,T)]       # indexing via logical vectors requires a logical index vector of the same length of the vector to be indexed
y <- x <= 6       # generate an indexing vector by logic evaluations
z <- which(y)     # retain inices of those entries --> z is integer vector
x[y]              # subvector
x[z]              # subvector

Another helpfull application of indexing is to sort and rearrange vectors:

x <- sample(x = 1:100, size = 10) # sample 10 values between 1 and 100 randomly
y <- order(x)     # retain indices of entries ascending order (1 to smallest entry, 2 to nd smallest entry, ...)
y[2]              # index of 2nd smallest entry of x
x[y]              # ordered vector
sort(x)           # same result obtained by sort function
z <- rank(x)      # z contains the ranks of the entries of x
which(z == 2)     # index of 2nd smallest entry of x

4.2 Matrices

Matrices have two dimensions and, thus, two indices are required to access an entry. The indices are seperated by ,, i.e. <*name matrix*>[<*index row*>, <*index column*>].

x <- cbind(1:6, 2:7, 3:8) # column-wise binding
x[1,1]                    # first row, first column (just one element)
x[1,]                     # first row (a vector)
x[,1]                     # first column (a vector)
x[1:2,]                   # first and second row (a matrix)
y <- rep( x = c(T,F,T), times = 2) # logical index vector
x[y,]                     # same as 
x[y,c(1,3)]               # mix row and column selections

Matrix entries can also be accessed by index tuples in the form of vectors, i.e., <*name matrix*>[<*index vector*>]

x <- matrix(1:16, ncol=4) # quadratic matrix 
y <- cbind(1:4, 1:4)      # indices of diagonal elements
x[y]                      # extract diagonal elements of x
diag(x)                   # ... or directly
z <- cbind(c(2,4,1), c(1,3,2))    # indices of diagonal elements
x[z]

When parts of objects are extracted R tries automatically to simply the resulting objects as far as possible (unless you instruct otherwise).

x <- matrix(1:16, ncol=4) # quadratic matrix 
y <- x[1,]                # extract 1st row
dim(y)                    # y is not a matrix anymore
is.matrix(y)              # check
is.numeric(y)             # but its a vector, thus ...
y[2]                        # works
y[,2]                     # doesn't works
# but you can say otherwise
y <- x[1, , drop=F]       # extract 1st row as 1,4-matrix ()
dim(y)                    # so y has two dimensions
is.numeric(y)             # y can also be used as  vector
is.matrix(y)              # ... and as a vector, thus ...
y[2]                      # ... works ...
y[,2]                     # ... works too

4.3 Arrays

Arrays are multi-dimensional matrices. Indexing works analoguesly to matrices, but with some dimensions more:

x <- array(1:12, dim=c(2,2,3))  # array in 3 dimensions (cube) with 2*2*3 entries
y <- x[1,,]                     # pick first row in each submatrix -> upper floor of cube
y <- x[,2,]                     # pick second column in each submatrix -> right wing of cube
y <- x[,,3]                     # pick last layer -> top floor
dim(y)                          # y is a matrix
# or you can keep the array type. 
y <- x[,,3, drop=F] 
dim(y)

4.4 Lists, data frame and tibbles

Lists, data frames tibbles usually assign names to their components. In case of data frames and tibbles these components are always columns. To access a named component the general syntax is <*object name*>$<*index name*>. To query the components’ names of an object use names(). For lists you can alternatively access a single components analogously to vectors with <*list name*>[[<*component index*>]] or <*list name*>[[<*component name*>]]. For multiple componts, simple brackets are used, i.e. <*list name*>[<*component names*>].

df1 <- data.frame(one = 1:3, two = c("nest","test","fest") )
names(df1)        # names of both columns
df1$one           # access column one

tb1 <- tibble(one = 1:3, two = c("nest","test","fest")) 
tb1$two           # works too

l1 <- list(a = 1:3, b = "nest", c = TRUE)
l1$a              # this is equivalent to...
l1[[1]]           # ...this...
l1[["a"]]         # ...and this
l1[c(1,3)]        # for multiple components
l1[c("a","c")]        # for multiple components