Introduction
- In contrast to a vector, in which all elements must be of the same mode, R’s list structure can combine objects of different types.
- For those familiar with Python, an R list is similar to a Python dictionary or, for that matter, a Perl hash. C programmers may find it similar to a C struct. The list plays a central role in R, forming the basis for data frames, object-oriented programming, and so on.
Creating Lists
- List is a vector but not ordinary or atomic, rather it is a recursive one.
- For our first look at lists, let’s consider an employee database. For each employee, we wish to store the name, salary, and a Boolean indicating union membership. Since we have three different modes here—character, numeric, and logical—it’s a perfect place for using lists. Our entire database might then be a list of lists, or some other kind of list such as a data frame, though we won’t pursue that here.
- We could create a list to represent our employee, Joe, this way:
j <- list(name="Joe", salary=55000, union=T)
- We could print out j, either in full or by component:
## $name
## [1] "Joe"
##
## $salary
## [1] 55000
##
## $union
## [1] TRUE
Actually, the component names—called tags in the R literature—such as salary are optional. We could alternatively do this:
jalt <- list("Joe", 55000, T)
jalt
## [[1]]
## [1] "Joe"
##
## [[2]]
## [1] 55000
##
## [[3]]
## [1] TRUE
- However, it is generally considered clearer and less error-prone to use names instead of numeric indices.
- Names of list components can be abbreviated to whatever extent is possible without causing ambiguity:
## [1] 55000
Since lists are vectors, they can be created via vector():
z <- vector(mode="list")
z[["abc"]] <- 3
z
## $abc
## [1] 3
General List Operations
- Now that you’ve seen a simple example of creating a list, let’s look at how to access and work with lists.
List Indexing
You can access a list component in several different ways:
## [1] 55000
## [1] 55000
## [1] 55000
We can refer to list components by their numerical indices, treating the list as a vector. However, note that in this case, we use double brackets instead of single ones.
So, there are three ways to access an individual component c of a list lst and return it in the data type of c:
- lst$c
- lst[[“c”]]
- lst[[i]], where i is the index of c within lst
Each of these is useful in different contexts, as you will see in subsequent examples. But note the qualifying phrase, “return it in the data type of c.” An alternative to the second and third techniques listed is to use single brackets rather than double brackets:
- lst[“c”]
- lst[i], where i is the index of c within lst
Both single-bracket and double-bracket indexing access list elements in vector-index fashion. But there is an important difference from ordinary (atomic) vector indexing. If single brackets [ ] are used, the result is another list—a sublist of the original. For instance, continuing the preceding example, we have this:
## $name
## [1] "Joe"
##
## $salary
## [1] 55000
## $salary
## [1] 55000
## List of 1
## $ salary: num 55000
- The subsetting operation returned another list consisting of the first two components of the original list j. Note that the word returned makes sense here, since index brackets are functions. - This is similar to other cases you’ve seen for operators that do not at first appear to be functions, such as +.
- By contrast, you can use double brackets [[ ]] for referencing only a single component, with the result having the type of that component.
j[[1:2]] # Error in j[[1:2]] : subscript out of bounds
## [1] 55000
## [1] "numeric"
Adding and Deleting List Elements
- The operations of adding and deleting list elements arise in a surprising number of contexts. This is especially true for data structures in which lists form the foundation, such as data frames and R classes.
- New components can be added after a list is created.
z <- list(a="abc",b=12)
z
## $a
## [1] "abc"
##
## $b
## [1] 12
z$c <- "sailing" # add a c component
# did c really get added?
z
## $a
## [1] "abc"
##
## $b
## [1] 12
##
## $c
## [1] "sailing"
- Adding components can also be done via a vector index:
z[[4]] <- 28
z[5:7] <- c(FALSE,TRUE,TRUE)
z
## $a
## [1] "abc"
##
## $b
## [1] 12
##
## $c
## [1] "sailing"
##
## [[4]]
## [1] 28
##
## [[5]]
## [1] FALSE
##
## [[6]]
## [1] TRUE
##
## [[7]]
## [1] TRUE
- You can delete a list component by setting it to NULL.
## $a
## [1] "abc"
##
## $c
## [1] "sailing"
##
## [[3]]
## [1] 28
##
## [[4]]
## [1] FALSE
##
## [[5]]
## [1] TRUE
##
## [[6]]
## [1] TRUE
- Note that upon deleting z$b, the indices of the elements after it moved up by 1. For instance, the former z[[4]] became z[[3]].
- You can also concatenate lists.
c(list("Joe", 55000, T),list(5))
## [[1]]
## [1] "Joe"
##
## [[2]]
## [1] 55000
##
## [[3]]
## [1] TRUE
##
## [[4]]
## [1] 5
Getting the Size of a List
- Since a list is a vector, you can obtain the number of components in a list via length().
## [1] 3
Accessing List Components and Values
- If the components in a list do have tags, as is the case with name, salary, and union for j in the previous section, you can obtain them via names():
## [1] "name" "salary" "union"
- To obtain the values, use unlist():
## name salary union
## "Joe" "55000" "TRUE"
## [1] "character"
- The return value of unlist() is a vector—in this case, a vector of character strings. Note that the element names in this vector come from the components in the original list. On the other hand, if we were to start with numbers, we would get numbers.
z <- list(a=5,b=12,c=13)
y <- unlist(z)
class(y)
## [1] "numeric"
## a b c
## 5 12 13
- So the output of unlist() in this case was a numeric vector. What about a mixed case?
w <- list(a=5,b="xyz")
wu <- unlist(w)
class(wu)
## [1] "character"
## a b
## "5" "xyz"
Here, R chose the least common denominator: character strings. This sounds like some kind of precedence structure, and it is. As R’s help for unlist() states:
- Where possible the list components are coerced to a common mode during the unlisting, and so the result often ends up as a character vector. Vectors will be coerced to the highest type of the components in the hierarchy NULL < raw < logical < integer < real < complex < character < list < expression: pairlists are treated as lists.
But there is something else to deal with here. Though wu is a vector and not a list, R did give each of the elements a name. We can remove them by setting their names to NULL, as you saw in previous lessons
## [1] "5" "xyz"
- We can also remove the elements’ names directly with unname(), as follows:
## [1] "5" "xyz"
- This also has the advantage of not destroying the names in wu, in case they are needed later. If they will not be needed later, we could simply assign back to wu instead of to wun in the preceding statement.
Applying Functions to Lists
- Two functions are handy for applying functions to lists: lapply and sapply.
Using the lapply() and sapply() Functions
- The function lapply() (for list apply) works like the matrix apply() function, calling the specified function on each component of a list (or vector coerced to a list) and returning another list. Here’s an example:
lapply(list(1:3,25:29),median)
## [[1]]
## [1] 2
##
## [[2]]
## [1] 27
- R applied median() to 1:3 and to 25:29, returning a list consisting of 2 and 27. In some cases, such as the example here, the list returned by lapply() could be simplified to a vector or matrix. This is exactly what sapply() (for simplified [l]apply) does.
sapply(list(1:3,25:29),median)
## [1] 2 27
- You saw an example of matrix output in previous sections. There, we applied a vectorized, vector-valued function—a function whose return value is a vector, each of whose components is vectorized— to a vector input. Using sapply(), rather than applying the function directly, gave us the desired matrix form in the output.
Example: Back to the Abalone Data
Let’s use the lapply() function in our abalone gender example. Recall that at one point in that example, we wished to know the indices of the observations that were male, female, and infant. For an easy demonstration, let’s use the same test case: a vector of genders.
g <- c("M","F","F","I","M","M","F")
- A more compact way of accomplishing our goal is as follows:
lapply(c("M","F","I"),function(gender) which(g==gender))
## [[1]]
## [1] 1 5 6
##
## [[2]]
## [1] 2 3 7
##
## [[3]]
## [1] 4
- The lapply() function expects its first argument to be a list. Here it was a vector, but lapply() will coerce that vector to a list form. Also, lapply() expects its second argument to be a function. This could be the name of a function, as you saw before, or the actual code, as we have here. Then lapply() calls that anonymous function on “M”, then on “F”, and then on “I”. In that first case, the function calculates which(g==“M”), giving us the vector of indices in g of the males. After determining the indices for the females and infants, lapply() will return the three vectors in a list.
- Note that even though the object of our main attention is the vector g of genders, it is not the first argument in the lapply() call in the example. Instead, that argument is an innocuous-looking vector of the three possible gender encodings. By contrast, g is mentioned only briefly in the function, as the second actual argument. This is a common situation in R.
Recursive Lists
Lists can be recursive, meaning that you can have lists within lists. Here’s an example:
b <- list(u = 5, v = 12)
c <- list(w = 13)
a <- list(b,c)
a
## [[1]]
## [[1]]$u
## [1] 5
##
## [[1]]$v
## [1] 12
##
##
## [[2]]
## [[2]]$w
## [1] 13
## [1] 2
- This code makes a into a two-component list, with each component itself also being a list. The concatenate function c() has an optional argument recursive, which controls whether flattening occurs when recursive lists are combined.
c(list(a=1,b=2,c=list(d=5,e=9)))
## $a
## [1] 1
##
## $b
## [1] 2
##
## $c
## $c$d
## [1] 5
##
## $c$e
## [1] 9
c(list(a=1,b=2,c=list(d=5,e=9)),recursive=T)
## a b c.d c.e
## 1 2 5 9
- In the first case, we accepted the default value of recursive, which is FALSE, and obtained a recursive list, with the c component of the main list itself being another list. In the second call, with recursive set to TRUE, we got a single list as a result; only the names look recursive. (It’s odd that setting recursive to TRUE gives a nonrecursive list.)
- Recall that our first example of lists consisted of an employee database. I mentioned that since each employee was represented as a list, the entire database would be a list of lists. That is a concrete example of recursive lists.