Lists
Julius Schmid
In contrast to a vector, in which all elements must be of the same
mode, R’s list structure can combine objects of different types.
Creating Lists
Technically, a list is a vector. Ordinary vectors—those of the type
we’ve been using so far in this book—are termed atomic vectors, since
their components cannot be broken down into smaller components. In
contrast, lists are referred to as recursive vectors.
For our first look at lists, let’s consider an employee database. For
each employee, we wish to store the name, salary, and a Boolean
indicating union membership. Since we have three different modes
here—character, numeric, and logical — it’s a perfect place for using
lists. Our entire database might then be a list of lists, or some other
kind of list such as a data frame, though we won’t pursue that here. We
could create a list to represent our employee, Marco, this way:
j <- list(name="Marco", salary=72000, union=T)
We could print out j, either in full or by component:
Start by print j out in full. This is done by just entering the list
name j:
j
$name
[1] "Marco"
$salary
[1] 72000
$union
[1] TRUE
Now we return the different list entries individually. We do this in
the same way, as if we would access a column of a data frame, running
list_name$entry.
- Return the name entry:
j$name
[1] "Marco"
- Return the salary entry:
j$salary
[1] 72000
- Return the union entry:
j$union
[1] TRUE
General List Operations
Now that you’ve seen a simple example of creating a list, let’s look
at how to access and work with lists.
List Indexing
You can access a list component in several different ways. The first
method is the one we have seen above.
j$salary
[1] 72000
The second method is the syntax list_name[[“entry”]]:
j[["salary"]]
[1] 72000
The third way is using the index of the index of the entry that is to
be returned. In this case, salary is the second entry, hence we return
the entry with index 2:
j[[2]]
[1] 72000
We can refer to list components by their numerical indices, treating
the list as a vector. However, note that in this case, we use double
brackets instead of single ones.
In summary, there are three ways to access an individual component c
of a list lst and return it in the data type of c: list_name$entry,
list_name[[“entry”]], and list_name[[index]].
Adding and Deleting List Elements
The operations of adding and deleting list elements arise in a
surprising number of contexts. This is especially true for data
structures in which lists form the foundation, such as data frames and R
classes.
New components can be added after a list is created.
z <- list(first="character",second=5)
z
$first
[1] "character"
$second
[1] 5
We can add an additional entry by assigning a value to a yet unnamed
list entry:
z$third <- "soccer"
z
$first
[1] "character"
$second
[1] 5
$third
[1] "soccer"
We confirm that a third element was added to the list named
third.
Adding components can also be done via a vector index:
z[[4]] <- 32
z[5:7] <- c(FALSE,TRUE,TRUE)
z
$first
[1] "character"
$second
[1] 5
$third
[1] "soccer"
[[4]]
[1] 32
[[5]]
[1] FALSE
[[6]]
[1] TRUE
[[7]]
[1] TRUE
You can also delete a list component by setting it to NULL.
z$second <- NULL
z
$first
[1] "character"
$third
[1] "soccer"
[[3]]
[1] 32
[[4]]
[1] FALSE
[[5]]
[1] TRUE
[[6]]
[1] TRUE
You can also concatenate lists.
c(list("Lucas", 83000, T),list(5))
[[1]]
[1] "Lucas"
[[2]]
[1] 83000
[[3]]
[1] TRUE
[[4]]
[1] 5
Getting the Size of a List
Since a list is a vector, you can obtain the number of components in
a list via length().
length(j)
[1] 3
length(z)
[1] 6
Accessing List Components and Values
If the components in a list do have tags, as is the case with name,
salary, and union for j, you can obtain them via names():
names(j)
[1] "name" "salary" "union"
To obtain the values, use unlist():
ulj <- unlist(j)
ulj
name salary union
"Marco" "72000" "TRUE"
We apply the class() function to return the data type of the
list:
class(ulj)
[1] "character"
On the other hand, if all the components are numbers, we would get
numeric as an output.
z <- list(a=8,b=16,c=5)
class(y)
[1] "numeric"
We can apply the unlist() function, to not interpret the variable as
a list:
y <- unlist(z)
y
a b c
8 16 5
Same as above, we apply the class() function for the unlisted y, and
we get the same output as if we apply the class() function to the list
y:
class(y)
[1] "numeric"
So the output of unlist() in this case was a numeric vector. What
about a mixed case?
w <- list(a=52,b="random character")
wu <- unlist(w)
class(wu)
[1] "character"
Since there is one character element in the array (“random
character”), all components are interpreted as characters.
Let us print out the unlisted wu:
wu
a b
"52" "random character"
Both 52 and random character are interpreted as character
strings.
Applying Functions to Lists
Two functions are handy for applying functions to lists: lapply and
sapply.
Using the lapply() and sapply()
Functions
The function lapply() (for list apply) works like the matrix apply()
function, calling the specified function on each component of a list (or
vector coerced to a list) and returning another list. Here’s an
example:
lapply(list(5:7,25:35),median)
[[1]]
[1] 6
[[2]]
[1] 30
R applied median() to 5:7 and to 25:35, returning a list consisting
of 6 and 30. In some cases, such as the example here, the list returned
by lapply() could be simplified to a vector or matrix. This is exactly
what sapply() (for simplified [l]apply) does.
sapply(list(5:7,25:35),median)
[1] 6 30
Using sapply(), rather than applying the function directly, gave us
the desired matrix form in the output.
Extended Example: Back to the Abalone
Data
Let’s use the lapply() function in our abalone gender example. Recall
that at one point in that example, we wished to know the indices of the
observations that were male, female, and infant. For an easy
demonstration, let’s use the same test case: a vector of genders.
g <- c("M","I","F","I","M","F","F")
lapply(c("F","M","I"),function(gender) which(g==gender))
[[1]]
[1] 3 6 7
[[2]]
[1] 1 5
[[3]]
[1] 2 4
The lapply() function expects its first argument to be a list. Here
it was a vector, but lapply() will coerce that vector to a list form.
Also, lapply() expects its second argument to be a function. This could
be the name of a function, as you saw before, or the actual code, as we
have here. This is an anonymous function, which you’ll learn more about
in Section 7.13.
Then lapply() calls that anonymous function on “F”, then on “M”, and
then on “I”. In that first case, the function calculates which(g==“F”),
giving us the vector of indices in g of the females. After determining
the indices for the males and infants, lapply() will return the three
vectors in a list.
Note that even though the object of our main attention is the vector
g of genders, it is not the first argument in the lapply() call in the
example. Instead, that argument is an innocuous-looking vector of the
three possible gender encodings. By contrast, g is mentioned only
briefly in the function, as the second actual argument. This is a common
situation in R. An even better way to do this will be presented in
Section 6.2.2.
Recursive Lists
Lists can be recursive, meaning that you can have lists within lists.
Here’s an example:
b <- list(u = 5, v = 12)
c <- list(w = 13)
a <- list(b,c)
The list b has two entries, the list c has one entry. a then combines
lists b and c in a new list of lists. Let us see what this looks like
when we return it:
a
[[1]]
[[1]]$u
[1] 5
[[1]]$v
[1] 12
[[2]]
[[2]]$w
[1] 13
The two lists b ([[1]]) and c([[2]]) are returned, where b has two
components. Let us print the length of a:
length(a)
[1] 2
The length of a is 2. This is because a contains 2 lists.
The code makes a into a two-component list, with each component
itself also being a list. The concatenate function c() has an optional
argument recursive, which controls whether flattening occurs when
recursive lists are combined.
Now let us create a list p with 3 entries, where one component is a
list itself:
p <- c(list(a=4,b=1,c=list(d=25,e=7)))
p
$a
[1] 4
$b
[1] 1
$c
$c$d
[1] 25
$c$e
[1] 7
The length of p is 3, since p has three entries:
length(p)
[1] 3
However, if we set the optional parameter recursive to TRUE, our list
has 4 components.
p2 = c(list(a=4,b=1,c=list(d=25,e=7)),recursive=T)
p2
a b c.d c.e
4 1 25 7
We confirm this by applying the length() function again:
length(p2)
[1] 4
In the first case, we accepted the default value of recursive, which
is FALSE, and obtained a recursive list, with the c component of the
main list itself being another list. In the second call, with recursive
set to TRUE, we got a single list as a result; only the names look
recursive. (It’s odd that setting recursive to TRUE gives a nonrecursive
list.) Recall that our first example of lists consisted of an employee
database. I mentioned that since each employee was represented as a
list, the entire database would be a list of lists. That is a concrete
example of recursive lists.
