A matrix is a vector with two additional attributes: the number of rows and the number of columns. Since matrices are vectors, they also have modes, such as numeric and character. (On the other hand, vectors are not onecolumn or one-row matrices.)
Matrices are special cases of a more general R type of object: arrays.Arrays can be multidimensional. For example, a three-dimensional array would consist of rows, columns, and layers, not just rows and columns as in the matrix case. Most of this lesson will concern matrices, but we will briefly discuss higher-dimensional arrays in the final section.
Much of R’s power comes from the various operations you can perform on matrices. We’ll cover these operations in this chapter, especially those analogous to vector subsetting and vectorization.
Matrix row and column subscripts begin with 1. For example, the upper-left corner of the matrix a is denoted a[1,1]. The internal storage of a matrix is in column-major order, meaning that first all of column 1 is stored, then all of column 2, and so on.
One way to create a matrix is by using the matrix() function:
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
Here, we concatenate what we intend as the first column, the numbers 1 and 2, with what we intend as the second column, 3 and 4. So, our data is (1,2,3,4).
Next, we specify the number of rows and columns. The fact that R uses column-major order then determines where these four numbers are put within the matrix.
Since we specified the matrix entries in the preceding example, and there were four of them, we did not need to specify both ncol and nrow; just nrow or ncol would have been enough. Having four elements in all, in two rows, implies two columns:
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
## [1] 3 4
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
Note that we do need to warn R ahead of time that y will be a matrix and give the number of rows and columns.
You can set the byrow argument in matrix() to true to indicate that the data is coming in row-major order. Here’s an example of using byrow:
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
Note that the matrix is still stored in column-major order. The byrow argument enabled only our input to come in row-major form.
We’ll look at some common operations performed with matrices. These include performing linear algebra operations, matrix indexing, and matrix filtering.
You can perform various linear algebra operations on matrices, such as matrix multiplication, matrix scalar multiplication, and matrix addition.
## [,1] [,2]
## [1,] 7 15
## [2,] 10 22
## [,1] [,2]
## [1,] 3 9
## [2,] 6 12
## [,1] [,2]
## [1,] 2 6
## [2,] 4 8
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 2 1 0
## [3,] 3 0 1
## [4,] 4 0 0
## [,1] [,2]
## [1,] 1 1
## [2,] 1 0
## [3,] 0 1
## [4,] 0 0
## [,1] [,2]
## [1,] 21 22
## [2,] 31 32
## [1] 22 32
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [,1] [,2]
## [1,] 1 8
## [2,] 2 5
## [3,] 1 12
## [,1] [,2]
## [1,] 4 2
## [2,] 5 3
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA 4 2
## [3,] NA 5 3
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [,1] [,2]
## [1,] 1 4
## [2,] 3 6
Image files are inherently matrices, since the pixels are arranged in rows and columns. If we have a grayscale image, for each pixel, we store the intensity— the brightness–of the image at that pixel. So, the intensity of a pixel in, say, row 28 and column 88 of the image is stored in row 28, column 88 of the matrix. For a color image, three matrices are stored, with intensities for red, green, and blue components, but we’ll stick to grayscale here.
For our example, let’s consider an image of the Mount Rushmore National Memorial in the United States(mtrush1.pgm). Make sure to download this image and save it in your working directory so that the coding will work.Let’s read it in, using the pixmap library.
## Pixmap image
## Type : pixmapGrey
## Size : 194x259
## Resolution : 1x1
## Bounding box : 0 0 259 194
We read in the file named mtrush1.pgm,returning an object of class pixmap, then we plot it.
Now, let’s see what this class consists of:
## Formal class 'pixmapGrey' [package "pixmap"] with 6 slots
## ..@ grey : num [1:194, 1:259] 0.278 0.263 0.239 0.212 0.192 ...
## ..@ channels: chr "grey"
## ..@ size : int [1:2] 194 259
## ..@ cellres : num [1:2] 1 1
## ..@ bbox : num [1:4] 0 0 259 194
## ..@ bbcent : logi FALSE
## [1] 0.7960784
## [,1] [,2]
## [1,] 1 2
## [2,] 2 3
## [3,] 3 4
## [,1] [,2]
## [1,] 2 3
## [2,] 3 4
## [1] FALSE TRUE TRUE
## [,1] [,2]
## [1,] 2 3
## [2,] 3 4
## [,1] [,2]
## [1,] 1 2
## [2,] 2 3
## [3,] 3 4
## [,1] [,2]
## [1,] 2 3
## [2,] 3 4
For performance purposes, it’s worth noting again that the computation of j here is a completely vectorized operation, since all of the following are true:
Also note that even though j was defined in terms of x and then was used to extract from x, it did not need to be that way. The filtering criterion can be based on a variable separate from the one to which the filtering will be applied. Here’s an example with the same x as above:
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
Here, the expression z %% 2 == 1 tests each element of z for being an odd number, thus yielding (TRUE,FALSE,TRUE). As a result, we extracted the first and third rows of x. Here is another example:
## [1] 3 6
We’re using the same principle here, but with a slightly more complex set of conditions for row extraction. (Column extraction, or more generally, extraction of any submatrix, is similar.)
First, the expression m[,1] > 1 compares each element of the first column of m to 1 and returns (FALSE,TRUE,TRUE). The second expression, m[,2] > 5, similarly returns (FALSE,FALSE,TRUE).
We then take the logical AND of (FALSE,TRUE,TRUE) and (FALSE,FALSE,TRUE), yielding (FALSE,FALSE,TRUE). Using the latter in the row indices of m, we get the third row of m.
Note that we needed to use &, the vector Boolean AND operator, rather than the scalar one that we would use in an if statement, &&.
The alert reader may have noticed an anomaly in the preceding example. Our filtering should have given us a submatrix of size 1 by 2, but instead it gave us a two-element vector. The elements were correct, but the data type was not. This would cause trouble if we were to then input it to some other matrix function. The solution is to use the drop argument, which tells R to retain the two-dimensional nature of our data.
Since matrices are vectors, you can also apply vector operations to them. Here’s an example:
## [1] 1 3 5 6
R informed us here that, from a vector-indexing point of view, elements 1, 3, 5, and 6 of m are larger than 2. For example, element 5 is the element in row 2, column 2 of m, which we see has the value 10, which is indeed greater than 2.
This example demonstrates R’s row() and col() functions, whose arguments are matrices. For example, for a matrix a, row(a[2,8]) will return the row number of that element of a, which is 2. Well, we knew row(a[2,8]) is in row 2, didn’t we? So why would this function be useful?
Let’s consider an example. When writing simulation code for multivariate normal distributions—for instance, using mvrnorm() from the MASS library—we need to specify a covariance matrix. The key point for our purposes here is that the matrix is symmetric; for example, the element in row 1, column 2 is equal to the element in row 2, column 1.
Suppose that we are working with an n-variate normal distribution. Our matrix will have n rows and n columns, and we wish each of the n variables to have variance 1, with correlation rho between pairs of variables. For n=3 and rho = 0.2, for example, the desired matrix is as follows:
Here is code to generate this kind of matrix:
makecov <- function(rho,n) {
m <- matrix(nrow=n,ncol=n)
m <- ifelse(row(m) == col(m),1,rho)
return(m)
}## [,1] [,2]
## [1,] 1 1
## [2,] 2 2
## [3,] 3 3
This is the general form of apply for matrices:
where the arguments are as follows:
For example, here we apply the R function mean() to each column of a matrix z:
## [1] 2 5
-In this case, we could have used the colMeans() function, but this provides a simple example of using apply(). A function you write yourself is just as legitimate for use in apply() as any R built-in function such as mean(). Here’s an example using our own function f:
## [,1] [,2] [,3]
## [1,] 0.5 1.000 1.50
## [2,] 0.5 0.625 0.75
Our f() function divides a two-element vector by the vector (2,8). (Recycling would be used if x had a length longer than 2.) The call to apply() asks R to call f() on each of the rows of z. The first such row is (1,4), so in the call to f(), the actual argument corresponding to the formal argument x is (1,4). Thus, R computes the value of (1,4)/(2,8), which in R’s element-wise vector arithmetic is (0.5,0.5). The computations for the other two rows are similar.
You may have been surprised that the size of the result here is 2 by 3 rather than 3 by 2. That first computation, (0.5,0.5), ends up at the first column in the output of apply(), not the first row. But this is the behavior of apply(). If the function to be applied returns a vector of k components, then the result of apply() will have k rows. You can use the matrix transpose function t() to change it if necessary, as follows:
## [,1] [,2]
## [1,] 0.5 0.500
## [2,] 1.0 0.625
## [3,] 1.5 0.750
If the function returns a scalar (which we know is just a one-element vector), the final result will be a vector, not a matrix. As you can see, the function to be applied needs to take at least one argument. The formal argument here will correspond to an actual argument of one row or column in the matrix, as described previously. In some cases, you will need additional arguments for this function, which you can place following the function name in your call to apply().
For instance, suppose we have a matrix of 1s and 0s and want to create a vector as follows: For each row of the matrix, the corresponding element of the vector will be either 1 or 0, depending on whether the majority of the first d elements in that row is 1 or 0. Here, d will be a parameter that we may wish to vary. We could do this:
copymaj <- function(rw,d) {
maj <- sum(rw[1:d]) / d
return(if(maj > 0.5) 1 else 0)
}
x<-matrix(c(1,1,1,0,0,1,0,1,1,1,0,1,1,1,1,1,0,0,1,0),nrow=4,ncol=5)
apply(x,1,copymaj,3)## [1] 1 1 0 1
## [1] 0 1 0 0
findols <- function(x) {
findol <- function(xrow) {
mdn <- median(xrow)
devs <- abs(xrow-mdn)
return(which.max(devs))
}
return(apply(x,1,findol))
}Our call will be as follows:
Recall how we reassign vectors to change their size:
## [1] 12 5 13 16 8 20
## [1] 12 5 13 20 16 8 20
## [1] 12 16 8 20
In the first case, x is originally of length 5, which we extend to 6 via concatenation and then reassignment. We didn’t literally change the length of x but instead created a new vector from x and then assigned x to that new vector.
Analogous operations can be used to change the size of a matrix. For instance, the rbind() (row bind) and cbind() (column bind) functions let you add rows or columns to a matrix.
## one
## [1,] 1 1 1 1
## [2,] 1 2 1 0
## [3,] 1 3 0 1
## [4,] 1 4 0 0
Note, too, that we could have relied on recycling:
## one
## [1,] 1 1 1 1 1
## [2,] 1 1 2 1 0
## [3,] 1 1 3 0 1
## [4,] 1 1 4 0 0
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
## [,1] [,2]
## [1,] 1 4
## [2,] 3 6
Finding the distances between vertices on a graph is a common example used in computer science courses and is used in statistics/data sciences too. This kind of problem arises in some clustering algorithms, for instance, and in genomics applications.
Here, we’ll look at the common example of finding distances between cities, as it is easier to describe than, say, finding distances between DNA strands.
Suppose we need a function that inputs a distance matrix, where the element in row i, column j gives the distance between city i and city j and outputs the minimum one-hop distance between cities and the pair of cities that achieves that minimum. Here’s the code for the solution:
# returns the minimum value of d[i,j], i != j, and the row/col attaining
# that minimum, for square symmetric matrix d; no special policy on ties
mind <- function(d) {
n <- nrow(d)
# add a column to identify row number for apply()
dd <- cbind(d,1:n)
wmins <- apply(dd[-n,],1,imin)
# wmins will be 2xn, 1st row being indices and 2nd being values
i <- which.min(wmins[2,])
j <- wmins[1,i]
return(c(d[i,j],i,j))
}
# finds the location, value of the minimum in a row x
imin <- function(x) {
lx <- length(x)
i <- x[lx] # original row number
j <- which.min(x[(i+1):(lx-1)])
k <- i+j
return(c(k,x[k]))
}And here’s an example of putting our new function to use
## [1] 6 3 4
The minimum value was 6, located in row 3, column 4. As you can see, a call to apply() plays a prominent role. Our task is fairly simple: We need to find the minimum nonzero element in the matrix. We find the minimum in each row—a single call to apply() accomplishes this for all the rows—and then find the smallest value
Among those minima. But as you’ll see, the code logic becomes rather intricate. One key point is that the matrix is symmetric, because the distance from city i to city j is the same as from j to i. So in finding the minimum value in the ith row, we need look at only elements i+1, i+2,…, n, where n is the number of rows and columns in the matrix. Note too that this means we can skip the last row of d in our call to apply() in line 7.
Since the matrix could be large—a thousand cities would mean a million entries in the matrix—we should exploit that symmetry and save work. That, however, presents a problem. In order to go through the basic computation, the function called by apply() needs to know the number of the row in the original matrix—knowledge that apply() does not provide to the function. So, in line 6, we augment the matrix with an extra column, consisting of the row numbers, so that the function called by apply() can take row numbers into account.
The function called by apply() is imin(), beginning in line 15, which finds the mininum in the row specified in the formal argument x. It returns not only the mininum in the given row but also the index at which that minimum occurs. When imin() is called on row 1 of our example matrix q above, the minimum value is 8, which occurs in index 4. For this latter purpose, the R function which.min(), used in line 18, is very handy.
Line 19 is noteworthy. Recall that due to the symmetry of the matrix, we skip the early part of each row, as is seen in the expression (i+1):(1x-1) in line 18. But that means that the call to which.min() in that line will return the minimum’s index relative to the range (i+1):(1x-1). In row 3 of our example matrix q, we would get an index of 1 instead of 4. Thus, we must adjust by adding i, which we do in line 19.
Finally, making proper use of the output of apply() here is a bit tricky. Think again of the example matrix q above. The call to apply() will return the matrix wmins:
## [,1] [,2] [,3] [,4]
## [1,] 4 3 4 5
## [2,] 8 15 6 33
minda <- function(d) {
smallest <- min(d)
ij <- which(d == smallest,arr.ind=TRUE)
return(c(smallest,ij))
}This works, but it does have some possible drawbacks. Here’s the key line in this new code:
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
As z is still a vector, we can query its length:
## [1] 8
But as a matrix, z is a bit more than a vector:
## [1] "matrix" "array"
## $dim
## [1] 4 2
## [1] 4 2
## [1] 4
## [1] 2
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
## [1] 2 6
## $dim
## [1] 4 2
## NULL
## int [1:4, 1:2] 1 2 3 4 5 6 7 8
## int [1:2] 2 6
## [,1] [,2]
## [1,] 2 6
## [1] 1 2
## [1] 7
## [1] 7
## NULL
## $dim
## [1] 3 1
## NULL
## a b
## [1,] 1 3
## [2,] 2 4
## [1] "a" "b"
## $dim
## [1] 3 2 2
## [1] 50
R’s print function for arrays displays the data layer by layer:
## , , 1
##
## [,1] [,2]
## [1,] 46 30
## [2,] 21 25
## [3,] 50 50
##
## , , 2
##
## [,1] [,2]
## [1,] 46 43
## [2,] 41 35
## [3,] 50 50