At the beginning of the chapter, I said that a matrix is just a
vector but with two additional attributes: the number of rows and the
number of columns. Here, we’ll take a closer look at the vector nature
of matrices. Consider this example:
z <- matrix(10:17,nrow=4)
z
[,1] [,2]
[1,] 10 14
[2,] 11 15
[3,] 12 16
[4,] 13 17
-> creates matrix with values 1 through 8 and 4 rows, therefore,
with 2 columns
length(z)#As z is still a vector, we can query its length
[1] 8
-> length of vector z
class(z)
[1] "matrix" "array"
dim(z)
[1] 4 2
-> z class is matrix, array, and the dimensions are 4 rows and two
columns -> 2-dimensional
Avoiding Unintended Dimension Reduction In the world
of statistics, dimension reduction is a good thing, with many
statistical procedures aimed to do it well. If we are working with, say,
10 variables and can reduce that number to 3 that still capture the
essence of our data, we’re happy. However, in R, something else might
merit the name dimension reduction that we may sometimes wish to avoid.
Say we have a four-row matrix and extract a row from it:
z
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
r <- z[2,]
r
[1] 2 6
-> returns second row of z and assigns it to r in vector
format
This seems innocuous, but note the format in which R has displayed r.
It’s a vector format, not a matrix format. In other words, r is a vector
of length 2, rather than a 1-by-2 matrix. We can confirm this in a
couple of ways:
attributes(z)
$dim
[1] 4 2
attributes(r)
NULL
str(z)
int [1:4, 1:2] 1 2 3 4 5 6 7 8
str(r)
int [1:2] 2 6
-> shows dimensions of z and NULL for r since r is a vector ->
also the structural differences between a matrix and a regular vector
can be seen
Fortunately, R has a way to suppress this dimension reduction: the
drop argument. Here’s an example, using the matrix z from above:
r <- z[2,, drop=FALSE]
r
[,1] [,2]
[1,] 2 6
-> using the drop=false, the second row can be selected while
keeping the matrix format
dim(r)
[1] 1 2
class(r)
[1] "matrix" "array"
-> confirms the matrix format
Naming Matrix Rows and Columns
The natural way to refer to rows and columns in a matrix is via the
row and column numbers. However, you can also give names to these
entities. Here’s an example:
z
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7
[4,] 4 8
colnames(z)
NULL
-> matrix has no column names
colnames(z) <- c("cat","dog")
-> assigning column names to z
z
cat dog
[1,] 10 14
[2,] 11 15
[3,] 12 16
[4,] 13 17
colnames(z)
[1] "cat" "dog"
-> confirms that columns are renamed and can be accessed
directly
z[,"cat"]
[1] 10 11 12 13
-> accesses column of z by column name
Higher-Dimensional Arrays
In a statistical context, a typical matrix in R has rows
corresponding to observations, say on various people, and columns
corresponding to variables, such as weight and blood pressure. The
matrix is then a two-dimensional data structure. But suppose we also
have data taken at different times, one data point per person per
variable per time. Time then becomes the third dimension, in addition to
rows and columns. In R, such data sets are called arrays. As a simple
example, consider students and test scores. Say each test consists of
two parts, so we record two scores for a student for each test. Now
suppose that we have two tests, and to keep the example small, assume we
have only three students. Here’s the data for the first test:
firsttest <- matrix(nrow=3,ncol=2)
firsttest[1,1] <- 468522
firsttest[2,1] <- 21230
firsttest[1,2] <- 30016523
firsttest[2,2] <- 252331
firsttest[3,1] <- 509851
firsttest[3,2] <- 500156
firsttest
[,1] [,2]
[1,] 468522 30016523
[2,] 21230 252331
[3,] 509851 500156
Student 1 had scores of 46 and 30 on the first test, student 2 scored
21 and 25, and so on. Here are the scores for the same students on the
second test:
secondtest <- matrix(nrow=3,ncol=2)
secondtest[1,1] <- 4695
secondtest[2,1] <- 411458
secondtest[1,2] <- 4335847
secondtest[2,2] <- 35456
secondtest[3,1] <- 5078563
secondtest[3,2] <- 50512
secondtest
[,1] [,2]
[1,] 4695 4335847
[2,] 411458 35456
[3,] 5078563 50512
Now let’s put both tests into one data structure, which we’ll name
tests. We’ll arrange it to have two “layers”—one layer per test—with
three rows and two columns within each layer. We’ll store firsttest in
the first layer and secondtest in the second. In layer 1, there will be
three rows for the three students’ scores on the first test, with two
columns per row for the two portions of a test. We use R’s array
function to create the data structure:
tests <- array(data=c(firsttest,secondtest),dim=c(3,2,2))
tests
, , 1
[,1] [,2]
[1,] 468522 30016523
[2,] 21230 252331
[3,] 509851 500156
, , 2
[,1] [,2]
[1,] 4695 4335847
[2,] 411458 35456
[3,] 5078563 50512
-> adding firsttest and secondtest in 3-dimensional manner
In the argument dim=c(3,2,2), we are specifying two layers (this is
the second 2), each consisting of three rows and two columns. This then
becomes an attribute of the data structure:
attributes(tests)
$dim
[1] 3 2 2
-> shows attributes: 3 rows, two columns, 2 layers ->
3-dimensional
Each element of tests now has three subscripts, rather than two as in
the matrix case. The first subscript corresponds to the first element in
the $dim vector, the second subscript corresponds to the second element
in the vector, and so on. For instance, the score on the second portion
of test 1 for student 3 is retrieved as follows:
tests[3,2,1]
[1] 500156
-> accesses score of row 3, column 2, in layer 1
tests
, , 1
[,1] [,2]
[1,] 468522 30016523
[2,] 21230 252331
[3,] 509851 500156
, , 2
[,1] [,2]
[1,] 4695 4335847
[2,] 411458 35456
[3,] 5078563 50512
-> returns both layers: shows three rows with two columns in each
layer
Just as we built our three-dimensional array by combining two
matrices, we can build four-dimensional arrays by combining two or more
three dimensional arrays, and so on.
LS0tDQp0aXRsZTogIkFjdGl2aXR5IDciDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpBdCB0aGUgYmVnaW5uaW5nIG9mIHRoZSBjaGFwdGVyLCBJIHNhaWQgdGhhdCBhIG1hdHJpeCBpcyBqdXN0IGEgdmVjdG9yIGJ1dCB3aXRoIHR3byBhZGRpdGlvbmFsIGF0dHJpYnV0ZXM6IHRoZSBudW1iZXIgb2Ygcm93cyBhbmQgdGhlIG51bWJlciBvZiBjb2x1bW5zLiBIZXJlLCB3ZeKAmWxsIHRha2UgYSBjbG9zZXIgbG9vayBhdCB0aGUgdmVjdG9yIG5hdHVyZSBvZiBtYXRyaWNlcy4gQ29uc2lkZXIgdGhpcyBleGFtcGxlOg0KDQpgYGB7cn0NCnogPC0gbWF0cml4KDEwOjE3LG5yb3c9NCkNCnoNCmBgYA0KLT4gY3JlYXRlcyBtYXRyaXggd2l0aCB2YWx1ZXMgMSB0aHJvdWdoIDggYW5kIDQgcm93cywgdGhlcmVmb3JlLCB3aXRoIDIgY29sdW1ucw0KDQoNCmBgYHtyfQ0KbGVuZ3RoKHopI0FzIHogaXMgc3RpbGwgYSB2ZWN0b3IsIHdlIGNhbiBxdWVyeSBpdHMgbGVuZ3RoDQpgYGANCi0+IGxlbmd0aCBvZiB2ZWN0b3Igeg0KDQpgYGB7cn0NCmNsYXNzKHopDQpkaW0oeikNCmBgYA0KLT4geiBjbGFzcyBpcyBtYXRyaXgsIGFycmF5LCBhbmQgdGhlIGRpbWVuc2lvbnMgYXJlIDQgcm93cyBhbmQgdHdvIGNvbHVtbnMgLT4gMi1kaW1lbnNpb25hbA0KDQoNCioqQXZvaWRpbmcgVW5pbnRlbmRlZCBEaW1lbnNpb24gUmVkdWN0aW9uKioNCkluIHRoZSB3b3JsZCBvZiBzdGF0aXN0aWNzLCBkaW1lbnNpb24gcmVkdWN0aW9uIGlzIGEgZ29vZCB0aGluZywgd2l0aCBtYW55IHN0YXRpc3RpY2FsIHByb2NlZHVyZXMgYWltZWQgdG8gZG8gaXQgd2VsbC4gSWYgd2UgYXJlIHdvcmtpbmcgd2l0aCwgc2F5LCAxMCB2YXJpYWJsZXMgYW5kIGNhbiByZWR1Y2UgdGhhdCBudW1iZXIgdG8gMyB0aGF0IHN0aWxsIGNhcHR1cmUgdGhlIGVzc2VuY2Ugb2Ygb3VyDQpkYXRhLCB3ZeKAmXJlIGhhcHB5LiBIb3dldmVyLCBpbiBSLCBzb21ldGhpbmcgZWxzZSBtaWdodCBtZXJpdCB0aGUgbmFtZSBkaW1lbnNpb24gcmVkdWN0aW9uIHRoYXQgd2UgbWF5IHNvbWV0aW1lcyB3aXNoIHRvIGF2b2lkLiBTYXkgd2UgaGF2ZSBhIGZvdXItcm93IG1hdHJpeCBhbmQgZXh0cmFjdCBhIHJvdyBmcm9tIGl0Og0KDQpgYGB7cn0NCnoNCmBgYA0KDQoNCmBgYHtyfQ0KciA8LSB6WzIsXQ0Kcg0KYGBgDQotPiByZXR1cm5zIHNlY29uZCByb3cgb2YgeiBhbmQgYXNzaWducyBpdCB0byByIGluIHZlY3RvciBmb3JtYXQNCg0KVGhpcyBzZWVtcyBpbm5vY3VvdXMsIGJ1dCBub3RlIHRoZSBmb3JtYXQgaW4gd2hpY2ggUiBoYXMgZGlzcGxheWVkIHIuIEl04oCZcyBhIHZlY3RvciBmb3JtYXQsIG5vdCBhIG1hdHJpeCBmb3JtYXQuIEluIG90aGVyIHdvcmRzLCByIGlzIGEgdmVjdG9yIG9mIGxlbmd0aCAyLCByYXRoZXIgdGhhbiBhIDEtYnktMiBtYXRyaXguIFdlIGNhbiBjb25maXJtIHRoaXMgaW4gYSBjb3VwbGUgb2Ygd2F5czoNCg0KYGBge3J9DQphdHRyaWJ1dGVzKHopDQphdHRyaWJ1dGVzKHIpDQpzdHIoeikNCnN0cihyKQ0KYGBgDQotPiBzaG93cyBkaW1lbnNpb25zIG9mIHogYW5kIE5VTEwgZm9yIHIgc2luY2UgciBpcyBhIHZlY3Rvcg0KLT4gYWxzbyB0aGUgc3RydWN0dXJhbCBkaWZmZXJlbmNlcyBiZXR3ZWVuIGEgbWF0cml4IGFuZCBhIHJlZ3VsYXIgdmVjdG9yIGNhbiBiZSBzZWVuDQoNCg0KRm9ydHVuYXRlbHksIFIgaGFzIGEgd2F5IHRvIHN1cHByZXNzIHRoaXMgZGltZW5zaW9uIHJlZHVjdGlvbjogdGhlIGRyb3AgYXJndW1lbnQuIEhlcmXigJlzIGFuIGV4YW1wbGUsIHVzaW5nIHRoZSBtYXRyaXggeiBmcm9tIGFib3ZlOg0KDQpgYGB7cn0NCnIgPC0gelsyLCwgZHJvcD1GQUxTRV0NCnINCmBgYA0KLT4gdXNpbmcgdGhlIGRyb3A9ZmFsc2UsIHRoZSBzZWNvbmQgcm93IGNhbiBiZSBzZWxlY3RlZCB3aGlsZSBrZWVwaW5nIHRoZSBtYXRyaXggZm9ybWF0DQoNCg0KYGBge3J9DQpkaW0ocikNCmNsYXNzKHIpDQpgYGANCi0+IGNvbmZpcm1zIHRoZSBtYXRyaXggZm9ybWF0DQoNCg0KKipOYW1pbmcgTWF0cml4IFJvd3MgYW5kIENvbHVtbnMqKg0KDQpUaGUgbmF0dXJhbCB3YXkgdG8gcmVmZXIgdG8gcm93cyBhbmQgY29sdW1ucyBpbiBhIG1hdHJpeCBpcyB2aWEgdGhlIHJvdyBhbmQgY29sdW1uIG51bWJlcnMuIEhvd2V2ZXIsIHlvdSBjYW4gYWxzbyBnaXZlIG5hbWVzIHRvIHRoZXNlIGVudGl0aWVzLiBIZXJl4oCZcyBhbiBleGFtcGxlOg0KDQoNCmBgYHtyfQ0Keg0KYGBgDQoNCmBgYHtyfQ0KY29sbmFtZXMoeikNCmBgYA0KLT4gbWF0cml4IGhhcyBubyBjb2x1bW4gbmFtZXMgDQoNCg0KYGBge3J9DQpjb2xuYW1lcyh6KSA8LSBjKCJjYXQiLCJkb2ciKQ0KYGBgDQotPiBhc3NpZ25pbmcgY29sdW1uIG5hbWVzIHRvIHoNCg0KDQpgYGB7cn0NCnoNCmNvbG5hbWVzKHopDQpgYGANCi0+IGNvbmZpcm1zIHRoYXQgY29sdW1ucyBhcmUgcmVuYW1lZCBhbmQgY2FuIGJlIGFjY2Vzc2VkIGRpcmVjdGx5DQoNCmBgYHtyfQ0KelssImNhdCJdDQpgYGANCi0+IGFjY2Vzc2VzIGNvbHVtbiBvZiB6IGJ5IGNvbHVtbiBuYW1lDQoNCg0KKipIaWdoZXItRGltZW5zaW9uYWwgQXJyYXlzKioNCg0KSW4gYSBzdGF0aXN0aWNhbCBjb250ZXh0LCBhIHR5cGljYWwgbWF0cml4IGluIFIgaGFzIHJvd3MgY29ycmVzcG9uZGluZyB0byBvYnNlcnZhdGlvbnMsIHNheSBvbiB2YXJpb3VzIHBlb3BsZSwgYW5kIGNvbHVtbnMgY29ycmVzcG9uZGluZyB0byB2YXJpYWJsZXMsIHN1Y2ggYXMgd2VpZ2h0IGFuZCBibG9vZCBwcmVzc3VyZS4gVGhlIG1hdHJpeCBpcyB0aGVuIGEgdHdvLWRpbWVuc2lvbmFsIGRhdGEgc3RydWN0dXJlLiBCdXQgc3VwcG9zZSB3ZSBhbHNvIGhhdmUgZGF0YSB0YWtlbiBhdCBkaWZmZXJlbnQgdGltZXMsIG9uZSBkYXRhIHBvaW50IHBlciBwZXJzb24gcGVyIHZhcmlhYmxlIHBlciB0aW1lLiBUaW1lIHRoZW4gYmVjb21lcyB0aGUgdGhpcmQgZGltZW5zaW9uLCBpbiBhZGRpdGlvbiB0byByb3dzIGFuZCBjb2x1bW5zLiBJbiBSLCBzdWNoIGRhdGEgc2V0cyBhcmUgY2FsbGVkIGFycmF5cy4gQXMgYSBzaW1wbGUgZXhhbXBsZSwgY29uc2lkZXIgc3R1ZGVudHMgYW5kIHRlc3Qgc2NvcmVzLiBTYXkgZWFjaCB0ZXN0IGNvbnNpc3RzIG9mIHR3byBwYXJ0cywgc28gd2UgcmVjb3JkIHR3byBzY29yZXMgZm9yIGEgc3R1ZGVudCBmb3IgZWFjaCB0ZXN0LiBOb3cNCnN1cHBvc2UgdGhhdCB3ZSBoYXZlIHR3byB0ZXN0cywgYW5kIHRvIGtlZXAgdGhlIGV4YW1wbGUgc21hbGwsIGFzc3VtZSB3ZSBoYXZlIG9ubHkgdGhyZWUgc3R1ZGVudHMuIEhlcmXigJlzIHRoZSBkYXRhIGZvciB0aGUgZmlyc3QgdGVzdDoNCg0KYGBge3J9DQpmaXJzdHRlc3QgPC0gbWF0cml4KG5yb3c9MyxuY29sPTIpDQpmaXJzdHRlc3RbMSwxXSA8LSA0Njg1MjINCmZpcnN0dGVzdFsyLDFdIDwtIDIxMjMwDQpmaXJzdHRlc3RbMSwyXSA8LSAzMDAxNjUyMw0KZmlyc3R0ZXN0WzIsMl0gPC0gMjUyMzMxDQpmaXJzdHRlc3RbMywxXSA8LSA1MDk4NTENCmZpcnN0dGVzdFszLDJdIDwtIDUwMDE1Ng0KDQpgYGANCg0KDQpgYGB7cn0NCmZpcnN0dGVzdA0KYGBgDQoNCg0KU3R1ZGVudCAxIGhhZCBzY29yZXMgb2YgNDYgYW5kIDMwIG9uIHRoZSBmaXJzdCB0ZXN0LCBzdHVkZW50IDIgc2NvcmVkIDIxIGFuZCAyNSwgYW5kIHNvIG9uLiBIZXJlIGFyZSB0aGUgc2NvcmVzIGZvciB0aGUgc2FtZSBzdHVkZW50cyBvbiB0aGUgc2Vjb25kIHRlc3Q6DQoNCmBgYHtyfQ0Kc2Vjb25kdGVzdCA8LSBtYXRyaXgobnJvdz0zLG5jb2w9MikNCnNlY29uZHRlc3RbMSwxXSA8LSA0Njk1DQpzZWNvbmR0ZXN0WzIsMV0gPC0gNDExNDU4DQpzZWNvbmR0ZXN0WzEsMl0gPC0gNDMzNTg0Nw0Kc2Vjb25kdGVzdFsyLDJdIDwtIDM1NDU2DQpzZWNvbmR0ZXN0WzMsMV0gPC0gNTA3ODU2Mw0Kc2Vjb25kdGVzdFszLDJdIDwtIDUwNTEyDQpgYGANCg0KDQpgYGB7cn0NCnNlY29uZHRlc3QNCmBgYA0KDQpOb3cgbGV04oCZcyBwdXQgYm90aCB0ZXN0cyBpbnRvIG9uZSBkYXRhIHN0cnVjdHVyZSwgd2hpY2ggd2XigJlsbCBuYW1lIHRlc3RzLiBXZeKAmWxsIGFycmFuZ2UgaXQgdG8gaGF2ZSB0d28g4oCcbGF5ZXJz4oCd4oCUb25lIGxheWVyIHBlciB0ZXN04oCUd2l0aCB0aHJlZSByb3dzIGFuZCB0d28gY29sdW1ucyB3aXRoaW4gZWFjaCBsYXllci4gV2XigJlsbCBzdG9yZSBmaXJzdHRlc3QgaW4gdGhlIGZpcnN0IGxheWVyIGFuZCBzZWNvbmR0ZXN0IGluIHRoZSBzZWNvbmQuIEluIGxheWVyIDEsIHRoZXJlIHdpbGwgYmUgdGhyZWUgcm93cyBmb3IgdGhlIHRocmVlIHN0dWRlbnRz4oCZIHNjb3JlcyBvbiB0aGUgZmlyc3QgdGVzdCwgd2l0aCB0d28gY29sdW1ucyBwZXIgcm93IGZvciB0aGUgdHdvIHBvcnRpb25zIG9mIGEgdGVzdC4gV2UgdXNlIFLigJlzIGFycmF5IGZ1bmN0aW9uIHRvIGNyZWF0ZSB0aGUgZGF0YSBzdHJ1Y3R1cmU6DQoNCmBgYHtyfQ0KdGVzdHMgPC0gYXJyYXkoZGF0YT1jKGZpcnN0dGVzdCxzZWNvbmR0ZXN0KSxkaW09YygzLDIsMikpDQp0ZXN0cw0KYGBgDQotPiBhZGRpbmcgZmlyc3R0ZXN0IGFuZCBzZWNvbmR0ZXN0IGluIDMtZGltZW5zaW9uYWwgbWFubmVyDQoNCg0KSW4gdGhlIGFyZ3VtZW50IGRpbT1jKDMsMiwyKSwgd2UgYXJlIHNwZWNpZnlpbmcgdHdvIGxheWVycyAodGhpcyBpcyB0aGUgc2Vjb25kIDIpLCBlYWNoIGNvbnNpc3Rpbmcgb2YgdGhyZWUgcm93cyBhbmQgdHdvIGNvbHVtbnMuIFRoaXMgdGhlbiBiZWNvbWVzIGFuIGF0dHJpYnV0ZSBvZiB0aGUgZGF0YSBzdHJ1Y3R1cmU6DQoNCmBgYHtyfQ0KYXR0cmlidXRlcyh0ZXN0cykNCmBgYA0KLT4gc2hvd3MgYXR0cmlidXRlczogMyByb3dzLCB0d28gY29sdW1ucywgMiBsYXllcnMgLT4gMy1kaW1lbnNpb25hbCANCg0KRWFjaCBlbGVtZW50IG9mIHRlc3RzIG5vdyBoYXMgdGhyZWUgc3Vic2NyaXB0cywgcmF0aGVyIHRoYW4gdHdvIGFzIGluIHRoZSBtYXRyaXggY2FzZS4gVGhlIGZpcnN0IHN1YnNjcmlwdCBjb3JyZXNwb25kcyB0byB0aGUgZmlyc3QgZWxlbWVudCBpbiB0aGUgJGRpbSB2ZWN0b3IsIHRoZSBzZWNvbmQgc3Vic2NyaXB0IGNvcnJlc3BvbmRzIHRvIHRoZSBzZWNvbmQgZWxlbWVudCBpbiB0aGUgdmVjdG9yLCBhbmQgc28gb24uIEZvciBpbnN0YW5jZSwgdGhlIHNjb3JlIG9uIHRoZSBzZWNvbmQgcG9ydGlvbiBvZiB0ZXN0IDEgZm9yIHN0dWRlbnQgMyBpcyByZXRyaWV2ZWQgYXMgZm9sbG93czoNCg0KYGBge3J9DQp0ZXN0c1szLDIsMV0NCmBgYA0KLT4gYWNjZXNzZXMgc2NvcmUgb2Ygcm93IDMsIGNvbHVtbiAyLCBpbiBsYXllciAxDQoNCmBgYHtyfQ0KdGVzdHMNCmBgYA0KLT4gcmV0dXJucyBib3RoIGxheWVyczogc2hvd3MgdGhyZWUgcm93cyB3aXRoIHR3byBjb2x1bW5zIGluIGVhY2ggbGF5ZXINCg0KDQpKdXN0IGFzIHdlIGJ1aWx0IG91ciB0aHJlZS1kaW1lbnNpb25hbCBhcnJheSBieSBjb21iaW5pbmcgdHdvIG1hdHJpY2VzLCB3ZSBjYW4gYnVpbGQgZm91ci1kaW1lbnNpb25hbCBhcnJheXMgYnkgY29tYmluaW5nIHR3byBvciBtb3JlIHRocmVlIGRpbWVuc2lvbmFsIGFycmF5cywgYW5kIHNvIG9uLg==