Matrices

We have talked about vectors. They are a sequence of values of same type. A natural generalization is a matrix. This is a 2 dimensional arrangement of values, i.e. data elements arranged in a rectangular layout.

To create a matrix

m1 <- matrix(c(1, 2, 3, 4, 5, 6),
             nrow = 3,
             ncol = 2,
             byrow = TRUE)
m1
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
attributes(m1)
## $dim
## [1] 3 2
str(m1)
##  num [1:3, 1:2] 1 3 5 2 4 6

Notice the byrow argument. Let’s see what happens if we set it to FALSE.

m2 <- matrix(c(1, 2, 3, 4, 5, 6),
             nrow = 2,
             ncol = 3,
             byrow = FALSE)
m2
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
attributes(m2)
## $dim
## [1] 2 3
str(m2)
##  num [1:2, 1:3] 1 2 3 4 5 6

To get all the elements of a matrix as a vector -

c(m1)
## [1] 1 3 5 2 4 6
c(m2)
## [1] 1 2 3 4 5 6

To get dimensions of a matrix we can use dim.

dim(m1)
## [1] 3 2
dim(m2)
## [1] 2 3

You can obtain the rows and columns individually as follows.

nrow(m1)
## [1] 3
ncol(m1)
## [1] 2

These functions piggyback on dim as you can see below.

nrow
## function (x) 
## dim(x)[1L]
## <bytecode: 0xc0b2a0>
## <environment: namespace:base>
ncol
## function (x) 
## dim(x)[2L]
## <bytecode: 0x9fde30>
## <environment: namespace:base>

Note that a matrix is a valid vector but with two additional attributes, number of rows and number of columns.

m1
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
length(m1)
## [1] 6
class(m1)
## [1] "matrix"
attributes(m1)
## $dim
## [1] 3 2
cm1 <- c(m1)
cm1
## [1] 1 3 5 2 4 6
length(cm1)
## [1] 6
class(cm1)
## [1] "numeric"
attributes(cm1)
## NULL

Here is how we can apply the transpose operation.

t(m1)
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

To access element at ith row and jth column of matrix M, we use the expression M[i, j]

m1[1, 2]
## [1] 2
m1[3, 2]
## [1] 6

What happens when we do m1[5, 5] ?

To access the entire ith row -

m1[1, ]
## [1] 1 2
m2[2, ]
## [1] 2 4 6

To access the entire jth column -

m1[, 2]
## [1] 2 4 6
m2[, 1]
## [1] 1 2

The last 4 commands seem ok but note the type of data structure returned by them. Even though we indexed a matrix, we got a vector back instead of a matrix. We can confirm this in a couple of ways.

r <- m1[1, ]
attributes(m1)
## $dim
## [1] 3 2
attributes(r)
## NULL
str(m1)
##  num [1:3, 1:2] 1 3 5 2 4 6
str(r)
##  num [1:2] 1 2

This tells us that m1 has rows and columns while r does not. While this may be what you expect, in many cases you want a submatrix. You may be doing a matrix indexing expecting to get a submatrix back but if the index is a single number you suddenly get a vector back. This might introduce a bug in your code. To avoid this behavior, you can use the drop argument as follows.

r <- m1[1, , drop = FALSE]
attributes(m1)
## $dim
## [1] 3 2
attributes(r)
## $dim
## [1] 1 2
str(m1)
##  num [1:3, 1:2] 1 3 5 2 4 6
str(r)
##  num [1, 1:2] 1 2

We can pass drop as an argument because [ is just a function in R! This is what makes the following legal R code.

"["(m1, 2, 2)
## [1] 4

If you have a vector that you want to treat as a matrix, you can use as.matrix function.

vec1 <- c(1, 2, 3, 4)
mat1 <- as.matrix(vec1)
attributes(mat1)
## $dim
## [1] 4 1

To access more than one row or column at a time -

m1[c(1, 3), ]
##      [,1] [,2]
## [1,]    1    2
## [2,]    5    6
m2[, c(1, 3)]
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    6

What if we want to combine multiple matrices together?

m3 = matrix(c(90, 43, -783, 21, 23, 57), 
            nrow=3, 
            ncol=2) 

m4 = matrix(c(37, 12, 0), 
            nrow=3, 
            ncol=1) 

m3
##      [,1] [,2]
## [1,]   90   21
## [2,]   43   23
## [3,] -783   57
m4
##      [,1]
## [1,]   37
## [2,]   12
## [3,]    0
cbind(m3, m4)
##      [,1] [,2] [,3]
## [1,]   90   21   37
## [2,]   43   23   12
## [3,] -783   57    0
cbind(m4, m3)
##      [,1] [,2] [,3]
## [1,]   37   90   21
## [2,]   12   43   23
## [3,]    0 -783   57
m5 = matrix(c(90, 43, -783, 21, 23, 57), 
            nrow=1, 
            ncol=2)
m5
##      [,1] [,2]
## [1,]   90   43
m3
##      [,1] [,2]
## [1,]   90   21
## [2,]   43   23
## [3,] -783   57
rbind(m3, m5)
##      [,1] [,2]
## [1,]   90   21
## [2,]   43   23
## [3,] -783   57
## [4,]   90   43
rbind(m5, m3)
##      [,1] [,2]
## [1,]   90   43
## [2,]   90   21
## [3,]   43   23
## [4,] -783   57

We can also cbind or rbind more than two matrices together.

rbind(c(3,4,5), c(2,1,3), c(6,5,4))
##      [,1] [,2] [,3]
## [1,]    3    4    5
## [2,]    2    1    3
## [3,]    6    5    4
cbind(c(3,2,6), c(4,1,5), c(5,3,4))
##      [,1] [,2] [,3]
## [1,]    3    4    5
## [2,]    2    1    3
## [3,]    6    5    4

Matrix rows and columns can be named. This allows us to use those names as indices for accessing elements, row and columns of the matrix.

m6 = matrix(c(90, 43, -783, 21, 23, 57, 23, 65, 78, 90, 32, 4, 3, 67, 0, 12, 32, 76, 32, 9), 
            nrow=5, 
            ncol=4)
m6
##      [,1] [,2] [,3] [,4]
## [1,]   90   57   32   12
## [2,]   43   23    4   32
## [3,] -783   65    3   76
## [4,]   21   78   67   32
## [5,]   23   90    0    9
rownames(m6)
## NULL
colnames(m6)
## NULL

What is this NULL? It means, a list (more about this later!) of 0 items. Let’s try to provide names to rows and columns.

rownames(m6) <- c("r1", "r2", "r3", "r4", "r5")
m6
##    [,1] [,2] [,3] [,4]
## r1   90   57   32   12
## r2   43   23    4   32
## r3 -783   65    3   76
## r4   21   78   67   32
## r5   23   90    0    9
colnames(m6) <- c("c1", "c2", "c3", "c4")
m6
##      c1 c2 c3 c4
## r1   90 57 32 12
## r2   43 23  4 32
## r3 -783 65  3 76
## r4   21 78 67 32
## r5   23 90  0  9

We can also remove names by assigning NULL.

rownames(m6) <- NULL
m6
##        c1 c2 c3 c4
## [1,]   90 57 32 12
## [2,]   43 23  4 32
## [3,] -783 65  3 76
## [4,]   21 78 67 32
## [5,]   23 90  0  9
colnames(m6) <- NULL
m6
##      [,1] [,2] [,3] [,4]
## [1,]   90   57   32   12
## [2,]   43   23    4   32
## [3,] -783   65    3   76
## [4,]   21   78   67   32
## [5,]   23   90    0    9

The names can be changed and both row and column names can be assigned in a single step.

dimnames(m6) <- list(c("r-1", "r-2", "r-3", "r-4", "r-5"),
                     c("c-1", "c-2", "c-3", "c-4"))
m6
##      c-1 c-2 c-3 c-4
## r-1   90  57  32  12
## r-2   43  23   4  32
## r-3 -783  65   3  76
## r-4   21  78  67  32
## r-5   23  90   0   9

Similarly, we can remove them in a single step.

dimnames(m6) <- list(NULL, NULL)
m6
##      [,1] [,2] [,3] [,4]
## [1,]   90   57   32   12
## [2,]   43   23    4   32
## [3,] -783   65    3   76
## [4,]   21   78   67   32
## [5,]   23   90    0    9

Let’s do some matrix operations.

mA <- cbind(matrix(rnorm(20), 4), c(1, 2, 4, 6))
mA
##            [,1]       [,2]       [,3]       [,4]        [,5] [,6]
## [1,] -2.4131887 -0.9644539 -0.5149461 -1.2951568 -1.68376254    1
## [2,] -1.5674913  1.3081971  2.4307926  0.8757151 -1.27309222    2
## [3,]  0.6428604  1.0977363  0.5369375  1.8720059 -0.48050814    4
## [4,] -0.5079755 -0.3424555 -0.4709463 -1.1729805  0.04627785    6
mB <- cbind(matrix(rnorm(20), 4), c(8, 3, 6, 7))
mB
##            [,1]       [,2]       [,3]       [,4]       [,5] [,6]
## [1,]  1.5625534  2.4192403 -0.5176799  0.3967884 -1.6512427    8
## [2,]  1.6124261 -1.4256384  1.2699589  0.7929031  0.6220068    3
## [3,] -0.2891407 -0.4104651  1.5511732 -0.6885141 -0.4071293    6
## [4,] -0.4522428  0.1269211 -1.9408871  0.6740879  0.6477337    7

To add and subtract two matrices, we use + and - operators.

mA + mB
##             [,1]       [,2]      [,3]       [,4]       [,5] [,6]
## [1,] -0.85063522  1.4547864 -1.032626 -0.8983684 -3.3350053    9
## [2,]  0.04493474 -0.1174413  3.700751  1.6686182 -0.6510854    5
## [3,]  0.35371962  0.6872712  2.088111  1.1834918 -0.8876374   10
## [4,] -0.96021825 -0.2155344 -2.411833 -0.4988926  0.6940116   13
mA - mB
##             [,1]       [,2]         [,3]        [,4]        [,5] [,6]
## [1,] -3.97574211 -3.3836941  0.002733768 -1.69194519 -0.03251979   -7
## [2,] -3.17991740  2.7338354  1.160833702  0.08281202 -1.89509901   -1
## [3,]  0.93200109  1.5082014 -1.014235697  2.56052006 -0.07337888   -2
## [4,] -0.05573267 -0.4693766  1.469940870 -1.84706838 -0.60145588   -1

Element wise multiplication and division is done using * and ‘/’

mA * mB
##            [,1]        [,2]      [,3]       [,4]        [,5] [,6]
## [1,] -3.7707363 -2.33324567 0.2665773 -0.5139032  2.78030067    8
## [2,] -2.5274639 -1.86501591 3.0870066  0.6943572 -0.79187200    6
## [3,] -0.1858771 -0.45058242 0.8328831 -1.2889025  0.19562892   24
## [4,]  0.2297282 -0.04346483 0.9140536 -0.7906919  0.02997573   42
mA / mB
##            [,1]       [,2]      [,3]      [,4]       [,5]      [,6]
## [1,] -1.5443879 -0.3986598 0.9947192 -3.264100  1.0196941 0.1250000
## [2,] -0.9721322 -0.9176220 1.9140719  1.104442 -2.0467497 0.6666667
## [3,] -2.2233476 -2.6743719 0.3461493 -2.718907  1.1802348 0.6666667
## [4,]  1.1232362 -2.6981759 0.2426449 -1.740100  0.0714458 0.8571429

These operations can be done with a constant. In that case R recycles the constant.

mA + 10
##           [,1]      [,2]      [,3]      [,4]      [,5] [,6]
## [1,]  7.586811  9.035546  9.485054  8.704843  8.316237   11
## [2,]  8.432509 11.308197 12.430793 10.875715  8.726908   12
## [3,] 10.642860 11.097736 10.536938 11.872006  9.519492   14
## [4,]  9.492025  9.657545  9.529054  8.827019 10.046278   16
10 + mA
##           [,1]      [,2]      [,3]      [,4]      [,5] [,6]
## [1,]  7.586811  9.035546  9.485054  8.704843  8.316237   11
## [2,]  8.432509 11.308197 12.430793 10.875715  8.726908   12
## [3,] 10.642860 11.097736 10.536938 11.872006  9.519492   14
## [4,]  9.492025  9.657545  9.529054  8.827019 10.046278   16
mA - 10
##           [,1]       [,2]       [,3]       [,4]       [,5] [,6]
## [1,] -12.41319 -10.964454 -10.514946 -11.295157 -11.683763   -9
## [2,] -11.56749  -8.691803  -7.569207  -9.124285 -11.273092   -8
## [3,]  -9.35714  -8.902264  -9.463062  -8.127994 -10.480508   -6
## [4,] -10.50798 -10.342455 -10.470946 -11.172981  -9.953722   -4
10 - mA
##          [,1]      [,2]      [,3]      [,4]      [,5] [,6]
## [1,] 12.41319 10.964454 10.514946 11.295157 11.683763    9
## [2,] 11.56749  8.691803  7.569207  9.124285 11.273092    8
## [3,]  9.35714  8.902264  9.463062  8.127994 10.480508    6
## [4,] 10.50798 10.342455 10.470946 11.172981  9.953722    4
mA / 10
##             [,1]        [,2]        [,3]        [,4]         [,5] [,6]
## [1,] -0.24131887 -0.09644539 -0.05149461 -0.12951568 -0.168376254  0.1
## [2,] -0.15674913  0.13081971  0.24307926  0.08757151 -0.127309222  0.2
## [3,]  0.06428604  0.10977363  0.05369375  0.18720059 -0.048050814  0.4
## [4,] -0.05079755 -0.03424555 -0.04709463 -0.11729805  0.004627785  0.6
10 / mA
##            [,1]       [,2]       [,3]      [,4]      [,5]      [,6]
## [1,]  -4.143895 -10.368562 -19.419507 -7.721073  -5.93908 10.000000
## [2,]  -6.379621   7.644108   4.113885 11.419239  -7.85489  5.000000
## [3,]  15.555478   9.109656  18.624141  5.341863 -20.81130  2.500000
## [4,] -19.685990 -29.200875 -21.233845 -8.525291 216.08608  1.666667
mA * 10
##            [,1]      [,2]      [,3]       [,4]        [,5] [,6]
## [1,] -24.131887 -9.644539 -5.149461 -12.951568 -16.8376254   10
## [2,] -15.674913 13.081971 24.307926   8.757151 -12.7309222   20
## [3,]   6.428604 10.977363  5.369375  18.720059  -4.8050814   40
## [4,]  -5.079755 -3.424555 -4.709463 -11.729805   0.4627785   60
10 * mA
##            [,1]      [,2]      [,3]       [,4]        [,5] [,6]
## [1,] -24.131887 -9.644539 -5.149461 -12.951568 -16.8376254   10
## [2,] -15.674913 13.081971 24.307926   8.757151 -12.7309222   20
## [3,]   6.428604 10.977363  5.369375  18.720059  -4.8050814   40
## [4,]  -5.079755 -3.424555 -4.709463 -11.729805   0.4627785   60

To perform the mathematical matrix multiplication, we use the %*% operator.

mA %*% t(mA)
##            [,1]      [,2]       [,3]      [,4]
## [1,] 12.5313078  4.278626 -0.5020329  9.239908
## [2,]  4.2786259 16.464802 11.9846362 10.117362
## [3,] -0.5020329 11.984636 21.6418906 20.826585
## [4,]  9.2399079 10.117362 20.8265848 37.975130

To calculate the row and column means and sums of matrices conveniently, R provides appropriately named functions.

rowMeans(mA)
## [1] -0.9785847  0.6290202  1.2781720  0.5919867
colMeans(mA)
## [1] -0.96144877  0.27475599  0.49545943  0.06989594 -0.84777126  3.25000000
rowSums(mA)
## [1] -5.871508  3.774121  7.669032  3.551920
colSums(mA)
## [1] -3.8457951  1.0990240  1.9818377  0.2795838 -3.3910850 13.0000000

We can also assign to matrices and submatrices. The following snippets of code explain how.

y <- matrix(nrow = 3, ncol = 2)
y
##      [,1] [,2]
## [1,]   NA   NA
## [2,]   NA   NA
## [3,]   NA   NA
y[1, 1] <- 11
y[2, 1] <- 12
y[3, 1] <- 56
y[1, 2] <- 13
y[2, 2] <- 14
y[3, 2] <- 89
y
##      [,1] [,2]
## [1,]   11   13
## [2,]   12   14
## [3,]   56   89
y[c(1, 3), c(1, 2)] <- matrix(1:4, nrow = 2)
y
##      [,1] [,2]
## [1,]    1    3
## [2,]   12   14
## [3,]    2    4

Here is another interesting example.

x <- matrix(nrow=3,ncol=3)
y <- matrix(c(7, 8, 9, 10), nrow = 2)
x
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA   NA   NA
## [3,]   NA   NA   NA
y
##      [,1] [,2]
## [1,]    7    9
## [2,]    8   10
x[2:3, 2:3] <- y
x
##      [,1] [,2] [,3]
## [1,]   NA   NA   NA
## [2,]   NA    7    9
## [3,]   NA    8   10

Just like the case of vectors, we can use negative indices too. Negative indices are used to exclude rows or columns. For example -

m1
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
m1[-2, ]
##      [,1] [,2]
## [1,]    1    2
## [2,]    5    6

List

Vectors and Matrices always contain same type of data values. A List is like a vector that can contain objects of different types.

numbers <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
characters <- c("aa", "bb", "cc", "dd", "ee", "ff") 
logicals <- c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE)
l1 <- list(numbers, characters, logicals, 12, "abcd", TRUE)
l1
## [[1]]
##  [1] 0 1 2 3 4 5 6 7 8 9
## 
## [[2]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
## 
## [[3]]
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
## 
## [[4]]
## [1] 12
## 
## [[5]]
## [1] "abcd"
## 
## [[6]]
## [1] TRUE

With lists we can reference an individual element or retrieve a slice (sublist). Notice the difference between using [[]] and [].

l1[[1]]
##  [1] 0 1 2 3 4 5 6 7 8 9
l1[[2]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
l1[[3]]
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
l1[[4]]
## [1] 12
l1[[5]]
## [1] "abcd"
l1[[6]]
## [1] TRUE
l1[1]
## [[1]]
##  [1] 0 1 2 3 4 5 6 7 8 9
l1[2]
## [[1]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
l1[3]
## [[1]]
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
l1[4]
## [[1]]
## [1] 12
l1[5]
## [[1]]
## [1] "abcd"
l1[6]
## [[1]]
## [1] TRUE
l1[c(1, 3, 5)]
## [[1]]
##  [1] 0 1 2 3 4 5 6 7 8 9
## 
## [[2]]
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
## 
## [[3]]
## [1] "abcd"
l1[c(2, 4, 6)]
## [[1]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
## 
## [[2]]
## [1] 12
## 
## [[3]]
## [1] TRUE

If we want to modify the list contents, an interesting thing happens.

l1[[1]]
##  [1] 0 1 2 3 4 5 6 7 8 9
l1[[1]][1] <- 299792458
l1
## [[1]]
##  [1] 299792458         1         2         3         4         5         6
##  [8]         7         8         9
## 
## [[2]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
## 
## [[3]]
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
## 
## [[4]]
## [1] 12
## 
## [[5]]
## [1] "abcd"
## 
## [[6]]
## [1] TRUE
numbers
##  [1] 0 1 2 3 4 5 6 7 8 9

Just like matrices, we can provide names to list elements. Notice how the list gets printed.

l2 <- list(primes = c(2, 3, 5, 7, 11, 13),
           composites = c(4, 6, 8, 9, 10, 12),
           complexes = c(3+1i, 4+6i, 8 + 9i))
l2
## $primes
## [1]  2  3  5  7 11 13
## 
## $composites
## [1]  4  6  8  9 10 12
## 
## $complexes
## [1] 3+1i 4+6i 8+9i

To access the elements, we can use the following approaches.

l2$primes
## [1]  2  3  5  7 11 13
l2["primes"]
## $primes
## [1]  2  3  5  7 11 13
l2[["primes"]]
## [1]  2  3  5  7 11 13
l2[1]
## $primes
## [1]  2  3  5  7 11 13
l2[[1]]
## [1]  2  3  5  7 11 13
l2[c("composites", "complexes")]
## $composites
## [1]  4  6  8  9 10 12
## 
## $complexes
## [1] 3+1i 4+6i 8+9i

What if we want to access the list variables directly?

# primes, complexes and composites are not defined
attach(l2)
primes #primes is defined now
## [1]  2  3  5  7 11 13
complexes #complexes is defined now
## [1] 3+1i 4+6i 8+9i
composites #composites is defined now
## [1]  4  6  8  9 10 12
detach(l2)
# primes, complexes and composites are again not defined.

This essentially attaches the list to search path and allows the list members to be accessed by their names directly. If you attach then make sure you call detach as soon as possible.

Data Frames

Data Frames are the most useful data structure in R. A Data Frame is a List in which all vectors are of same size. It is useful to store tables of data, like a spreadsheet.

firstNames <- c("John", "Oliver", "Mack", "Ron")
secondNames <- c("Green", "Twist", "Brown", "Weasely")
grades <- c("A", "B", "C", "A")
df <- data.frame(firstNames, secondNames, grades)
df
##   firstNames secondNames grades
## 1       John       Green      A
## 2     Oliver       Twist      B
## 3       Mack       Brown      C
## 4        Ron     Weasely      A

Notice how, in the previous example, the column names are automatically taken to be the vector names.

In R, there is a data frame called mtcars provided by default. We will use this for the rest of the lecture.

mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

This is too much output. If we are just interested in inspecting the first few lines of the data frame, we can do the following.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

To get information about the number of rows, row names, number of columns and column names of a data frame, we can do the following.

nrow(mtcars)
## [1] 32
rownames(mtcars)
##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"
ncol(mtcars)
## [1] 11
colnames(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"

To access elements, we can use the familiar matrix like syntax.

mtcars[2, 3]
## [1] 160
mtcars["Mazda RX4 Wag", "disp"]
## [1] 160

To access a column vector, we can use [[]] operator.

mtcars[[3]]
##  [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0
## [23] 304.0 350.0 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0
mtcars[["disp"]]
##  [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0
## [23] 304.0 350.0 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0

Since a data frame is a type of list, we can also use the $ operator.

mtcars$disp
##  [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0
## [23] 304.0 350.0 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0

Another way to do the same is as follows.

mtcars[, "disp"]
##  [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0
## [23] 304.0 350.0 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0

If we want a column slice of the data frame we can use the [] operator.

mtcars["disp"]
##                      disp
## Mazda RX4           160.0
## Mazda RX4 Wag       160.0
## Datsun 710          108.0
## Hornet 4 Drive      258.0
## Hornet Sportabout   360.0
## Valiant             225.0
## Duster 360          360.0
## Merc 240D           146.7
## Merc 230            140.8
## Merc 280            167.6
## Merc 280C           167.6
## Merc 450SE          275.8
## Merc 450SL          275.8
## Merc 450SLC         275.8
## Cadillac Fleetwood  472.0
## Lincoln Continental 460.0
## Chrysler Imperial   440.0
## Fiat 128             78.7
## Honda Civic          75.7
## Toyota Corolla       71.1
## Toyota Corona       120.1
## Dodge Challenger    318.0
## AMC Javelin         304.0
## Camaro Z28          350.0
## Pontiac Firebird    400.0
## Fiat X1-9            79.0
## Porsche 914-2       120.3
## Lotus Europa         95.1
## Ford Pantera L      351.0
## Ferrari Dino        145.0
## Maserati Bora       301.0
## Volvo 142E          121.0

This is different from

mtcars[["disp"]]
##  [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0  78.7  75.7  71.1 120.1 318.0
## [23] 304.0 350.0 400.0  79.0 120.3  95.1 351.0 145.0 301.0 121.0

A column slice is a valid data frame. It preserves the row names as well. To get multiple columns, we do the obvious -

mtcars[c("mpg", "cyl", "disp")]
##                      mpg cyl  disp
## Mazda RX4           21.0   6 160.0
## Mazda RX4 Wag       21.0   6 160.0
## Datsun 710          22.8   4 108.0
## Hornet 4 Drive      21.4   6 258.0
## Hornet Sportabout   18.7   8 360.0
## Valiant             18.1   6 225.0
## Duster 360          14.3   8 360.0
## Merc 240D           24.4   4 146.7
## Merc 230            22.8   4 140.8
## Merc 280            19.2   6 167.6
## Merc 280C           17.8   6 167.6
## Merc 450SE          16.4   8 275.8
## Merc 450SL          17.3   8 275.8
## Merc 450SLC         15.2   8 275.8
## Cadillac Fleetwood  10.4   8 472.0
## Lincoln Continental 10.4   8 460.0
## Chrysler Imperial   14.7   8 440.0
## Fiat 128            32.4   4  78.7
## Honda Civic         30.4   4  75.7
## Toyota Corolla      33.9   4  71.1
## Toyota Corona       21.5   4 120.1
## Dodge Challenger    15.5   8 318.0
## AMC Javelin         15.2   8 304.0
## Camaro Z28          13.3   8 350.0
## Pontiac Firebird    19.2   8 400.0
## Fiat X1-9           27.3   4  79.0
## Porsche 914-2       26.0   4 120.3
## Lotus Europa        30.4   4  95.1
## Ford Pantera L      15.8   8 351.0
## Ferrari Dino        19.7   6 145.0
## Maserati Bora       15.0   8 301.0
## Volvo 142E          21.4   4 121.0

Note that we can also use numbers, though they are less intuitive.

mtcars[c(1, 2, 3)]
##                      mpg cyl  disp
## Mazda RX4           21.0   6 160.0
## Mazda RX4 Wag       21.0   6 160.0
## Datsun 710          22.8   4 108.0
## Hornet 4 Drive      21.4   6 258.0
## Hornet Sportabout   18.7   8 360.0
## Valiant             18.1   6 225.0
## Duster 360          14.3   8 360.0
## Merc 240D           24.4   4 146.7
## Merc 230            22.8   4 140.8
## Merc 280            19.2   6 167.6
## Merc 280C           17.8   6 167.6
## Merc 450SE          16.4   8 275.8
## Merc 450SL          17.3   8 275.8
## Merc 450SLC         15.2   8 275.8
## Cadillac Fleetwood  10.4   8 472.0
## Lincoln Continental 10.4   8 460.0
## Chrysler Imperial   14.7   8 440.0
## Fiat 128            32.4   4  78.7
## Honda Civic         30.4   4  75.7
## Toyota Corolla      33.9   4  71.1
## Toyota Corona       21.5   4 120.1
## Dodge Challenger    15.5   8 318.0
## AMC Javelin         15.2   8 304.0
## Camaro Z28          13.3   8 350.0
## Pontiac Firebird    19.2   8 400.0
## Fiat X1-9           27.3   4  79.0
## Porsche 914-2       26.0   4 120.3
## Lotus Europa        30.4   4  95.1
## Ford Pantera L      15.8   8 351.0
## Ferrari Dino        19.7   6 145.0
## Maserati Bora       15.0   8 301.0
## Volvo 142E          21.4   4 121.0

This allows us to retrieve a specific set of columns. What if we want to retrieve a specific set of rows?

mtcars[1, ]
##           mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4
mtcars["Mazda RX4", ]
##           mpg cyl disp  hp drat   wt  qsec vs am gear carb
## Mazda RX4  21   6  160 110  3.9 2.62 16.46  0  1    4    4

As usual, we can use names or vector of names or vector of numbers.

mtcars[c(1, 2, 3), ]
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
mtcars[c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710"), ]
##                mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

Now, we can generalize this to get a specific slice of the data frame, limiting both the rows and the columns.

mtcars[c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710"), c("mpg", "cyl", "disp")]
##                mpg cyl disp
## Mazda RX4     21.0   6  160
## Mazda RX4 Wag 21.0   6  160
## Datsun 710    22.8   4  108

A lot of times, we need to slice data frames based on specific properties satisfied by the values of a column. In this case, let’s say we want a data frame of all the cars whose mpg is greater than 20.0. We can do this in two steps.

  1. Get the logical index vector
  2. Index the data frame using this vector
indices <- mtcars$mpg > 20.0
indices
##  [1]  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
## [23] FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE  TRUE
mtcars[indices, ]
##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Merc 240D      24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230       22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2