We have talked about vectors. They are a sequence of values of same type. A natural generalization is a matrix. This is a 2 dimensional arrangement of values, i.e. data elements arranged in a rectangular layout.
To create a matrix
m1 <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2,
byrow = TRUE)
m1
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
attributes(m1)
## $dim
## [1] 3 2
str(m1)
## num [1:3, 1:2] 1 3 5 2 4 6
Notice the byrow argument. Let’s see what happens if we set it to FALSE.
m2 <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 2,
ncol = 3,
byrow = FALSE)
m2
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
attributes(m2)
## $dim
## [1] 2 3
str(m2)
## num [1:2, 1:3] 1 2 3 4 5 6
To get all the elements of a matrix as a vector -
c(m1)
## [1] 1 3 5 2 4 6
c(m2)
## [1] 1 2 3 4 5 6
To get dimensions of a matrix we can use dim.
dim(m1)
## [1] 3 2
dim(m2)
## [1] 2 3
You can obtain the rows and columns individually as follows.
nrow(m1)
## [1] 3
ncol(m1)
## [1] 2
These functions piggyback on dim as you can see below.
nrow
## function (x)
## dim(x)[1L]
## <bytecode: 0xc0b2a0>
## <environment: namespace:base>
ncol
## function (x)
## dim(x)[2L]
## <bytecode: 0x9fde30>
## <environment: namespace:base>
Note that a matrix is a valid vector but with two additional attributes, number of rows and number of columns.
m1
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
length(m1)
## [1] 6
class(m1)
## [1] "matrix"
attributes(m1)
## $dim
## [1] 3 2
cm1 <- c(m1)
cm1
## [1] 1 3 5 2 4 6
length(cm1)
## [1] 6
class(cm1)
## [1] "numeric"
attributes(cm1)
## NULL
Here is how we can apply the transpose operation.
t(m1)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
To access element at ith row and jth column of matrix M, we use the expression M[i, j]
m1[1, 2]
## [1] 2
m1[3, 2]
## [1] 6
What happens when we do m1[5, 5] ?
To access the entire ith row -
m1[1, ]
## [1] 1 2
m2[2, ]
## [1] 2 4 6
To access the entire jth column -
m1[, 2]
## [1] 2 4 6
m2[, 1]
## [1] 1 2
The last 4 commands seem ok but note the type of data structure returned by them. Even though we indexed a matrix, we got a vector back instead of a matrix. We can confirm this in a couple of ways.
r <- m1[1, ]
attributes(m1)
## $dim
## [1] 3 2
attributes(r)
## NULL
str(m1)
## num [1:3, 1:2] 1 3 5 2 4 6
str(r)
## num [1:2] 1 2
This tells us that m1 has rows and columns while r does not. While this may be what you expect, in many cases you want a submatrix. You may be doing a matrix indexing expecting to get a submatrix back but if the index is a single number you suddenly get a vector back. This might introduce a bug in your code. To avoid this behavior, you can use the drop argument as follows.
r <- m1[1, , drop = FALSE]
attributes(m1)
## $dim
## [1] 3 2
attributes(r)
## $dim
## [1] 1 2
str(m1)
## num [1:3, 1:2] 1 3 5 2 4 6
str(r)
## num [1, 1:2] 1 2
We can pass drop as an argument because [ is just a function in R! This is what makes the following legal R code.
"["(m1, 2, 2)
## [1] 4
If you have a vector that you want to treat as a matrix, you can use as.matrix function.
vec1 <- c(1, 2, 3, 4)
mat1 <- as.matrix(vec1)
attributes(mat1)
## $dim
## [1] 4 1
To access more than one row or column at a time -
m1[c(1, 3), ]
## [,1] [,2]
## [1,] 1 2
## [2,] 5 6
m2[, c(1, 3)]
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
What if we want to combine multiple matrices together?
m3 = matrix(c(90, 43, -783, 21, 23, 57),
nrow=3,
ncol=2)
m4 = matrix(c(37, 12, 0),
nrow=3,
ncol=1)
m3
## [,1] [,2]
## [1,] 90 21
## [2,] 43 23
## [3,] -783 57
m4
## [,1]
## [1,] 37
## [2,] 12
## [3,] 0
cbind(m3, m4)
## [,1] [,2] [,3]
## [1,] 90 21 37
## [2,] 43 23 12
## [3,] -783 57 0
cbind(m4, m3)
## [,1] [,2] [,3]
## [1,] 37 90 21
## [2,] 12 43 23
## [3,] 0 -783 57
m5 = matrix(c(90, 43, -783, 21, 23, 57),
nrow=1,
ncol=2)
m5
## [,1] [,2]
## [1,] 90 43
m3
## [,1] [,2]
## [1,] 90 21
## [2,] 43 23
## [3,] -783 57
rbind(m3, m5)
## [,1] [,2]
## [1,] 90 21
## [2,] 43 23
## [3,] -783 57
## [4,] 90 43
rbind(m5, m3)
## [,1] [,2]
## [1,] 90 43
## [2,] 90 21
## [3,] 43 23
## [4,] -783 57
We can also cbind or rbind more than two matrices together.
rbind(c(3,4,5), c(2,1,3), c(6,5,4))
## [,1] [,2] [,3]
## [1,] 3 4 5
## [2,] 2 1 3
## [3,] 6 5 4
cbind(c(3,2,6), c(4,1,5), c(5,3,4))
## [,1] [,2] [,3]
## [1,] 3 4 5
## [2,] 2 1 3
## [3,] 6 5 4
Matrix rows and columns can be named. This allows us to use those names as indices for accessing elements, row and columns of the matrix.
m6 = matrix(c(90, 43, -783, 21, 23, 57, 23, 65, 78, 90, 32, 4, 3, 67, 0, 12, 32, 76, 32, 9),
nrow=5,
ncol=4)
m6
## [,1] [,2] [,3] [,4]
## [1,] 90 57 32 12
## [2,] 43 23 4 32
## [3,] -783 65 3 76
## [4,] 21 78 67 32
## [5,] 23 90 0 9
rownames(m6)
## NULL
colnames(m6)
## NULL
What is this NULL? It means, a list (more about this later!) of 0 items. Let’s try to provide names to rows and columns.
rownames(m6) <- c("r1", "r2", "r3", "r4", "r5")
m6
## [,1] [,2] [,3] [,4]
## r1 90 57 32 12
## r2 43 23 4 32
## r3 -783 65 3 76
## r4 21 78 67 32
## r5 23 90 0 9
colnames(m6) <- c("c1", "c2", "c3", "c4")
m6
## c1 c2 c3 c4
## r1 90 57 32 12
## r2 43 23 4 32
## r3 -783 65 3 76
## r4 21 78 67 32
## r5 23 90 0 9
We can also remove names by assigning NULL.
rownames(m6) <- NULL
m6
## c1 c2 c3 c4
## [1,] 90 57 32 12
## [2,] 43 23 4 32
## [3,] -783 65 3 76
## [4,] 21 78 67 32
## [5,] 23 90 0 9
colnames(m6) <- NULL
m6
## [,1] [,2] [,3] [,4]
## [1,] 90 57 32 12
## [2,] 43 23 4 32
## [3,] -783 65 3 76
## [4,] 21 78 67 32
## [5,] 23 90 0 9
The names can be changed and both row and column names can be assigned in a single step.
dimnames(m6) <- list(c("r-1", "r-2", "r-3", "r-4", "r-5"),
c("c-1", "c-2", "c-3", "c-4"))
m6
## c-1 c-2 c-3 c-4
## r-1 90 57 32 12
## r-2 43 23 4 32
## r-3 -783 65 3 76
## r-4 21 78 67 32
## r-5 23 90 0 9
Similarly, we can remove them in a single step.
dimnames(m6) <- list(NULL, NULL)
m6
## [,1] [,2] [,3] [,4]
## [1,] 90 57 32 12
## [2,] 43 23 4 32
## [3,] -783 65 3 76
## [4,] 21 78 67 32
## [5,] 23 90 0 9
Let’s do some matrix operations.
mA <- cbind(matrix(rnorm(20), 4), c(1, 2, 4, 6))
mA
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -2.4131887 -0.9644539 -0.5149461 -1.2951568 -1.68376254 1
## [2,] -1.5674913 1.3081971 2.4307926 0.8757151 -1.27309222 2
## [3,] 0.6428604 1.0977363 0.5369375 1.8720059 -0.48050814 4
## [4,] -0.5079755 -0.3424555 -0.4709463 -1.1729805 0.04627785 6
mB <- cbind(matrix(rnorm(20), 4), c(8, 3, 6, 7))
mB
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.5625534 2.4192403 -0.5176799 0.3967884 -1.6512427 8
## [2,] 1.6124261 -1.4256384 1.2699589 0.7929031 0.6220068 3
## [3,] -0.2891407 -0.4104651 1.5511732 -0.6885141 -0.4071293 6
## [4,] -0.4522428 0.1269211 -1.9408871 0.6740879 0.6477337 7
To add and subtract two matrices, we use + and - operators.
mA + mB
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.85063522 1.4547864 -1.032626 -0.8983684 -3.3350053 9
## [2,] 0.04493474 -0.1174413 3.700751 1.6686182 -0.6510854 5
## [3,] 0.35371962 0.6872712 2.088111 1.1834918 -0.8876374 10
## [4,] -0.96021825 -0.2155344 -2.411833 -0.4988926 0.6940116 13
mA - mB
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -3.97574211 -3.3836941 0.002733768 -1.69194519 -0.03251979 -7
## [2,] -3.17991740 2.7338354 1.160833702 0.08281202 -1.89509901 -1
## [3,] 0.93200109 1.5082014 -1.014235697 2.56052006 -0.07337888 -2
## [4,] -0.05573267 -0.4693766 1.469940870 -1.84706838 -0.60145588 -1
Element wise multiplication and division is done using * and ‘/’
mA * mB
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -3.7707363 -2.33324567 0.2665773 -0.5139032 2.78030067 8
## [2,] -2.5274639 -1.86501591 3.0870066 0.6943572 -0.79187200 6
## [3,] -0.1858771 -0.45058242 0.8328831 -1.2889025 0.19562892 24
## [4,] 0.2297282 -0.04346483 0.9140536 -0.7906919 0.02997573 42
mA / mB
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -1.5443879 -0.3986598 0.9947192 -3.264100 1.0196941 0.1250000
## [2,] -0.9721322 -0.9176220 1.9140719 1.104442 -2.0467497 0.6666667
## [3,] -2.2233476 -2.6743719 0.3461493 -2.718907 1.1802348 0.6666667
## [4,] 1.1232362 -2.6981759 0.2426449 -1.740100 0.0714458 0.8571429
These operations can be done with a constant. In that case R recycles the constant.
mA + 10
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 7.586811 9.035546 9.485054 8.704843 8.316237 11
## [2,] 8.432509 11.308197 12.430793 10.875715 8.726908 12
## [3,] 10.642860 11.097736 10.536938 11.872006 9.519492 14
## [4,] 9.492025 9.657545 9.529054 8.827019 10.046278 16
10 + mA
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 7.586811 9.035546 9.485054 8.704843 8.316237 11
## [2,] 8.432509 11.308197 12.430793 10.875715 8.726908 12
## [3,] 10.642860 11.097736 10.536938 11.872006 9.519492 14
## [4,] 9.492025 9.657545 9.529054 8.827019 10.046278 16
mA - 10
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -12.41319 -10.964454 -10.514946 -11.295157 -11.683763 -9
## [2,] -11.56749 -8.691803 -7.569207 -9.124285 -11.273092 -8
## [3,] -9.35714 -8.902264 -9.463062 -8.127994 -10.480508 -6
## [4,] -10.50798 -10.342455 -10.470946 -11.172981 -9.953722 -4
10 - mA
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 12.41319 10.964454 10.514946 11.295157 11.683763 9
## [2,] 11.56749 8.691803 7.569207 9.124285 11.273092 8
## [3,] 9.35714 8.902264 9.463062 8.127994 10.480508 6
## [4,] 10.50798 10.342455 10.470946 11.172981 9.953722 4
mA / 10
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.24131887 -0.09644539 -0.05149461 -0.12951568 -0.168376254 0.1
## [2,] -0.15674913 0.13081971 0.24307926 0.08757151 -0.127309222 0.2
## [3,] 0.06428604 0.10977363 0.05369375 0.18720059 -0.048050814 0.4
## [4,] -0.05079755 -0.03424555 -0.04709463 -0.11729805 0.004627785 0.6
10 / mA
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -4.143895 -10.368562 -19.419507 -7.721073 -5.93908 10.000000
## [2,] -6.379621 7.644108 4.113885 11.419239 -7.85489 5.000000
## [3,] 15.555478 9.109656 18.624141 5.341863 -20.81130 2.500000
## [4,] -19.685990 -29.200875 -21.233845 -8.525291 216.08608 1.666667
mA * 10
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -24.131887 -9.644539 -5.149461 -12.951568 -16.8376254 10
## [2,] -15.674913 13.081971 24.307926 8.757151 -12.7309222 20
## [3,] 6.428604 10.977363 5.369375 18.720059 -4.8050814 40
## [4,] -5.079755 -3.424555 -4.709463 -11.729805 0.4627785 60
10 * mA
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -24.131887 -9.644539 -5.149461 -12.951568 -16.8376254 10
## [2,] -15.674913 13.081971 24.307926 8.757151 -12.7309222 20
## [3,] 6.428604 10.977363 5.369375 18.720059 -4.8050814 40
## [4,] -5.079755 -3.424555 -4.709463 -11.729805 0.4627785 60
To perform the mathematical matrix multiplication, we use the %*% operator.
mA %*% t(mA)
## [,1] [,2] [,3] [,4]
## [1,] 12.5313078 4.278626 -0.5020329 9.239908
## [2,] 4.2786259 16.464802 11.9846362 10.117362
## [3,] -0.5020329 11.984636 21.6418906 20.826585
## [4,] 9.2399079 10.117362 20.8265848 37.975130
To calculate the row and column means and sums of matrices conveniently, R provides appropriately named functions.
rowMeans(mA)
## [1] -0.9785847 0.6290202 1.2781720 0.5919867
colMeans(mA)
## [1] -0.96144877 0.27475599 0.49545943 0.06989594 -0.84777126 3.25000000
rowSums(mA)
## [1] -5.871508 3.774121 7.669032 3.551920
colSums(mA)
## [1] -3.8457951 1.0990240 1.9818377 0.2795838 -3.3910850 13.0000000
We can also assign to matrices and submatrices. The following snippets of code explain how.
y <- matrix(nrow = 3, ncol = 2)
y
## [,1] [,2]
## [1,] NA NA
## [2,] NA NA
## [3,] NA NA
y[1, 1] <- 11
y[2, 1] <- 12
y[3, 1] <- 56
y[1, 2] <- 13
y[2, 2] <- 14
y[3, 2] <- 89
y
## [,1] [,2]
## [1,] 11 13
## [2,] 12 14
## [3,] 56 89
y[c(1, 3), c(1, 2)] <- matrix(1:4, nrow = 2)
y
## [,1] [,2]
## [1,] 1 3
## [2,] 12 14
## [3,] 2 4
Here is another interesting example.
x <- matrix(nrow=3,ncol=3)
y <- matrix(c(7, 8, 9, 10), nrow = 2)
x
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA NA NA
## [3,] NA NA NA
y
## [,1] [,2]
## [1,] 7 9
## [2,] 8 10
x[2:3, 2:3] <- y
x
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] NA 7 9
## [3,] NA 8 10
Just like the case of vectors, we can use negative indices too. Negative indices are used to exclude rows or columns. For example -
m1
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
m1[-2, ]
## [,1] [,2]
## [1,] 1 2
## [2,] 5 6
Vectors and Matrices always contain same type of data values. A List is like a vector that can contain objects of different types.
numbers <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
characters <- c("aa", "bb", "cc", "dd", "ee", "ff")
logicals <- c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE)
l1 <- list(numbers, characters, logicals, 12, "abcd", TRUE)
l1
## [[1]]
## [1] 0 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
##
## [[3]]
## [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE
##
## [[4]]
## [1] 12
##
## [[5]]
## [1] "abcd"
##
## [[6]]
## [1] TRUE
With lists we can reference an individual element or retrieve a slice (sublist). Notice the difference between using [[]] and [].
l1[[1]]
## [1] 0 1 2 3 4 5 6 7 8 9
l1[[2]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
l1[[3]]
## [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE
l1[[4]]
## [1] 12
l1[[5]]
## [1] "abcd"
l1[[6]]
## [1] TRUE
l1[1]
## [[1]]
## [1] 0 1 2 3 4 5 6 7 8 9
l1[2]
## [[1]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
l1[3]
## [[1]]
## [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE
l1[4]
## [[1]]
## [1] 12
l1[5]
## [[1]]
## [1] "abcd"
l1[6]
## [[1]]
## [1] TRUE
l1[c(1, 3, 5)]
## [[1]]
## [1] 0 1 2 3 4 5 6 7 8 9
##
## [[2]]
## [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE
##
## [[3]]
## [1] "abcd"
l1[c(2, 4, 6)]
## [[1]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
##
## [[2]]
## [1] 12
##
## [[3]]
## [1] TRUE
If we want to modify the list contents, an interesting thing happens.
l1[[1]]
## [1] 0 1 2 3 4 5 6 7 8 9
l1[[1]][1] <- 299792458
l1
## [[1]]
## [1] 299792458 1 2 3 4 5 6
## [8] 7 8 9
##
## [[2]]
## [1] "aa" "bb" "cc" "dd" "ee" "ff"
##
## [[3]]
## [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE
##
## [[4]]
## [1] 12
##
## [[5]]
## [1] "abcd"
##
## [[6]]
## [1] TRUE
numbers
## [1] 0 1 2 3 4 5 6 7 8 9
Just like matrices, we can provide names to list elements. Notice how the list gets printed.
l2 <- list(primes = c(2, 3, 5, 7, 11, 13),
composites = c(4, 6, 8, 9, 10, 12),
complexes = c(3+1i, 4+6i, 8 + 9i))
l2
## $primes
## [1] 2 3 5 7 11 13
##
## $composites
## [1] 4 6 8 9 10 12
##
## $complexes
## [1] 3+1i 4+6i 8+9i
To access the elements, we can use the following approaches.
l2$primes
## [1] 2 3 5 7 11 13
l2["primes"]
## $primes
## [1] 2 3 5 7 11 13
l2[["primes"]]
## [1] 2 3 5 7 11 13
l2[1]
## $primes
## [1] 2 3 5 7 11 13
l2[[1]]
## [1] 2 3 5 7 11 13
l2[c("composites", "complexes")]
## $composites
## [1] 4 6 8 9 10 12
##
## $complexes
## [1] 3+1i 4+6i 8+9i
What if we want to access the list variables directly?
# primes, complexes and composites are not defined
attach(l2)
primes #primes is defined now
## [1] 2 3 5 7 11 13
complexes #complexes is defined now
## [1] 3+1i 4+6i 8+9i
composites #composites is defined now
## [1] 4 6 8 9 10 12
detach(l2)
# primes, complexes and composites are again not defined.
This essentially attaches the list to search path and allows the list members to be accessed by their names directly. If you attach then make sure you call detach as soon as possible.
Data Frames are the most useful data structure in R. A Data Frame is a List in which all vectors are of same size. It is useful to store tables of data, like a spreadsheet.
firstNames <- c("John", "Oliver", "Mack", "Ron")
secondNames <- c("Green", "Twist", "Brown", "Weasely")
grades <- c("A", "B", "C", "A")
df <- data.frame(firstNames, secondNames, grades)
df
## firstNames secondNames grades
## 1 John Green A
## 2 Oliver Twist B
## 3 Mack Brown C
## 4 Ron Weasely A
Notice how, in the previous example, the column names are automatically taken to be the vector names.
In R, there is a data frame called mtcars provided by default. We will use this for the rest of the lecture.
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
This is too much output. If we are just interested in inspecting the first few lines of the data frame, we can do the following.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
To get information about the number of rows, row names, number of columns and column names of a data frame, we can do the following.
nrow(mtcars)
## [1] 32
rownames(mtcars)
## [1] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710"
## [4] "Hornet 4 Drive" "Hornet Sportabout" "Valiant"
## [7] "Duster 360" "Merc 240D" "Merc 230"
## [10] "Merc 280" "Merc 280C" "Merc 450SE"
## [13] "Merc 450SL" "Merc 450SLC" "Cadillac Fleetwood"
## [16] "Lincoln Continental" "Chrysler Imperial" "Fiat 128"
## [19] "Honda Civic" "Toyota Corolla" "Toyota Corona"
## [22] "Dodge Challenger" "AMC Javelin" "Camaro Z28"
## [25] "Pontiac Firebird" "Fiat X1-9" "Porsche 914-2"
## [28] "Lotus Europa" "Ford Pantera L" "Ferrari Dino"
## [31] "Maserati Bora" "Volvo 142E"
ncol(mtcars)
## [1] 11
colnames(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
To access elements, we can use the familiar matrix like syntax.
mtcars[2, 3]
## [1] 160
mtcars["Mazda RX4 Wag", "disp"]
## [1] 160
To access a column vector, we can use [[]] operator.
mtcars[[3]]
## [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
## [23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
mtcars[["disp"]]
## [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
## [23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
Since a data frame is a type of list, we can also use the $ operator.
mtcars$disp
## [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
## [23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
Another way to do the same is as follows.
mtcars[, "disp"]
## [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
## [23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
If we want a column slice of the data frame we can use the [] operator.
mtcars["disp"]
## disp
## Mazda RX4 160.0
## Mazda RX4 Wag 160.0
## Datsun 710 108.0
## Hornet 4 Drive 258.0
## Hornet Sportabout 360.0
## Valiant 225.0
## Duster 360 360.0
## Merc 240D 146.7
## Merc 230 140.8
## Merc 280 167.6
## Merc 280C 167.6
## Merc 450SE 275.8
## Merc 450SL 275.8
## Merc 450SLC 275.8
## Cadillac Fleetwood 472.0
## Lincoln Continental 460.0
## Chrysler Imperial 440.0
## Fiat 128 78.7
## Honda Civic 75.7
## Toyota Corolla 71.1
## Toyota Corona 120.1
## Dodge Challenger 318.0
## AMC Javelin 304.0
## Camaro Z28 350.0
## Pontiac Firebird 400.0
## Fiat X1-9 79.0
## Porsche 914-2 120.3
## Lotus Europa 95.1
## Ford Pantera L 351.0
## Ferrari Dino 145.0
## Maserati Bora 301.0
## Volvo 142E 121.0
This is different from
mtcars[["disp"]]
## [1] 160.0 160.0 108.0 258.0 360.0 225.0 360.0 146.7 140.8 167.6 167.6
## [12] 275.8 275.8 275.8 472.0 460.0 440.0 78.7 75.7 71.1 120.1 318.0
## [23] 304.0 350.0 400.0 79.0 120.3 95.1 351.0 145.0 301.0 121.0
A column slice is a valid data frame. It preserves the row names as well. To get multiple columns, we do the obvious -
mtcars[c("mpg", "cyl", "disp")]
## mpg cyl disp
## Mazda RX4 21.0 6 160.0
## Mazda RX4 Wag 21.0 6 160.0
## Datsun 710 22.8 4 108.0
## Hornet 4 Drive 21.4 6 258.0
## Hornet Sportabout 18.7 8 360.0
## Valiant 18.1 6 225.0
## Duster 360 14.3 8 360.0
## Merc 240D 24.4 4 146.7
## Merc 230 22.8 4 140.8
## Merc 280 19.2 6 167.6
## Merc 280C 17.8 6 167.6
## Merc 450SE 16.4 8 275.8
## Merc 450SL 17.3 8 275.8
## Merc 450SLC 15.2 8 275.8
## Cadillac Fleetwood 10.4 8 472.0
## Lincoln Continental 10.4 8 460.0
## Chrysler Imperial 14.7 8 440.0
## Fiat 128 32.4 4 78.7
## Honda Civic 30.4 4 75.7
## Toyota Corolla 33.9 4 71.1
## Toyota Corona 21.5 4 120.1
## Dodge Challenger 15.5 8 318.0
## AMC Javelin 15.2 8 304.0
## Camaro Z28 13.3 8 350.0
## Pontiac Firebird 19.2 8 400.0
## Fiat X1-9 27.3 4 79.0
## Porsche 914-2 26.0 4 120.3
## Lotus Europa 30.4 4 95.1
## Ford Pantera L 15.8 8 351.0
## Ferrari Dino 19.7 6 145.0
## Maserati Bora 15.0 8 301.0
## Volvo 142E 21.4 4 121.0
Note that we can also use numbers, though they are less intuitive.
mtcars[c(1, 2, 3)]
## mpg cyl disp
## Mazda RX4 21.0 6 160.0
## Mazda RX4 Wag 21.0 6 160.0
## Datsun 710 22.8 4 108.0
## Hornet 4 Drive 21.4 6 258.0
## Hornet Sportabout 18.7 8 360.0
## Valiant 18.1 6 225.0
## Duster 360 14.3 8 360.0
## Merc 240D 24.4 4 146.7
## Merc 230 22.8 4 140.8
## Merc 280 19.2 6 167.6
## Merc 280C 17.8 6 167.6
## Merc 450SE 16.4 8 275.8
## Merc 450SL 17.3 8 275.8
## Merc 450SLC 15.2 8 275.8
## Cadillac Fleetwood 10.4 8 472.0
## Lincoln Continental 10.4 8 460.0
## Chrysler Imperial 14.7 8 440.0
## Fiat 128 32.4 4 78.7
## Honda Civic 30.4 4 75.7
## Toyota Corolla 33.9 4 71.1
## Toyota Corona 21.5 4 120.1
## Dodge Challenger 15.5 8 318.0
## AMC Javelin 15.2 8 304.0
## Camaro Z28 13.3 8 350.0
## Pontiac Firebird 19.2 8 400.0
## Fiat X1-9 27.3 4 79.0
## Porsche 914-2 26.0 4 120.3
## Lotus Europa 30.4 4 95.1
## Ford Pantera L 15.8 8 351.0
## Ferrari Dino 19.7 6 145.0
## Maserati Bora 15.0 8 301.0
## Volvo 142E 21.4 4 121.0
This allows us to retrieve a specific set of columns. What if we want to retrieve a specific set of rows?
mtcars[1, ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
mtcars["Mazda RX4", ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
As usual, we can use names or vector of names or vector of numbers.
mtcars[c(1, 2, 3), ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
mtcars[c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710"), ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Now, we can generalize this to get a specific slice of the data frame, limiting both the rows and the columns.
mtcars[c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710"), c("mpg", "cyl", "disp")]
## mpg cyl disp
## Mazda RX4 21.0 6 160
## Mazda RX4 Wag 21.0 6 160
## Datsun 710 22.8 4 108
A lot of times, we need to slice data frames based on specific properties satisfied by the values of a column. In this case, let’s say we want a data frame of all the cars whose mpg is greater than 20.0. We can do this in two steps.
indices <- mtcars$mpg > 20.0
indices
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
## [23] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
mtcars[indices, ]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2