In R, the square brackets [] are used for subsetting objects such as vectors, matrices, data frames.

Subsetting Vectors

  • Positive integer index: x[1] – returns the first element.
  • Negative integer index: x[-1] – excludes the first element.
  • Integer range: x[1:3] – returns elements 1 through 3.
  • Logical vector: x[x > 3] – returns elements greater than 3.
  • Character vector (for named vectors): x[“name”] – returns the element associated with “name”.
  • Empty index: x[] – returns the entire vector.

Subsetting 2-dimensional dataframe

  • Positive integer indices: mat[1, 2] – returns the element at row 1, column 2.
  • Row and column indices: mat[1:3, 2:4] – returns a submatrix consisting of rows 1 to 3 and columns 2 to 4.
  • Entire row: mat[1, ] – returns the first row.
  • Entire column: mat[, 2] – returns the second column.
  • Logical vector for rows/columns: mat[mat[,1] > 0, ] – returns rows where the first column values are greater than 0.
  • Character vector for column names: mat[, “column_name”] – returns the column with the name “column_name”.

dplyr packages select and filter

library(dslabs)
data(murders)
attach(murders)
library(dplyr)
select(murders,population)
##    population
## 1     4779736
## 2      710231
## 3     6392017
## 4     2915918
## 5    37253956
## 6     5029196
## 7     3574097
## 8      897934
## 9      601723
## 10   19687653
## 11    9920000
## 12    1360301
## 13    1567582
## 14   12830632
## 15    6483802
## 16    3046355
## 17    2853118
## 18    4339367
## 19    4533372
## 20    1328361
## 21    5773552
## 22    6547629
## 23    9883640
## 24    5303925
## 25    2967297
## 26    5988927
## 27     989415
## 28    1826341
## 29    2700551
## 30    1316470
## 31    8791894
## 32    2059179
## 33   19378102
## 34    9535483
## 35     672591
## 36   11536504
## 37    3751351
## 38    3831074
## 39   12702379
## 40    1052567
## 41    4625364
## 42     814180
## 43    6346105
## 44   25145561
## 45    2763885
## 46     625741
## 47    8001024
## 48    6724540
## 49    1852994
## 50    5686986
## 51     563626
filter(murders,total < 135)
##                   state abb        region population total
## 1                Alaska  AK          West     710231    19
## 2              Arkansas  AR         South    2915918    93
## 3              Colorado  CO          West    5029196    65
## 4           Connecticut  CT     Northeast    3574097    97
## 5              Delaware  DE         South     897934    38
## 6  District of Columbia  DC         South     601723    99
## 7                Hawaii  HI          West    1360301     7
## 8                 Idaho  ID          West    1567582    12
## 9                  Iowa  IA North Central    3046355    21
## 10               Kansas  KS North Central    2853118    63
## 11             Kentucky  KY         South    4339367   116
## 12                Maine  ME     Northeast    1328361    11
## 13        Massachusetts  MA     Northeast    6547629   118
## 14            Minnesota  MN North Central    5303925    53
## 15          Mississippi  MS         South    2967297   120
## 16              Montana  MT          West     989415    12
## 17             Nebraska  NE North Central    1826341    32
## 18               Nevada  NV          West    2700551    84
## 19        New Hampshire  NH     Northeast    1316470     5
## 20           New Mexico  NM          West    2059179    67
## 21         North Dakota  ND North Central     672591     4
## 22             Oklahoma  OK         South    3751351   111
## 23               Oregon  OR          West    3831074    36
## 24         Rhode Island  RI     Northeast    1052567    16
## 25         South Dakota  SD North Central     814180     8
## 26                 Utah  UT          West    2763885    22
## 27              Vermont  VT     Northeast     625741     2
## 28           Washington  WA          West    6724540    93
## 29        West Virginia  WV         South    1852994    27
## 30            Wisconsin  WI North Central    5686986    97
## 31              Wyoming  WY          West     563626     5
filter(murders,total < 135, region=="South")
##                  state abb region population total
## 1             Arkansas  AR  South    2915918    93
## 2             Delaware  DE  South     897934    38
## 3 District of Columbia  DC  South     601723    99
## 4             Kentucky  KY  South    4339367   116
## 5          Mississippi  MS  South    2967297   120
## 6             Oklahoma  OK  South    3751351   111
## 7        West Virginia  WV  South    1852994    27

Sorting Ordering

Now that we have mastered some basic R knowledge, let’s try to gain some insights into the safety of different states in the context of gun murders.

Say we want to rank the states from least to most gun murders. The function sort sorts a vector in increasing order. We can therefore see the largest number of gun murders by typing:

sort(total)
##  [1]    2    4    5    5    7    8   11   12   12   16   19   21   22   27   32
## [16]   36   38   53   63   65   67   84   93   93   97   97   99  111  116  118
## [31]  120  135  142  207  219  232  246  250  286  293  310  321  351  364  376
## [46]  413  457  517  669  805 1257

However, this does not give us information about which states have which murder totals. For example, we don’t know which state had 1257.

Order takes a vector as input and returns the vector of indexes that sorts the input vector.

index <- order(total)
index 
##  [1] 46 35 30 51 12 42 20 13 27 40  2 16 45 49 28 38  8 24 17  6 32 29  4 48  7
## [26] 50  9 37 18 22 25  1 15 41 43  3 31 47 34 21 36 26 19 14 11 23 39 33 10 44
## [51]  5

####The 46th entry of total is the smallest, so order(x) starts with 46. The next smallest is the 35th entry, so the second entry is 3 and so on.

total[index] == sort(total)
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [46] TRUE TRUE TRUE TRUE TRUE TRUE

How does this help us order the states by murders?

First, remember that the entries of vectors you access with $ follow the same order as the rows in the table. For example, these two vectors containing state names and abbreviations, respectively, are matched by their order:

state[1:6]
## [1] "Alabama"    "Alaska"     "Arizona"    "Arkansas"   "California"
## [6] "Colorado"
abb[1:6]
## [1] "AL" "AK" "AZ" "AR" "CA" "CO"

This means we can order the state names by their total murders. We first obtain the index that orders the vectors according to murder totals and then index the state names vector:

index <- order(total) 
abb[index] 
##  [1] "VT" "ND" "NH" "WY" "HI" "SD" "ME" "ID" "MT" "RI" "AK" "IA" "UT" "WV" "NE"
## [16] "OR" "DE" "MN" "KS" "CO" "NM" "NV" "AR" "WA" "CT" "WI" "DC" "OK" "KY" "MA"
## [31] "MS" "AL" "IN" "SC" "TN" "AZ" "NJ" "VA" "NC" "MD" "OH" "MO" "LA" "IL" "GA"
## [46] "MI" "PA" "NY" "FL" "TX" "CA"

The default sort order is increasing.

index <- sort(total,decreasing = TRUE)  

If we are only interested in the entry with the largest value, we can use max for the value:

abb[which.max(total)]
## [1] "CA"

Exercise

Elmentwise arithmetrics

In R, arithmetic operations on vectors occur element-wise. For a quick example, suppose we have height in inches:

inches <- c(69, 62, 66, 70, 70, 73, 67, 73, 67, 70)

and want to convert to centimeters. Notice what happens when we multiply inches by 2.54:

inches *2.54
##  [1] 175.26 157.48 167.64 177.80 177.80 185.42 170.18 185.42 170.18 177.80

In the line above, we multiplied each element by 2.54. Similarly, if for each entry we want to compute how many inches taller or shorter than 69 inches, the average height for males, we can subtract it from every entry like this:

inches - 70
##  [1] -1 -8 -4  0  0  3 -3  3 -3  0

This operation also applies on two same length vectors

murder_rate <- murders$total / murders$population 

Exercise

convert following city temperature form Fahrenheit to Celsius \(C=\frac{5\times(F-32)}{9}\)

temp_C <- c(35, 88, 42, 84, 81, 30)
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro",   "San Juan", "Toronto")
city_temps <- data.frame(name = city, temperature = temp_C)

Conditional Indexing

Suppose we want to look up California’s murder rate. The function which tells us which entries of a logical vector are TRUE.

ind <- which(murders$state == "California")
murder_rate[ind]
## [1] 3.374138e-05

If instead of just one state we want to find out the murder rates for several states, say New York, Florida, and Texas, we can use the function match.

ind <- match(c("New York", "Florida", "Texas"), murders$state) 

If rather than an index we want a logical that tells us whether or not each element of a first vector is in a second, we can use the function %in%.

c("Boston", "Dakota", "Washington") %in% murders$state
## [1] FALSE FALSE  TRUE