1 Basic Building Blocks

R can be used as an interactive calculator.

5+7
## [1] 12
5-7
## [1] -2
5*3
## [1] 15
5^2
## [1] 25

Any object that contains data is called a data structure and numeric vectors are the simplest type of data structure in R. The easiest way to create a vector is with the c() function.

c(5,6,7)
## [1] 5 6 7

You can combine vectors to make a new vector.

x<-c(4,5)
c(1,2,3,x)
## [1] 1 2 3 4 5

Numeric vectors can be used in arithmetic expressions.

x*2+100
## [1] 108 110

common arithmetic operators are +, -, /, and ^ (where x^2 means ‘x squared’). To take the square root, use the sqrt() function and to take the absolute value, use the abs() function.

sqrt(144)
## [1] 12
abs(4-5)
## [1] 1

When given two vectors of the same length, R simply performs the specified arithmetic operation element-by-element. If the vectors are of different lengths, R ‘recycles’ the shorter vector until it is the same length as the longer vector.

c(1,2,3)+c(0,1,2)
## [1] 1 3 5
c(1,2,3,4)+c(100,101)
## [1] 101 103 103 105

In many programming environments, the up arrow will cycle through previous commands.

2 Sequences of Numbers

The simplest way to create a sequence of numbers in R is by using the : operator.

1:20
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
pi:10 # sequence of real numbers
## [1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
10:pi # sequence of integers
## [1] 10  9  8  7  6  5  4
15:1 # It counted backwards in increments of 1! It's unlikely we'd want this behavior, but nonetheless it's good to know how it could happen.
##  [1] 15 14 13 12 11 10  9  8  7  6  5  4  3  2  1

If you have questions about a particular R function, you can access its documentation with a question mark followed by the function name: ?function_name_here. However, in the case of an operator like the colon used above, you must enclose the symbol in backticks like this: ?:.

?`:`
seq(1,20) # This gives us the same output as 1:20.
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

You are still using the seq() function here, but this time with an extra argument that tells R you want to increment your sequence by 0.5.

seq(1,10,by=.5)
##  [1]  1.0  1.5  2.0  2.5  3.0  3.5  4.0  4.5  5.0  5.5  6.0  6.5  7.0  7.5  8.0
## [16]  8.5  9.0  9.5 10.0

we just want a sequence of 30 numbers between 5 and 10.

seq(5,10,length=30)
##  [1]  5.000000  5.172414  5.344828  5.517241  5.689655  5.862069  6.034483
##  [8]  6.206897  6.379310  6.551724  6.724138  6.896552  7.068966  7.241379
## [15]  7.413793  7.586207  7.758621  7.931034  8.103448  8.275862  8.448276
## [22]  8.620690  8.793103  8.965517  9.137931  9.310345  9.482759  9.655172
## [29]  9.827586 10.000000

Let’s create a vector of length 30.

x<-1:30
seq(along.with=x) # this generates the integer sequence 1,2,,,,,length(along.with)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

R has separate built-in functions.

seq_along(x)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
seq_len(length(x))
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

One more function related to creating sequences of numbers is rep(), which stands for ‘replicate’.

rep(0,times=5)
## [1] 0 0 0 0 0
rep(c(1,2,3),times=3)
## [1] 1 2 3 1 2 3 1 2 3
rep(c(1,2,3),each=3) # to create a vector which contains 3 ones, then 3 twos, then 3 threes.
## [1] 1 1 1 2 2 2 3 3 3

3 Vectors

The simplest and most common data structure in R is the vector. Vectors come in two different flavors: atomic vectors and lists. An atomic vector contains exactly one data type, whereas a list may contain multiple data types.

3.1 logical vectors

Logical vectors can contain the values TRUE, FALSE, and NA.

x<-c(.5,.67,4,5,6)
y<-x>1
y # y is a logical vector. The statement x>1 is a condition and y tells us whether each corresponding element of our numeric vector x satisfies this condition.
## [1] FALSE FALSE  TRUE  TRUE  TRUE

If we have two logical expressions, A and B, we can ask whether at least one is TRUE with A | B (logical ‘or’ a.k.a. ‘union’) or whether they are both TRUE with A & B (logical ‘and’ a.k.a. ‘intersection’). Lastly, !A is the negation of A and is TRUE when A is FALSE and vice versa.

(3>5) & (1==1)
## [1] FALSE
(3>5) | (1==1)
## [1] TRUE
(3<5) & (1==1)
## [1] TRUE
(!3>5) & (1==1)
## [1] TRUE

3.2 Character vectors

a<-c('I','am','Joy')
a
## [1] "I"   "am"  "Joy"
length(a) # to check no of elements 
## [1] 3

Let’s say we want to join the elements of a together into one continuous character string(i.e. a character vector of length 1). We can do this using the paste() function.

paste(a,collapse=' ') # collapse: an optional character string to separate the results
## [1] "I am Joy"

In this example, we used the paste() function to collapse the elements of a single character vector. paste() can also be used to join the elements of multiple character vectors.

paste('Happy','coding!',sep=' ')
## [1] "Happy coding!"
paste(1:3,c('X','Y','Z'),sep='') # sep = "" to leave no space between the joined elements.
## [1] "1X" "2Y" "3Z"
x<-LETTERS
paste(x,1:4,sep='-')
##  [1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4"
## [13] "M-1" "N-2" "O-3" "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4"
## [25] "Y-1" "Z-2"

4 Missing values

In R, NA is used to represent any value that is ‘not available’ or ‘missing’(in the statistical sense). Any operation involving NA generally yields NA as the result.

x<-c(1,NA,2,3,NA)
x*100
## [1] 100  NA 200 300  NA
x+100
## [1] 101  NA 102 103  NA

The is.na() function tells us whether each element of a vector is NA.

y<-is.na(x)
y
## [1] FALSE  TRUE FALSE FALSE  TRUE

NA is not really a value, but just a placeholder for a quantity that is not available. For this reason you got a vector of all NAs in case of x==NA.

x == NA
## [1] NA NA NA NA NA

We have a vector, y, that has a TRUE for every NA and FALSE for every numeric value, we can compute the total number of NAs in our data. R represents TRUE as the number 1 and FALSE as the number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs , which is the total number of NAs.

sum(y)
## [1] 2

Let’s look at a second type of missing value – NaN, which stands for ‘not a number’.

x<-0/0
x
## [1] NaN
1/0 #  In R, Inf stands for infinity.
## [1] Inf

A NaN value is also NA but the converse isn’t true.

m<-c(.5,.6,NA,0,NaN)
is.na(m)
## [1] FALSE FALSE  TRUE FALSE  TRUE
is.nan(m)
## [1] FALSE FALSE FALSE FALSE  TRUE

5 Subsetting Objects

5.1 Basics

There are a number of operators that can be used to extract subsets of R objects.

  1. [ always returns an object of the same class as the original; can be used to select more than one element(there is one exception)

  2. [[ is used to extract elements of a list or a data frame; can return an object which isn’t a list or a data frame and is used to extract only one element.

  3. $ is is used to extract elements of a list or a data frame by name.

x<-c('a','b','c','d')
x[1] # numeric index
## [1] "a"
x[1:3] # numeric index
## [1] "a" "b" "c"
x[x>'b'] # logical index
## [1] "c" "d"
y<-c(10,12,7,8,14)
y[y>10] # logical index
## [1] 12 14

5.2 Subsetting Lists

In this case, [ , [[ or $ all can be used. Let’s create a vector.

a<-list(foo=1:4,bar=.5)
a[1] # always returns a list since 'a' is a list
## $foo
## [1] 1 2 3 4
a[[1]] 
## [1] 1 2 3 4
a$foo
## [1] 1 2 3 4
a['foo'] # always returns a list since 'a' is a list
## $foo
## [1] 1 2 3 4
a[['foo']]
## [1] 1 2 3 4

To Extract multiple elements, we can only use [

my_list<-list(a=c(2,3,4),b=.5,c='R')
my_list[c(2,3)]
## $b
## [1] 0.5
## 
## $c
## [1] "R"
name<-'b'
my_list[[name]]
## [1] 0.5
my_list$name
## NULL
my_list[[c(1,3)]] # to extract nested elements 
## [1] 4
my_list[[1]][[3]]
## [1] 4

5.3 Subsetting Matrices

x<-matrix(1:6,nrow=2,ncol=3,byrow=TRUE)
x
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
x[1,2] # to extract element of 1st row and 2nd column
## [1] 2
x[1,] # to extract 1st row
## [1] 1 2 3
x[,2] # to extract 2nd column
## [1] 2 5
x[1,2,drop=FALSE] # to get the output as a matrix
##      [,1]
## [1,]    2
x[1,,drop=FALSE] # to get the output as a matrix
##      [,1] [,2] [,3]
## [1,]    1    2    3
x[,2,drop=FALSE] # to get the output as a matrix
##      [,1]
## [1,]    2
## [2,]    5

5.4 Partial Matching

It is useful when the object you’re working with has very long elements names.

x<-list(aaarkk=.5,b=c('a','b'))
x$a
## [1] 0.5
x[['a']]
## NULL
x[['a',exact=FALSE]]
## [1] 0.5

5.5 Removing NA Values

x<-c(1,NA,2,NA)
y<-c('a',NA,'b',NA)
good<-complete.cases(x,y) # Return a logical vector indicating which cases have no missing values.
good
## [1]  TRUE FALSE  TRUE FALSE
x[good]
## [1] 1 2
y[good]
## [1] "a" "b"

6 Matrices

Matrices are vectors with a dimension attribute. Let’s create a matrix by using matrix() function.

x<-matrix(1:20,nrow=4,ncol=5,byrow=TRUE)
x
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    2    3    4    5
## [2,]    6    7    8    9   10
## [3,]   11   12   13   14   15
## [4,]   16   17   18   19   20
attributes(x)
## $dim
## [1] 4 5

Matrices can also be created directly from a vector.

y<-c(1,2,3,4,5,6)
dim(y)<-c(2,3)
y
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
m<-c('a','b','c')
n<-c('d','e','f')
cbind(m,n)
##      m   n  
## [1,] "a" "d"
## [2,] "b" "e"
## [3,] "c" "f"
rbind(m,n)
##   [,1] [,2] [,3]
## m "a"  "b"  "c" 
## n "d"  "e"  "f"

Matrices can only contain ONE class of data.

vector1<-1:6
vector2<-c('a','b','c','d','e','f')
cbind(vector1,vector2)
##      vector1 vector2
## [1,] "1"     "a"    
## [2,] "2"     "b"    
## [3,] "3"     "c"    
## [4,] "4"     "d"    
## [5,] "5"     "e"    
## [6,] "6"     "f"

7 Data Frames

Data frames can contain data of different classes.

names<-c('Bill', 'Gina', 'Kelly', 'Sean')
my_data<-matrix(1:20,nrow=4,ncol=5)
df<-data.frame(names,my_data)
df
##   names X1 X2 X3 X4 X5
## 1  Bill  1  5  9 13 17
## 2  Gina  2  6 10 14 18
## 3 Kelly  3  7 11 15 19
## 4  Sean  4  8 12 16 20
# use the colnames() function to set the `colnames` attribute for our data frame.
colnames(df)<-c('patient','age','weight','bp','rating','test')
df
##   patient age weight bp rating test
## 1    Bill   1      5  9     13   17
## 2    Gina   2      6 10     14   18
## 3   Kelly   3      7 11     15   19
## 4    Sean   4      8 12     16   20

Use str() to have a more compact view. Compactly display the internal structure of an R object.

str(list(a=.5,b=c('X','Y','Z')))
## List of 2
##  $ a: num 0.5
##  $ b: chr [1:3] "X" "Y" "Z"
str(list(a=.5,b=c(1,2,3)))
## List of 2
##  $ a: num 0.5
##  $ b: num [1:3] 1 2 3