R can be used as an interactive calculator.
5+7## [1] 12
5-7## [1] -2
5*3## [1] 15
5^2## [1] 25
Any object that contains data is called a data structure and numeric vectors are the simplest type of data structure in R. The easiest way to create a vector is with the c() function.
c(5,6,7)## [1] 5 6 7
You can combine vectors to make a new vector.
x<-c(4,5)
c(1,2,3,x)## [1] 1 2 3 4 5
Numeric vectors can be used in arithmetic expressions.
x*2+100## [1] 108 110
common arithmetic operators are +, -, /, and ^ (where x^2 means ‘x squared’). To take the square root, use the sqrt() function and to take the absolute value, use the abs() function.
sqrt(144)## [1] 12
abs(4-5)## [1] 1
When given two vectors of the same length, R simply performs the specified arithmetic operation element-by-element. If the vectors are of different lengths, R ‘recycles’ the shorter vector until it is the same length as the longer vector.
c(1,2,3)+c(0,1,2)## [1] 1 3 5
c(1,2,3,4)+c(100,101)## [1] 101 103 103 105
In many programming environments, the up arrow will cycle through previous commands.
The simplest way to create a sequence of numbers in R is by using the : operator.
1:20## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
pi:10 # sequence of real numbers## [1] 3.141593 4.141593 5.141593 6.141593 7.141593 8.141593 9.141593
10:pi # sequence of integers## [1] 10 9 8 7 6 5 4
15:1 # It counted backwards in increments of 1! It's unlikely we'd want this behavior, but nonetheless it's good to know how it could happen.## [1] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
If you have questions about a particular R function, you can access its documentation with a question mark followed by the function name: ?function_name_here. However, in the case of an operator like the colon used above, you must enclose the symbol in backticks like this: ?:.
?`:`seq(1,20) # This gives us the same output as 1:20.## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
You are still using the seq() function here, but this time with an extra argument that tells R you want to increment your sequence by 0.5.
seq(1,10,by=.5)## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
## [16] 8.5 9.0 9.5 10.0
we just want a sequence of 30 numbers between 5 and 10.
seq(5,10,length=30)## [1] 5.000000 5.172414 5.344828 5.517241 5.689655 5.862069 6.034483
## [8] 6.206897 6.379310 6.551724 6.724138 6.896552 7.068966 7.241379
## [15] 7.413793 7.586207 7.758621 7.931034 8.103448 8.275862 8.448276
## [22] 8.620690 8.793103 8.965517 9.137931 9.310345 9.482759 9.655172
## [29] 9.827586 10.000000
Let’s create a vector of length 30.
x<-1:30
seq(along.with=x) # this generates the integer sequence 1,2,,,,,length(along.with)## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
R has separate built-in functions.
seq_along(x)## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
seq_len(length(x))## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
One more function related to creating sequences of numbers is rep(), which stands for ‘replicate’.
rep(0,times=5)## [1] 0 0 0 0 0
rep(c(1,2,3),times=3)## [1] 1 2 3 1 2 3 1 2 3
rep(c(1,2,3),each=3) # to create a vector which contains 3 ones, then 3 twos, then 3 threes.## [1] 1 1 1 2 2 2 3 3 3
The simplest and most common data structure in R is the vector. Vectors come in two different flavors: atomic vectors and lists. An atomic vector contains exactly one data type, whereas a list may contain multiple data types.
Logical vectors can contain the values TRUE, FALSE, and NA.
x<-c(.5,.67,4,5,6)
y<-x>1
y # y is a logical vector. The statement x>1 is a condition and y tells us whether each corresponding element of our numeric vector x satisfies this condition.## [1] FALSE FALSE TRUE TRUE TRUE
If we have two logical expressions, A and B, we can ask whether at least one is TRUE with A | B (logical ‘or’ a.k.a. ‘union’) or whether they are both TRUE with A & B (logical ‘and’ a.k.a. ‘intersection’). Lastly, !A is the negation of A and is TRUE when A is FALSE and vice versa.
(3>5) & (1==1)## [1] FALSE
(3>5) | (1==1)## [1] TRUE
(3<5) & (1==1)## [1] TRUE
(!3>5) & (1==1)## [1] TRUE
a<-c('I','am','Joy')
a## [1] "I" "am" "Joy"
length(a) # to check no of elements ## [1] 3
Let’s say we want to join the elements of a together into one continuous character string(i.e. a character vector of length 1). We can do this using the paste() function.
paste(a,collapse=' ') # collapse: an optional character string to separate the results## [1] "I am Joy"
In this example, we used the paste() function to collapse the elements of a single character vector. paste() can also be used to join the elements of multiple character vectors.
paste('Happy','coding!',sep=' ')## [1] "Happy coding!"
paste(1:3,c('X','Y','Z'),sep='') # sep = "" to leave no space between the joined elements.## [1] "1X" "2Y" "3Z"
x<-LETTERS
paste(x,1:4,sep='-')## [1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4"
## [13] "M-1" "N-2" "O-3" "P-4" "Q-1" "R-2" "S-3" "T-4" "U-1" "V-2" "W-3" "X-4"
## [25] "Y-1" "Z-2"
In R, NA is used to represent any value that is ‘not available’ or ‘missing’(in the statistical sense). Any operation involving NA generally yields NA as the result.
x<-c(1,NA,2,3,NA)
x*100## [1] 100 NA 200 300 NA
x+100## [1] 101 NA 102 103 NA
The is.na() function tells us whether each element of a vector is NA.
y<-is.na(x)
y## [1] FALSE TRUE FALSE FALSE TRUE
NA is not really a value, but just a placeholder for a quantity that is not available. For this reason you got a vector of all NAs in case of x==NA.
x == NA## [1] NA NA NA NA NA
We have a vector, y, that has a TRUE for every NA and FALSE for every numeric value, we can compute the total number of NAs in our data. R represents TRUE as the number 1 and FALSE as the number 0. Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs , which is the total number of NAs.
sum(y)## [1] 2
Let’s look at a second type of missing value – NaN, which stands for ‘not a number’.
x<-0/0
x## [1] NaN
1/0 # In R, Inf stands for infinity.## [1] Inf
A NaN value is also NA but the converse isn’t true.
m<-c(.5,.6,NA,0,NaN)
is.na(m)## [1] FALSE FALSE TRUE FALSE TRUE
is.nan(m)## [1] FALSE FALSE FALSE FALSE TRUE
There are a number of operators that can be used to extract subsets of R objects.
[ always returns an object of the same class as the original; can be used to select more than one element(there is one exception)
[[ is used to extract elements of a list or a data frame; can return an object which isn’t a list or a data frame and is used to extract only one element.
$ is is used to extract elements of a list or a data frame by name.
x<-c('a','b','c','d')
x[1] # numeric index## [1] "a"
x[1:3] # numeric index## [1] "a" "b" "c"
x[x>'b'] # logical index## [1] "c" "d"
y<-c(10,12,7,8,14)
y[y>10] # logical index## [1] 12 14
In this case, [ , [[ or $ all can be used. Let’s create a vector.
a<-list(foo=1:4,bar=.5)
a[1] # always returns a list since 'a' is a list## $foo
## [1] 1 2 3 4
a[[1]] ## [1] 1 2 3 4
a$foo## [1] 1 2 3 4
a['foo'] # always returns a list since 'a' is a list## $foo
## [1] 1 2 3 4
a[['foo']]## [1] 1 2 3 4
To Extract multiple elements, we can only use [
my_list<-list(a=c(2,3,4),b=.5,c='R')
my_list[c(2,3)]## $b
## [1] 0.5
##
## $c
## [1] "R"
name<-'b'
my_list[[name]]## [1] 0.5
my_list$name## NULL
my_list[[c(1,3)]] # to extract nested elements ## [1] 4
my_list[[1]][[3]]## [1] 4
x<-matrix(1:6,nrow=2,ncol=3,byrow=TRUE)
x## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
x[1,2] # to extract element of 1st row and 2nd column## [1] 2
x[1,] # to extract 1st row## [1] 1 2 3
x[,2] # to extract 2nd column## [1] 2 5
x[1,2,drop=FALSE] # to get the output as a matrix## [,1]
## [1,] 2
x[1,,drop=FALSE] # to get the output as a matrix## [,1] [,2] [,3]
## [1,] 1 2 3
x[,2,drop=FALSE] # to get the output as a matrix## [,1]
## [1,] 2
## [2,] 5
It is useful when the object you’re working with has very long elements names.
x<-list(aaarkk=.5,b=c('a','b'))
x$a## [1] 0.5
x[['a']]## NULL
x[['a',exact=FALSE]]## [1] 0.5
x<-c(1,NA,2,NA)
y<-c('a',NA,'b',NA)
good<-complete.cases(x,y) # Return a logical vector indicating which cases have no missing values.
good## [1] TRUE FALSE TRUE FALSE
x[good]## [1] 1 2
y[good]## [1] "a" "b"
Matrices are vectors with a dimension attribute. Let’s create a matrix by using matrix() function.
x<-matrix(1:20,nrow=4,ncol=5,byrow=TRUE)
x## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
## [3,] 11 12 13 14 15
## [4,] 16 17 18 19 20
attributes(x)## $dim
## [1] 4 5
Matrices can also be created directly from a vector.
y<-c(1,2,3,4,5,6)
dim(y)<-c(2,3)
y## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
m<-c('a','b','c')
n<-c('d','e','f')
cbind(m,n)## m n
## [1,] "a" "d"
## [2,] "b" "e"
## [3,] "c" "f"
rbind(m,n)## [,1] [,2] [,3]
## m "a" "b" "c"
## n "d" "e" "f"
Matrices can only contain ONE class of data.
vector1<-1:6
vector2<-c('a','b','c','d','e','f')
cbind(vector1,vector2)## vector1 vector2
## [1,] "1" "a"
## [2,] "2" "b"
## [3,] "3" "c"
## [4,] "4" "d"
## [5,] "5" "e"
## [6,] "6" "f"
Data frames can contain data of different classes.
names<-c('Bill', 'Gina', 'Kelly', 'Sean')
my_data<-matrix(1:20,nrow=4,ncol=5)
df<-data.frame(names,my_data)
df## names X1 X2 X3 X4 X5
## 1 Bill 1 5 9 13 17
## 2 Gina 2 6 10 14 18
## 3 Kelly 3 7 11 15 19
## 4 Sean 4 8 12 16 20
# use the colnames() function to set the `colnames` attribute for our data frame.
colnames(df)<-c('patient','age','weight','bp','rating','test')
df## patient age weight bp rating test
## 1 Bill 1 5 9 13 17
## 2 Gina 2 6 10 14 18
## 3 Kelly 3 7 11 15 19
## 4 Sean 4 8 12 16 20
Use str() to have a more compact view. Compactly display the internal structure of an R object.
str(list(a=.5,b=c('X','Y','Z')))## List of 2
## $ a: num 0.5
## $ b: chr [1:3] "X" "Y" "Z"
str(list(a=.5,b=c(1,2,3)))## List of 2
## $ a: num 0.5
## $ b: num [1:3] 1 2 3