In this first script, you will be introduced to the main functions of R. The learning objectives of this script are:
Please read the following code instructions carefully, try to understand the code that follows each instruction, execute it and see what happens. Do not hesitate to insert your own code and execute it as well!
The simplest things that R can do for you is to serve as a pocket calculator. R can perform all types of basic arithmetic operations:
1+2+3
## [1] 6
2*3
## [1] 6
3/4
## [1] 0.75
4-5
## [1] -1
2^4
## [1] 16
(2.5+7.5)*3
## [1] 30
pi # the variable pi is already defined in R by default
## [1] 3.141593
Whenever there is text followed by brackets, this text is a function, that means some sort of query, operation or algorithm is applied to the number or object inside the brackets. The simplest form of that are mathematical functions:
log(7)
## [1] 1.94591
exp(8)
## [1] 2980.958
sin(9)
## [1] 0.4121185
cos(10)
## [1] -0.8390715
tan(12)
## [1] -0.6358599
sqrt(16) # square root
## [1] 4
round(pi, 2) # round pi by two digits
## [1] 3.14
12%%5 # remainder from euclidean division
## [1] 2
In R, the most important data types are:
A Boolean value is usually returned when you compare two variables. The value can then either be TRUE or FALSE.
1==1
## [1] TRUE
1==2
## [1] FALSE
1!=1
## [1] FALSE
1!=2
## [1] TRUE
Numeric values are essentially numbers. A number that can be written without a fractional component, i.e. without a decimal separator, is an “integer”. A number that cannot, is usually referred to as “float”, but in R it is just called “numeric”. In R, by default, integer numbers are treated as numeric if not defined differently. You can ask for the data type of an object by using the class function:
class(1)
## [1] "numeric"
class(as.integer(1))
## [1] "integer"
class(1.5)
## [1] "numeric"
class("1")
## [1] "character"
class(TRUE)
## [1] "logical"
Character values are essentially text, or everything that is written in quotation marks " ", in your code. You cannot run any mathematical operation on text, but you can modify text with specific functions for character values
class("Hello")
## [1] "character"
# "Hello" + "World" # that does not work!
paste("Hello", "World", sep=" // ") # paste two strings into one and separate the element by a space
## [1] "Hello // World"
nchar("Hello") # count the number of characters
## [1] 5
substr("World", 2, 4) # Only show letters 2 to 4
## [1] "orl"
Objects of a certain type can be converted to another type:
as.numeric("1")+2
## [1] 3
class(as.numeric("1"))
## [1] "numeric"
as.character(100)
## [1] "100"
class(as.character(100))
## [1] "character"
R works with named objects. The most important types of objects are:
In general, you create a new object and assign it a value with the <- sign. You can choose almost ANY name for your objects, but you should not use special characters (such as #, -, + etc.). An object name can be a single letter, or more complex names, so it is a trade-off between having concise code and remembering the meaning of an object.
The simplest object you can have is just one value. By giving a name to this value, the value becomes an object (we say we assign a value to the object).
a <- 12 # assign a value to a
the_number_of_tomatoes_in_my_fridge <- 13
As you can see, by just assigning a value to a variable, R does not produce any output. However if we now call “a”, R tells us what the value of “a” is:
a # call a
## [1] 12
the_number_of_tomatoes_in_my_fridge # call the_number_of_tomatoes_in_my_fridge
## [1] 13
You can apply logical operators on variables, i.e. we can check if a certain statement is TRUE or FALSE (see explanation on Boolean/logical values above).
a==10 # equal to
## [1] FALSE
a==12
## [1] TRUE
a!=10 # not equal to
## [1] TRUE
a<8 # less than
## [1] FALSE
a<=8 # less than or equal to
## [1] FALSE
a>8 # greater than
## [1] TRUE
a>=10 # greater than or equal to
## [1] TRUE
is.na(a) # is missing, is it not a number?
## [1] FALSE
is.na(NA)
## [1] TRUE
!is.na(a) # is not missing
## [1] TRUE
is.null(a) # is null
## [1] FALSE
is.null(NULL)
## [1] TRUE
!is.null(a) # is not null
## [1] TRUE
(a==10 | a==12) # OR statement
## [1] TRUE
(a==10 & a==12) # AND statement
## [1] FALSE
You can also store text in a variable:
b <- "Hello"
b
## [1] "Hello"
b2 <- paste("Hello", "World", sep=" ") # paste two strings into one and separate the element by a space
b2
## [1] "Hello World"
gsub("Hello", "Good Morning", b2) # replace a certain expression by another one
## [1] "Good Morning World"
toupper(b2) # convert to uppercase
## [1] "HELLO WORLD"
tolower(b2) # convert to lowercase
## [1] "hello world"
nchar(b2) # number of characters
## [1] 11
And you can, of course, also store decimal numbers in a variable:
c <- 3.141593
c
## [1] 3.141593
round(c, 4) # round by two digits
## [1] 3.1416
print(c, digits=6) # show the first 6 digits
## [1] 3.14159
A vector is a one-dimensional object that consists of several elements or values in a fixed order. If you want to join several values to one element, you have to use the c-function (concatenate-function). You can apply a mathematical operation to all elements of a vector at once.
d <- c(1, 2, 10, 15, 4)
d
## [1] 1 2 10 15 4
d-1
## [1] 0 1 9 14 3
Variables and vectors are the same kind of element in R. In fact, a variable is just the simplest form of a vector, i.e. a vector of the length 1. If you ask for the class of a vector, R will just return the data type of the elements in that vector. This is because you cannot store elements of different data types in the same vector (we need a “list” to do that, see below).
class(c)
## [1] "numeric"
class(d)
## [1] "numeric"
If the values to be put into a vector represent a consecutive sequence, you can use the : sign:
e <- 1:5
e
## [1] 1 2 3 4 5
e+2
## [1] 3 4 5 6 7
If there is a fix interval between the values, you have to use the seq-function
f <- seq(1, 15, 2)
f
## [1] 1 3 5 7 9 11 13 15
f*f
## [1] 1 9 25 49 81 121 169 225
You can repeat a certain arrangement of values:
g <- seq(1, 15, 2)
g
## [1] 1 3 5 7 9 11 13 15
g2 <- rep(1:3, each=2)
g2
## [1] 1 1 2 2 3 3
g3 <- rep(1:3, 2)
g3
## [1] 1 2 3 1 2 3
Let us get some basic information about the objects we have created:
a
## [1] 12
d
## [1] 1 2 10 15 4
length(a) # number of elements
## [1] 1
length(d)
## [1] 5
class(a) # object type
## [1] "numeric"
class(b)
## [1] "character"
class(c)
## [1] "numeric"
str(d) # structure
## num [1:5] 1 2 10 15 4
By the way, if you are not sure what a functions does, you can always use the help-function:
help(length) # get help
## starting httpd help server ... done
The elements of a vector will always be of the same type, i.e. you cannot mix numbers and text!
h <- c("Hello", 2, 3) # The numbers get automatically converted to text!
h
## [1] "Hello" "2" "3"
class(h)
## [1] "character"
str(h)
## chr [1:3] "Hello" "2" "3"
You can run some basic calculations on our numerical vectors that will each return one value:
min(d)
## [1] 1
max(e)
## [1] 5
mean(f)
## [1] 8
sum(f)
## [1] 64
median(e)
## [1] 3
Or you can run some non-arithmetic functions on them:
sort(d)
## [1] 1 2 4 10 15
rank(d)
## [1] 1 2 4 5 3
unique(c(d, e))
## [1] 1 2 10 15 4 3 5
You can run calculations that return one value for each vector element, i.e. the calculation is performed on each vector element separately:
d
## [1] 1 2 10 15 4
log(d)
## [1] 0.0000000 0.6931472 2.3025851 2.7080502 1.3862944
d+1
## [1] 2 3 11 16 5
d^2
## [1] 1 4 100 225 16
You can test if a vector contains a certain element:
10%in%d
## [1] TRUE
e%in%d
## [1] TRUE TRUE FALSE TRUE FALSE
And you can make operations between a vector and a variable, or between vectors:
d+a
## [1] 13 14 22 27 16
d+e
## [1] 2 4 13 19 9
d^e
## [1] 1 4 1000 50625 1024
e/d
## [1] 1.0000000 1.0000000 0.3000000 0.2666667 1.2500000
e/f # This produces an error because the two vectors do not have thesame number of elements.
## Warning in e/f: Länge des längeren Objektes
## ist kein Vielfaches der Länge des kürzeren Objektes
## [1] 1.00000000 0.66666667 0.60000000 0.57142857 0.55555556 0.09090909 0.15384615
## [8] 0.20000000
By the way, if you assign a new value to a variable or vector that already exists, it will get automatically overwritten!
a
## [1] 12
a <- 1000
a
## [1] 1000
And you can of course create a new vector from existing vectors or variables:
a
## [1] 1000
d
## [1] 1 2 10 15 4
i <- c(a, d, 3, 4)
i
## [1] 1000 1 2 10 15 4 3 4
j <- a+d
You can index the elements of a vector by using numbers in squared brackets to indicate their positions.
d[3] # the third element of d
## [1] 10
d[-3] # all but the third element of d
## [1] 1 2 15 4
d[1:4] # elements 1 to 4
## [1] 1 2 10 15
d[c(1, 3, 5)] # elements 1, 3 and 5
## [1] 1 10 4
d[-(2:4)] # all elements except for elements 2 to 4
## [1] 1 4
d[d>5] # all elements greater than 5
## [1] 10 15
d[d%in%e] # logical vector: only index the elements for which an expression is TRUE
## [1] 1 2 4
Both matrices and data frames are two-dimensional objects, you could also call them tables. There are some differences between these two object types that you can read about here
Let’s first create a data frame:
test_dataframe <- data.frame(x=1:10, y=seq(10, 100, 10)) ## create a data frame and populate it with some numbers
test_dataframe # let's have a look at the data frame
| x | y |
|---|---|
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
| 5 | 50 |
| 6 | 60 |
| 7 | 70 |
| 8 | 80 |
| 9 | 90 |
| 10 | 100 |
row.names(test_dataframe) <- c("one","two","three","four","five",
"six","seven","eight","nine","ten") # define the row names of the data frame
test_dataframe # let's have a look again: the data frame has row names now!
| x | y | |
|---|---|---|
| one | 1 | 10 |
| two | 2 | 20 |
| three | 3 | 30 |
| four | 4 | 40 |
| five | 5 | 50 |
| six | 6 | 60 |
| seven | 7 | 70 |
| eight | 8 | 80 |
| nine | 9 | 90 |
| ten | 10 | 100 |
let’s now create a matrix:
test_matrix1 <- matrix(c(c(1:10), seq(10, 100, 10)), nrow=10, ncol=2) # nrow and ncol define the number of rows and columns in a matrix
test_matrix1
## [,1] [,2]
## [1,] 1 10
## [2,] 2 20
## [3,] 3 30
## [4,] 4 40
## [5,] 5 50
## [6,] 6 60
## [7,] 7 70
## [8,] 8 80
## [9,] 9 90
## [10,] 10 100
test_matrix2 <- matrix(c(1:12), nrow=4, ncol=3)
test_matrix2
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
t(test_matrix2) # transpose matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
Data frames and matrices basically consist of vectors - each column can be regarded a vector, and each row can be regarded a vector, too. That means that you can perform all kinds of vector operations on the columns and rows of matrices and data frames. Similarly as with indexing the elements of a vector, you can index the rows and columns of a matrix or data frame with squared brackets, but as our objects are no longer one-, but two-dimensional, we have to indicate the dimension in the brackets by inserting a comma ,
Values before the comma refer to rows, and values after the comma refer to columns: We will proceed with the data frame here, but you can perfectly exchange it for one of the matrices!
test_dataframe[1, ]
| x | y | |
|---|---|---|
| one | 1 | 10 |
test_dataframe[1:3, ]
| x | y | |
|---|---|---|
| one | 1 | 10 |
| two | 2 | 20 |
| three | 3 | 30 |
test_dataframe[c(1, 3, 5), ]
| x | y | |
|---|---|---|
| one | 1 | 10 |
| three | 3 | 30 |
| five | 5 | 50 |
test_dataframe[c(1:3), c(2)]
## [1] 10 20 30
Since the data frame has column names, you can also index the columns by their names. This is done with the $ sign, followed by the column name:
test_dataframe$x # This only works for dataframes!
## [1] 1 2 3 4 5 6 7 8 9 10
test_dataframe[ ,1]
## [1] 1 2 3 4 5 6 7 8 9 10
Vector operations on columns/rows:
test_dataframe[5,]+1
| x | y | |
|---|---|---|
| five | 6 | 51 |
mean(test_dataframe[,2])
## [1] 55
test_dataframe[,3] <- test_dataframe[,1] + test_dataframe[,2] # create a third column that is equal to the sum of the previous two columns
test_dataframe
| x | y | V3 | |
|---|---|---|---|
| one | 1 | 10 | 11 |
| two | 2 | 20 | 22 |
| three | 3 | 30 | 33 |
| four | 4 | 40 | 44 |
| five | 5 | 50 | 55 |
| six | 6 | 60 | 66 |
| seven | 7 | 70 | 77 |
| eight | 8 | 80 | 88 |
| nine | 9 | 90 | 99 |
| ten | 10 | 100 | 110 |
You can of course combine row and column indices to index specific elements:
test_dataframe[5,2]
## [1] 50
test_dataframe[5:8,3]
## [1] 55 66 77 88
Let us ask for some properties of the data frame:
length(test_dataframe) # number of columns
## [1] 3
nrow(test_dataframe) # number of rows
## [1] 10
dim(test_dataframe) # dimensions, i.e. number of rows followed by number of columns
## [1] 10 3
str(test_dataframe) # structure
## 'data.frame': 10 obs. of 3 variables:
## $ x : int 1 2 3 4 5 6 7 8 9 10
## $ y : num 10 20 30 40 50 60 70 80 90 100
## $ V3: num 11 22 33 44 55 66 77 88 99 110
How to delete parts of a data frame?
test_dataframe[,1]<-NULL # delete a column
test_dataframe
| y | V3 | |
|---|---|---|
| one | 10 | 11 |
| two | 20 | 22 |
| three | 30 | 33 |
| four | 40 | 44 |
| five | 50 | 55 |
| six | 60 | 66 |
| seven | 70 | 77 |
| eight | 80 | 88 |
| nine | 90 | 99 |
| ten | 100 | 110 |
rm(test_dataframe) # delete a whole object
An array can be regarded a three-dimensional table or a set of equally-sized matrices. You can imagine it as a “stack” of matrices, i.e. several matrices lying on top of each other.
test_array <- array(c(1:24), dim=c(3,4,2))
test_array
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] 13 16 19 22
## [2,] 14 17 20 23
## [3,] 15 18 21 24
As we have now three dimensions, an additional comma is required to index an element in the array:
test_array[,,1] # this is the first matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
test_array[,,2] # this is the second matrix
## [,1] [,2] [,3] [,4]
## [1,] 13 16 19 22
## [2,] 14 17 20 23
## [3,] 15 18 21 24
test_array[,1,] # these are the first columns of both matrices
## [,1] [,2]
## [1,] 1 13
## [2,] 2 14
## [3,] 3 15
test_array[1,,] # these are the first rows of both matrices - transposed to columns
## [,1] [,2]
## [1,] 1 13
## [2,] 4 16
## [3,] 7 19
## [4,] 10 22
test_array[,1,1] # this is the first row of the first matrix
## [1] 1 2 3
test_array[1,,1] # this is the first column of the first matrix
## [1] 1 4 7 10
test_array[3,4,2] # this is fourth element in the third row of the second matrix
## [1] 24
Again, we can retrieve some properties of an array with the following commands:
sum(test_array) # sum of all elements of the array
## [1] 300
dim(test_array) # number of dimensions
## [1] 3 4 2
length(test_array) # number of elements in the array
## [1] 24
sum(test_array[1,1,]) # sum of the first element of the first row of both matrices
## [1] 14
So far, the object types we have discussed can only contain elements of the same type or of the same dimensions: Vectors can only contain elements of the same type (either logical, numeric or character values), matrices and data frames can only contain vectors of the same size, and arrays can only contain matrices of the same size.
Let’s assume we have the following three objects x, y and z and we want to join them somehow into one object:
x <- c("Good Morning", "Good Evening", "Good Night")
y <- data.frame(number=1:7, day=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), weekend=c(rep("no", 5), rep("yes", 2)))
z <- array(c(c(1:10), c((1:10)^2)), dim=c(2,5,2))
x
## [1] "Good Morning" "Good Evening" "Good Night"
y
| number | day | weekend |
|---|---|---|
| 1 | Monday | no |
| 2 | Tuesday | no |
| 3 | Wednesday | no |
| 4 | Thursday | no |
| 5 | Friday | no |
| 6 | Saturday | yes |
| 7 | Sunday | yes |
z
## , , 1
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 9 25 49 81
## [2,] 4 16 36 64 100
You can simply join these elements in a list. Similarly to a vector, a list is a one-dimensional object, but instead of values it contains objects (that can be of different types) in a defined order.
test_list <- list(x, y, z)
test_list
## [[1]]
## [1] "Good Morning" "Good Evening" "Good Night"
##
## [[2]]
## number day weekend
## 1 1 Monday no
## 2 2 Tuesday no
## 3 3 Wednesday no
## 4 4 Thursday no
## 5 5 Friday no
## 6 6 Saturday yes
## 7 7 Sunday yes
##
## [[3]]
## , , 1
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 9 25 49 81
## [2,] 4 16 36 64 100
length(test_list) # number of objects in test_list
## [1] 3
You can of course create a list from scratch:
test_list2 <- list(name="Tom", wife="Maria", no.children=4, child.ages=c(4,7,9,9))
We can index list elements with either simple or double squared brackets:
test_list[1]
## [[1]]
## [1] "Good Morning" "Good Evening" "Good Night"
test_list[[1]]
## [1] "Good Morning" "Good Evening" "Good Night"
If we want to index several elements of a list, we have to use simple squared brackets:
test_list[c(1,2)]
## [[1]]
## [1] "Good Morning" "Good Evening" "Good Night"
##
## [[2]]
## number day weekend
## 1 1 Monday no
## 2 2 Tuesday no
## 3 3 Wednesday no
## 4 4 Thursday no
## 5 5 Friday no
## 6 6 Saturday yes
## 7 7 Sunday yes
However, if we want to index an element of a list element, we have to use double squared brackets followed by simple squared brackets:
test_list[[1]][2]
## [1] "Good Evening"
test_list[[2]][3,]
| number | day | weekend | |
|---|---|---|---|
| 3 | 3 | Wednesday | no |
test_list[[3]][2,3,1]
## [1] 6
Similarly to data frames, we can name the elements of a list, and also index list elements by their names.
names(test_list) <- c("greetings", "weekend", "squared values")
test_list
## $greetings
## [1] "Good Morning" "Good Evening" "Good Night"
##
## $weekend
## number day weekend
## 1 1 Monday no
## 2 2 Tuesday no
## 3 3 Wednesday no
## 4 4 Thursday no
## 5 5 Friday no
## 6 6 Saturday yes
## 7 7 Sunday yes
##
## $`squared values`
## , , 1
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 3 5 7 9
## [2,] 2 4 6 8 10
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 9 25 49 81
## [2,] 4 16 36 64 100
test_list$weekend
| number | day | weekend |
|---|---|---|
| 1 | Monday | no |
| 2 | Tuesday | no |
| 3 | Wednesday | no |
| 4 | Thursday | no |
| 5 | Friday | no |
| 6 | Saturday | yes |
| 7 | Sunday | yes |
You made it to the end! Take a break! =)