In this first script, you will be introduced to the main functions of R. The learning objectives of this script are:

Know basic arithmetic operations and some of the key concepts of R
Have an understanding of the basic data types
Have an understanding of the basic object types

Please read the following code instructions carefully, try to understand the code that follows each instruction, execute it and see what happens. Do not hesitate to insert your own code and execute it as well!

1. Basic arithmetic operations

The simplest things that R can do for you is to serve as a pocket calculator. R can perform all types of basic arithmetic operations:

1+2+3
## [1] 6
2*3
## [1] 6
3/4
## [1] 0.75
4-5
## [1] -1
2^4
## [1] 16
(2.5+7.5)*3
## [1] 30
pi             # the variable pi is already defined in R by default
## [1] 3.141593

Whenever there is text followed by brackets, this text is a function, that means some sort of query, operation or algorithm is applied to the number or object inside the brackets. The simplest form of that are mathematical functions:

log(7)
## [1] 1.94591
exp(8)
## [1] 2980.958
sin(9)
## [1] 0.4121185
cos(10)
## [1] -0.8390715
tan(12)
## [1] -0.6358599
sqrt(16)      # square root
## [1] 4
round(pi, 2)  # round pi by two digits
## [1] 3.14

12%%5         # remainder from euclidean division
## [1] 2

2. Data types

In R, the most important data types are:

Boolean/logical values
Numeric values
Character values

2.1 Boolean/logical values

A Boolean value is usually returned when you compare two variables. The value can then either be TRUE or FALSE.

1==1
## [1] TRUE
1==2
## [1] FALSE
1!=1
## [1] FALSE
1!=2
## [1] TRUE

2.2 Numeric values

Numeric values are essentially numbers. A number that can be written without a fractional component, i.e. without a decimal separator, is an “integer”. A number that cannot, is usually referred to as “float”, but in R it is just called “numeric”. In R, by default, integer numbers are treated as numeric if not defined differently. You can ask for the data type of an object by using the class function:

class(1)
## [1] "numeric"
class(as.integer(1))
## [1] "integer"
class(1.5)
## [1] "numeric"
class("1")
## [1] "character"
class(TRUE)
## [1] "logical"

2.3 Character values

Character values are essentially text, or everything that is written in quotation marks " ", in your code. You cannot run any mathematical operation on text, but you can modify text with specific functions for character values

class("Hello")                      
## [1] "character"
# "Hello" + "World"                    # that does not work!
paste("Hello", "World", sep=" // ")  # paste two strings into one and separate the element by a space
## [1] "Hello // World"
nchar("Hello")                       # count the number of characters
## [1] 5
substr("World", 2, 4)                  # Only show letters 2 to 4
## [1] "orl"

Objects of a certain type can be converted to another type:

as.numeric("1")+2
## [1] 3
class(as.numeric("1"))
## [1] "numeric"
as.character(100)
## [1] "100"
class(as.character(100))
## [1] "character"

3. Object types

R works with named objects. The most important types of objects are:

variables
vectors
matrices and data frames (= tables)
arrays
lists

In general, you create a new object and assign it a value with the <- sign. You can choose almost ANY name for your objects, but you should not use special characters (such as #, -, + etc.). An object name can be a single letter, or more complex names, so it is a trade-off between having concise code and remembering the meaning of an object.

3.1 Variables

The simplest object you can have is just one value. By giving a name to this value, the value becomes an object (we say we assign a value to the object).

a <- 12                                     # assign a value to a
the_number_of_tomatoes_in_my_fridge <- 13

As you can see, by just assigning a value to a variable, R does not produce any output. However if we now call “a”, R tells us what the value of “a” is:

a                                         # call a
## [1] 12
the_number_of_tomatoes_in_my_fridge       # call the_number_of_tomatoes_in_my_fridge
## [1] 13

You can apply logical operators on variables, i.e. we can check if a certain statement is TRUE or FALSE (see explanation on Boolean/logical values above).

a==10           # equal to
## [1] FALSE
a==12
## [1] TRUE
a!=10           # not equal to
## [1] TRUE
a<8             # less than
## [1] FALSE
a<=8            # less than or equal to
## [1] FALSE
a>8             # greater than
## [1] TRUE
a>=10           # greater than or equal to
## [1] TRUE
is.na(a)        # is missing, is it not a number?
## [1] FALSE
is.na(NA)
## [1] TRUE
!is.na(a)       # is not missing
## [1] TRUE
is.null(a)      # is null
## [1] FALSE
is.null(NULL)
## [1] TRUE
!is.null(a)     # is not null
## [1] TRUE
(a==10 | a==12) # OR statement
## [1] TRUE
(a==10 & a==12) # AND statement
## [1] FALSE

You can also store text in a variable:

b <- "Hello"
b
## [1] "Hello"
b2 <- paste("Hello", "World", sep=" ")  # paste two strings into one and separate the element by a space
b2
## [1] "Hello World"
gsub("Hello", "Good Morning", b2)       # replace a certain expression by another one
## [1] "Good Morning World"
toupper(b2)                             # convert to uppercase
## [1] "HELLO WORLD"
tolower(b2)                             # convert to lowercase
## [1] "hello world"
nchar(b2)                               # number of characters
## [1] 11

And you can, of course, also store decimal numbers in a variable:

c <- 3.141593
c
## [1] 3.141593
round(c, 4)         # round by two digits
## [1] 3.1416
print(c, digits=6)  # show the first 6 digits
## [1] 3.14159

3.2 Vectors

A vector is a one-dimensional object that consists of several elements or values in a fixed order. If you want to join several values to one element, you have to use the c-function (concatenate-function). You can apply a mathematical operation to all elements of a vector at once.

d <- c(1, 2, 10, 15, 4)
d
## [1]  1  2 10 15  4
d-1
## [1]  0  1  9 14  3

Variables and vectors are the same kind of element in R. In fact, a variable is just the simplest form of a vector, i.e. a vector of the length 1. If you ask for the class of a vector, R will just return the data type of the elements in that vector. This is because you cannot store elements of different data types in the same vector (we need a “list” to do that, see below).

class(c)
## [1] "numeric"
class(d)
## [1] "numeric"

If the values to be put into a vector represent a consecutive sequence, you can use the : sign:

e <- 1:5
e
## [1] 1 2 3 4 5
e+2
## [1] 3 4 5 6 7

If there is a fix interval between the values, you have to use the seq-function

f <- seq(1, 15, 2)
f
## [1]  1  3  5  7  9 11 13 15
f*f
## [1]   1   9  25  49  81 121 169 225

You can repeat a certain arrangement of values:

g <- seq(1, 15, 2)
g
## [1]  1  3  5  7  9 11 13 15
g2 <- rep(1:3, each=2)
g2
## [1] 1 1 2 2 3 3
g3 <- rep(1:3, 2)
g3
## [1] 1 2 3 1 2 3

Let us get some basic information about the objects we have created:

a
## [1] 12
d
## [1]  1  2 10 15  4
length(a)     # number of elements
## [1] 1
length(d)
## [1] 5
class(a)      # object type
## [1] "numeric"
class(b)
## [1] "character"
class(c)
## [1] "numeric"
str(d)        # structure
##  num [1:5] 1 2 10 15 4

By the way, if you are not sure what a functions does, you can always use the help-function:

help(length)         # get help
## starting httpd help server ... done

The elements of a vector will always be of the same type, i.e. you cannot mix numbers and text!

h <- c("Hello", 2, 3)   # The numbers get automatically converted to text!
h
## [1] "Hello" "2"     "3"
class(h)
## [1] "character"
str(h)
##  chr [1:3] "Hello" "2" "3"

You can run some basic calculations on our numerical vectors that will each return one value:

min(d)
## [1] 1
max(e)
## [1] 5
mean(f)
## [1] 8
sum(f)
## [1] 64
median(e)
## [1] 3

Or you can run some non-arithmetic functions on them:

sort(d)
## [1]  1  2  4 10 15
rank(d)
## [1] 1 2 4 5 3
unique(c(d, e))
## [1]  1  2 10 15  4  3  5

You can run calculations that return one value for each vector element, i.e. the calculation is performed on each vector element separately:

d
## [1]  1  2 10 15  4
log(d)
## [1] 0.0000000 0.6931472 2.3025851 2.7080502 1.3862944
d+1
## [1]  2  3 11 16  5
d^2
## [1]   1   4 100 225  16

You can test if a vector contains a certain element:

10%in%d
## [1] TRUE
e%in%d
## [1]  TRUE  TRUE FALSE  TRUE FALSE

And you can make operations between a vector and a variable, or between vectors:

d+a
## [1] 13 14 22 27 16
d+e
## [1]  2  4 13 19  9
d^e
## [1]     1     4  1000 50625  1024
e/d
## [1] 1.0000000 1.0000000 0.3000000 0.2666667 1.2500000
e/f           # This produces an error because the two vectors do not have thesame number of elements. 
## Warning in e/f: Länge des längeren Objektes
##       ist kein Vielfaches der Länge des kürzeren Objektes
## [1] 1.00000000 0.66666667 0.60000000 0.57142857 0.55555556 0.09090909 0.15384615
## [8] 0.20000000

By the way, if you assign a new value to a variable or vector that already exists, it will get automatically overwritten!

a
## [1] 12
a <- 1000
a
## [1] 1000

And you can of course create a new vector from existing vectors or variables:

a
## [1] 1000
d
## [1]  1  2 10 15  4
i <- c(a, d, 3, 4)
i
## [1] 1000    1    2   10   15    4    3    4
j <- a+d

You can index the elements of a vector by using numbers in squared brackets to indicate their positions.

d[3]           # the third element of d
## [1] 10
d[-3]          # all but the third element of d
## [1]  1  2 15  4
d[1:4]         # elements 1 to 4
## [1]  1  2 10 15
d[c(1, 3, 5)]  # elements 1, 3 and 5
## [1]  1 10  4
d[-(2:4)]      # all elements except for elements 2 to 4
## [1] 1 4
d[d>5]         # all elements greater than 5
## [1] 10 15
d[d%in%e]      # logical vector: only index the elements for which an expression is TRUE
## [1] 1 2 4

3.3 Matrices and data frames

Both matrices and data frames are two-dimensional objects, you could also call them tables. There are some differences between these two object types that you can read about here

Let’s first create a data frame:

test_dataframe <- data.frame(x=1:10, y=seq(10, 100, 10))   ## create a data frame and populate it with some numbers

test_dataframe            # let's have a look at the data frame

x	y
1	10
2	20
3	30
4	40
5	50
6	60
7	70
8	80
9	90
10	100


row.names(test_dataframe) <- c("one","two","three","four","five",
                             "six","seven","eight","nine","ten")      # define the row names of the data frame

test_dataframe            # let's have a look again: the data frame has row names now!

	x	y
one	1	10
two	2	20
three	3	30
four	4	40
five	5	50
six	6	60
seven	7	70
eight	8	80
nine	9	90
ten	10	100

let’s now create a matrix:

test_matrix1 <- matrix(c(c(1:10), seq(10, 100, 10)), nrow=10, ncol=2)        # nrow and ncol define the number of rows and columns in a matrix

test_matrix1
##       [,1] [,2]
##  [1,]    1   10
##  [2,]    2   20
##  [3,]    3   30
##  [4,]    4   40
##  [5,]    5   50
##  [6,]    6   60
##  [7,]    7   70
##  [8,]    8   80
##  [9,]    9   90
## [10,]   10  100

test_matrix2 <- matrix(c(1:12), nrow=4, ncol=3)

test_matrix2
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

t(test_matrix2)                                                       # transpose matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12

Data frames and matrices basically consist of vectors - each column can be regarded a vector, and each row can be regarded a vector, too. That means that you can perform all kinds of vector operations on the columns and rows of matrices and data frames. Similarly as with indexing the elements of a vector, you can index the rows and columns of a matrix or data frame with squared brackets, but as our objects are no longer one-, but two-dimensional, we have to indicate the dimension in the brackets by inserting a comma ,

Values before the comma refer to rows, and values after the comma refer to columns: We will proceed with the data frame here, but you can perfectly exchange it for one of the matrices!

test_dataframe[1, ]

	x	y
one	1	10

test_dataframe[1:3, ]

	x	y
one	1	10
two	2	20
three	3	30

test_dataframe[c(1, 3, 5), ]

	x	y
one	1	10
three	3	30
five	5	50

test_dataframe[c(1:3), c(2)]
## [1] 10 20 30

Since the data frame has column names, you can also index the columns by their names. This is done with the $ sign, followed by the column name:

test_dataframe$x      # This only works for dataframes!
##  [1]  1  2  3  4  5  6  7  8  9 10
test_dataframe[ ,1]
##  [1]  1  2  3  4  5  6  7  8  9 10

Vector operations on columns/rows:

test_dataframe[5,]+1

	x	y
five	6	51

mean(test_dataframe[,2])
## [1] 55
test_dataframe[,3] <- test_dataframe[,1] + test_dataframe[,2]     # create a third column that is equal to the sum of the previous two columns
test_dataframe

	x	y	V3
one	1	10	11
two	2	20	22
three	3	30	33
four	4	40	44
five	5	50	55
six	6	60	66
seven	7	70	77
eight	8	80	88
nine	9	90	99
ten	10	100	110

You can of course combine row and column indices to index specific elements:

test_dataframe[5,2]
## [1] 50
test_dataframe[5:8,3]
## [1] 55 66 77 88

Let us ask for some properties of the data frame:

length(test_dataframe) # number of columns
## [1] 3
nrow(test_dataframe)   # number of rows
## [1] 10
dim(test_dataframe)    # dimensions, i.e. number of rows followed by number of columns
## [1] 10  3
str(test_dataframe)    # structure
## 'data.frame':    10 obs. of  3 variables:
##  $ x : int  1 2 3 4 5 6 7 8 9 10
##  $ y : num  10 20 30 40 50 60 70 80 90 100
##  $ V3: num  11 22 33 44 55 66 77 88 99 110

How to delete parts of a data frame?

test_dataframe[,1]<-NULL  # delete a column
test_dataframe

	y	V3
one	10	11
two	20	22
three	30	33
four	40	44
five	50	55
six	60	66
seven	70	77
eight	80	88
nine	90	99
ten	100	110

rm(test_dataframe)        # delete a whole object

3.4 Arrays

An array can be regarded a three-dimensional table or a set of equally-sized matrices. You can imagine it as a “stack” of matrices, i.e. several matrices lying on top of each other.

test_array <- array(c(1:24), dim=c(3,4,2))
test_array
## , , 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24

As we have now three dimensions, an additional comma is required to index an element in the array:

test_array[,,1]   # this is the first matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
test_array[,,2]   # this is the second matrix
##      [,1] [,2] [,3] [,4]
## [1,]   13   16   19   22
## [2,]   14   17   20   23
## [3,]   15   18   21   24
test_array[,1,]   # these are the first columns of both matrices
##      [,1] [,2]
## [1,]    1   13
## [2,]    2   14
## [3,]    3   15
test_array[1,,]   # these are the first rows of both matrices - transposed to columns
##      [,1] [,2]
## [1,]    1   13
## [2,]    4   16
## [3,]    7   19
## [4,]   10   22
test_array[,1,1]  # this is the first row of the first matrix
## [1] 1 2 3
test_array[1,,1]  # this is the first column of the first matrix
## [1]  1  4  7 10
test_array[3,4,2] # this is fourth element in the third row of the second matrix
## [1] 24

Again, we can retrieve some properties of an array with the following commands:

sum(test_array)       # sum of all elements of the array
## [1] 300
dim(test_array)       # number of dimensions
## [1] 3 4 2
length(test_array)    # number of elements in the array
## [1] 24
sum(test_array[1,1,]) # sum of the first element of the first row of both matrices
## [1] 14

3.5 Lists

So far, the object types we have discussed can only contain elements of the same type or of the same dimensions: Vectors can only contain elements of the same type (either logical, numeric or character values), matrices and data frames can only contain vectors of the same size, and arrays can only contain matrices of the same size.

Let’s assume we have the following three objects x, y and z and we want to join them somehow into one object:

x <- c("Good Morning", "Good Evening", "Good Night")
y <- data.frame(number=1:7, day=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"), weekend=c(rep("no", 5), rep("yes", 2)))
z <- array(c(c(1:10), c((1:10)^2)), dim=c(2,5,2))
x
## [1] "Good Morning" "Good Evening" "Good Night"
y

number	day	weekend
1	Monday	no
2	Tuesday	no
3	Wednesday	no
4	Thursday	no
5	Friday	no
6	Saturday	yes
7	Sunday	yes

z
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    9   25   49   81
## [2,]    4   16   36   64  100

You can simply join these elements in a list. Similarly to a vector, a list is a one-dimensional object, but instead of values it contains objects (that can be of different types) in a defined order.

test_list <- list(x, y, z)
test_list
## [[1]]
## [1] "Good Morning" "Good Evening" "Good Night"  
## 
## [[2]]
##   number       day weekend
## 1      1    Monday      no
## 2      2   Tuesday      no
## 3      3 Wednesday      no
## 4      4  Thursday      no
## 5      5    Friday      no
## 6      6  Saturday     yes
## 7      7    Sunday     yes
## 
## [[3]]
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    9   25   49   81
## [2,]    4   16   36   64  100
length(test_list)     # number of objects in test_list
## [1] 3

You can of course create a list from scratch:

test_list2 <- list(name="Tom", wife="Maria", no.children=4, child.ages=c(4,7,9,9))

We can index list elements with either simple or double squared brackets:

test_list[1]
## [[1]]
## [1] "Good Morning" "Good Evening" "Good Night"
test_list[[1]]
## [1] "Good Morning" "Good Evening" "Good Night"

If we want to index several elements of a list, we have to use simple squared brackets:

test_list[c(1,2)]
## [[1]]
## [1] "Good Morning" "Good Evening" "Good Night"  
## 
## [[2]]
##   number       day weekend
## 1      1    Monday      no
## 2      2   Tuesday      no
## 3      3 Wednesday      no
## 4      4  Thursday      no
## 5      5    Friday      no
## 6      6  Saturday     yes
## 7      7    Sunday     yes

However, if we want to index an element of a list element, we have to use double squared brackets followed by simple squared brackets:

test_list[[1]][2]
## [1] "Good Evening"
test_list[[2]][3,]

	number	day	weekend
3	3	Wednesday	no

test_list[[3]][2,3,1]
## [1] 6

Similarly to data frames, we can name the elements of a list, and also index list elements by their names.

names(test_list) <- c("greetings", "weekend", "squared values")
test_list
## $greetings
## [1] "Good Morning" "Good Evening" "Good Night"  
## 
## $weekend
##   number       day weekend
## 1      1    Monday      no
## 2      2   Tuesday      no
## 3      3 Wednesday      no
## 4      4  Thursday      no
## 5      5    Friday      no
## 6      6  Saturday     yes
## 7      7    Sunday     yes
## 
## $`squared values`
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    9   25   49   81
## [2,]    4   16   36   64  100
test_list$weekend

number	day	weekend
1	Monday	no
2	Tuesday	no
3	Wednesday	no
4	Thursday	no
5	Friday	no
6	Saturday	yes
7	Sunday	yes

You made it to the end! Take a break! =)

Module 0: Introduction to R and R Studio
Script 01: R basics

Max Hofmann

07 June 2022

1. Basic arithmetic operations

2. Data types

2.1 Boolean/logical values

2.2 Numeric values

2.3 Character values

3. Object types

3.1 Variables

3.2 Vectors

3.3 Matrices and data frames

3.4 Arrays

3.5 Lists

Module 0: Introduction to R and R Studio Script 01: R basics

Max Hofmann

07 June 2022

1. Basic arithmetic operations

2. Data types

2.1 Boolean/logical values

2.2 Numeric values

2.3 Character values

3. Object types

3.1 Variables

3.2 Vectors

3.3 Matrices and data frames

3.4 Arrays

3.5 Lists

Module 0: Introduction to R and R Studio
Script 01: R basics