Chapter 1: The Building Blocks

In this chapter we are going to see the most basic concepts that are required to be able to use R in a functionaly, that is, under the Functional Programming paradigm.

Chapter content:

1.1: Value assignment and channeling operators.

The <-, ->, = operators can be used to assign values to variables.

The %>% operator is used to pipe variables or function results to another function, to use it you need to install the magrittr package.

Assignment.operator Meaning Syntax
<- assignment to the left x<-5
-> aassignment to the right 5->x
= assignment to the left (alternative) x=5
%>% piping x<-c(4) %>% print()

In console:

install.packages("magrittr")

library(magrittr)

x<-5

5->x

x=5

x<-c(4) %>% print()

1.2 Data types:

In this section, we will focus on the basic data types that exist natively in R, there are many other data types that can be defined using Object Oriented Programming, some are required to be able to use some features or projects that make use of R, such as Bioconductor.

Data.type Syntax Result Verification Coercion
character x<-'a' ‘a’ is.character() as.character()
numeric x<-1 1 is.numeric() as.numeric()
integer x<-1L 1L is.integer() as.integer()
logical x<-TRUE TRUE is.logical() as.logical()
complex x<-1+4i 1+4i is.complex() as.complex()
double x<-4.22 4.22 is.double() as.double()
factor x<-factor(c('a','b')) Factor w/ 2 levels “a”,“b”: 1 2 is.factor() as.factor()
NA x<-NA NA is.na() as.na()
NaN x<-NaN NaN is.nan() as.nan()
NULL x<-NULL NULL is.null() as.null()

c() is a function that is used to combine elements in a single variable, it means combine

1.2.1 Data type checking

tipo is.vector is.character is.numeric is.integer is.logical is.complex is.double is.factor is.na is.nan is.null
character T T
numeric T T T
integer T T T
logical T T
complex T T
double T T T
factor T
NA T T T
NaN T T T T T
NULL logical(0) logical(0) T

Some data types (such as numeric, doubles, integrals, NA, NaN) can be checked by verification functions other than those corresponding to them. As can be seen, factors are not vectors, and the NULL object is used to indicate empty variables.

1.3 Basic data structures

Dimensions Structure Type Syntax Verification Coerción
1 Atomic vector Homogeneous v <- c(1,2,3) is.vector() as.vector()
2 Matrix Homogeneous m <- matrix ( c(1:9), nrow = 3, ncol = 3) is.matrix() as.matrix()
n Array Homogeneous a <- array(c(1:18), dim = c(3,3,2)) is.array() as.array()
1 List Heterogeneous l <- list('a'='a', 'b'= c(1,2,3) ) is.list() as.list()
2 Table (data.frame) Heterogeneous dt <- data.frame( x=c('a','b'), y=c(1,2)) is.data.frame() as.data.frame()

Dimensions refers to the (i,j) coordinates in a matrix or table: i are the rows, j are the columns.

  • 1 dimension means that the vector only has the coordinate i.
  • 2 dimensions means having both.
  • n dimensions means that the structure can have more dimensions in addition to i,j.

Type refers to what type of data the structure can contain.

  • Homogeneous means that it can only contain a single data type.
  • Heterogeneous means that it can contain different data types.

NOTE: By default, when creating a data.frame, coercing to data.frame or importing data as data.frame, R will coerce all columns where there are characters to factors, it is necessary to set the argument stringsAsFactors = FALSE, if this is not desired.

1.4 Notation for accessing data structures.

The access notation may seem confusing, but in the script you can see in more detail how each one works.

Syntax Objects Description
x[c(i)] Vectors and Lists Select elements of object x, described en i. Element i can be a vector of 1 or n elements (inside of c()) of type integer, character (literal names of the elements) or logical. When it is used in vectors, returns a vector, when it is used in lists, returns a list.
x[[c(i)]] Lists and Data.frames Returns a unique element of x that is found in position i. Element i can be a vector of tipe integer or character of length 1 or n, in case it is inside c().
x$j Lists and Data.frames Returns an object of name j in position j of object x.
x[c(i),c(j)] Data.frames and Matrices Returns an object in row i and column j. The elements of i and j can be vectors of type integer or character (literal names of the elements), of length 1 or n, if they are inside c().
x[i,j,h] Arrays Select element of object x with coordinates i,j,h. h is the matrix index inside the array, i are rows and j columns
x[[c(i)]] [c(i),c(j)] Lists and Data.frames This one is a combination of notations x[[i]], and x[i] or x[i,j]. Notation x[[j]][i] works in data.frames the same way as notation x[i,j] or x$j[i]. Notation x[[i]][i,j] works in lists which element i, is a data.frame or matrix with coordinates i,j; Likewise, notation x[[i]][i] works in lists which element i is a vector of one dimension.

There are occasions where these notations can be combined with each other, an example where you can see them all in action would be a list made up of all the structures previously mentioned in section 1.3. Example that will be addressed in the script of this section.

1.5 Arithmetic operators

As the name implies, they are used to perform arithmetic operations on numbers alone, or with vectors.

operator description example result
+ sum 1+2 3.0
- subtraction 1-2 -1.0
’*’ multiplication 1*2 2.0
/ division 23/2 11.5
^ or ** power 2^3 8.0
%% modulo (residue of division) 23 %% 2 1.0
%/% integral division (integer part of quotient) 23 %/% 2 11.0

1.6 Logic Operators

  • Logical operators are used to evaluate conditions.

  • Logical operators can be used in conjunction with data structure access notations to be able to subset them.

Logic operators: x=c(1:10)
operator description Syntax result
>, < greater than, less than x[x>5] [1] 6 7 8 9 10
>=, <= greater or equal than, less or equal than x[x<=5] [1] 1 2 3 4 5
== equal to x[x==5] [1] 5
!= not equal to x[x!=5] [1] 1 2 3 4 6 7 8 9 10
!x NOT (boolean) x x[!(x==5)] [1] 1 2 3 4 6 7 8 9 10
x | y x OR (boolean) y x[x > 8 | x < 5] [1] 1 2 3 4 9 10
x & y x AND (boolean) y x[x > 5 & x <= 8] [1] 6 7 8
isTRUE(x) Is x TRUE? isTRUE(1>2) [1] FALSE
x%in%y elements of x in y c(1,2)%in%x [1] TRUE TRUE

1.6.1 Table explanation

Let’s explore what happens in the “Syntax” column of the above table with the x>5 condition.

# We assign to 'x' a sequence of values from 1 to 10
x=c(1:10)
# Which values of 'x' meet the condition 'x>5'?
x
##  [1]  1  2  3  4  5  6  7  8  9 10
x>5
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
# To visualize them better, we apply the 'cbind' function
# ('column bind', 'column bind') or 'rbind'
# ('row bind','row union'), to bind 'x' and 'x>5'
# as columns or rows of a table.

# Note: The result of x>5 is coerced into integrals
# so TRUE(true)=1, FALSE(False)=0
cbind(x,y=as.logical(x>5))
##        x y
##  [1,]  1 0
##  [2,]  2 0
##  [3,]  3 0
##  [4,]  4 0
##  [5,]  5 0
##  [6,]  6 1
##  [7,]  7 1
##  [8,]  8 1
##  [9,]  9 1
## [10,] 10 1
rbind(x,y=as.logical(x>5))
##   [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x    1    2    3    4    5    6    7    8    9    10
## y    0    0    0    0    0    1    1    1    1     1
# Alternatively, what we saw in the previous sections can be applied:

# We create an empty matrix with 2 columns and 10 rows
z<-matrix(ncol=2,
          nrow = 10)

# We assign to the first column of the matrix z the values of 'x'
z[,1]<-x
# We assign to the second column of matrix z the values of 'x>5'
z[,2]<-x>5 

z
##       [,1] [,2]
##  [1,]    1    0
##  [2,]    2    0
##  [3,]    3    0
##  [4,]    4    0
##  [5,]    5    0
##  [6,]    6    1
##  [7,]    7    1
##  [8,]    8    1
##  [9,]    9    1
## [10,]   10    1

1.7 Vectorization and brief introduction to loops, conditional expressions and functions:

1.7.1 Vectorization

Going back to the arithmetic operators, we observe that when they are applied to some data structure, it is NOT necessary to carry out a special process so that the operation is carried out for each of the values in a certain object, as long as the values are numeric, or can be coerced to numeric values, these operations are known as “vectorized operations”.

For the following example, we are going to take the table structures from section 1.3 and assign them to a list named list.str.

list.str<-list('dt'=data.frame(x=c('a','b'),  #Data.frame
                                 y=c(1,2),
                               stringsAsFactors = TRUE),
               # Edit 2022: stringsAsFactors argument of data.frame() function is defaulted to FALSE
                 'l'=list('a'='a',              #List
                          'b'=c(1,2,3)),
                 'm'=matrix(c(1:9),             #Matrix
                            nrow = 3,
                            ncol = 3),
                 'a'=array(c(1:18),             #Array
                           dim = c(3,3,2)),
                 'v'=c(1,2,3))                  #Vector

# Coerce factors from column x of table dt, from list.str, 
# to numeric and multiply them by two

as.numeric(list.str$dt$x)*2
## [1] 2 4

In this example, it can be seen that it is enough to declare that the object list.str$dt$x is going to be coerced to numeric by means of as.numeric(), and that the result is going to be multiplied by 2, both the as.numeric() function and the *2 operation are applied in a vectorized fashion to the values of list.str$dt$x.

On the contrary, many functions are not vectorized on some types of objects, such as columns, or elements within lists, as an example let’s take the class() function applied to list.str.

#If we apply class() to list.str we get:
class(list.str)
## [1] "list"
#If we apply class() to list.str$dt we get:
class(list.str$dt)
## [1] "data.frame"
#If we apply class() to list.str$dt$x we get:
class(list.str$dt$x)
## [1] "factor"
#If we apply class() to list.str$dt$y we get:
class(list.str$dt$y)
## [1] "numeric"

As you can see, we have to declare class() for list.str$dt$x and list.str$dt$y individually.

Is there a way for us to get the class of both columns of list.str$dt without having to declare the class() function individually?

1.7.2 The for loop.

The previously mentioned problem can be solved with a for loop.

The for loop is used to iterate or repeat statements, or blocks of statements a specified number of times, and this number is defined by an index or counter.

It has the following structure:

# for (i in x){
#   block of instructions to iterate through each item 'i' in x
# }
  • As an example, we are going to build a loop that displays the results of the multiplication table (times table) for the number 2 on the console, based on the sequence x.
  • In this case, the multiplier is fixed and is equal to 2
# We define the variable that will determine the index for the loop
  
x=c(1:10) # sequence from 1 to 10.

# "for(i in x)" means: "for (each item 'i' in x)"
for (i in x){
  #instruction: show in console the result of multiplying x by 2
  
  x[i]<-x[i]*2 # for each 'i' in x, we are going to assign
               # each element 'i' in x multiplied by 2
  
  print(x[i])  # print each 'i' from x" in console
  
  #The function 'print()' shows in console what is inside the parentheses
}
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14
## [1] 16
## [1] 18
## [1] 20

A little more sophisticated:

  • In this case, the multiplier will not be fixed.
  • Now we are going to build a loop that shows in console the result of multiplying x by m.
# We define a sequence in x
x=c(1:10) 
# We define the (empty) object y where the results of the loop will be stored.
y=c() 

#We define the multiplier m
m=2

# loop:
for (i in x){
  y[i]<-x[i]*m 
  
  print(paste(x[i],"x",m, "is",y[i])) 
  
  # The function 'paste()' concatenates the elements inside the parentheses.
}
## [1] "1 x 2 is 2"
## [1] "2 x 2 is 4"
## [1] "3 x 2 is 6"
## [1] "4 x 2 is 8"
## [1] "5 x 2 is 10"
## [1] "6 x 2 is 12"
## [1] "7 x 2 is 14"
## [1] "8 x 2 is 16"
## [1] "9 x 2 is 18"
## [1] "10 x 2 is 20"

Applying the for loop to the problem

for (i in colnames(list.str$dt)){ # for (each i in table column names)
  
  print(paste("class of column",        # show in console:
              i,                        # class of column i
              "is",                     # is 'class(table[[i]])' 
              class(list.str$dt[[i]])))
}
## [1] "class of column x is factor"
## [1] "class of column y is numeric"
# The 'colnames()' function gets the column names of a table.

Note that we use the x[[i]] notation inside the class() function. This is because this notation allows selecting columns either by numerical index, or a string of characters defined in i at each iteration of the statement, which is not possible with the list.str$dt$i notation, because the x$j form requires j to be defined literally.

We can definitely use the x[,i] too.

1.7.3 Conditional Expressions if, else and ifelse()

Conditional expressions are used to perform one action if a condition is TRUE, and another action (or no action) if the condition is FALSE.

Structures:

if: perform action if condition is true, perform nothing if condition is false.

# if("condition"){
#   "do something if condition == TRUE"
# }

if y else: perform action if condition is true, perform another action if condition is false

# if("condition"){
#   "do something if condition == TRUE"
# } else {
#   "do something if condition == FALSE"
# }

ifelse(): The structure is similar to the previous one, however, with ifelse() it is not necessary to use curly brackets {}, and it is verctorized.

# ifelse(condition,
#        do something if condition == TRUE,
#        do something else if condition == FALSE)

To illustrate how these expressions work, we are going to build a couple of simple examples. We are going to evaluate if the elementas of a vector ‘x’ are odd or even.

Example 1: if and else
  • We are going to evaluate if x is even or odd.
  • Remember that this process is not vectorized, so if you try to evaluate a vector whose length is n > 1, it will produce an error.
# We assign a value to variable x
x=3

if(x%%2==0){                     #if (the remainder of x/2 is 0)
  
  print(paste(x, "is even"))      # print in console: x is even
} else{
  
    print(paste(x, "is odd"))  # else, print in console: x is odd
  
  }
## [1] "3 is odd"
Example 2: ifelse()
  • We are going to evaluate if x is even or odd.
  • This process is vectorized, so it is valid to evaluate a vector whose length is n > 1.
x<-c(1:10)

ifelse(x%%2==0,
       paste(x, "is even"),
       paste(x, "is odd"))
##  [1] "1 is odd"   "2 is even"  "3 is odd"   "4 is even"  "5 is odd"  
##  [6] "6 is even"  "7 is odd"   "8 is even"  "9 is odd"   "10 is even"

1.7.4 Brief introduction to functions

Up to this point we have already used some functions that come preloaded in R.

In programming, functions are used to incorporate sets of instructions that are either desired to be used repeatedly or that, due to their complexity, are best self-contained in a subprogram and called when necessary.

A function is a piece of code written to carry out a specific task; it may or may not accept arguments or parameters and may or may not return one or more values.

When should functions be created? In general, the rule of thumb is that if you have a repetitive task that requires multiple instructions, it’s best to create a function to do it.

There are 2 basic types of functions in R:

  • Preloaded Functions, which come with the various R packages.
    • These are the ones that are most frequently used, some examples of packages containing these functions are:

      • tidyverse: A set of packages to manipulate data, among those packages, we have dplyr, the syntax used in this package is usually very pleasant for beginners, and, for many R users, it is the standard if you want to create usable scripts or programs.
      • data.table: A very useful package to manipulate large datasets, the syntax used in this package is not as simple as that of tidyverse, however, in this course we are going to focus on simple use cases, without delving into more advanced features.
      • ggplot2: This is a package to generate graphs in R, it is very versatile and customizable, it comfortably rivals with Tableau and Python.
      • base: These are the functions that are preloaded by default in R. One major advantage of learning how to use the base functions and syntax is that you don’t have to load extra packages for every minor thing that might arise.
  • User Defined Functions (UDFs), which are functions that the user builds. These in turn are divided into 2:
    • Named, which are those that can be called explicitly by the assigned name.
    • Anonymous, which only have the arguments and the body of the function, these are generally used within other functions.

For now we will concentrate on UDFs.

The basic structure of a UDF is:

# functionName<-function(arguments){
#   body of the function
#   } 

# o

# function(arguments){
#   body of the function
#   } 

To illustrate how to build a function, we will use the for loop from section 1.7.2 to build a function from this loop.

ColClass<-function(x){    # Function 'ColClass' is going to have one
                          # argument: 'x'
  
  r<-c()                  # Initialize empty variable r
  
  for (i in colnames(x)){ # for (each i in column names
                          # of the table defined by  
                          # argument 'x')
    
    r[i]<-paste("Class of column", # assign: to each element of r, for each i
                                    # in x:
                
               i,                     # "the class of column 'i'
               
               "is",                  # is 'class(tabla x[[i]])'"
               
               class(x[[i]]))
  }
  

  return(as.character(r))     # Return r
  
   # "return()" returns the result of the variable inside the parentheses.
}

# Assign to variable 'z' the result of the function 'ColClass' applied to 'list.str$dt'

z=ColClass(x=list.str$dt)
z
## [1] "Class of column x is factor"  "Class of column y is numeric"

We can go a little further, and include a validation or check step to the function.

This extra step will validate if the argument x is an object of class data.frame, if the validation is TRUE the for loop will be performed as before, otherwise it will return a message.

ColClassVal<-function(x){
  if(is.data.frame(x)){ # If objetct x is a data.frame:
    r<-c()
    for (i in colnames(x)){
      r[i]<-paste("Class of column", 
               i,                     
               "is",
               class(x[[i]]))
    }
    
  return(as.character(r))
    
  } else {              # Else, print in console .
    
    print("Object x is not a 'data.frame'")
  }
}

# Assign to variable 't' the result of the function 'ColClassVal' applied to 'list.str$dt'
t=ColClassVal(x=list.str$dt)
t
## [1] "Class of column x is factor"  "Class of column y is numeric"
# Assign to variable 'f' the result of the function 'ColClassVal' applied to 'list.str$v'
f=ColClassVal(x=list.str$v)
## [1] "Object x is not a 'data.frame'"
f
## [1] "Object x is not a 'data.frame'"

If one wants to get the manual for using some preloaded function, it is only necessary to put the following in the console:

# ?functionName

For example, let’s look at the help page for the print() function.

# ?print

If one wants to query the arguments of some preloaded function or UDF:

# args(functionName)

Let’s look at the arguments of the ColClassVal() function

args(ColClassVal)
## function (x) 
## NULL

To learn a little more about functions:

1.8 R workspace and document management:

This section will be covered entirely in the script of the chapter, however, below is a table with the functions we are going to use, and what they do.

Function Description
getwd() Get the current working directory.
dir.create(‘dirx’) Create working directory named ‘dirx’.
setwd(‘dirx’) Change working directory to ‘dirx’.
ls() List all elements of the ‘Global environment’.
list.files() List all files in the working directory.
list.dirs() List subdirectories within the working directory.
file.create(‘doc.R’) Create file ‘doc.R’ in working directory.
file.exists(‘doc.R’) Evaluate if the file ‘doc.R’ exists in the working directory.
file.info(‘doc.R’) Show metadata of file ‘doc.R’.
file.rename(‘doc.R’,‘doc2.R’) Rename file ‘doc.R’ to ‘doc2.R’.
file.copy(‘doc2.R’,‘doc3.R’) Copy contents of file ‘doc2.R’ to ‘doc3.R’.
file.path(‘dirx1’,‘dirx2’) Create route ‘dirx1/dirx2’.
args(file.path) Display arguments of function ‘file.path()’.

1.9 Review:

This section will look at a very useful package to reinforce what we have seen so far.

The swirl package allows the user to learn R, using R directly, through an interactive interface that is responsive to input from the user.

To install and run package:

> install.packages("swirl")

> library(swirl)

> swirl()

After initial setup you will be prompted to select a course, the course that deals with this chapter is R Programming: The basics of programming in R, although you can select and explore any of the options swirl() gives you.

Then, a list of lessons will appear, the lessons pertaining to this chapter are 1-9:

1: Basic Building Blocks

2: Workspace and Files

3: Sequences of Numbers

4: Vectors

5: Missing Values

6: Subsetting Vectors

7: Matrices and Data Frames

8: Logic

9: Functions

The rest will be covered in the following chapters.

When you finish the lessons, most likely swirl will show you the following message:

| Would you like to receive credit for completing this course on

| Coursera.org?

Selecting NO is suggested, but do as you wish.