In this chapter we are going to see the most basic concepts that are required to be able to use R in a functionaly, that is, under the Functional Programming paradigm.
Chapter content:
swirl packageThe <-, ->, = operators
can be used to assign values to variables.
The %>% operator is used to pipe variables or
function results to another function, to use it you need to install the
magrittr package.
| Assignment.operator | Meaning | Syntax |
|---|---|---|
<- |
assignment to the left | x<-5 |
-> |
aassignment to the right | 5->x |
= |
assignment to the left (alternative) | x=5 |
%>% |
piping | x<-c(4) %>% print() |
install.packages("magrittr")
library(magrittr)
x<-5
5->x
x=5
x<-c(4) %>% print()
In this section, we will focus on the basic data types that exist natively in R, there are many other data types that can be defined using Object Oriented Programming, some are required to be able to use some features or projects that make use of R, such as Bioconductor.
| Data.type | Syntax | Result | Verification | Coercion |
|---|---|---|---|---|
| character | x<-'a' |
‘a’ | is.character() | as.character() |
| numeric | x<-1 |
1 | is.numeric() | as.numeric() |
| integer | x<-1L |
1L | is.integer() | as.integer() |
| logical | x<-TRUE |
TRUE | is.logical() | as.logical() |
| complex | x<-1+4i |
1+4i | is.complex() | as.complex() |
| double | x<-4.22 |
4.22 | is.double() | as.double() |
| factor | x<-factor(c('a','b')) |
Factor w/ 2 levels “a”,“b”: 1 2 | is.factor() | as.factor() |
| NA | x<-NA |
NA | is.na() | as.na() |
| NaN | x<-NaN |
NaN | is.nan() | as.nan() |
| NULL | x<-NULL |
NULL | is.null() | as.null() |
c() is a function that is used to
combine elements in a single variable, it means combine
| tipo | is.vector | is.character | is.numeric | is.integer | is.logical | is.complex | is.double | is.factor | is.na | is.nan | is.null |
|---|---|---|---|---|---|---|---|---|---|---|---|
| character | T | T | |||||||||
| numeric | T | T | T | ||||||||
| integer | T | T | T | ||||||||
| logical | T | T | |||||||||
| complex | T | T | |||||||||
| double | T | T | T | ||||||||
| factor | T | ||||||||||
| NA | T | T | T | ||||||||
| NaN | T | T | T | T | T | ||||||
| NULL | logical(0) | logical(0) | T |
Some data types (such as numeric, doubles, integrals, NA, NaN) can be checked by verification functions other than those corresponding to them. As can be seen, factors are not vectors, and the NULL object is used to indicate empty variables.
| Dimensions | Structure | Type | Syntax | Verification | Coerción |
|---|---|---|---|---|---|
| 1 | Atomic vector | Homogeneous | v <- c(1,2,3) |
is.vector() | as.vector() |
| 2 | Matrix | Homogeneous | m <- matrix ( c(1:9), nrow = 3, ncol = 3) |
is.matrix() | as.matrix() |
| n | Array | Homogeneous | a <- array(c(1:18), dim = c(3,3,2)) |
is.array() | as.array() |
| 1 | List | Heterogeneous | l <- list('a'='a', 'b'= c(1,2,3) ) |
is.list() | as.list() |
| 2 | Table (data.frame) | Heterogeneous | dt <- data.frame( x=c('a','b'), y=c(1,2)) |
is.data.frame() | as.data.frame() |
Dimensions refers to the (i,j) coordinates in a matrix or table: i are the rows, j are the columns.
Type refers to what type of data the structure can contain.
NOTE: By default, when creating a
data.frame, coercing to data.frame or
importing data as data.frame, R will coerce all columns
where there are characters to factors, it is necessary to set the
argument stringsAsFactors = FALSE, if this is not
desired.
The access notation may seem confusing, but in the script you can see in more detail how each one works.
| Syntax | Objects | Description |
|---|---|---|
x[c(i)] |
Vectors and Lists | Select elements of object x, described en
i. Element i can be a vector of 1 or
n elements (inside of c()) of type
integer, character (literal names of the
elements) or logical. When it is used in vectors, returns a
vector, when it is used in lists, returns a list. |
x[[c(i)]] |
Lists and Data.frames | Returns a unique element of x that is
found in position i. Element i
can be a vector of tipe integer or character
of length 1 or n, in case it is inside c(). |
x$j |
Lists and Data.frames | Returns an object of name j in position
j of object x. |
x[c(i),c(j)] |
Data.frames and Matrices | Returns an object in row i
and column j. The elements of i
and j can be vectors of type integer or
character (literal names of the elements), of length 1 or
n, if they are inside c(). |
x[i,j,h] |
Arrays | Select element of object x with
coordinates i,j,h. h
is the matrix index inside the array, i are
rows and j columns |
x[[c(i)]] [c(i),c(j)] |
Lists and Data.frames | This one is a combination of notations
x[[i]], and x[i] or x[i,j].
Notation x[[j]][i] works in data.frames the
same way as notation x[i,j] or x$j[i]. Notation
x[[i]][i,j] works in lists which element
i, is a data.frame or matrix with
coordinates i,j; Likewise, notation
x[[i]][i] works in lists which element
i is a vector of one dimension. |
There are occasions where these notations can be combined with each other, an example where you can see them all in action would be a list made up of all the structures previously mentioned in section 1.3. Example that will be addressed in the script of this section.
As the name implies, they are used to perform arithmetic operations on numbers alone, or with vectors.
| operator | description | example | result |
|---|---|---|---|
| + | sum | 1+2 | 3.0 |
| - | subtraction | 1-2 | -1.0 |
| ’*’ | multiplication | 1*2 | 2.0 |
| / | division | 23/2 | 11.5 |
| ^ or ** | power | 2^3 | 8.0 |
| %% | modulo (residue of division) | 23 %% 2 | 1.0 |
| %/% | integral division (integer part of quotient) | 23 %/% 2 | 11.0 |
Logical operators are used to evaluate conditions.
Logical operators can be used in conjunction with data structure access notations to be able to subset them.
| operator | description | Syntax | result |
|---|---|---|---|
| >, < | greater than, less than | x[x>5] | [1] 6 7 8 9 10 |
| >=, <= | greater or equal than, less or equal than | x[x<=5] | [1] 1 2 3 4 5 |
| == | equal to | x[x==5] | [1] 5 |
| != | not equal to | x[x!=5] | [1] 1 2 3 4 6 7 8 9 10 |
| !x | NOT (boolean) x | x[!(x==5)] | [1] 1 2 3 4 6 7 8 9 10 |
| x | y | x OR (boolean) y | x[x > 8 | x < 5] | [1] 1 2 3 4 9 10 |
| x & y | x AND (boolean) y | x[x > 5 & x <= 8] | [1] 6 7 8 |
| isTRUE(x) | Is x TRUE? | isTRUE(1>2) | [1] FALSE |
| x%in%y | elements of x in y | c(1,2)%in%x | [1] TRUE TRUE |
Let’s explore what happens in the “Syntax” column of the above table
with the x>5 condition.
# We assign to 'x' a sequence of values from 1 to 10
x=c(1:10)
# Which values of 'x' meet the condition 'x>5'?
x
## [1] 1 2 3 4 5 6 7 8 9 10
x>5
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
# To visualize them better, we apply the 'cbind' function
# ('column bind', 'column bind') or 'rbind'
# ('row bind','row union'), to bind 'x' and 'x>5'
# as columns or rows of a table.
# Note: The result of x>5 is coerced into integrals
# so TRUE(true)=1, FALSE(False)=0
cbind(x,y=as.logical(x>5))
## x y
## [1,] 1 0
## [2,] 2 0
## [3,] 3 0
## [4,] 4 0
## [5,] 5 0
## [6,] 6 1
## [7,] 7 1
## [8,] 8 1
## [9,] 9 1
## [10,] 10 1
rbind(x,y=as.logical(x>5))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## x 1 2 3 4 5 6 7 8 9 10
## y 0 0 0 0 0 1 1 1 1 1
# Alternatively, what we saw in the previous sections can be applied:
# We create an empty matrix with 2 columns and 10 rows
z<-matrix(ncol=2,
nrow = 10)
# We assign to the first column of the matrix z the values of 'x'
z[,1]<-x
# We assign to the second column of matrix z the values of 'x>5'
z[,2]<-x>5
z
## [,1] [,2]
## [1,] 1 0
## [2,] 2 0
## [3,] 3 0
## [4,] 4 0
## [5,] 5 0
## [6,] 6 1
## [7,] 7 1
## [8,] 8 1
## [9,] 9 1
## [10,] 10 1
Going back to the arithmetic operators, we observe that when they are applied to some data structure, it is NOT necessary to carry out a special process so that the operation is carried out for each of the values in a certain object, as long as the values are numeric, or can be coerced to numeric values, these operations are known as “vectorized operations”.
For the following example, we are going to take the table structures
from section 1.3 and assign them to a list named
list.str.
list.str<-list('dt'=data.frame(x=c('a','b'), #Data.frame
y=c(1,2),
stringsAsFactors = TRUE),
# Edit 2022: stringsAsFactors argument of data.frame() function is defaulted to FALSE
'l'=list('a'='a', #List
'b'=c(1,2,3)),
'm'=matrix(c(1:9), #Matrix
nrow = 3,
ncol = 3),
'a'=array(c(1:18), #Array
dim = c(3,3,2)),
'v'=c(1,2,3)) #Vector
# Coerce factors from column x of table dt, from list.str,
# to numeric and multiply them by two
as.numeric(list.str$dt$x)*2
## [1] 2 4
In this example, it can be seen that it is enough to declare that the
object list.str$dt$x is going to be coerced to numeric by
means of as.numeric(), and that the result is going to be
multiplied by 2, both the as.numeric() function and the
*2 operation are applied in a vectorized fashion to the
values of list.str$dt$x.
On the contrary, many functions are not vectorized on some types of
objects, such as columns, or elements within lists, as an example let’s
take the class() function applied to
list.str.
#If we apply class() to list.str we get:
class(list.str)
## [1] "list"
#If we apply class() to list.str$dt we get:
class(list.str$dt)
## [1] "data.frame"
#If we apply class() to list.str$dt$x we get:
class(list.str$dt$x)
## [1] "factor"
#If we apply class() to list.str$dt$y we get:
class(list.str$dt$y)
## [1] "numeric"
As you can see, we have to declare class() for
list.str$dt$x and list.str$dt$y
individually.
Is there a way for us to get the class of both columns of
list.str$dt without having to declare the
class() function individually?
for loop.The previously mentioned problem can be solved with a
for loop.
The for loop is used to iterate or repeat statements, or
blocks of statements a specified number of times, and this number is
defined by an index or counter.
It has the following structure:
# for (i in x){
# block of instructions to iterate through each item 'i' in x
# }
# We define the variable that will determine the index for the loop
x=c(1:10) # sequence from 1 to 10.
# "for(i in x)" means: "for (each item 'i' in x)"
for (i in x){
#instruction: show in console the result of multiplying x by 2
x[i]<-x[i]*2 # for each 'i' in x, we are going to assign
# each element 'i' in x multiplied by 2
print(x[i]) # print each 'i' from x" in console
#The function 'print()' shows in console what is inside the parentheses
}
## [1] 2
## [1] 4
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14
## [1] 16
## [1] 18
## [1] 20
A little more sophisticated:
# We define a sequence in x
x=c(1:10)
# We define the (empty) object y where the results of the loop will be stored.
y=c()
#We define the multiplier m
m=2
# loop:
for (i in x){
y[i]<-x[i]*m
print(paste(x[i],"x",m, "is",y[i]))
# The function 'paste()' concatenates the elements inside the parentheses.
}
## [1] "1 x 2 is 2"
## [1] "2 x 2 is 4"
## [1] "3 x 2 is 6"
## [1] "4 x 2 is 8"
## [1] "5 x 2 is 10"
## [1] "6 x 2 is 12"
## [1] "7 x 2 is 14"
## [1] "8 x 2 is 16"
## [1] "9 x 2 is 18"
## [1] "10 x 2 is 20"
Applying the for loop to the problem
for (i in colnames(list.str$dt)){ # for (each i in table column names)
print(paste("class of column", # show in console:
i, # class of column i
"is", # is 'class(table[[i]])'
class(list.str$dt[[i]])))
}
## [1] "class of column x is factor"
## [1] "class of column y is numeric"
# The 'colnames()' function gets the column names of a table.
Note that we use the x[[i]] notation inside the
class() function. This is because this notation allows
selecting columns either by numerical index, or a string of
characters defined in i at each iteration of the statement, which is not
possible with the list.str$dt$i notation, because the
x$j form requires j to be defined
literally.
We can definitely use the x[,i] too.
if, else and
ifelse()Conditional expressions are used to perform one action if a condition is TRUE, and another action (or no action) if the condition is FALSE.
Structures:
if: perform action if condition is true, perform nothing if condition is false.
# if("condition"){
# "do something if condition == TRUE"
# }
if y else: perform action if condition is true, perform another action if condition is false
# if("condition"){
# "do something if condition == TRUE"
# } else {
# "do something if condition == FALSE"
# }
ifelse(): The structure is similar to the
previous one, however, with ifelse() it is not necessary to
use curly brackets {}, and it is verctorized.
# ifelse(condition,
# do something if condition == TRUE,
# do something else if condition == FALSE)
To illustrate how these expressions work, we are going to build a couple of simple examples. We are going to evaluate if the elementas of a vector ‘x’ are odd or even.
if and else# We assign a value to variable x
x=3
if(x%%2==0){ #if (the remainder of x/2 is 0)
print(paste(x, "is even")) # print in console: x is even
} else{
print(paste(x, "is odd")) # else, print in console: x is odd
}
## [1] "3 is odd"
ifelse()x<-c(1:10)
ifelse(x%%2==0,
paste(x, "is even"),
paste(x, "is odd"))
## [1] "1 is odd" "2 is even" "3 is odd" "4 is even" "5 is odd"
## [6] "6 is even" "7 is odd" "8 is even" "9 is odd" "10 is even"
Up to this point we have already used some functions that come preloaded in R.
In programming, functions are used to incorporate sets of instructions that are either desired to be used repeatedly or that, due to their complexity, are best self-contained in a subprogram and called when necessary.
A function is a piece of code written to carry out a specific task; it may or may not accept arguments or parameters and may or may not return one or more values.
When should functions be created? In general, the rule of thumb is that if you have a repetitive task that requires multiple instructions, it’s best to create a function to do it.
There are 2 basic types of functions in R:
These are the ones that are most frequently used, some examples of packages containing these functions are:
tidyverse: A set of packages to manipulate data, among
those packages, we have dplyr, the syntax used in this
package is usually very pleasant for beginners, and, for many R users,
it is the standard if you want to create usable scripts or
programs.data.table: A very useful package to manipulate large
datasets, the syntax used in this package is not as simple as that of
tidyverse, however, in this course we are going to focus on
simple use cases, without delving into more advanced features.ggplot2: This is a package to generate graphs in R, it
is very versatile and customizable, it comfortably rivals with
Tableau and Python.base: These are the functions that are preloaded by
default in R. One major advantage of learning how to use the
base functions and syntax is that you don’t have to load
extra packages for every minor thing that might arise.For now we will concentrate on UDFs.
The basic structure of a UDF is:
# functionName<-function(arguments){
# body of the function
# }
# o
# function(arguments){
# body of the function
# }
To illustrate how to build a function, we will use the
for loop from section 1.7.2 to build a function from this
loop.
ColClass<-function(x){ # Function 'ColClass' is going to have one
# argument: 'x'
r<-c() # Initialize empty variable r
for (i in colnames(x)){ # for (each i in column names
# of the table defined by
# argument 'x')
r[i]<-paste("Class of column", # assign: to each element of r, for each i
# in x:
i, # "the class of column 'i'
"is", # is 'class(tabla x[[i]])'"
class(x[[i]]))
}
return(as.character(r)) # Return r
# "return()" returns the result of the variable inside the parentheses.
}
# Assign to variable 'z' the result of the function 'ColClass' applied to 'list.str$dt'
z=ColClass(x=list.str$dt)
z
## [1] "Class of column x is factor" "Class of column y is numeric"
We can go a little further, and include a validation or check step to the function.
This extra step will validate if the argument x is an
object of class data.frame, if the validation is
TRUE the for loop will be performed as before,
otherwise it will return a message.
ColClassVal<-function(x){
if(is.data.frame(x)){ # If objetct x is a data.frame:
r<-c()
for (i in colnames(x)){
r[i]<-paste("Class of column",
i,
"is",
class(x[[i]]))
}
return(as.character(r))
} else { # Else, print in console .
print("Object x is not a 'data.frame'")
}
}
# Assign to variable 't' the result of the function 'ColClassVal' applied to 'list.str$dt'
t=ColClassVal(x=list.str$dt)
t
## [1] "Class of column x is factor" "Class of column y is numeric"
# Assign to variable 'f' the result of the function 'ColClassVal' applied to 'list.str$v'
f=ColClassVal(x=list.str$v)
## [1] "Object x is not a 'data.frame'"
f
## [1] "Object x is not a 'data.frame'"
If one wants to get the manual for using some preloaded function, it is only necessary to put the following in the console:
# ?functionName
For example, let’s look at the help page for the print()
function.
# ?print
If one wants to query the arguments of some preloaded function or UDF:
# args(functionName)
Let’s look at the arguments of the ColClassVal()
function
args(ColClassVal)
## function (x)
## NULL
To learn a little more about functions:
This section will be covered entirely in the script of the chapter, however, below is a table with the functions we are going to use, and what they do.
| Function | Description |
|---|---|
| getwd() | Get the current working directory. |
| dir.create(‘dirx’) | Create working directory named ‘dirx’. |
| setwd(‘dirx’) | Change working directory to ‘dirx’. |
| ls() | List all elements of the ‘Global environment’. |
| list.files() | List all files in the working directory. |
| list.dirs() | List subdirectories within the working directory. |
| file.create(‘doc.R’) | Create file ‘doc.R’ in working directory. |
| file.exists(‘doc.R’) | Evaluate if the file ‘doc.R’ exists in the working directory. |
| file.info(‘doc.R’) | Show metadata of file ‘doc.R’. |
| file.rename(‘doc.R’,‘doc2.R’) | Rename file ‘doc.R’ to ‘doc2.R’. |
| file.copy(‘doc2.R’,‘doc3.R’) | Copy contents of file ‘doc2.R’ to ‘doc3.R’. |
| file.path(‘dirx1’,‘dirx2’) | Create route ‘dirx1/dirx2’. |
| args(file.path) | Display arguments of function ‘file.path()’. |
This section will look at a very useful package to reinforce what we have seen so far.
The swirl package allows the user to learn R, using R
directly, through an interactive interface that is responsive to
input from the user.
To install and run package:
> install.packages("swirl")
> library(swirl)
> swirl()
After initial setup you will be prompted to select a course, the
course that deals with this chapter is R Programming: The basics
of programming in R, although you can select and explore any of
the options swirl() gives you.
Then, a list of lessons will appear, the lessons pertaining to this chapter are 1-9:
1: Basic Building Blocks
2: Workspace and Files
3: Sequences of Numbers
4: Vectors
5: Missing Values
6: Subsetting Vectors
7: Matrices and Data Frames
8: Logic
9: Functions
The rest will be covered in the following chapters.
When you finish the lessons, most likely swirl will show
you the following message:
| Would you like to receive credit for completing this course on
| Coursera.org?
Selecting NO is suggested, but do as you wish.