Control Structures (Statements) are expressions used to control the execution and flow of the program based on the conditions provided in the statements. These structures are used to make a decision after assessing the variable. For more detail, please watch the following video:
Basically, control structures allow you to put some “logic” into your R code, rather than just always executing the same R code every time. Control structures allow you to respond to inputs or to features of the data and execute different R expressions accordingly. Commonly used control structures are
if and else testing a condition and acting on itfor execute a loop a fixed number of timeswhile execute a loop while a condition is truerepeat execute an infinite loop (must break out of it to stop)break broken fragment of the execution loopnext skip an interaction of a loopMost control structures are not used in interactive sessions, but rather when writing functions or longer expressions. However, these constructs do not have to be used in functions and it’s a good idea to become familiar with them before.
if-elseThe if-else combination is probably the most commonly used control structure in R (even for all other programming languages). This structure allows you to test a condition and act on it depending on whether it’s true or false. You can have a series of tests by following the initial if with any number of else if and else itself. The general function for this, as the following code:
if (condition1) {
do something }
else if (condition2) {
do something different compare to condition1}
else (optional){
do something different compare to others}if-else Flow Chart
Here is an example of a valid if-else structure.
x <- runif(1, 0, 100) # pick `one` random number from 0 to 100
ifelse <- function(x){
if (x>80){ # condition 1
print('A')} # result for condition 1
else if (x<80 & x >70){ # condition 2
print('B')} # result for condition 2
else{ # condition 3
print('Fail')} # result for condition 3
}
ifelse(x)## [1] "Fail"
In Machine Learning tasks, usually, we need to split the data set between a train set and a test set. The train set allows the algorithm to learn from the data. To test the performance of our model, we can use the test set to return the performance measure. R base does not have a function to create two data sets. We can write our function to do that. Our function takes two arguments and is called split_data(). The idea behind is simple, we multiply the length of the data set (i.e. number of observations) with 0.8. For instance, if we want to split the data set 80/20, and our data set contains 100 rows, then our function will multiply 0.8*100 = 80. Then we have 80 rows that will be selected to become our training data.
We will use the air quality data set to test our user-defined function. The airquality data set has 153 rows. We can see it with the code below:
## [1] 153
This is it, we can write the function. We only need to change airquality to df because we want to try our function to any data frame, not only airquality:
split_data <- function(df, train = TRUE){
length<- nrow(df)
total_row <- length *0.75
split <- 1:length *0.75
if (train ==TRUE){
train_df <- df[split, ]
return(train_df)
} else {
test_df <- df[-split, ]
return(test_df)
}
}
train <- split_data(airquality, train = TRUE) # split for training data
test <- split_data(airquality, train = FALSE) # split for testing data
dim(train) # print dimension of the training data## [1] 152 6
## [1] 39 6
Of course, there are many packages that you can use to split your training and testing data. But here, we just focus on how to create a function. We will learn to split training and tasting data in chapter Data Manipulation.
for LoopsIn my experience doing data analysis, I’ve found very few situations where a for loop is very valuable when we need to iterate over a list of elements or a range of numbers. But I tell you that, the loop can be used to iterate over a vector, matrix, list, data frame, or any other object. One thing you should know, R will loop over all the variables in vector and do the computation written inside the Expression below.
for Flow Chart
Let’s iterate over all the elements of a vector and print the current value.
fruit <- c('Apple', 'Orange', 'Papaya', 'Banana') # create fruit vector
for ( i in fruit){ # create a `for` statement
print(i) # output
}## [1] "Apple"
## [1] "Orange"
## [1] "Papaya"
## [1] "Banana"
Creates a non-linear function by using the polynomial of x between 1 and 3 and we store it in a list
list <- c() # create an empty list
for (i in seq(1, 3, by=1)) { # create a `for` statement
list[[i]] <- i*i # to populate the list
}
print(list) # output## [[1]]
## [1] 1
##
## [[2]]
## [1] 4
##
## [[3]]
## [1] 9
Imagine we have df, We want to compute the average of each column. You could do with copy-and-paste:
set.seed(123) # to ensure we generate the same data
df<- data.frame( # create dataframe
a = rnorm(10, 5, 1), # vector `a` with normal random numbers
b = rnorm(10, 5, 1), # vector `b` with normal random numbers
c = rnorm(10, 5, 1) # vector `c` with normal random numbers
)
mean(df$a) # calculate the average of `a`## [1] 5.074626
## [1] 5.208622
## [1] 4.575441
But that breaks our rule of thumb: never copy and paste more than twice. Instead, we could use a for loop:
output <- vector("double", ncol(df)) # create an empty list
for (i in seq_along(df)) { # sequence
output[[i]] <- mean(df[[i]]) # body
}
output # the output result## [1] 5.074626 5.208622 4.575441
A matrix has 2-dimension, rows and columns. Therefore, to iterate over a matrix, we have to define two for loop, namely one for the rows and another for the column.
mat<- matrix(data=seq(11,20,by=1), nrow=5,ncol=2) # Create a matrix
for(r in 1:nrow(mat)) # loop with r and c to iterate over the matrix
for (c in 1:ncol(mat))
print(paste("Row", r, "and column",c, "have values of", mat[r,c])) ## [1] "Row 1 and column 1 have values of 11"
## [1] "Row 1 and column 2 have values of 16"
## [1] "Row 2 and column 1 have values of 12"
## [1] "Row 2 and column 2 have values of 17"
## [1] "Row 3 and column 1 have values of 13"
## [1] "Row 3 and column 2 have values of 18"
## [1] "Row 4 and column 1 have values of 14"
## [1] "Row 4 and column 2 have values of 19"
## [1] "Row 5 and column 1 have values of 15"
## [1] "Row 5 and column 2 have values of 20"
## [,1] [,2]
## [1,] 11 16
## [2,] 12 17
## [3,] 13 18
## [4,] 14 19
## [5,] 15 20
Do the same thing for df as we have use above.
Here, we learn hoe to build consecutive loops example: walk a matrix
# First we create a matrix full of 1s
my_matrix <- matrix(data = 1, nrow = 5, ncol = 5)
# Now we walk the matrix using the indexes and two consecutive for loops
# We put on each element the number of the element 1, 2, 3, etc
for(i in 1:nrow(my_matrix)){
for(j in 1:ncol(my_matrix)){
my_matrix[i, j] <- (i-1)*5 + j
}
}
my_matrix## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
## [3,] 11 12 13 14 15
## [4,] 16 17 18 19 20
## [5,] 21 22 23 24 25
Please create your own consecutive loops example!
Sometimes you want to use a for loop to modify an existing object. For example, remember our challenge from functions (Normalize). We wanted to rescale/normalize every column in a data frame:
set.seed(123) # to ensure we generate the same data
library(tidyverse) # load tidyverse library for `tibble`
df <- tibble( # dataframe using `tibble`
a = rnorm(10, 5, 1), # vector `a` with normal random numbers
b = rnorm(10, 5, 1), # vector `b` with normal random numbers
c = rnorm(10, 5, 1) # vector `c` with normal random numbers
)
rescale <- function(x) { # this function is the sama as `normalize`
rng <- range(x, na.rm = TRUE)
(x - rng[1]) / (rng[2] - rng[1])
}
for (i in seq_along(df)) {
df[[i]] <- rescale(df[[i]])
}
df## # A tibble: 10 x 3
## a b c
## <dbl> <dbl> <dbl>
## 1 0.236 0.850 0.210
## 2 0.347 0.620 0.499
## 3 0.948 0.631 0.225
## 4 0.448 0.553 0.326
## 5 0.468 0.376 0.361
## 6 1 1 0
## 7 0.579 0.657 0.859
## 8 0 0 0.626
## 9 0.194 0.711 0.187
## 10 0.275 0.398 1
while LoopsSometimes you don’t even know how long the input sequence should run for. This is common when doing simulations. For example, you might want to loop until you get three heads in a row. You can’t do the cache=TRUE, eat sort of iteration with the for loop. Instead, you can use a while loop. A while loop is simpler than for loop because it only has two components, a condition, and a body.
While loops begin by testing a condition. If it is true, then they execute the loop body. Once the loop body is executed, the condition is tested again, and so forth, until the condition is false, after which the loop exits. The syntax for a while loop is the following:
while Flow Chart
Let’s go through a very simple example to understand the concept of while loop. You will create a loop and after each run add 2 to the stored variable. You need to close the loop, therefore we explicitly tells R to stop looping when the variable reached 11.
begin <- 0 # create a variable with value 3
while (begin <= 12){ # create the loop
cat('This is loop number', begin) # see which we are
begin <- begin+2 # add 1 to the variable begin after each loop
print(begin)
}## This is loop number 0[1] 2
## This is loop number 2[1] 4
## This is loop number 4[1] 6
## This is loop number 6[1] 8
## This is loop number 8[1] 10
## This is loop number 10[1] 12
## This is loop number 12[1] 14
Do the same thing, but in your case to create a sequences ten values of prime numbers.
While loops can potentially result in infinite loops if not written properly. Sometimes there will be more than one condition in the test:
z <- c(4,5,11)
while(z >= 3 && z <= 10) {
coin <- rbinom(1, 1, 0.5)
if(coin == 1) {
z <- z + 1
}
else {z <- z - 1
}
}
print(z)## [1] 2 3 9
Conditions are always evaluated from left to right. For example, in the above code, if z were less than 3, the second test would not have been evaluated.
break LoopsA break statement used to skip/stop an iterations and flow the control outside of the loop (for, while, repeat). Even though, these are not commonly used in statistical or data analysis applications but they do have their uses.
break Flow Chart
Let’s iterate over the vector \(x,\) which has consecutive numbers from 1 to 5. Inside the for loop we have used a if condition to break if the current value is equal to 3. As we can see from the output, the loop terminates when it encounters the break statement.
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
next LoopsA next statement is useful when we want to skip the current iteration of a loop immediately without terminating, regardless of what iteration the loop may be on.
next Flow Chart
Use the next statement inside a condition to check if the value is equal to 3. If the value is equal to 3, the current evaluation stops (value is not printed) but the loop continues with the next iteration.
## [1] 1
## [1] 2
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
repeat LoopsA repeat loop is used to iterate over a block of code multiple number of times. There is no condition check in the loop to exit. We must ourselves put a condition explicitly inside the body of the loop.
repeat Flow Chart