Preamble

This notes is intended to give reader an introduction to basic programming concepts using R. We shall discuss the following topics

  • Usage of brackets
  • Creating function
  • Controlling flow of a program

Dataset

We shall use iris data for many of our examples. Let us look at structure of iris data
Variable.name Description Type
Sepal.Length Sepal Length of Iris Numeric
Sepal.Width Sepal Width of Iris Numeric
Petal.Length Petal Length of Iris Numeric
Petal.Width Petal Width of Iris Numeric
Species Species of Iris flower Character

Programming

We wish to perform various tasks using computers, for this we give recipie to the computer in understandable language. This process is called as programming. In this notes we shall look at some important tools in R programming.

Brackets

Brackets act as an important tool for performing tasks such as

  • Executing function
  • Accessing elements in Dataframe, matrix etc
  • Defining a Function

Using paranthesis

'( )' operator is used to specify input for a function in R. Every function must be called using the round brackets. Example

# Creating vector
Example=c(1,2,34)

# Finding sum
sum(Example)
## [1] 37
# loading package 
library(dplyr)

# 

Square brackets

Let us look at first few rows of our data

# Consider iris data
data("iris")
kbl(head(iris))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa

Any data structure in R having more than one element requires use of '[ ]' for accessing one or more of its elements, it makes use of indexing. For example let us get 3rd element of the 'Example' vector

Example[3]
## [1] 34

We can get multiple elements from a vector using vector of indices. Let us obtain 3rd, 5th, 9th and 55th values of Sepal.Length

iris$Sepal.Length[c(3,5,9,55)]
## [1] 4.7 5.0 4.4 6.5

In a data frame and matrix we can use indexing in form '[m,n]' to access particular element where 'm' represents row number and 'n' represents column number.

# Obtaining 5th row 3rd column value
iris[5,3]
## [1] 1.4

We can also subset more than one row and column using vector to specify the indices

iris[1:15,c(2,3)]
##    Sepal.Width Petal.Length
## 1          3.5          1.4
## 2          3.0          1.4
## 3          3.2          1.3
## 4          3.1          1.5
## 5          3.6          1.4
## 6          3.9          1.7
## 7          3.4          1.4
## 8          3.4          1.5
## 9          2.9          1.4
## 10         3.1          1.5
## 11         3.7          1.5
## 12         3.4          1.6
## 13         3.0          1.4
## 14         3.0          1.1
## 15         4.0          1.2

We can use "-" sign while specifying indices to leave out a row/column or multiple rows/columns

iris[-c(2:150),-c(2,3)]
##   Sepal.Length Petal.Width Species
## 1          5.1         0.2  setosa

Curly braces

While defining functions, executing conditional statements we make use of '{}'. Example of this can be seen in next topic 'Functions'.

Functions

A function can be regarded as a collection of statements. One of the strengths of R is the ability to extend R by writing new functions.

The general form of a function is given by:

Function_name = function (arg1, arg2, ...)
{
Body of function: a collection of valid statements
}

Where arg1, arg2 etc are arguments of the function, which are inputs for the functions. Body of function is where we give instructions for task we are interested. An example can be seen as

greet=function(Name){
  cat("Hello ",Name)
}
greet('Data Scientist') 
## Hello  Data Scientist

Conditional statements and Flow control

Programming requires controlling the flow from one part of the program to another. Flow control occurs through the use of loops, conditional statements and branching, and stopping conditions that cause the program to stop executing one thing and execute something else or quit entirely. Conditional statements helps in specifying which blocks of code to run on which elements under different contexts

if

if statement helps us check for a condition and executes the given instruction if condition is met and does nothing if condition is not met.

if (test_expression) 
      {
    statement
            }   
if(length(iris$Sepal.Length)>2){
  print("Sepal.Length is having more than two elements")
}
## [1] "Sepal.Length is having more than two elements"

if else

An if else statement is a programming conditional statement that , if test expression is satisfactory , performs a function or displays information .Otherwise it performs a statement of else. So it is used when an alternate option is available for a situation.

if (test_expression) 
      {
    statement 1
            }else{
            statement 2
            }   
if(length(iris$Sepal.Width)>115){
  print("Sepal.Length is having more than 115 elements")
}else{
  print("false")
}
## [1] "Sepal.Length is having more than 115 elements"

if else if

An if-else-if ladder statement is the addition of one or more if else statements one after the other. It is used when multiple responses are possible and outcome for each response is different.

if(test_expression1)
  {
  statement1
  } else if(test_expression2)
    {
    statement2
    } else if(test_expression3)
      {
      statement3
      } else
      {
          statement4
        }

An Example is shown below

if(length(iris$Sepal.Length)<10){
  print("small vector")
}else if(length(iris$Sepal.Length)>10 & length(iris$Sepal.Length)<30 )
      {
      print("moderate vector")
      } else
      {
          print("Sepal.Length of iris is a long vector")
        }
## [1] "Sepal.Length of iris is a long vector"

ifelse

We have a convenient functionifelse() in R for performing the above conditional statement. The following is the syntax for the function

ifelse(conditional expression,
task to be executed if the condition is true, 
task to be executed if the condition is false)

Let us see an example. Here we shall create a new factor variable called PL_type in iris based on values of Petal.Length ( if it is less than 4.3 PL_type is "small" otherwise it is "large")

iris$PL_type=ifelse(iris$Petal.Length<4.3,'small','large')
iris$PL_type=as.factor(iris$PL_type)

let us look at first few rows of modified data

kbl(head(iris))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species PL_type
5.1 3.5 1.4 0.2 setosa small
4.9 3.0 1.4 0.2 setosa small
4.7 3.2 1.3 0.2 setosa small
4.6 3.1 1.5 0.2 setosa small
5.0 3.6 1.4 0.2 setosa small
5.4 3.9 1.7 0.4 setosa small

We can add more ifelse conditions in the 3rd argument of the function which specifies the task to be done if the conditional expression is false. Let us see an example. we shall create a new factor variable PL_type3 in iris based on values of Petal.Length ( if it is less than 1.6 PL_type is "small", if it is greater than or equal to 1.6 and less than 5.1 as "medium" otherwise it is "large")

iris$PL_type3=ifelse(iris$Petal.Length<1.6,'small',ifelse(iris$Petal.Length>=1.3&iris$Petal.Length<5.1,'medium','large'))
iris$PL_type=as.factor(iris$PL_type3)

Let us see the first few lines of iris

kbl(head(iris))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species PL_type PL_type3
5.1 3.5 1.4 0.2 setosa small small
4.9 3.0 1.4 0.2 setosa small small
4.7 3.2 1.3 0.2 setosa small small
4.6 3.1 1.5 0.2 setosa small small
5.0 3.6 1.4 0.2 setosa small small
5.4 3.9 1.7 0.4 setosa medium medium

for

A for loop is a repetition control structure that allows to efficiently write a loop that needs to execute a specific number of times.

for (value in vector) {
   statements
}
# Considering Species variable of iris and using index of elements
for(i in 1:5)
{
  print(iris$Species[i])
}
## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica

for loops are particularly flexible that they are not limited to integers, or even numbers in the input. We can pass character vectors, logical vectors, lists or expressions.

# Considering Species variable of iris
for(species in iris$Species[c(47:53,108)])
  {
  print(species)
} 
## [1] "setosa"
## [1] "setosa"
## [1] "setosa"
## [1] "setosa"
## [1] "versicolor"
## [1] "versicolor"
## [1] "versicolor"
## [1] "virginica"

repeat

The Repeat loop executes the same code again and again until a stop condition is met.

repeat { 
   commands 
   if(condition) {
      break
   }
}
v <- "Repeating statement for "
cutoff <- 1

repeat {
   cat(v,cutoff," time \n")
   cutoff <- cutoff+1
   
   if(cutoff > 5) {
      break
   }
}
## Repeating statement for  1  time 
## Repeating statement for  2  time 
## Repeating statement for  3  time 
## Repeating statement for  4  time 
## Repeating statement for  5  time
print("Came out of loop to next statement")
## [1] "Came out of loop to next statement"

while

The While loop executes the code for all elements for which the condition is met.

while (test_expression) {
   statement
}
x=20
while(x<50){
  print("Re do the exam")
  x= x+5
}
## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
print(x)
## [1] 50

break

When the break statement is encountered inside a loop, the loop is immediately terminated and program control resumes at the next statement following the loop.

var=1
repeat {
  statement
   var <- var + 1
    
   if(var > 5) {
      break
   }
}

we have seen example for break statement before in repeat statement

Final Note

In this notes we have learn how to brackets and also how to control the flow of the program using conditional loops and control statements. We have discussed

  • Creating function and usage of {}
  • Executing function using ()
  • Accessing elements in Dataframe, matrix etc. using []
  • if else if
  • for
  • while
  • break
  • next