Preamble
This notes is intended to give reader an introduction to basic programming concepts using R. We shall discuss the following topics
- Usage of brackets
- Creating function
- Controlling flow of a program
Dataset
We shall use iris data for many of our examples. Let us look at structure of iris data| Variable.name | Description | Type |
|---|---|---|
| Sepal.Length | Sepal Length of Iris | Numeric |
| Sepal.Width | Sepal Width of Iris | Numeric |
| Petal.Length | Petal Length of Iris | Numeric |
| Petal.Width | Petal Width of Iris | Numeric |
| Species | Species of Iris flower | Character |
Programming
We wish to perform various tasks using computers, for this we give recipie to the computer in understandable language. This process is called as programming. In this notes we shall look at some important tools in R programming.
Brackets
Brackets act as an important tool for performing tasks such as
- Executing function
- Accessing elements in Dataframe, matrix etc
- Defining a Function
Using paranthesis
'( )' operator is used to specify input for a function in R. Every function must be called using the round brackets. Example
# Creating vector
Example=c(1,2,34)
# Finding sum
sum(Example)## [1] 37
# loading package
library(dplyr)
# Square brackets
Let us look at first few rows of our data
# Consider iris data
data("iris")
kbl(head(iris))| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa |
Any data structure in R having more than one element requires use of '[ ]' for accessing one or more of its elements, it makes use of indexing. For example let us get 3rd element of the 'Example' vector
Example[3]## [1] 34
We can get multiple elements from a vector using vector of indices. Let us obtain 3rd, 5th, 9th and 55th values of Sepal.Length
iris$Sepal.Length[c(3,5,9,55)]## [1] 4.7 5.0 4.4 6.5
In a data frame and matrix we can use indexing in form '[m,n]' to access particular element where 'm' represents row number and 'n' represents column number.
# Obtaining 5th row 3rd column value
iris[5,3]## [1] 1.4
We can also subset more than one row and column using vector to specify the indices
iris[1:15,c(2,3)]## Sepal.Width Petal.Length
## 1 3.5 1.4
## 2 3.0 1.4
## 3 3.2 1.3
## 4 3.1 1.5
## 5 3.6 1.4
## 6 3.9 1.7
## 7 3.4 1.4
## 8 3.4 1.5
## 9 2.9 1.4
## 10 3.1 1.5
## 11 3.7 1.5
## 12 3.4 1.6
## 13 3.0 1.4
## 14 3.0 1.1
## 15 4.0 1.2
We can use "-" sign while specifying indices to leave out a row/column or multiple rows/columns
iris[-c(2:150),-c(2,3)]## Sepal.Length Petal.Width Species
## 1 5.1 0.2 setosa
Curly braces
While defining functions, executing conditional statements we make use of '{}'. Example of this can be seen in next topic 'Functions'.
Functions
A function can be regarded as a collection of statements. One of the strengths of R is the ability to extend R by writing new functions.
The general form of a function is given by:
Function_name = function (arg1, arg2, ...)
{
Body of function: a collection of valid statements
}
Where arg1, arg2 etc are arguments of the function, which are inputs for the functions. Body of function is where we give instructions for task we are interested. An example can be seen as
greet=function(Name){
cat("Hello ",Name)
}
greet('Data Scientist') ## Hello Data Scientist
Conditional statements and Flow control
Programming requires controlling the flow from one part of the program to another. Flow control occurs through the use of loops, conditional statements and branching, and stopping conditions that cause the program to stop executing one thing and execute something else or quit entirely. Conditional statements helps in specifying which blocks of code to run on which elements under different contexts
if
if statement helps us check for a condition and executes the given instruction if condition is met and does nothing if condition is not met.
if (test_expression)
{
statement
}
if(length(iris$Sepal.Length)>2){
print("Sepal.Length is having more than two elements")
}## [1] "Sepal.Length is having more than two elements"
if else
An if else statement is a programming conditional statement that , if test expression is satisfactory , performs a function or displays information .Otherwise it performs a statement of else. So it is used when an alternate option is available for a situation.
if (test_expression)
{
statement 1
}else{
statement 2
}
if(length(iris$Sepal.Width)>115){
print("Sepal.Length is having more than 115 elements")
}else{
print("false")
}## [1] "Sepal.Length is having more than 115 elements"
if else if
An if-else-if ladder statement is the addition of one or more if else statements one after the other. It is used when multiple responses are possible and outcome for each response is different.
if(test_expression1)
{
statement1
} else if(test_expression2)
{
statement2
} else if(test_expression3)
{
statement3
} else
{
statement4
}
An Example is shown below
if(length(iris$Sepal.Length)<10){
print("small vector")
}else if(length(iris$Sepal.Length)>10 & length(iris$Sepal.Length)<30 )
{
print("moderate vector")
} else
{
print("Sepal.Length of iris is a long vector")
}## [1] "Sepal.Length of iris is a long vector"
ifelse
We have a convenient functionifelse() in R for performing the above conditional statement. The following is the syntax for the function
ifelse(conditional expression,
task to be executed if the condition is true,
task to be executed if the condition is false)
Let us see an example. Here we shall create a new factor variable called PL_type in iris based on values of Petal.Length ( if it is less than 4.3 PL_type is "small" otherwise it is "large")
iris$PL_type=ifelse(iris$Petal.Length<4.3,'small','large')
iris$PL_type=as.factor(iris$PL_type)let us look at first few rows of modified data
kbl(head(iris))| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | PL_type |
|---|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa | small |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa | small |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa | small |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa | small |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa | small |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa | small |
We can add more ifelse conditions in the 3rd argument of the function which specifies the task to be done if the conditional expression is false. Let us see an example. we shall create a new factor variable PL_type3 in iris based on values of Petal.Length ( if it is less than 1.6 PL_type is "small", if it is greater than or equal to 1.6 and less than 5.1 as "medium" otherwise it is "large")
iris$PL_type3=ifelse(iris$Petal.Length<1.6,'small',ifelse(iris$Petal.Length>=1.3&iris$Petal.Length<5.1,'medium','large'))
iris$PL_type=as.factor(iris$PL_type3)Let us see the first few lines of iris
kbl(head(iris))| Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | PL_type | PL_type3 |
|---|---|---|---|---|---|---|
| 5.1 | 3.5 | 1.4 | 0.2 | setosa | small | small |
| 4.9 | 3.0 | 1.4 | 0.2 | setosa | small | small |
| 4.7 | 3.2 | 1.3 | 0.2 | setosa | small | small |
| 4.6 | 3.1 | 1.5 | 0.2 | setosa | small | small |
| 5.0 | 3.6 | 1.4 | 0.2 | setosa | small | small |
| 5.4 | 3.9 | 1.7 | 0.4 | setosa | medium | medium |
for
A for loop is a repetition control structure that allows to efficiently write a loop that needs to execute a specific number of times.
for (value in vector) {
statements
}
# Considering Species variable of iris and using index of elements
for(i in 1:5)
{
print(iris$Species[i])
}## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica
## [1] setosa
## Levels: setosa versicolor virginica
for loops are particularly flexible that they are not limited to integers, or even numbers in the input. We can pass character vectors, logical vectors, lists or expressions.
# Considering Species variable of iris
for(species in iris$Species[c(47:53,108)])
{
print(species)
} ## [1] "setosa"
## [1] "setosa"
## [1] "setosa"
## [1] "setosa"
## [1] "versicolor"
## [1] "versicolor"
## [1] "versicolor"
## [1] "virginica"
repeat
The Repeat loop executes the same code again and again until a stop condition is met.
repeat {
commands
if(condition) {
break
}
}
v <- "Repeating statement for "
cutoff <- 1
repeat {
cat(v,cutoff," time \n")
cutoff <- cutoff+1
if(cutoff > 5) {
break
}
}## Repeating statement for 1 time
## Repeating statement for 2 time
## Repeating statement for 3 time
## Repeating statement for 4 time
## Repeating statement for 5 time
print("Came out of loop to next statement")## [1] "Came out of loop to next statement"
while
The While loop executes the code for all elements for which the condition is met.
while (test_expression) {
statement
}
x=20
while(x<50){
print("Re do the exam")
x= x+5
}## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
## [1] "Re do the exam"
print(x)## [1] 50
break
When the break statement is encountered inside a loop, the loop is immediately terminated and program control resumes at the next statement following the loop.
var=1
repeat {
statement
var <- var + 1
if(var > 5) {
break
}
}
we have seen example for break statement before in repeat statement
next
The next statement in R programming language is useful when we want to skip the current iteration of a loop without terminating it. On encountering next, the R parser skips further evaluation and starts next iteration of the loop.
for (value in vector) {
if (test_expression)
{
next
}
statement
}
v <- LETTERS[1:6]
for ( i in v) {
if (i == "D") {
next
}
print(i)
}## [1] "A"
## [1] "B"
## [1] "C"
## [1] "E"
## [1] "F"
We have used 'next' for element 'D' that's why it was skipped.
Final Note
In this notes we have learn how to brackets and also how to control the flow of the program using conditional loops and control statements. We have discussed
- Creating function and usage of {}
- Executing function using ()
- Accessing elements in Dataframe, matrix etc. using []
- if else if
- for
- while
- break
- next