How many times have you been in a situation, that you had to repeat a task several times to get the correct result by changing just a small input? Have you ever wished to have magical powers to do it without much efforts? Well, you don’t need magical powers to do it in R!
WHILE loop will keep repeating set of commands, till the condition is true.
i <- 0
cond <- TRUE
while(cond)
{
print(i)
i <- i+1
if(i>2)
{
cond <- FALSE
}
}
## [1] 0
## [1] 1
## [1] 2
(‘IF’ is not a loop)
For loop will also repeat set of commands. ‘For’ loop is generally considered when you are aware of the number of times the code needs to be executed, before the loop starts.
i = 0
for(i in 1:3)
{
print(i)
}
## [1] 1
## [1] 2
## [1] 3
Did you check that ‘i’ is not explicitly incremented in FOR loop?
Consider that you have to run a set of commands on a variable; and after that, same commands on another variable.
i = 0
for(i in 1:4)
{
print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
j = 0
for(j in 1:4)
{
print(j)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
So, we are still repeating the code.
“Functions” are to rescue us from this loop of repeating.
Functions are set of commands which will be written once and can be executed several times, without repeating the code. It can be reused in different R scripts as well. If you have used any commands in R, then those are the functions written by someone else to make your life easier. This document concentrates only on writing functions.
To create and use functions (Unless you are using any package specific commands inside the function), you do not need to install any additional package. It is part of the base package of R.
Let’s start with a simple function of calculating area of a circle.
Input
Radius r
Output
area
Formula
area of circle with radius r = pi * r * r
areaC <- function(r)
{
area <- pi*r*r
return(area)
}
You must have observed, that the function is stored in a variable. We can call this function by using variable name.
If you are wondering why I have not passed the value of ‘pi’, It is because ‘pi’ is a constant in R.
pi
## [1] 3.141593
Now that we have our function ready, let’s play with it.
areaC(2.22)
## [1] 15.48303
ans <- areaC(8438)
ans
## [1] 223680907
Did you check how the parameter is passed and how the answer is stored in a new variable?
Let’s make it little more difficult! What if I want to pass multiple parameters and want multiple output values?
Input
length l and width w
Output
area; perimeter
Formula
area = l * w
perimeter = 2 * (l + w)
PropR <- function(l,w)
{
areaR <- l*w
perimeter <- 2*(l+w)
return(c(areaR,perimeter))
}
PropR(2,4)
## [1] 8 12
RectangleProperties <- PropR(4.78,2.55)
RectangleProperties
## [1] 12.189 14.660
class(RectangleProperties)
## [1] "numeric"
Did you spot ‘c’ in the return function? So, we are still returning a single parameter, but we have combined multiple outputs into one!
If you have an eye for detail, you must have observed an error in PropR(2,4). In this function call, width is the first parameter and length is second. But, our function accepts length first and then width. It did not matter here, but it is important that you follow the sequence.
Let’s make it a little more complex. We will try factorial function and exception handling for incorrect parameters.
Factorial of a number n (represented as ‘n!’) is multiplication of all the whole numbers which are smaller or equal to n (Excluding 0).
5! = 1 * 2 * 3 * 4 * 5
Input
n
Output
n!
Any guesses about the factorial of zero? (Hint: answer is in the code below.)
fact <- function(n)
{
if(n>0)
{
if(n==1)
return(1)
else
return(n*fact(n-1))
}
else if(n==0)
return(1)
else
print("Please enter valid number for factorial")
}
fact(5)
## [1] 120
This function is a recursive function where it keeps calling itself. Functions can include loops or function calls inside them.
factorials <- c(fact(0))
for(i in 1:5)
factorials <- append(factorials, fact(i))
plot(c(0:5),factorials, col= 'blue', xlab = "Number n", ylab = "Factorial")
Now, lets take a look at a real dataset of data breaches-
# To run this code install and load packages
# install.packages("tidyverse")
library(tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#install.packages("rvest")
library(rvest)
## Loading required package: xml2
##
## Attaching package: 'rvest'
## The following object is masked from 'package:purrr':
##
## pluck
## The following object is masked from 'package:readr':
##
## guess_encoding
url <- "https://en.wikipedia.org/wiki/List_of_data_breaches"
BreachData <- url %>%
html() %>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table') %>%
html_table()
BreachData <- BreachData[[1]]
# This is an example of function. This gives very basic information about data frame. You can try expand this function to explore data frame further.
ExploreData <- function(data)
{
print("printing top 2 rows")
print(head(data,2))
print("Printing Summary")
print(summary(data))
}
ExploreData(BreachData)
## [1] "printing top 2 rows"
## Entity Year Records Organization type Method
## 1 21st Century Oncology 2016 2,200,000 healthcare hacked
## 2 Accendo Insurance Co. 2011 175,350 healthcare poor security
## Sources
## 1 [5][6]
## 2 [7][8]
## [1] "Printing Summary"
## Entity Year Records
## Length:287 Length:287 Length:287
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Organization type Method Sources
## Length:287 Length:287 Length:287
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
Now that you have good understanding of functions, explore how you can use these functions in other scripts. (HINT: source(), packages)