Highlights:
setwd("C:/Users/nirma/Documents/GitHub/Practice")
Control structures in R allow us to control the flow of R program. Below is a list of some common basic constructs and a brief description of their jobs.
\(If\) combined with \(else\) allows programmers to test logical conditions and let R do something based on whether is true or false. The \(else\) part is optional, so, it is used if we want R to do something else given the defined condition is not met. There are three types of conditional in this function, they are: \(if,else, and else if\). \(If\) comes at the beginning and \(else\) always at the end (if we decide to use one). We can have as many \(else if\) statement as we need. There are a couple of different ways we can formulate the \(if-else statements\). For example:
# Example 1
a <- 21 # Variable /a/ has a value of 20
if(a < 20){ #If a is less than 20
b <- 0 #then b is 0
} else{ # otherwise
b <- print("👁️You Hit the Bulls Eye!") # good job
}
[1] "<U+0001F441><U+FE0F>You Hit the Bulls Eye!"
# Example 2
c <- 15
d <- if(c <= 16){
print("Can you go up?")
} else{
print("🤩 You are my man!")
}
[1] "Can you go up?"
If-else statements are not functions though. If you want to pass a new value to \(a\) or \(c\) that don’t work.
The \(if...else\) statement can be extended further to accommodate other conditionals by using else-if in between. Remember that the statement still starts with \(if\) and ends with \(else\). We can have one or more \(else if\) expressions in a single statement. \(if...else\) ladder takes the following general structure
if(test_condition1){
action_1
} else if(test_condition2){
action_2
} else if(test_condition3){
action_3
} else if(test_condition4){
action_4
} else {
action_5
}
Let’s look at an example. One else if expression*
x <- 4 # the value of x is 4
if (x > 4){ # if x is higher than 4
print("Too high dear!") # then, print this
} else if(x < 4){ #if x is less than 4,
print("You are a loser!") # then, print this
} else # otherwise,
print("x is a square number!") #print this
[1] "x is a square number!"
We assigned the value of 4 to x, thus, the outcome was “x is a square number”. Now, let’s check a statement having more than one \(else if\) conditionals.
x <- c("This","is","where","salmons","come","to","breed",".")
if("tuna" %in% x){
print("Oh, yeah!! We are lerning about tuna fish!")
} else if("Alaska" %in% x){
print("Alaska!! My god, it's cold in there!")
} else if("breed" %in% x){
print("Oh, well!! Elon Musk says human population is declining!")
} else if("Ukraine" %in% x){
print("I am writing this code when Ukraine and Russia truf war is all time high!")
} else
print("I am the God!!")
[1] "Oh, well!! Elon Musk says human population is declining!"
They are very useful in exploratory data analyses and even data visualization where we need to put conditionals and calculate series of values and or plots/charts based on the groups.
This the most common type of loops in R. It takes an initiator (loop index) point and assigns successive values through a loop. Let’s take an example:
# A function that loops 1 through 10 and prints 1 + (1-1) = 1 all the way to 10 + (10-1) = 19.
for (i in 1:10){
print(i + (i-1))
}
[1] 1
[1] 3
[1] 5
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
[1] 17
[1] 19
# A function that loops through letters 5 (that is: E) to 15 (i.e., O) and prints them
b <- LETTERS[5:15]
for(i in 1:11){
print(b[i])
}
[1] "E"
[1] "F"
[1] "G"
[1] "H"
[1] "I"
[1] "J"
[1] "K"
[1] "L"
[1] "M"
[1] "N"
[1] "O"
# A Function that puts things in provided sequence, in this case, sequence as noted in the vector 'C'.
C <- c(1,9,3,2,4,5,7,8,6,4,5,3,4,1,0,0,0,5,3)
for(i in seq_along(C)){
print(C[i])
}
[1] 1
[1] 9
[1] 3
[1] 2
[1] 4
[1] 5
[1] 7
[1] 8
[1] 6
[1] 4
[1] 5
[1] 3
[1] 4
[1] 1
[1] 0
[1] 0
[1] 0
[1] 5
[1] 3
# A function that calculates mean of a vector, and if, the vector has a missing value, it prints some information.
mean_func <- function(x){
my_mean <- mean(x)
if(any(is.na(x))){
warning("This variable has missing values!")
return("Fix the missing values")
} else{
return(my_mean)
}
}
mean_func(iris$Sepal.Width)
[1] 3.057333
mean_func(cars$dist)
[1] 42.98
mean_func(airquality$Ozone)
Warning in mean_func(airquality$Ozone): This variable has missing values!
[1] "Fix the missing values"
I know that cars and iris datasets don’t have any missing values, at least, in the noted variables. I, thus, got the mean of those variables. Airquality data set, though, have some missing values. My mean_func function discovered some missing values and populated the warning message and the results that I specified in the function.
It is convention to enclose \(foor-loop\) in a curly braces. However, sometimes we can write them without the braces. Please note, \(for-loop\) without curly braces are more error prone, and they cannot accommodate compound statements (e.g., requiring to meet more than one conditions) within them. Here’s an example of simple \(for-loop\) without a pair of curly braces.
sentence <- c("Salmon","returns","to","its","birthplace","to","breed", "and", "die", ".")
# for loop without curly braces
for(words in sentence)
print(words)
[1] "Salmon"
[1] "returns"
[1] "to"
[1] "its"
[1] "birthplace"
[1] "to"
[1] "breed"
[1] "and"
[1] "die"
[1] "."
Sometime we may want to embed a \(for-loop\) within a \(for-loop\). We have to be careful while nesting, because having more then 2 or 3 levels can be very hard to read and understand. Most of the times there are ways to avoid writing nested \(for-loop\), while sometimes it may be necessary. Example:
a <- matrix(1:15, 5,3) # A matrix of 5 rows and 3 column
# Creating a loop
for(i in seq_len(nrow(a))){ #for all elements in rows, put them in order
for(j in seq_len(ncol(a))){ #for all elements in columns, put them in order
print(a[i,j]) #print the output
}
}
[1] 1
[1] 6
[1] 11
[1] 2
[1] 7
[1] 12
[1] 3
[1] 8
[1] 13
[1] 4
[1] 9
[1] 14
[1] 5
[1] 10
[1] 15
While is another looping function in R. It first tests the defined condition and when the result is executed, it test the condition again until it meets the set limit. While loops, can potentially result in infinite loops if not written properly.
my_number <- 0 #Starts with the number of zero
while(my_number < 10){ #stop the loop when the value of my_number is less than 10
my_number <- my_number + 1 # and add 1 to the number
print(my_number)#Print the number
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
We can also test multiple conditions using the \(while-loop\). Like the one below:
b <- 8 #The assigned value of b
while(b >= 2 && b <= 10){ #setting the conditions and creating a while loop
print(b)#if the value is within the limit print the value
filp_a_coin <- rbinom(1, 1, 0.5) #then flip a fair coin once,
if(filp_a_coin == 1){# if the coin turns head(or 1)
b = b + 2 # add 2 to the value of b
}else{ # otherwise
b = b - 1 #subtract 1 from b
}
}
[1] 8
[1] 10
[1] 9
[1] 8
[1] 10
[1] 9
Here, the I was not sure when the \(while-loop\) was going to end. We can see that the value goes up and down based on the outcome of the coin flip.
min_value <- 1
max_value <- 500
repeat{ estimates <- computeEstimate()
if(abs(estimates - min_value)< max_value)
{ break
}else{ min_value <- estimates
}
}
I tried the above code but it didn’t converge. The complain was about not finding the function computeEstimate().
Next Loop It is used in any type of looping construct when we want to skip an iteration. I am going to create a \(for-loop\) that runs for 150 iterations, and pass a next statement to bypass certain cases:
for(i in 1:15){
if(i <= 11){ # I want to skip first 11 iterations
next # go to 12nd iteration
}
print(i-1)# and print i-1
}
[1] 11
[1] 12
[1] 13
[1] 14
Creating a function that calculates means of all the columns in a data table.
column_wise_mean <- function(m){
number_of_column <- ncol(m)#creating a vector that reads the columns in a data frame
calculated_means <- numeric(number_of_column)#creating an empty vector to store the calculated_means
for(i in 1:number_of_column){ #for 1 through the number of columns
calculated_means[i] <- mean(m[,i]) #calculates mean by columns in m
}
calculated_means #returns the calculated_mean
}
# Checking the function on the iris data set
column_wise_mean(airquality)
[1] NA NA 9.957516 77.882353 6.993464 15.803922
The results shows the mean of the last 4-columns and returns NA for the first two column. A further study of these columns show that there are multiple missing values in them.
Calculating columnwise Standard Deviation after getting rid of NAs
column_wise_sd <- function(n, removeNA = TRUE){
number_of_column <- ncol(n)
calculated_sd <- numeric(number_of_column)
for(i in 1:number_of_column){
calculated_sd[i] <- sd(n[,i], na.rm = removeNA)
}
calculated_sd
}
# Checking the function on the airquality data set
column_wise_sd(airquality)
[1] 32.987885 90.058422 3.523001 9.465270 1.416522 8.864520
Once omitted the Missing Values from the calculations, I was able to calculte standard deviation for all the columns in the airquality dataset.
The … argument indicates a variable number of arguments that are usually passed on to other functions. It is often used when extending another function and we don’t want to copy the entire argument list of the original function. For example:
plot_line <- function(calculated_mean, calculated_sd, type = "l",...){
plot(calculated_mean, calculated_sd, type = type, ...)
}
#Search Process in R if we accidentally create generic function named as the global function already exists in the stat package, e.g., mean
search()
[1] ".GlobalEnv" "package:stats" "package:graphics"
[4] "package:grDevices" "package:utils" "package:datasets"
[7] "package:methods" "Autoloads" "package:base"
The concept of scope in any programming language is a code chunk dedicated to a variable so that it can be called and refrenced when needed. There are two basic concepts of scoping in R, i.e., lexical (aka. statistical) and dynamic. R uses lexical scoping. For example if we use natural logarithm (ln) of a number R searches it for the environment where the function was defined and made available on the Global environment. Dynamic scoping on the other hand uses the most recent values assigned to a variable.
# Function within a function
raised_power <- function(x){ #takes the value of x
p_wer <- function(y){ #takes the value of y
y^x #raises y to the power of x
}
p_wer
}
# Creating other functions in relation to the raise_power
square_function <- raised_power(2)
cube_function <- raised_power(3)
quad_function <- raised_power(4)
# Testing my work the base of 2
square_function(2)
[1] 4
cube_function(2)
[1] 8
quad_function(2)
[1] 16
# Testing my work the base of 10
square_function(10)
[1] 100
cube_function(10)
[1] 1000
quad_function(10)
[1] 10000
Now, lets check how the above functions work:
# The Square Function
ls(environment(square_function))
[1] "p_wer" "x"
get("x", environment(square_function))
[1] 2
# The Cube Function
ls(environment(cube_function))
[1] "p_wer" "x"
get("x", environment(cube_function))
[1] 3
# The Quad Function
ls(environment(quad_function))
[1] "p_wer" "x"
get("x", environment(quad_function))
[1] 4
The readily available R functions are saved in the global environment, thus, they are easily available in user workspace. Lexical scoping allows R program to check the value of a function in a global environment, while the dynamic scoping allows R to look up the values in the environment from which the function is called. In my case, my current work space, thus, it is sometimes called the calling environment, aka the parent frame. The function I am going create follows the parent frame to calculate the value of the desired input. In the case below, the first value of b, i.e., is global environment while the second value is defined in the calling/local environment.
b <- 41 #the value of b is 41
func <- function(t){ # Creating a function named func which takes an argument t
b <- 21 # Then it assigns b the value 21
b^2 + g(t)#It, then squares b, and adds g of t
}
g <- function(t){# Defining the g function
t * b # it multiplies t with b
}
func(5)# 5 is the value of t
[1] 646
The value of func(5) is 646. How? Here’s how:
Likewise, lets hand calculate the value of func (1)
Let’s check if that’s the case
func(1)
[1] 482
Exactly.
As a data scientists, we all write regular functions that manipulate data or do some calculations. There is one combination of the scoping rules and function that is as useful, and it is called optimization
Date and Times are regarded as a separate kind of data in R. R has developed a special representation of dates and times:
- Dates are represented by the **Date** class
- Times are represented by the **POSIXct** or the **POSIXlt** class
- Dates are stored internally as the number of days since 1970-01-01
- Times are stored internally as the number of seconds since 1970-01-01
There are a number of functions that work on dates and times. For example:
Dates are represented by the Date class and can be coerced from a character string using the as.Date() function. Times can be coerced from a character string using the as.POSIXct or as.POSIXlt functions.
tme <- Sys.time()#Generates the system time
tme
[1] "2022-02-17 21:20:39 CST"
# Using as.POSIXlt
tme_1 <- as.POSIXlt(tme)
tme_1
[1] "2022-02-17 21:20:39 CST"
# Checking the names in tme1
names(unclass(tme_1))
[1] "sec" "min" "hour" "mday" "mon" "year" "wday" "yday"
[9] "isdst" "zone" "gmtoff"
# Accessing only the second values
tme_1$sec
[1] 39.03534
# Accessing only the values in hour
tme_1$hour
[1] 21
The object of PoSIXct function does not have these list functions. Thus,when we try to do the above things on the PoSIXct functions we get either error message or some non useful information. For example:
present_time <- Sys.time()
present_time
[1] "2022-02-17 21:20:39 CST"
#Using the unclass function
unclass(present_time)
[1] 1645154439
# Accessing only the second values – present_time$sec (gave the error message) # Accessing only the values in hour – present_time$hour (gave the error message)
this function are transform the dates in different formats, for example from character to integer format. Example:
string_date <- c("May 5, 2021 14:04", "August 01, 2020 11:51")
new_date <- strptime(string_date, "%B %d, %Y %H:%M")
new_date
[1] "2021-05-05 14:04:00 CDT" "2020-08-01 11:51:00 CDT"
class(new_date)
[1] "POSIXlt" "POSIXt"
Operations on Dates and Times: We can use mathematical operations, i.e., add or subtract, or do the comparison, i.e., ==, <= etc. on dates and times.
#Number of days between the two dates above
total_days <- new_date[1] - new_date[2]
total_days
Time difference of 277.0924 days
#Creating new dates
dat1 <- as.Date("2021-05-05")
dat1 <- as.POSIXlt(dat1)
dat2 <- strptime("26 July 2020 5:26:26", "%d %b %Y %H:%M:%S")
# Subtract
dat1 - dat2
Time difference of 282.565 days
It can be helpful for us because they help us keep track of leap year, day light saving, and even the time zones.
a <- as.Date("2020-09-16")
b <- as.Date("2021-05-02")
a-b
Time difference of -228 days
#Calculating Time Difference between These Two Dates
a <- as.POSIXct("2020-09-16 01:00:00")
b <- as.POSIXct("2021-05-02 06:00:00", tz = "GMT")
b-a
Time difference of 228 days