Directory


Functions

A function is a block of organized, reusable code that performs a specific task.

Functions allow you to break down your program into smaller, modular components, making it easier to understand, maintain, and debug.

Functions help in promoting code reusability and modularity.

They make your code more readable and easier to maintain by dividing it into logical sections.


Function Syntax

A function is defined by using the function() keyword, followed by the function name and its parameters.

Parameters are the inputs that the function accepts, and they are enclosed within parentheses.

The body of the function consists of the code that defines its behavior, enclosed within curly braces {}

functions in R can return a value using the return keyword, although it’s not always necessary.

add_numbers <- function(x,y) {
  return(x + y)
}

#Calling the function
result <- add_numbers(5, 3)
print(result)
## [1] 8

When to Use Functions

If we only have one dataset to analyze/plot, writing scripts is easy and simple.

If we have twelve files to analyze/plot and may have more in the future, writing functions can become very complex. Writing functions allows us to repeat several operations with a single command.


Example: Fahrenheit to Celsius

Converts temperatures from Fahrenheit to Celsius:

fahrenheit_to_celsius <- function(temp_F){
  temp_C <- (temp_F - 32) * 5 / 9
  return(temp_C)
}

Function Name: fahrenheit_to_celsius Argument(s): temp_F Body: statements that are executed when the function runs


Call a Function

fahrenheit_to_celsius(32)
## [1] 0
one_tmp <- 212
fahrenheit_to_celsius(one_tmp)
## [1] 100

If you define your own functions using a vectorized operation/function, your newly defined function will also be vectorized.


Celsius into Kelvin

celsius_to_kelvin <- function(temp_C){
  temp_K <- temp_C + 273.15
  return(temp_K)
}

celsius_to_kelvin(0)
## [1] 273.15

Composing Functions: Fahrenheit to Kelven

We could use the formla. But we can also compose the two functions we have already created:

fahrenheit_to_kelvin <- function(temp_F){
  temp_C <- fahrenheit_to_celsius(temp_F)
  temp_K <- celsius_to_kelvin(temp_C)
  return(temp_K)
}

fahrenheit_to_kelvin(32.0)
## [1] 273.15

Composing Functions

The previous example shows how large software programs are built. * Basic Operation * Combine them in larger chunks * Real world functions are longer than out example, but shouldn’t be longer than a few dozen lines, or the next person who reads it won’t be able to understand the code


Naming Conventions

We named R objects as nouns

Here, the function names are usually verbs * convert_temperature() * get_colors()


Variable Scope

A variable that is visible only within the function body is said to be local to that function.

Local variables disappear after a function call

Varaibles created outsie functions are global and are available within functions as well.

A global variable will not be changed if it was used in a function as an argument


References

Software carpentry:


Testing, Error Handling


Testing

Once functions are defined, we need to start testing that those functions are working correctly.

Testing functions is a crucial aspect of software development to ensure that they produce the expected results under various conditions.

Proper testing helps identify bugs, errors, and dedge cases, ensuring the reliability and robustness of your code.

Example: Define the function to normalize data around a specified midpoint

Context: you provide some data and a midpoint; the resulting normalized data will be the original data adjusted so that its mean is centered around the specified midpoint.

We will test the following function:

normalize <- function(data, midpoint){
  new_data <- (data - mean(data)) + midpoint
  return(new_data)
}

Test Case 1: Normalize data around midpoint 0

data1 <- c(1,2,3,4,5)
midpoint1 <- 0
result1 <- normalize(data1, midpoint1)
print(result1)
## [1] -2 -1  0  1  2

We have a dataset data1 with values [1, 2, 3, 4, 5] and we want to normalize it around a midpoint of 0.

The normalized data would be such that the mean of the new data is 0. This means that the values will be adjusted accordingly to center the distribution around 0.

Test Case 2: Normalize data around midpoint 10

data2 <- c(10,20,30,40,50)
midpoint2 <- 10
result2 <- normalize(data2, midpoint2)
print(result2)
## [1] -10   0  10  20  30

We have a dataset data2 with values [10, 20, 30, 40, 50] and we want to normalize it around a midpoint of 10.

The normalized data would be such that the mean of the new data is 10. This involves adjusting the values to center the distribution around 10.


More Tips for Testing

Write multiple testing cases.

Write testing cases on rare cases.

Write testing cases into a separate function.

Sometimes when testing if two numeric values are equal, a very small difference can be detected due to rounding at very low decimal places. all.equal() can be used in those cases


Error Handling

What if we have missing data (NA values) in the data argument we provide to normalize()?

data3 <- c(10,20,30,NA,50)
normalize(data3, 10)
## [1] NA NA NA NA NA

We may actually wish to not consider NA values in our normalize() function.

normalize <- function(data, midpoint) {  
    new_data <- (data - mean(data, na.rm = TRUE)) + midpoint
    return(new_data)
}

data3 <- c(10,20,30,NA,50)
normalize(data3, 10)
## [1] -7.5  2.5 12.5   NA 32.5

Input with the wrong class.

normalize(as.character(data3), 10)
Error in data - mean(data, na.rm = TRUE) :
non-numeric argument to binary operator

In addition, Warning message:
In mean.default(data, na.rm = TRUE) :
argument is not numeric or logical: returning NA

You may use stopifnot(), warning(), and stop() function to handle such cases.


Defining Defaults and Documentation


Define Defaults

So far, we have passed arguments to functions in three ways:

Directly: dim(dat)
By name: read.csv(file = “data/inflammation-01.csv”, header = FALSE)
Without naming them (order matters): dat <- read.csv(“data/inflammation-01.csv”, FALSE)

Let’s re-define our function:

normalize <- function(data, midpoint = 0) {
  new_data <- (data - mean(data)) + midpoint
  return(new_data)
}

The second argument is now written midpoint = 0


Call Function with Defaults

normalize(data1, 0)
## [1] -2 -1  0  1  2
normalize(data1)
## [1] -2 -1  0  1  2

Documentation

A common way to add documentation is to comment directly on your code.


Formal Documentation

Formal documentation for R functions is written in separate .Rd using a markup language similar to LaTeX.

You see the result of this documentation when you look at the help file for a given function, e.g. ?read.csv.


Control Statements

Control statements are powerful constructs in programming that allow you to control the flow of execution in your code

In R, there are several tyoes of control statements to:


Types of Control Statements

1. Conditional Statements:


2. Looping Statements:


3. Control Flow Statements:


For Loop

Using a for loop, we will be able to perform a certain operation or function over each element in a vector, list, etc.

for (n in 1:4){
  print(n^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16

Another example iterating through a vector using indices i.

my.vec <- c(1,3,34,22,16)
for (i in 1:length(my.vec)) {
  print(my.vec[i])
}
## [1] 1
## [1] 3
## [1] 34
## [1] 22
## [1] 16

However, if my.vec is an empty vector. 1:length(my.vec) is 1:0.

my.vec <- NULL
for (i in 1:length(my.vec)) {
  print(my.vec[i])
}
## NULL
## NULL
for (i in seq(my.vec)){
  print(my.vec[i])
}

Sequence returns a sequence of values.

my.vec <- c(1,3,34,22,16)
seq(my.vec)
## [1] 1 2 3 4 5

Using get() within for loop to iterate a set of objects.

get() takes a string as input argument and returns of object of that name.

lm() lm is used to fit linear models, including multivariate ones. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although aov may provide a more convenient interface for these).

u <- c(1,1,2,2,3,4)
dim(u) <- c(3,2)
#or
u <- matrix(c(1,1,2,2,3,4), nrow = 3, ncol = 2)

v <- c(8,15,12,10,20,2)
dim(v) <- c(3,2)
#or
v <- matrix(c(8,15,12,10,20,2), nrow = 3, ncol = 2)

for (m in c("u", "v")){
  z <- get(m) #x will become the name u and v
  print(lm(z[,2]~z[,1])) #Left side of ~: Represents the dependent variable, Right side of ~: Represents the independent variables
}
## 
## Call:
## lm(formula = z[, 2] ~ z[, 1])
## 
## Coefficients:
## (Intercept)       z[, 1]  
##         1.0          1.5  
## 
## 
## Call:
## lm(formula = z[, 2] ~ z[, 1])
## 
## Coefficients:
## (Intercept)       z[, 1]  
##      -3.838        1.243

The first iteration will be through matrix u, the second iteration will be through matrix z


If Else Statement

The if else statement allows you to made a decision between different options.

Be careful about indentation

The else if may not needed in some situations

my.score <- 83
if(my.score < 60){
  print("fail")
} else if (my.score > 100) {
  print("wrong input")
} else {
  print("pass")
}
## [1] "pass"

Some basic operations/functions commonly used:

Check eligibility for a loan based on age and income

age <- 25
income <- 50000

if (age >= 18 & income >= 40000){
  print("Congratulations! You are eledible for a loan.")
}else{
  print("Sorry, you are not eligible for a loan at this time.")
}
## [1] "Congratulations! You are eledible for a loan."

Check conditions using if-else statement

x <- 10
y <- 20
z <- NA

if (x == 10 && !is.na(z)){
  print("Condition 1: x equals 10 and z is not NA")
} else if (y >= 20) {
  print("Condition 2: y is greater than or equal to 20")
} else {
  print("None of this conditions are satisfied")
}
## [1] "Condition 2: y is greater than or equal to 20"

Small Assignment on Functions with If Else

I have the following pt.types dataframe of Breast cancer patients that shows whether a patient is positive or negative for a specific receptor type that will decide the type of the patient.

I want a function “get.condition()” with if else condition,

if i call get.condition () function with a Pt.ID, and pt.types dataframe, it should return,

pt.types<- data.frame(Pt.ID = c("BCPt1", "BCPt2", "BCPt3", "BCPt4", "BCPt5"),
                      ER = c(TRUE, FALSE, TRUE, TRUE, FALSE),
                      PR = c(TRUE, FALSE, FALSE, TRUE, FALSE),
                      HER2 = c(FALSE, FALSE, TRUE, FALSE, TRUE))

get.condition <- function(ID, dataframe){
  patient <- subset(dataframe, Pt.ID == ID)
  neg_count <- sum(patient[,-1] == FALSE)
  
  if(neg_count == 3){
    return("the patient is triple negative")
  } else if (neg_count == 2){
    return("the patient is double negative")
  } else if (neg_count == 1){
    return("the patient is single negative")
  } else {
    return("Invalid ID")
  }
} 

get.condition("BCPt5", pt.types)
## [1] "the patient is double negative"

If Else & For Loop

Both statements can be used for a task.

scores <- c(67,65,88,54)
for (score in scores){
  print(score)
  if(score > 60){
    print("pass")
  } else {
    print("fail")
  }
}
## [1] 67
## [1] "pass"
## [1] 65
## [1] "pass"
## [1] 88
## [1] "pass"
## [1] 54
## [1] "fail"

Manupulate strings for better output:

for (score in scores){
   if(score > 60){
     output.string <- paste(score, "pass", "\n")
   } else {
     output.string <- paste(score, "fail", "\n")
   }
   cat(output.string)
}
## 67 pass 
## 65 pass 
## 88 pass 
## 54 fail
#or

for (score in scores){
   if(score > 60){
     cat(score, "pass", "\n")
   } else {
     cat(score, "fail", "\n")
   }
}
## 67 pass 
## 65 pass 
## 88 pass 
## 54 fail

Small Assignment on the pt.types Dataframe

I would like add another column “Pt.type” to Pt.types dataframe.

The values of “Pt.type” should be

use if else and for loop for this

for (i in 1:nrow(pt.types)){
  neg_count <- sum(pt.types[i, 2:4] == FALSE)
  
  if (neg_count == 3){
    pt.types$Pt.type[i] <- "triple"
  } else if (neg_count == 2){
    pt.types$Pt.type[i] <- "double"
  } else if (neg_count == 1){
    pt.types$Pt.type[i] <- "single"
  }else{
    pt.types$Pt.type[i] <- "Invalid"
  }
}