Functions

A function is a block of organized, reusable code that performs a specific task.

Functions allow you to break down your program into smaller, modular components, making it easier to understand, maintain, and debug.

Functions help in promoting code reusability and modularity.

They make your code more readable and easier to maintain by dividing it into logical sections.

Function Syntax

A function is defined by using the function() keyword, followed by the function name and its parameters.

Parameters are the inputs that the function accepts, and they are enclosed within parentheses.

The body of the function consists of the code that defines its behavior, enclosed within curly braces {}

functions in R can return a value using the return keyword, although it’s not always necessary.

add_numbers <- function(x,y) {
  return(x + y)
}

#Calling the function
result <- add_numbers(5, 3)
print(result)

## [1] 8

When to Use Functions

If we only have one dataset to analyze/plot, writing scripts is easy and simple.

If we have twelve files to analyze/plot and may have more in the future, writing functions can become very complex. Writing functions allows us to repeat several operations with a single command.

Example: Fahrenheit to Celsius

Converts temperatures from Fahrenheit to Celsius:

fahrenheit_to_celsius <- function(temp_F){
  temp_C <- (temp_F - 32) * 5 / 9
  return(temp_C)
}

Function Name: fahrenheit_to_celsius Argument(s): temp_F Body: statements that are executed when the function runs

Call a Function

fahrenheit_to_celsius(32)

## [1] 0

one_tmp <- 212
fahrenheit_to_celsius(one_tmp)

## [1] 100

If you define your own functions using a vectorized operation/function, your newly defined function will also be vectorized.

Celsius into Kelvin

celsius_to_kelvin <- function(temp_C){
  temp_K <- temp_C + 273.15
  return(temp_K)
}

celsius_to_kelvin(0)

## [1] 273.15

Composing Functions: Fahrenheit to Kelven

We could use the formla. But we can also compose the two functions we have already created:

fahrenheit_to_kelvin <- function(temp_F){
  temp_C <- fahrenheit_to_celsius(temp_F)
  temp_K <- celsius_to_kelvin(temp_C)
  return(temp_K)
}

fahrenheit_to_kelvin(32.0)

## [1] 273.15

Composing Functions

The previous example shows how large software programs are built. * Basic Operation * Combine them in larger chunks * Real world functions are longer than out example, but shouldn’t be longer than a few dozen lines, or the next person who reads it won’t be able to understand the code

Naming Conventions

We named R objects as nouns

Here, the function names are usually verbs * convert_temperature() * get_colors()

Variable Scope

A variable that is visible only within the function body is said to be local to that function.

In fahrenheit_to_celsius() function temp_C is a local variable

Local variables disappear after a function call

Varaibles created outsie functions are global and are available within functions as well.

one_tmp <- 212

A global variable will not be changed if it was used in a function as an argument

References

Software carpentry:

http://swcarpentry.github.io/r-novice-inflammation/02-func-R/index.html

Testing, Error Handling

Testing

Once functions are defined, we need to start testing that those functions are working correctly.

Testing functions is a crucial aspect of software development to ensure that they produce the expected results under various conditions.

Proper testing helps identify bugs, errors, and dedge cases, ensuring the reliability and robustness of your code.

Example: Define the function to normalize data around a specified midpoint

Context: you provide some data and a midpoint; the resulting normalized data will be the original data adjusted so that its mean is centered around the specified midpoint.

We will test the following function:

normalize <- function(data, midpoint){
  new_data <- (data - mean(data)) + midpoint
  return(new_data)
}

Test Case 1: Normalize data around midpoint 0

data1 <- c(1,2,3,4,5)
midpoint1 <- 0
result1 <- normalize(data1, midpoint1)
print(result1)

## [1] -2 -1  0  1  2

We have a dataset data1 with values [1, 2, 3, 4, 5] and we want to normalize it around a midpoint of 0.

The normalized data would be such that the mean of the new data is 0. This means that the values will be adjusted accordingly to center the distribution around 0.

Test Case 2: Normalize data around midpoint 10

data2 <- c(10,20,30,40,50)
midpoint2 <- 10
result2 <- normalize(data2, midpoint2)
print(result2)

## [1] -10   0  10  20  30

We have a dataset data2 with values [10, 20, 30, 40, 50] and we want to normalize it around a midpoint of 10.

The normalized data would be such that the mean of the new data is 10. This involves adjusting the values to center the distribution around 10.

More Tips for Testing

Write multiple testing cases.

Write testing cases on rare cases.

Testing logical vector for numerical function input
Testing vectors of length one if input is usually longer than one

Write testing cases into a separate function.

Sometimes when testing if two numeric values are equal, a very small difference can be detected due to rounding at very low decimal places. all.equal() can be used in those cases

Error Handling

What if we have missing data (NA values) in the data argument we provide to normalize()?

data3 <- c(10,20,30,NA,50)
normalize(data3, 10)

## [1] NA NA NA NA NA

We may actually wish to not consider NA values in our normalize() function.

normalize <- function(data, midpoint) {  
    new_data <- (data - mean(data, na.rm = TRUE)) + midpoint
    return(new_data)
}

data3 <- c(10,20,30,NA,50)
normalize(data3, 10)

## [1] -7.5  2.5 12.5   NA 32.5

Input with the wrong class.

normalize(as.character(data3), 10)
Error in data - mean(data, na.rm = TRUE) :
non-numeric argument to binary operator

In addition, Warning message:
In mean.default(data, na.rm = TRUE) :
argument is not numeric or logical: returning NA

You may use stopifnot(), warning(), and stop() function to handle such cases.

Defining Defaults and Documentation

Define Defaults

So far, we have passed arguments to functions in three ways:

Directly: dim(dat)
By name: read.csv(file = “data/inflammation-01.csv”, header = FALSE)
Without naming them (order matters): dat <- read.csv(“data/inflammation-01.csv”, FALSE)

Let’s re-define our function:

normalize <- function(data, midpoint = 0) {
  new_data <- (data - mean(data)) + midpoint
  return(new_data)
}

The second argument is now written midpoint = 0

If we call normalize () with two arguments, it works as it did before.
If call normalize () with just one argument, then midpoint is automatically assigned the default value of 0

Call Function with Defaults

normalize(data1, 0)

## [1] -2 -1  0  1  2

normalize(data1)

## [1] -2 -1  0  1  2

Documentation

A common way to add documentation is to comment directly on your code.

Formal Documentation

Formal documentation for R functions is written in separate .Rd using a markup language similar to LaTeX.

You see the result of this documentation when you look at the help file for a given function, e.g. ?read.csv.

Control Statements

Control statements are powerful constructs in programming that allow you to control the flow of execution in your code

In R, there are several tyoes of control statements to:

make decisions
iterate over sequences
perform different actions based on conditions

Types of Control Statements

1. Conditional Statements:

if-else: Used to make decisions based on conditions
switch: Allows you to select one of several code blocks to execute

2. Looping Statements:

for loop: Executes a sequence of statements multiple times.
while loop: Executes a block of code as long as a specified condition is true.
repeat loop: Executes a block of code repeatedly until a specified condition is met.

3. Control Flow Statements:

break: Terminates the current loop or switch statement.
next: Skips the current iteration of a loop.
return: Exits a function and returns a value

For Loop

Using a for loop, we will be able to perform a certain operation or function over each element in a vector, list, etc.

for (n in 1:4){
  print(n^2)
}

## [1] 1
## [1] 4
## [1] 9
## [1] 16

Another example iterating through a vector using indices i.

my.vec <- c(1,3,34,22,16)
for (i in 1:length(my.vec)) {
  print(my.vec[i])
}

## [1] 1
## [1] 3
## [1] 34
## [1] 22
## [1] 16

However, if my.vec is an empty vector. 1:length(my.vec) is 1:0.

my.vec <- NULL
for (i in 1:length(my.vec)) {
  print(my.vec[i])
}

## NULL
## NULL

for (i in seq(my.vec)){
  print(my.vec[i])
}

Sequence returns a sequence of values.

my.vec <- c(1,3,34,22,16)
seq(my.vec)

## [1] 1 2 3 4 5

Using get() within for loop to iterate a set of objects.

get() takes a string as input argument and returns of object of that name.

lm() lm is used to fit linear models, including multivariate ones. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although aov may provide a more convenient interface for these).

u <- c(1,1,2,2,3,4)
dim(u) <- c(3,2)
#or
u <- matrix(c(1,1,2,2,3,4), nrow = 3, ncol = 2)

v <- c(8,15,12,10,20,2)
dim(v) <- c(3,2)
#or
v <- matrix(c(8,15,12,10,20,2), nrow = 3, ncol = 2)

for (m in c("u", "v")){
  z <- get(m) #x will become the name u and v
  print(lm(z[,2]~z[,1])) #Left side of ~: Represents the dependent variable, Right side of ~: Represents the independent variables
}

## 
## Call:
## lm(formula = z[, 2] ~ z[, 1])
## 
## Coefficients:
## (Intercept)       z[, 1]  
##         1.0          1.5  
## 
## 
## Call:
## lm(formula = z[, 2] ~ z[, 1])
## 
## Coefficients:
## (Intercept)       z[, 1]  
##      -3.838        1.243

The first iteration will be through matrix u, the second iteration will be through matrix z

If Else Statement

The if else statement allows you to made a decision between different options.

Be careful about indentation

The else if may not needed in some situations

my.score <- 83
if(my.score < 60){
  print("fail")
} else if (my.score > 100) {
  print("wrong input")
} else {
  print("pass")
}

## [1] "pass"

Some basic operations/functions commonly used:

x == y (test for x equals y)
x <= y
x >= y
x && y (both are TRUE)
x || y (either x or y is TRUE)
!x (negation, returns TRUE when x is FALSE
is.na(b)
is.numeric(b)

Check eligibility for a loan based on age and income

age <- 25
income <- 50000

if (age >= 18 & income >= 40000){
  print("Congratulations! You are eledible for a loan.")
}else{
  print("Sorry, you are not eligible for a loan at this time.")
}

## [1] "Congratulations! You are eledible for a loan."

Check conditions using if-else statement

x <- 10
y <- 20
z <- NA

if (x == 10 && !is.na(z)){
  print("Condition 1: x equals 10 and z is not NA")
} else if (y >= 20) {
  print("Condition 2: y is greater than or equal to 20")
} else {
  print("None of this conditions are satisfied")
}

## [1] "Condition 2: y is greater than or equal to 20"

Small Assignment on Functions with If Else

I have the following pt.types dataframe of Breast cancer patients that shows whether a patient is positive or negative for a specific receptor type that will decide the type of the patient.

I want a function “get.condition()” with if else condition,

if i call get.condition () function with a Pt.ID, and pt.types dataframe, it should return,

“the patient is triple-negative breast cancer type” if all the column values are FALSE,
“the patient is double negative breast cancer type” if two of the columns are FALSE,
“the patient is single negative breast cancer type” if one of the columns is TRUE and other two are FALSE

pt.types<- data.frame(Pt.ID = c("BCPt1", "BCPt2", "BCPt3", "BCPt4", "BCPt5"),
                      ER = c(TRUE, FALSE, TRUE, TRUE, FALSE),
                      PR = c(TRUE, FALSE, FALSE, TRUE, FALSE),
                      HER2 = c(FALSE, FALSE, TRUE, FALSE, TRUE))

get.condition <- function(ID, dataframe){
  patient <- subset(dataframe, Pt.ID == ID)
  neg_count <- sum(patient[,-1] == FALSE)
  
  if(neg_count == 3){
    return("the patient is triple negative")
  } else if (neg_count == 2){
    return("the patient is double negative")
  } else if (neg_count == 1){
    return("the patient is single negative")
  } else {
    return("Invalid ID")
  }
} 

get.condition("BCPt5", pt.types)

## [1] "the patient is double negative"

If Else & For Loop

Both statements can be used for a task.

scores <- c(67,65,88,54)
for (score in scores){
  print(score)
  if(score > 60){
    print("pass")
  } else {
    print("fail")
  }
}

## [1] 67
## [1] "pass"
## [1] 65
## [1] "pass"
## [1] 88
## [1] "pass"
## [1] 54
## [1] "fail"

Manupulate strings for better output:

for (score in scores){
   if(score > 60){
     output.string <- paste(score, "pass", "\n")
   } else {
     output.string <- paste(score, "fail", "\n")
   }
   cat(output.string)
}

## 67 pass 
## 65 pass 
## 88 pass 
## 54 fail

#or

for (score in scores){
   if(score > 60){
     cat(score, "pass", "\n")
   } else {
     cat(score, "fail", "\n")
   }
}

## 67 pass 
## 65 pass 
## 88 pass 
## 54 fail

Small Assignment on the pt.types Dataframe

I would like add another column “Pt.type” to Pt.types dataframe.

The values of “Pt.type” should be

if all the three columns are FALSE, the value should be “triple”,
if two of the values are FALSE, the values should be “double”
any of the two are TRUE and one is FALSE, the value should be “single”.

use if else and for loop for this

for (i in 1:nrow(pt.types)){
  neg_count <- sum(pt.types[i, 2:4] == FALSE)
  
  if (neg_count == 3){
    pt.types$Pt.type[i] <- "triple"
  } else if (neg_count == 2){
    pt.types$Pt.type[i] <- "double"
  } else if (neg_count == 1){
    pt.types$Pt.type[i] <- "single"
  }else{
    pt.types$Pt.type[i] <- "Invalid"
  }
}

Module_5_Notes

Kayla Foht

2025-06-16

Directory