Fundamentals of Functions in R

Learning Objectives

  • Create custom R functions
  • Create conditional statements
  • Use warning and error messages to debug our code

Load necessary packages

library(tidyverse)
library(stringr)
library(lubridate)

Basics of functions

# Creating a function called calculate_sum with 2 required arguments
calculate_sum <- function(number1, number2) {
  return(number1 + number2)
}

# Defaulting to order of arguments in function
calculate_sum(5, 3)
## [1] 8
# Manually specifying arguments
calculate_sum(number1 = 5, number2 = 3)
## [1] 8
# Calculate sum of 13 and 1989
calculate_sum(13, 1989)
## [1] 2002
# Calculate sum of NA and 1989. The output is NA.
calculate_sum(NA, 1989)
## [1] NA

Best practices

Some key steps when creating an R function:

  • Select a descriptive, but concise name for the function.

  • Specify descriptively named inputs / arguments to the function

  • Write your code in the body of the function enclosed in curly brackets

It is considered best practice to name functions as verbs and name arguments using nouns.

For naming functions and arguments, consistency is key. Using snake_case 🐍 (i.e. words separated by underscores) or camelCase 🐪 is helpful if a name has multiple words in it.

Here are some poorly named functions:

# Poor naming of functions: not consistent in use 
# of camelCase vs. snake_case, connected them with 
# suffixes rather than prefixes

# Creating a function called fun1
fun1 <- function(x, y) {
  return(x + y)
}

# Creating a function called fun_2
fun_2 <- function(x, y) {
  return(x - y)
}

These functions are named more descriptively, which is helpful for your future self and potential users:

# Good naming of functions

# Creating a function called compute_sum
compute_sum <- function(number1, number2) {
  return(number1 + number2)
}

# Creating a function called compute_diff
compute_diff <- function(number1, number2) {
  return(number1 - number2)
}

When possible, do not rewrite base R functions (i.e. functions or objects already named in R), although this is less of a consideration when you are still learning more about R.

# Don't do this
# T is a base R function
#T <- FALSE

# c is a base R function for combining values into a list or vector
#c <- 10

#mean <- function(x) {
  #return(sum(x))
#}
# df is a base R function
#df <- data.frame(color = c("Red", "Blue", "Green"), number = 1:3)

# dt is as base R function
#dt <- data.frame(color = c("Red", "Blue", "Green"), number = 1:3)

Adding comments to a calculate_sum() function to communicate the purpose of different parts of the code.

# Creating a function called calculate_sum with 2 required arguments, 1 optional argument
# Function that adds 2 numbers and changes NAs to 0 if remove_na = TRUE
## number1: first numeric or NA value to be summed
## number2: second numeric or NA value to be summed
## remove_na: whether to change missing values to 0 (TRUE) or not (FALSE)

calculate_sum <- function(number1, number2, remove_na = TRUE) {
  # Change NA values to 0 when remove_na = TRUE
  if(remove_na) {
    calcSum <- ifelse(is.na(number1), 0, number1) + 
         ifelse(is.na(number2), 0, number2)
  } else {
    calcSum <- number1 + number2
    # Add number1 and number2
  }
  return(calcSum)
}

Function names depending on the purpose of the functions

# Function to check whether a string starts with a given prefix or not
has_prefix <- function(string, prefix) {
  str_sub(string, 1, nchar(prefix)) == prefix
}

# Funtion to drop the last element of a vector or list
drop_last_element <- function(x) {
  if (length(x) <= 1)
    return(NULL)
  return(x[-length(x)]) 
}

# Function to calculate the harmonic mean
calculate_harmonic_mean <- function(x) {
  return(1 / (mean(1 / x)))
}
x <- c(1,2,3,45,6)
drop_last_element(x)
## [1]  1  2  3 45
calculate_harmonic_mean(x)
## [1] 2.472527

The last of the three functions calculates a special type of mean of a vector of numbers which is equal to the reciprocal of the arithmetic mean of the reciprocals. This can be a useful measure of center when the typical rate of a variable is desired.

Creating a new function to calculate the geometric mean of a vector of numbers which is equal to the nth root of the product of n numbers. Hint: See ?prod, and know that we can use the ^ (called the carrot symbol) to raise a number to a power in R.

# Function to calculate the geometric mean of numeric values: Method 1
geometric_mean <- function(values) {
  return(prod(values)^(1 / length(values)))
}

geometric_mean(1:10)
## [1] 4.528729
# Function to calculate the geometric mean of numeric values: Method 2
geometric_mean <- function(values) {
  return(exp(mean(log(values))))
}

geometric_mean(1:10)
## [1] 4.528729

Conditional execution, warnings, and errors

An if statement facilitates conditional execution of code:

check_for_k <- function(k) {
  if (k == 2) {
  # code executed when condition is TRUE
  return(2)
} else {
  # code executed when condition is FALSE
  #warning("I only want a 2")
  return("Not a 2!")
}

}

check_for_k(6)
## [1] "Not a 2!"
check_for_k(2)
## [1] 2

Below we create a function, calculate_sqrt that calculates the square root of a non-negative number, but returns NaN if the number is negative along with displaying a warning message.

# Function to calculate square root of a non-negative number
calculate_sqrt <- function(val) {
if (val >= 0) {
  return(val^0.5)
} else {
  warning("val is negative. It must be >= 0!")
  return(NaN)
}
}

calculate_sqrt(-4)
## [1] NaN
## [1] 2
calculate_sqrt(9)
## [1] 3
## [1] NaN

Warnings, which are created using the warning() function, still proceed with execution of the code, but print a message to the Console. Errors, which are created with the stop() function, stop execution of the code and print a message to the Console.

# Function to calculate square root of a non-negative number
calculate_sqrt <- function(val) {
if (val >= 0) {
  return(val^0.5)
} else {
  stop("val is negative but must be >= 0!")
  return(NaN)
}
}

calculate_sqrt(4)
## [1] 2
## [1] 2
#calculate_sqrt(-9)
## [1] NaN

We can also chain multiple if statements together using else:

food_type <- function(food) {
if (food %in% c("apple", "orange", "banana")) {
  return("fruit")
} else if(food %in% c("broccoli", "asparagus")) {
  return("vegetable")
} else {
  return("other")
}}

food_type("Reese's Puffs")
## [1] "other"
## [1] "other"
food_type("apple")
## [1] "fruit"
## [1] "fruit"

Let’s improve our food type functions

food_type <- function(food) {
if (food %in% stringr::fruit) {
  return("fruit")
} else if(food %in% c("broccoli", "asparagus")) {
  return("vegetable")
} else {
  return("other")
}}

food_type("Reese's Puffs")
## [1] "other"
## [1] "other"
food_type("apple")
## [1] "fruit"
## [1] "fruit"

We can also allow the foods considered to be vegetables or fruits to be customisable as well.

food_type <- function(food = "Greek yorgut", fruits = stringr::fruit, vegetables = c("carrot", "broccoli", "spinach", "tomato", "onion", "lettuce", "cucumber", "bell pepper", "potato", "zucchini", "eggplant", "kale", "cauliflower", "green beans", "sweet potato")) {
if (food %in% fruits) {
  return("fruit")
} else if(food %in% vegetables) {
  return("vegetable")
} else {
  return("other")
}
}

food_type("Reese's Puffs")
## [1] "other"
## [1] "other"
food_type("apple")
## [1] "fruit"

Extending the food_type() function to detect the type of food regardless of case.

food_type <- function(food = "Greek yorgut", fruits = stringr::fruit, vegetables = c("carrot", "broccoli", "spinach", "tomato", "onion", "lettuce", "cucumber", "bell pepper", "potato", "zucchini", "eggplant", "kale", "cauliflower", "green beans", "sweet potato")) {
  food <- str_squish(str_to_lower(food))
  if (food %in% fruits) {
  return("fruit")
} else if(food %in% vegetables) {
  return("vegetable")
} else {
  return("other")
}
}

food_type("Reese's Puffs")
## [1] "other"
food_type("tomato")
## [1] "vegetable"
food_type("Green beans")
## [1] "vegetable"
food_type("apple")
## [1] "fruit"

Creating a function called is_whole_number(), which checks if a given value is a whole number (e.g., 0, 1, 2, …) or not.

# Function to check if a value is a whole number
is_whole_number <- function(x) {
  is.numeric(x) && x %% 1 == 0 && x >= 0
}
 
is_whole_number('j')
## [1] FALSE
is_whole_number(0)
## [1] TRUE
is_whole_number(-1)
## [1] FALSE
is_whole_number(pi)
## [1] FALSE
is_whole_number(1000000000)
## [1] TRUE

The ^ operator raises a number to a specified power in R. Let’s create a function that raises a number to a specified power, and returns a warning and an NaN value if the number is negative and the power is not a whole number.

# Create a function to raise a number to a specified power
raise_power <- function(number, exponent) {
  if (number < 0 && !is_whole_number(exponent)) {
    warning("Number is negative and exponent is not a whole number")
    return(NaN)
  } else {
    return(number^exponent)
  }
}
raise_power(-2,1/3)
## [1] NaN

Using the raise_power() function to calculate 4^(2), (-9)^(2), 3^(1/3), and (-3)^(1/3).

raise_power(4, 2)
## [1] 16
raise_power(number = -9, 2)
## [1] 81
raise_power(3, exponent = 1/3)
## [1] 1.44225
raise_power(-3, 1/3)
## [1] NaN

The ifelse() function

# A vector of numbers starting -3 to 3
(-3:3)
## [1] -3 -2 -1  0  1  2  3
(-3:3) < 0
## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
# Condition that checks if a number in the vector is < 0 or not
# and assigns the arguments respectively
ifelse((-3:3) < 0, "negative", "nonnegative")
## [1] "negative"    "negative"    "negative"    "nonnegative" "nonnegative"
## [6] "nonnegative" "nonnegative"

The function calculate_sum(), created below, has two required arguments: number1 and number2, and one optional argument, remove_na. This function uses the ifelse() function, which has 3 main arguments itself. Look at the help file for ifelse() to see descriptions of its arguments and examples of its use.

# Creating a function called calculate_sum with 2 required arguments, 1 optional argument
calculate_sum <- function(number1, number2, remove_na = TRUE) {
  if(remove_na) {
    calcSum <- ifelse(is.na(number1), 0, number1) + 
         ifelse(is.na(number2), 0, number2)
  } else {
    calcSum <- number1 + number2
  }
  return(calcSum)
}

calculate_sum(13, 2006)
## [1] 2019
calculate_sum(NA, 2006)
## [1] 2006
calculate_sum(13, NA)
## [1] 13
calculate_sum(13, NA, remove_na = FALSE)
## [1] NA

Logical operators

Here are some useful tools for logical expressions:

  • || serves as an ‘or’ (not vectorized)

  • && serves as an ‘and’ (not vectorized)

  • == is used to check equality (vectorized)

Many R functions are vectorized, which means that inputs can be single values or vectors. Vectorized functions will operate on all elements of an input vector automatically. This facilitates computationally faster code which can be more concise and readable as well. Here are some examples of not-vectorized logical operators:

# The 'or' operator
(3 < 5) || (-1 > 0)
## [1] TRUE
# The 'and' operator
(3 < 5) && (-1 > 0)
## [1] FALSE

Here are some examples of vectorized logical operators:

# The 'equals' operator
3 == 5
## [1] FALSE
(4/2) == 2
## [1] TRUE
3 != 5
## [1] TRUE
# Vectorized examples
1:4
## [1] 1 2 3 4
1:4 == 3
## [1] FALSE FALSE  TRUE FALSE
(1:4 < 4) & (1:4 > 1)
## [1] FALSE  TRUE  TRUE FALSE
(1:4 < 4) | (1:4 > 1)
## [1] TRUE TRUE TRUE TRUE

Creating a function that takes a numeric vector as input, and returns a single boolean value indicating whether or not the vector has any negative values.

# Function that checks whether a function has a negative value
found_negative <- function(num_vector) {
  return(any(num_vector < 0))
}

negative_vector <- c(-1,-2,3,4)
negative_vector < 0
## [1]  TRUE  TRUE FALSE FALSE
found_negative(negative_vector)
## [1] TRUE

Does the code (1:4 < 4) & (1:4 > 1) produce the same result as (1:4 < 4) && (1:4 > 1)? Which code achieves what appears to be its intended purpose?

# code 1
(1:4 < 4) & (1:4 > 1)
## [1] FALSE  TRUE  TRUE FALSE
# code 2
(1:4 < 4) && (1:4 > 1)
## Error in (1:4 < 4) && (1:4 > 1): 'length = 4' in coercion to 'logical(1)'

For positive numbers, the Modulus is the remainder of the division of two numbers. For example, 10 divided by 3 is 1, so 10 modulus 3 is 1. The modulus operator in R is implemented using %% as below:

# 10 mod 3
10 %% 3
## [1] 1
## [1] 1
# 10 mod 2
10 %% 2
## [1] 0
## [1] 0
# 10 mod 5
10 %% 5
## [1] 0
## [1] 0
# 12 mod 5
12 %% 5
## [1] 2
## [1] 2

Creating a function to check if a number is even or odd that returns a vector containing the values “even”, “odd”, or “not an integer” depending on its input.

# Function to check whether an integer is odd or even
check_even_odd <- function(integer) {
  if(integer %% 2 == 0) {
    return("even")
  } else if(integer %% 2 == 1) {
    return("odd")
  } else {
    return("not an integer")
  }
}

# Testing out the function
check_even_odd(10)
## [1] "even"
check_even_odd(-10)
## [1] "even"
check_even_odd(pi)
## [1] "not an integer"

Checking vector of numbers variation of the above check_even_odd function

# Function to check whether an integer is odd or even
check_even_odd <- function(integer) {
  return(ifelse(integer %% 2 == 0, "even",
         ifelse(integer %% 2 == 1, "odd", "not an integer")))
}

# Testing out the function
# sampling random integers
set.seed(1989)
integers = sample(-1000:1000, size = 10)
integers
##  [1]  385  224  601  518 -786  562   92 -221  302 -676
# checking on vector of numbers
check_even_odd(integers)
##  [1] "odd"  "even" "odd"  "even" "even" "even" "even" "odd"  "even" "even"

Creating a function called print_greeting() that has a single argument, date, with a default value of today(), such that print_greeting() prints the following greeting when called:

# Function to print greeting
print_greeting <- function(date = today()) {
  day_of_week <- wday(date, abbr = FALSE, label = TRUE)
 message <- str_c("Hello! Today is ", day_of_week, ".") 
 return(message)
}
# Testing the function
print_greeting()
## [1] "Hello! Today is Sunday."

Extending the print_greeting() function to include the date as well like in the message below. Hint: use scales::ordinal(lubridate::day(today())) to display the date with the ordinal (th or nd) included. Then further extend the print_greeting() function to have an optional argument, user, that customizes who the greeting is for and has a default value of Sys.info()[["user"]]

# Function to print greeting
print_greeting <- function(date = today(), user = Sys.info()[["user"]]) {
  day_of_week <- wday(date, abbr = FALSE, label = TRUE)
  month_name <- month(date, label = TRUE, abbr = FALSE)
 message <- str_c("Hello ", user, "! Today is ", day_of_week, 
                  ", ", month_name, " ",
                  scales::ordinal(lubridate::day(date)),
                  ".") 
 return(message)
}
# Testing the function
print_greeting()
## [1] "Hello user! Today is Sunday, March 31st."
print_greeting(user = "Florence Nightingale")
## [1] "Hello Florence Nightingale! Today is Sunday, March 31st."

Further extending the print_greeting() function to have an optional argument, user, that customizes who the greeting is for and has a default value of Sys.info()[[“user”]] and a customized greeting depending on the time of the day.

# Function to print greeting
print_greeting <- function(date = today(), time = Sys.time(), user = Sys.info()[["user"]]) {
  day_of_week <- wday(date, abbr = FALSE, label = TRUE)
  month_name <- month(date, label = TRUE, abbr = FALSE)
  time_hour <- hour(time)
  
  
  if(time_hour < 12) {
    salutation <- "Good morning"
  } else if(time_hour < 17) {
    salutation <- "Good afternoon"
  } else if(time_hour < 21) {
    salutation <- "Good evening"
  } else {
    salutation <- "Hello"
  }
 message <- str_c(salutation, " ", user, "! Today is ", day_of_week, 
                  ", ", month_name, " ",
                  scales::ordinal(lubridate::day(date)),
                  ".") 
 return(message)
}
# Testing the function
print_greeting()
## [1] "Good morning user! Today is Sunday, March 31st."
print_greeting(user = "Florence Nightingale")
## [1] "Good morning Florence Nightingale! Today is Sunday, March 31st."
# Testing the above function
# Time in morning
morning_time <- Sys.time()
hour(morning_time) <- 11

# Time in afternoon
afternoon_time <- Sys.time()
hour(afternoon_time) <- 16

# Time in evening
evening_time <- Sys.time()
hour(evening_time) <- 18

# Time at night
night_time <- Sys.time()
hour(night_time) <- 23

# Testing out function at different times of day
print_greeting(time = morning_time, user = "Harry Styles")
## [1] "Good morning Harry Styles! Today is Sunday, March 31st."
## [1] "Good morning Harry Styles! Today is Friday, March 22nd."
print_greeting(time = afternoon_time, user = "Marie Curie")
## [1] "Good afternoon Marie Curie! Today is Sunday, March 31st."
## [1] "Good afternoon Marie Curie! Today is Friday, March 22nd."
print_greeting(time = evening_time, user = "Michael Jordan")
## [1] "Good evening Michael Jordan! Today is Sunday, March 31st."
## [1] "Good evening Michael Jordan! Today is Friday, March 22nd."
print_greeting(time = night_time, user = "Rey")
## [1] "Hello Rey! Today is Sunday, March 31st."
## [1] "Hello Rey! Today is Friday, March 22nd."