Fundamentals of Functions in R
Learning Objectives
- Create custom
R
functions - Create conditional statements
- Use warning and error messages to debug our code
Basics of functions
# Creating a function called calculate_sum with 2 required arguments
calculate_sum <- function(number1, number2) {
return(number1 + number2)
}
# Defaulting to order of arguments in function
calculate_sum(5, 3)
## [1] 8
## [1] 8
## [1] 2002
## [1] NA
Best practices
Some key steps when creating an R
function:
Select a descriptive, but concise name for the function.
Specify descriptively named inputs / arguments to the function
Write your code in the body of the function enclosed in curly brackets
It is considered best practice to name functions as verbs and name arguments using nouns.
For naming functions and arguments, consistency is key. Using snake_case 🐍 (i.e. words separated by underscores) or camelCase 🐪 is helpful if a name has multiple words in it.
Here are some poorly named functions:
These functions are named more descriptively, which is helpful for your future self and potential users:
When possible, do not rewrite base R
functions
(i.e. functions or objects already named in R
), although
this is less of a consideration when you are still learning more about
R
.
# Don't do this
# T is a base R function
#T <- FALSE
# c is a base R function for combining values into a list or vector
#c <- 10
#mean <- function(x) {
#return(sum(x))
#}
# df is a base R function
#df <- data.frame(color = c("Red", "Blue", "Green"), number = 1:3)
# dt is as base R function
#dt <- data.frame(color = c("Red", "Blue", "Green"), number = 1:3)
Adding comments to a calculate_sum()
function to
communicate the purpose of different parts of the code.
# Creating a function called calculate_sum with 2 required arguments, 1 optional argument
# Function that adds 2 numbers and changes NAs to 0 if remove_na = TRUE
## number1: first numeric or NA value to be summed
## number2: second numeric or NA value to be summed
## remove_na: whether to change missing values to 0 (TRUE) or not (FALSE)
calculate_sum <- function(number1, number2, remove_na = TRUE) {
# Change NA values to 0 when remove_na = TRUE
if(remove_na) {
calcSum <- ifelse(is.na(number1), 0, number1) +
ifelse(is.na(number2), 0, number2)
} else {
calcSum <- number1 + number2
# Add number1 and number2
}
return(calcSum)
}
Function names depending on the purpose of the functions
# Function to check whether a string starts with a given prefix or not
has_prefix <- function(string, prefix) {
str_sub(string, 1, nchar(prefix)) == prefix
}
# Funtion to drop the last element of a vector or list
drop_last_element <- function(x) {
if (length(x) <= 1)
return(NULL)
return(x[-length(x)])
}
# Function to calculate the harmonic mean
calculate_harmonic_mean <- function(x) {
return(1 / (mean(1 / x)))
}
x <- c(1,2,3,45,6)
drop_last_element(x)
## [1] 1 2 3 45
## [1] 2.472527
The last of the three functions calculates a special type of mean of a vector of numbers which is equal to the reciprocal of the arithmetic mean of the reciprocals. This can be a useful measure of center when the typical rate of a variable is desired.
Creating a new function to calculate the geometric mean
of a vector of numbers which is equal to the nth root of the product of
n numbers. Hint: See ?prod
, and know that we can use the ^
(called the carrot symbol) to raise a number to a power in
R
.
# Function to calculate the geometric mean of numeric values: Method 1
geometric_mean <- function(values) {
return(prod(values)^(1 / length(values)))
}
geometric_mean(1:10)
## [1] 4.528729
# Function to calculate the geometric mean of numeric values: Method 2
geometric_mean <- function(values) {
return(exp(mean(log(values))))
}
geometric_mean(1:10)
## [1] 4.528729
Conditional execution, warnings, and errors
An if statement facilitates conditional execution of code:
check_for_k <- function(k) {
if (k == 2) {
# code executed when condition is TRUE
return(2)
} else {
# code executed when condition is FALSE
#warning("I only want a 2")
return("Not a 2!")
}
}
check_for_k(6)
## [1] "Not a 2!"
## [1] 2
Below we create a function, calculate_sqrt
that
calculates the square root of a non-negative number, but returns
NaN
if the number is negative along with displaying a
warning message.
# Function to calculate square root of a non-negative number
calculate_sqrt <- function(val) {
if (val >= 0) {
return(val^0.5)
} else {
warning("val is negative. It must be >= 0!")
return(NaN)
}
}
calculate_sqrt(-4)
## [1] NaN
## [1] 3
Warnings, which are created using the warning()
function, still proceed with execution of the code, but print a message
to the Console. Errors, which are created with the stop()
function, stop execution of the code and print a message to the
Console.
# Function to calculate square root of a non-negative number
calculate_sqrt <- function(val) {
if (val >= 0) {
return(val^0.5)
} else {
stop("val is negative but must be >= 0!")
return(NaN)
}
}
calculate_sqrt(4)
## [1] 2
We can also chain multiple if
statements together using
else
:
food_type <- function(food) {
if (food %in% c("apple", "orange", "banana")) {
return("fruit")
} else if(food %in% c("broccoli", "asparagus")) {
return("vegetable")
} else {
return("other")
}}
food_type("Reese's Puffs")
## [1] "other"
## [1] "fruit"
Let’s improve our food type functions
food_type <- function(food) {
if (food %in% stringr::fruit) {
return("fruit")
} else if(food %in% c("broccoli", "asparagus")) {
return("vegetable")
} else {
return("other")
}}
food_type("Reese's Puffs")
## [1] "other"
## [1] "fruit"
We can also allow the foods considered to be vegetables or fruits to be customisable as well.
food_type <- function(food = "Greek yorgut", fruits = stringr::fruit, vegetables = c("carrot", "broccoli", "spinach", "tomato", "onion", "lettuce", "cucumber", "bell pepper", "potato", "zucchini", "eggplant", "kale", "cauliflower", "green beans", "sweet potato")) {
if (food %in% fruits) {
return("fruit")
} else if(food %in% vegetables) {
return("vegetable")
} else {
return("other")
}
}
food_type("Reese's Puffs")
## [1] "other"
## [1] "fruit"
Extending the food_type()
function to detect the type
of food regardless of case.
food_type <- function(food = "Greek yorgut", fruits = stringr::fruit, vegetables = c("carrot", "broccoli", "spinach", "tomato", "onion", "lettuce", "cucumber", "bell pepper", "potato", "zucchini", "eggplant", "kale", "cauliflower", "green beans", "sweet potato")) {
food <- str_squish(str_to_lower(food))
if (food %in% fruits) {
return("fruit")
} else if(food %in% vegetables) {
return("vegetable")
} else {
return("other")
}
}
food_type("Reese's Puffs")
## [1] "other"
## [1] "vegetable"
## [1] "vegetable"
## [1] "fruit"
Creating a function called is_whole_number()
, which
checks if a given value is a whole number (e.g., 0, 1, 2, …) or
not.
# Function to check if a value is a whole number
is_whole_number <- function(x) {
is.numeric(x) && x %% 1 == 0 && x >= 0
}
is_whole_number('j')
## [1] FALSE
## [1] TRUE
## [1] FALSE
## [1] FALSE
## [1] TRUE
The ^ operator raises a number to a specified power in
R
. Let’s create a function that raises a number to a
specified power, and returns a warning and an NaN value if the number is
negative and the power is not a whole number.
# Create a function to raise a number to a specified power
raise_power <- function(number, exponent) {
if (number < 0 && !is_whole_number(exponent)) {
warning("Number is negative and exponent is not a whole number")
return(NaN)
} else {
return(number^exponent)
}
}
raise_power(-2,1/3)
## [1] NaN
The ifelse()
function
## [1] -3 -2 -1 0 1 2 3
## [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE
# Condition that checks if a number in the vector is < 0 or not
# and assigns the arguments respectively
ifelse((-3:3) < 0, "negative", "nonnegative")
## [1] "negative" "negative" "negative" "nonnegative" "nonnegative"
## [6] "nonnegative" "nonnegative"
The function calculate_sum()
, created below, has two
required arguments: number1
and number2
, and
one optional argument, remove_na
. This function uses the
ifelse()
function, which has 3 main arguments itself. Look
at the help file for ifelse()
to see descriptions of its
arguments and examples of its use.
# Creating a function called calculate_sum with 2 required arguments, 1 optional argument
calculate_sum <- function(number1, number2, remove_na = TRUE) {
if(remove_na) {
calcSum <- ifelse(is.na(number1), 0, number1) +
ifelse(is.na(number2), 0, number2)
} else {
calcSum <- number1 + number2
}
return(calcSum)
}
calculate_sum(13, 2006)
## [1] 2019
## [1] 2006
## [1] 13
## [1] NA
Logical operators
Here are some useful tools for logical expressions:
|| serves as an ‘or’ (not vectorized)
&& serves as an ‘and’ (not vectorized)
== is used to check equality (vectorized)
Many R
functions are vectorized, which means that
inputs can be single values or vectors. Vectorized functions will
operate on all elements of an input vector automatically. This
facilitates computationally faster code which can be more concise and
readable as well. Here are some examples of not-vectorized logical
operators:
## [1] TRUE
## [1] FALSE
Here are some examples of vectorized logical operators:
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] 1 2 3 4
## [1] FALSE FALSE TRUE FALSE
## [1] FALSE TRUE TRUE FALSE
## [1] TRUE TRUE TRUE TRUE
Creating a function that takes a numeric vector as input, and returns a single boolean value indicating whether or not the vector has any negative values.
# Function that checks whether a function has a negative value
found_negative <- function(num_vector) {
return(any(num_vector < 0))
}
negative_vector <- c(-1,-2,3,4)
negative_vector < 0
## [1] TRUE TRUE FALSE FALSE
## [1] TRUE
Does the code (1:4 < 4) & (1:4 > 1)
produce
the same result as (1:4 < 4) && (1:4 > 1)
?
Which code achieves what appears to be its intended purpose?
## [1] FALSE TRUE TRUE FALSE
## Error in (1:4 < 4) && (1:4 > 1): 'length = 4' in coercion to 'logical(1)'
For positive numbers, the Modulus is the remainder of the division
of two numbers. For example, 10 divided by 3 is 1, so 10 modulus 3 is 1.
The modulus operator in R
is implemented using
%%
as below:
## [1] 1
## [1] 0
## [1] 0
## [1] 2
Creating a function to check if a number is even or odd that returns a vector containing the values “even”, “odd”, or “not an integer” depending on its input.
# Function to check whether an integer is odd or even
check_even_odd <- function(integer) {
if(integer %% 2 == 0) {
return("even")
} else if(integer %% 2 == 1) {
return("odd")
} else {
return("not an integer")
}
}
# Testing out the function
check_even_odd(10)
## [1] "even"
## [1] "even"
## [1] "not an integer"
Checking vector of numbers variation of the above
check_even_odd
function
# Function to check whether an integer is odd or even
check_even_odd <- function(integer) {
return(ifelse(integer %% 2 == 0, "even",
ifelse(integer %% 2 == 1, "odd", "not an integer")))
}
# Testing out the function
# sampling random integers
set.seed(1989)
integers = sample(-1000:1000, size = 10)
integers
## [1] 385 224 601 518 -786 562 92 -221 302 -676
## [1] "odd" "even" "odd" "even" "even" "even" "even" "odd" "even" "even"
Creating a function called print_greeting()
that has a
single argument, date, with a default value of today(),
such that print_greeting()
prints the following greeting
when called:
# Function to print greeting
print_greeting <- function(date = today()) {
day_of_week <- wday(date, abbr = FALSE, label = TRUE)
message <- str_c("Hello! Today is ", day_of_week, ".")
return(message)
}
# Testing the function
print_greeting()
## [1] "Hello! Today is Sunday."
Extending the print_greeting()
function to include the
date as well like in the message below. Hint: use
scales::ordinal(lubridate::day(today()))
to display the
date with the ordinal (th or nd) included. Then further extend the
print_greeting()
function to have an optional argument,
user
, that customizes who the greeting is for and has a
default value of Sys.info()[["user"]]
# Function to print greeting
print_greeting <- function(date = today(), user = Sys.info()[["user"]]) {
day_of_week <- wday(date, abbr = FALSE, label = TRUE)
month_name <- month(date, label = TRUE, abbr = FALSE)
message <- str_c("Hello ", user, "! Today is ", day_of_week,
", ", month_name, " ",
scales::ordinal(lubridate::day(date)),
".")
return(message)
}
# Testing the function
print_greeting()
## [1] "Hello user! Today is Sunday, March 31st."
## [1] "Hello Florence Nightingale! Today is Sunday, March 31st."
Further extending the print_greeting()
function to have
an optional argument, user
, that customizes who the
greeting is for and has a default value of Sys.info()[[“user”]] and a
customized greeting depending on the time of the day.
# Function to print greeting
print_greeting <- function(date = today(), time = Sys.time(), user = Sys.info()[["user"]]) {
day_of_week <- wday(date, abbr = FALSE, label = TRUE)
month_name <- month(date, label = TRUE, abbr = FALSE)
time_hour <- hour(time)
if(time_hour < 12) {
salutation <- "Good morning"
} else if(time_hour < 17) {
salutation <- "Good afternoon"
} else if(time_hour < 21) {
salutation <- "Good evening"
} else {
salutation <- "Hello"
}
message <- str_c(salutation, " ", user, "! Today is ", day_of_week,
", ", month_name, " ",
scales::ordinal(lubridate::day(date)),
".")
return(message)
}
# Testing the function
print_greeting()
## [1] "Good morning user! Today is Sunday, March 31st."
## [1] "Good morning Florence Nightingale! Today is Sunday, March 31st."
# Testing the above function
# Time in morning
morning_time <- Sys.time()
hour(morning_time) <- 11
# Time in afternoon
afternoon_time <- Sys.time()
hour(afternoon_time) <- 16
# Time in evening
evening_time <- Sys.time()
hour(evening_time) <- 18
# Time at night
night_time <- Sys.time()
hour(night_time) <- 23
# Testing out function at different times of day
print_greeting(time = morning_time, user = "Harry Styles")
## [1] "Good morning Harry Styles! Today is Sunday, March 31st."
## [1] "Good morning Harry Styles! Today is Friday, March 22nd."
print_greeting(time = afternoon_time, user = "Marie Curie")
## [1] "Good afternoon Marie Curie! Today is Sunday, March 31st."
## [1] "Good afternoon Marie Curie! Today is Friday, March 22nd."
print_greeting(time = evening_time, user = "Michael Jordan")
## [1] "Good evening Michael Jordan! Today is Sunday, March 31st."
## [1] "Good evening Michael Jordan! Today is Friday, March 22nd."
print_greeting(time = night_time, user = "Rey")
## [1] "Hello Rey! Today is Sunday, March 31st."