Functions


So far we have nearly always used built-in functions to fulfill tasks. However, sometimes built-in functions are not enough to do some specific things that we hope to do. In the last lecture, there was such an example.

make_datetime_100 <- function(year, month, day, time, tz = "UTC") {
  make_datetime(year, month, day, time %/% 100, time %% 100, 0, tz)
}

In this example, we need to handle a time in the format such as “319” (3:19am). So to separate it into hour and minute, then make a date-time object out of all the time information, we created the function above to do it.

R Functions


Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. We have learned how to write a Python function, and all basics are pretty much the same in R but the syntax in writing a function is slightly different.

function_Name <- function(<inputs separated by commas>){
  # function body, indented by two spaces
}

Example


For example,

add_one <- function(x) { 
  x+1 
}

add_one(3)
## [1] 4

By default, the function will return the last value computed in the function.

add_one <- function(x) { 
  x+1
  print(x)
}

add_one(3)
## [1] 3

The function above returns x which is the value of the line print(x). Note that the print function in R returns the value in the function, which is different from Python.

add_one <- function(x) { 
  print(x)
  return(x+1) 
  print(x)
}

y <- add_one(3)
## [1] 3
y
## [1] 4

Similar to Python, we use return function to specify the value to be returned by a function. When a function reaches a return function, it will stop there and anything after return will not be executed.

Multiple arguments


my_func <- function(a=1,b=2,c=3,d=4,e=5){
  c(a,b,c,d,e)
}

my_func()
## [1] 1 2 3 4 5

When taking multiple arguments, simply separated them by comma as inputs. If there is a default value for the argument, simply declare it following the argument name.

Calling a function


Calling a multiple argument function is similar to Python. We can either use their positions or keywords. The difference is that R is more flexible and doesn’t require keyword argument to follow positional argument.

my_func <- function(a=1,b=2,c=3,d=4,e=5){
  c(a,b,c,d,e)
}

my_func(10, 20)
## [1] 10 20  3  4  5
my_func(10, 20, e = 50)
## [1] 10 20  3  4 50
my_func(, 10)   # This works but not recommended
## [1]  1 10  3  4  5
my_func(c = 30, 10, 20)
## [1] 10 20 30  4  5

Usually, in data analysis, it is recommended to ignore the keyword for the first argument (the data set name) since they are used in every function. But for other arguments, it’s better to write out their names in function calling.

ggplot(mpg) +   # For data set name, usually we don't use the argument name
  geom_bar(mapping = aes(x = as.factor(cyl), fill = as.factor(year)), col = "blue", position = "dodge")

Other than the data set mpg, it’s recommended that you write out all the keyword names for each argument.

Interaction with environment


An important thing to know is how functions interact with the environment outside the function frame (in Python it’s called the global frame). For example,

f <- function(x) {
  x + y
}

In many programming languages, this would be an error, because y is not defined inside the function. In R, this is valid code because R uses rules called lexical scoping to find the value associated with a name. Since y is not defined inside the function, R will look in the environment where the function was defined:

y <- 100
f(10)
## [1] 110
y <- 1000
f(10)
## [1] 1010

This behaviour seems like a recipe for bugs, and indeed you should avoid creating functions like this deliberately, but by and large it doesn’t cause too many problems (especially if you regularly restart R to get to a clean slate).

The advantage of this behaviour is that from a language standpoint it allows R to be very consistent. This power and flexibility is what makes tools like ggplot2 and dplyr possible.

Conditional Execution


In R, using if to implement conditional execution looks like this:

if (condition) {
  # code executed when condition is TRUE
} else {
  # code executed when condition is FALSE
}

A simple example


As below is a simple example

my_func <- function(x) {
  if (x %% 2 == 0) {
    "even"
  } else {
    "odd"
  }
}

my_func(4)
## [1] "even"
my_func(3)
## [1] "odd"

Conditions


The condition must evaluate to either TRUE or FALSE. If it’s a vector, you’ll get a warning message; if it’s an NA, you’ll get an error. Watch out for these messages in your own code:

if (c(TRUE, FALSE)) {}
#> Error in if (c(TRUE, FALSE)) {: the condition has length > 1

if (NA) {}
#> Error in if (NA) {: missing value where TRUE/FALSE needed

Logical operators in a function && and ||

When you need logical operations, use && and || in your expression! Those are non-vectorised operators which should be used in if or other conditional execution. You should never use vectorised operators & and | which is used in filter().

my_func2 <- function(x) {
  if (x %% 2 == 0 && x %% 3 == 0){
    "x is divisible by six"
  } else {
    "x is not divisible by six"
  }
}

my_func2(6)
## [1] "x is divisible by six"
my_func2(7)
## [1] "x is not divisible by six"

Lab Homework:


Write a function contain_abc(string) that returns TRUE if the given string contains letter “a”, “b” or “c”; FALSE otherwise. Use grepl(letter, string) to detect whether a letter is in a string or not.

Submit your answer in a single pdf or html knitted from a R markdown file. Submit your R markdown file as well.