So far we have nearly always used built-in functions to fulfill tasks. However, sometimes built-in functions are not enough to do some specific things that we hope to do. In the last lecture, there was such an example.
make_datetime_100 <- function(year, month, day, time, tz = "UTC") {
make_datetime(year, month, day, time %/% 100, time %% 100, 0, tz)
}
In this example, we need to handle a time in the format such as “319” (3:19am). So to separate it into hour and minute, then make a date-time object out of all the time information, we created the function above to do it.
Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting. We have learned how to write a Python function, and all basics are pretty much the same in R but the syntax in writing a function is slightly different.
function_Name <- function(<inputs separated by commas>){
# function body, indented by two spaces
}
For example,
add_one <- function(x) {
x+1
}
add_one(3)
## [1] 4
By default, the function will return the last value computed in the function.
add_one <- function(x) {
x+1
print(x)
}
add_one(3)
## [1] 3
The function above returns x
which is the value of the
line print(x)
. Note that the print
function in
R returns the value in the function, which is different from Python.
add_one <- function(x) {
print(x)
return(x+1)
print(x)
}
y <- add_one(3)
## [1] 3
y
## [1] 4
Similar to Python, we use return
function to specify the
value to be returned by a function. When a function reaches a
return
function, it will stop there and anything after
return
will not be executed.
my_func <- function(a=1,b=2,c=3,d=4,e=5){
c(a,b,c,d,e)
}
my_func()
## [1] 1 2 3 4 5
When taking multiple arguments, simply separated them by comma as inputs. If there is a default value for the argument, simply declare it following the argument name.
Calling a multiple argument function is similar to Python. We can either use their positions or keywords. The difference is that R is more flexible and doesn’t require keyword argument to follow positional argument.
my_func <- function(a=1,b=2,c=3,d=4,e=5){
c(a,b,c,d,e)
}
my_func(10, 20)
## [1] 10 20 3 4 5
my_func(10, 20, e = 50)
## [1] 10 20 3 4 50
my_func(, 10) # This works but not recommended
## [1] 1 10 3 4 5
my_func(c = 30, 10, 20)
## [1] 10 20 30 4 5
Usually, in data analysis, it is recommended to ignore the keyword for the first argument (the data set name) since they are used in every function. But for other arguments, it’s better to write out their names in function calling.
ggplot(mpg) + # For data set name, usually we don't use the argument name
geom_bar(mapping = aes(x = as.factor(cyl), fill = as.factor(year)), col = "blue", position = "dodge")
Other than the data set mpg
, it’s recommended that you
write out all the keyword names for each argument.
An important thing to know is how functions interact with the environment outside the function frame (in Python it’s called the global frame). For example,
f <- function(x) {
x + y
}
In many programming languages, this would be an error, because
y
is not defined inside the function. In R, this is valid
code because R uses rules called lexical scoping to find the value
associated with a name. Since y
is not defined inside the
function, R will look in the environment where the function was
defined:
y <- 100
f(10)
## [1] 110
y <- 1000
f(10)
## [1] 1010
This behaviour seems like a recipe for bugs, and indeed you should avoid creating functions like this deliberately, but by and large it doesn’t cause too many problems (especially if you regularly restart R to get to a clean slate).
The advantage of this behaviour is that from a language standpoint it
allows R to be very consistent. This power and flexibility is what makes
tools like ggplot2
and dplyr
possible.
In R, using if
to implement conditional execution looks
like this:
if (condition) {
# code executed when condition is TRUE
} else {
# code executed when condition is FALSE
}
As below is a simple example
my_func <- function(x) {
if (x %% 2 == 0) {
"even"
} else {
"odd"
}
}
my_func(4)
## [1] "even"
my_func(3)
## [1] "odd"
The condition must evaluate to either TRUE
or
FALSE
. If it’s a vector, you’ll get a warning message; if
it’s an NA
, you’ll get an error. Watch out for these
messages in your own code:
if (c(TRUE, FALSE)) {}
#> Error in if (c(TRUE, FALSE)) {: the condition has length > 1
if (NA) {}
#> Error in if (NA) {: missing value where TRUE/FALSE needed
&&
and
||
When you need logical operations, use &&
and
||
in your expression! Those are non-vectorised operators
which should be used in if
or other conditional execution.
You should never use vectorised operators &
and
|
which is used in filter()
.
my_func2 <- function(x) {
if (x %% 2 == 0 && x %% 3 == 0){
"x is divisible by six"
} else {
"x is not divisible by six"
}
}
my_func2(6)
## [1] "x is divisible by six"
my_func2(7)
## [1] "x is not divisible by six"
Write a function contain_abc(string)
that returns
TRUE
if the given string contains letter “a”, “b” or “c”;
FALSE
otherwise. Use grepl(letter, string)
to
detect whether a letter is in a string or not.
Submit your answer in a single pdf or html knitted from a R markdown file. Submit your R markdown file as well.