If we can do anything in R, then we already know how to use functions. This tutorial will cover how to create our own functions.
We already know that when we use a function, we need to know what the arguments are. For example, if we want to use the mean() function, we look at the documentation by typing ?mean().
The first function we write will be our own version of a funciton that calculates the mean of a numeric vector. Since we can’t just call it mean (or, if we do, we will replace that base function in our environment), we will call our function myMean.
Here is the basic structure of creating a function.
myMean <- function( ){
}
A function is created by using another function called… function(). The parentheses will contain the parameters we want our function to have, and the curly braces will contain the operation that will be performed on the parameters. First, we’ll need a vector of numbers, so we’ll call it the x parameter.
myMean <- function(x){
#do something to x
}
Now we’ll calculate the average of the numbers in x.
myMean <- function(x){
total_count_of_values <- length(x)
total_sum_of_values <- sum(x)
average_of_values <- total_sum_of_values/total_count_of_values
average_of_values
}
We’ve found how many numbers are in the x vector, we’ve added up all the values in that vector, and we’ve found the average. In a function, the last line will always be returned. You can also use the return() function, but it’s not necessary.
myMean <- function(x){
total_count_of_values <- length(x)
total_sum_of_values <- sum(x)
average_of_values <- total_sum_of_values/total_count_of_values
return(average_of_values)
}
Let’s try our function.
my_vector <- c(1, 3, 5, 2, 6, 9, 0)
vector_mean <- myMean(x = my_vector)
vector_mean
## [1] 3.714286
Now let’s check our answer against the mean function that is built into R
mean(my_vector)
## [1] 3.714286
Conditional statements are helpful if you want to write some code that will do one thing in some circumstances and something else the rest of the time. R has conditional expressions such as if() and if else() that perform these operations. These are very similar to the IF function used in Excel and they take the same type of arguments
if(logical expression){
some intstructions
}
library(region5air)
data(chicago_air)
ozone <- chicago_air$ozone
for (i in length(ozone)) {
if (ozone[i] >= .02) {
ozone[i] <- 0
}
}
head(ozone)
## [1] 0.032 0.020 0.021 0.028 0.025 0.026
if()else is used when the logical expression is true and another if it is false. if() else uses two sts of braces, one before the “else” and one after.
if(logical expression){
some instructions on what to do if the logical expression is true
} else {
some other instructions on waht to do if the logical expression is false
}
danger <- if(sum(table(chicago_air$ozone)) > 350){
max(chicago_air$ozone,na.rm=T) + 5
} else {
sum(chicago_air$temp,na.rm=T) - 10
}
danger
## [1] 14029
The ifelse() function is a simplified version of if() else and can be used when you are creating simple conditional statements
ozone <- chicago_air$ozone
oz.viol <- ifelse(chicago_air$ozone > 0.065, "Potential Health Effects","All Good")
table(oz.viol)
## oz.viol
## All Good Potential Health Effects
## 330 9
Like most programming languages, R has for and while loops. We’ll just review for loops and move on to apply() functions, which are more commonly used in R.
For loops are used to repeat an operation a set number of times. The basic outline is
for(i in sequence){
}
The sequence parameter is typically a vector. The i parameter is a variable that will take on the values in the sequence vector. For instance, if sequence was the vector c(1, 2, 3) then the i will take on each of those values in turn.
Here we use our myMean() function to find the average of three vectors.
myMean <- function(x){
total_count_of_values <- length(x)
total_sum_of_values <- sum(x)
average_of_values <- total_sum_of_values/total_count_of_values
return(average_of_values)
}
my_list <- list(c(1, 5, 9, 3), 1:10, c(23, 42))
my_averages <- c()
for(i in c(1, 2, 3)){
my_averages[i] <- myMean(my_list[[i]])
}
my_averages
## [1] 4.5 5.5 32.5
We use the i parameter as a variable to extract different members of my_list and add the average to different positions in the my_averages vector.
We could use the i parameter to specify a column in a data frame. Let’s use a for loop to calculate the means of a few columns in the chicago_air dataset from the region5air package.
head(chicago_air)
## date ozone temp solar month weekday
## 1 2013-01-01 0.032 17 0.65 1 3
## 2 2013-01-02 0.020 15 0.61 1 4
## 3 2013-01-03 0.021 28 0.17 1 5
## 4 2013-01-04 0.028 18 0.62 1 6
## 5 2013-01-05 0.025 26 0.48 1 7
## 6 2013-01-06 0.026 36 0.47 1 1
chicago_avgs <- c()
for(i in c("ozone", "temp", "solar")){
chicago_avgs[i] <- myMean(chicago_air[, i])
}
chicago_avgs
## ozone temp solar
## NA NA 0.8410411
It looks like there is a problem with myMean(). We didn’t account for NAs. A properly written function would need to take into account that sort of thing, but we can also deal with NAs in the for loop.
chicago_avgs <- c()
for(i in c("ozone", "temp", "solar")){
numeric_series <- chicago_air[, i]
numeric_series <- numeric_series[!is.na(numeric_series)]
chicago_avgs[i] <- myMean(numeric_series)
}
chicago_avgs
## ozone temp solar
## 0.03567257 54.83984375 0.84104110
In R, the most efficient way to do loops is to use the apply() functions. These are functions that have apply() at the end of their name (such as lapply(), tapply(), and mapply()) and apply functions to each member of a vector, list, or column in a data frame.
apply() takes a data frame (or matrix) as the first argument. The second argument specifies if you want to apply a function to the rows (1) or columns (2), and the third argument is the function you want to apply to each row or column. Additional arguments can be used to pass on to the function being applied to each row or column.air <- chicago_air[, c("ozone", "temp", "solar")]
air_max <- apply(air,
MARGIN = 2, # we are applying the max() function to each column
FUN = max,
na.rm = TRUE# na.rm is being passed to the max() function
)
air_max
## ozone temp solar
## 0.081 92.000 1.490
lapply() applies a function to each member of a list. Here we find the length of each vector in my_list.lapply(my_list, length)
## [[1]]
## [1] 4
##
## [[2]]
## [1] 10
##
## [[3]]
## [1] 2