Writing Functions

a brief introduction

Andrew Ellis

2016-12-08

When should you write a function?

Have you ever done anything like this (the example is from Hadley Wickham’s book Advanced R Programming):

# Generate a sample dataset
create_dataframe <- function(na_value = -99) {
    set.seed(1014)
    df <- data.frame(replicate(6, sample(c(1:10, 
                                           na_value), 
                                         6, rep = TRUE)))
    names(df) <- letters[1:6]
    df
}
df <- create_dataframe()

Now, let’s replace all those -99 values with NAs:

df$a[df$a == -99] <- NA
df$b[df$b == -99] <- NA
df$c[df$c == -98] <- NA
df$d[df$d == -99] <- NA
df$e[df$e == -99] <- NA
df$f[df$g == -99] <- NA

Ok, this is painful, and error-prone. We might make copy-paste errors.

Can anyone spot an error?

Let’s do the right thing: start again, and write a function:

df <- create_dataframe()
df
##    a  b c   d   e f
## 1  1  6 1   5 -99 1
## 2 10  4 4 -99   9 3
## 3  7  9 5   4   1 4
## 4  2  9 3   8   6 8
## 5  1 10 5   9   8 6
## 6  6  2 1   3   8 5
fix_missing <- function(x) {
  x[x == -99] <- NA
  x
}

Now we can apply the function to each variable in the data frame:

For example:

df$d
## [1]   5 -99   4   8   9   3

contains one -99 value.

df$a <- fix_missing(df$a)
df$b <- fix_missing(df$b)
df$c <- fix_missing(df$c)
df$d <- fix_missing(df$d)
df$e <- fix_missing(df$e)
df$f <- fix_missing(df$e)

This is better, but still involves copying and pasting (should be avoided at all costs). In R, we would do something like this:

df <- create_dataframe()
df
##    a  b c   d   e f
## 1  1  6 1   5 -99 1
## 2 10  4 4 -99   9 3
## 3  7  9 5   4   1 4
## 4  2  9 3   8   6 8
## 5  1 10 5   9   8 6
## 6  6  2 1   3   8 5
df[] <- lapply(df, fix_missing)
df
##    a  b c  d  e f
## 1  1  6 1  5 NA 1
## 2 10  4 4 NA  9 3
## 3  7  9 5  4  1 4
## 4  2  9 3  8  6 8
## 5  1 10 5  9  8 6
## 6  6  2 1  3  8 5

But what if we have a new dataset with -98 as a missing value, instead of -99?

df <- create_dataframe(na_value = -98)
df
##    a  b c   d   e f
## 1  1  6 1   5 -98 1
## 2 10  4 4 -98   9 3
## 3  7  9 5   4   1 4
## 4  2  9 3   8   6 8
## 5  1 10 5   9   8 6
## 6  6  2 1   3   8 5

Let’s write a new function:

fix_missing <- function(x, na_value = -99) {
  x[x == na_value] <- NA
  x
}

Now we can pass in the correct value as an argument1 When using lapply, arguments to the function being applied are passed as argmuments to lapply(). :

df[] <- lapply(df, fix_missing, na_value = -98)
df
##    a  b c  d  e f
## 1  1  6 1  5 NA 1
## 2 10  4 4 NA  9 3
## 3  7  9 5  4  1 4
## 4  2  9 3  8  6 8
## 5  1 10 5  9  8 6
## 6  6  2 1  3  8 5

How to write a function

There are three key steps to creating a new function:

  1. Pick a name for the function. Here we called the function fix_missing because that’s a fairly accurate description of what is does.

  2. List the inputs, or arguments, to the function. Here we have two arguments: x and na_value

  3. Place the code in the body of the function. This is the bit inside the curly braces {}. The return value of the function will be the last expression evaluated inside the function. R only returns a single object; if you want more than one object to be returned, you need to place these inside a list.

fix_missing <- function(x, na_value = -99) {
  x[x == na_value] <- NA
  x
}

The first argument, x, does not have a default value, whereas the second argument, na_value has a default value of -99. Aguments with default values can be omitted when calling the function, argument without default values are required.

Exercise

Now, let’s do a little exercise:

Solutions

Basic power function

power_fun_1 <- function(x, k = 2) {
    x^k
}
power_fun_1(3)
## [1] 9

Power function that returns a list

power_fun_2 <- function(x, k = 2) { 
    list(result = x^k, 
         call = paste(as.character(x), "to the power of", 
                      as.character(k)))
}
power_fun_2(2, 3)
## $result
## [1] 8
## 
## $call
## [1] "2 to the power of 3"

Power function with a class attribute and plot method

power_fun_3 <- function(x, k = 2) { 
    out <- list(result = x^k,
                x = x,
                k = k,
         call = paste(as.character(x), "to the power of", 
                      as.character(k)))
    class(out) <- "some_class"
    out
}
plot.some_class <- function(z) {
    curve(x^z$k, -2*z$x, 2*z$x, lwd = 3, 
          col = "violet", ylab = z$call,
          xlab = "x")
    points(z$x, z$result, pch = 19, col = "blue", cex = 3)
}

Power function plot Power function plot

out <- power_fun_3(2, k = 3)
plot(out)

Session info

sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tint_0.0.3
## 
## loaded via a namespace (and not attached):
##  [1] backports_1.0.4    magrittr_1.5       rprojroot_1.1     
##  [4] tools_3.3.2        htmltools_0.3.5    parallel_3.3.2    
##  [7] yaml_2.1.14        Rcpp_0.12.8        stringi_1.1.2     
## [10] rmarkdown_1.2.9000 knitr_1.15.4       stringr_1.1.0     
## [13] digest_0.6.10      evaluate_0.10