The basics

Writing a function

The function template is a useful way to start writing a function:

my_fun <- function(arg1, arg2) {
  # body
}

my_fun is the variable that you want to assign your function to, arg1 and arg2 are arguments to the function. The template has two arguments, but you can specify any number of arguments, each separated by a comma. You then replace # body with the R code that your function will execute, referring to the inputs by the argument names you specified.

Let’s get to writing functions!

# Define ratio() function
ratio <- function(x, y) {
  x/y
  }

# Call ratio() with arguments 3 and 4
ratio(3,4)
## [1] 0.75

Arguments

How did you call your function ratio() in the previous exercise? Do you remember the two ways to specify the arguments? (If you have forgotten it might be useful to review the video from Intermediate R)

You probably either did ratio(3, 4), which relies on matching by position, or ratio(x = 3, y = 4), which relies on matching by name.

For functions you and others use often, it’s okay to use positional matching for the first one or two arguments. These are usually the data to be computed on. Good examples are the x argument to the summary functions (mean(), sd(), etc.) and the x and y arguments to plotting functions.

However, beyond the first couple of arguments you should always use matching by name. It makes your code much easier for you and others to read. This is particularly important if the argument is optional, because it has a default. When overriding a default value, it’s good practice to use the name.

Notice that when you call a function, you should place a space around = in function calls, and always put a space after a comma, not before (just like in regular English). Using whitespace makes it easier to skim the function for the important components.

# Rewrite the call to follow best practices
mean(0.1,x=c(1:9, NA),TRUE)
## [1] 5
mean(c(1:9, NA), trim = 0.1, na.rm = TRUE)
## [1] 5

Testing your understanding of scoping

The next few questions are designed to test your understanding of scoping. For first timers, these concepts will be hard and you might not get it the first time through. We decided to include it in the first chapter because it’s good to know now, so when something unexpected happens, you know what to come back to and review.

For now, take a careful look at the function featured in each question. Try to predict what the function will return without running it! Being able to reason about a function is an important skill for R programmers.

Consider the following:

y <- 10
f <- function(x) {
  x + y
}

What will f(10) return?

f(10)
## [1] 20

Consider the following:

y <- 10
f <- function(x) {
  y <- 5
  x + y
}

What will f(10) return?

f(10)
## [1] 15

This one’s a little different. Instead of predicting what the function will return, you need to predict the value of a variable. We run the following in a fresh session in the console:

f <- function(x) {
  y <- 5
  x + y
}
f(5)
## [1] 10

For loops

A safer way to create the sequence

Let’s take a look at the sequence component of our for loop:

i in 1:ncol(df)

Each time our for loop iterates, i takes the next value in 1:ncol(df). This is a pretty common model for a sequence: a sequence of consecutive integers designed to index over one dimension of our data.

What might surprise you is that this isn’t the best way to generate such a sequence, especially when you are using for loops inside your own functions. Let’s look at an example where df is an empty data frame:

df <- data.frame()
1:ncol(df)

for (i in 1:ncol(df)) {
  print(median(df[[i]]))
}
# Error in .subset2(x, i, exact = exact) : subscript out of bounds

Our sequence is now the somewhat non-sensical: 1, 0. You might think you wouldn’t be silly enough to use a for loop with an empty data frame, but once you start writing your own functions, there’s no telling what the input will be.

A better method is to use the seq_along() function. This function generates a sequence along the index of the object passed to it, but handles the empty case much better.

df <- read.csv2("../data/df.csv")
names(df) <-  c("a","b","c","d")
df <- apply(df, 2, as.double)

# Replace the 1:ncol(df) sequence
for (i in seq_along(df)) {
  print(median(df[[i]]))
}
## [1] -1.418444
## [1] 0.8004161
## [1] 0.3090237
## [1] -0.8101048
## [1] 0.4786248
## [1] -0.1399472
## [1] 0.8173948
## [1] -0.4621488
## [1] 0.4101153
## [1] 0.7700926
## [1] 0.5958999
## [1] -1.113583
## [1] 0.5307611
## [1] 1.192109
## [1] -0.5084309
## [1] 0.5434623
## [1] -0.6386574
## [1] -1.354054
## [1] 1.501762
## [1] -0.6814851
## [1] 1.481781
## [1] 0.543907
## [1] 1.438827
## [1] 1.508421
## [1] 0.0009905997
## [1] 0.09475556
## [1] -0.07377109
## [1] 0.7828129
## [1] -0.2461718
## [1] 0.9395533
## [1] -1.684828
## [1] 0.6245306
## [1] 0.8344307
## [1] -0.9508371
## [1] 1.124358
## [1] -0.9337636
## [1] 0.7013038
## [1] 0.4751808
## [1] 0.5888368
## [1] 2.027456
# Create an empty data frame
empty_df <- data.frame()

# Repeat for loop to verify there is no error
for (i in seq_along(empty_df)) {
  print(median(empty_df[[i]]))
}

Keeping output

Our for loop does a good job displaying the column medians, but we might want to store these medians in a vector for future use.

Before you start the loop, you must always allocate sufficient space for the output, let’s say an object called output. This is very important for efficiency: if you grow the for loop at each iteration (e.g. using c()), your for loop will be very slow.

A general way of creating an empty vector of given length is the vector() function. It has two arguments: the type of the vector (“logical”, “integer”, “double”, “character”, etc.) and the length of the vector.

Then, at each iteration of the loop you must store the output in the corresponding entry of the output vector, i.e. assign the result to output[[i]]. (You might ask why we are using double brackets here when output is a vector. It’s primarily for generalizability: this subsetting will work whether output is a vector or a list.)

Let’s edit our loop to store the medians, rather than printing them to the console.

# Create new double vector: output
output <- vector(mode = "double", length = ncol(df))

# Alter the loop
for (i in seq_along(df)) {
  # Change code to store result in output
  output[[i]] <- median(df[[i]])
}

# Print output
output
##  [1] -1.4184440000  0.8004161000  0.3090237000 -0.8101048000  0.4786248000
##  [6] -0.1399472000  0.8173948000 -0.4621488000  0.4101153000  0.7700926000
## [11]  0.5958999000 -1.1135828000  0.5307611000  1.1921092000 -0.5084309000
## [16]  0.5434623000 -0.6386574000 -1.3540538000  1.5017618000 -0.6814851000
## [21]  1.4817806482  0.5439070205  1.4388273001  1.5084212767  0.0009905997
## [26]  0.0947555557 -0.0737710890  0.7828128865 -0.2461718196  0.9395532943
## [31] -1.6848277000  0.6245306000  0.8344307000 -0.9508371000  1.1243584000
## [36] -0.9337636000  0.7013038000  0.4751808000  0.5888368000  2.0274565000

Why to use a function

1. Start with a snippet of code

We have a snippet of code that successfully rescales a column to be between 0 and 1:

(df$a - min(df$a, na.rm = TRUE)) /  
  (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

Our goal over the next few exercises is to turn this snippet, written to work on the a column in the data frame df, into a general purpose rescale01() function that we can apply to any vector.

The first step of turning a snippet into a function is to examine the snippet and decide how many inputs there are, then rewrite the snippet to refer to these inputs using temporary names. These inputs will become the arguments to our function, so choosing good names for them is important. (We’ll talk more about naming arguments in a later exercise.)

In this snippet, there is one input: the numeric vector to be rescaled (currently df$a). What would be a good name for this input? It’s quite common in R to refer to a vector of data simply as x (like in the mean function), so we will follow that convention here.

# Define example vector x
x <- c(seq(1,10,1), NA)

# Rewrite this snippet to refer to x
(x - min(x, na.rm = TRUE)) /
  (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000        NA

2. Rewrite for clarity

Our next step is to examine our snippet and see if we can write it more clearly.

Take a close look at our rewritten snippet. Do you see any duplication?

(x - min(x, na.rm = TRUE)) /
  (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))

One obviously duplicated statement is min(x, na.rm = TRUE). It makes more sense for us just to calculate it once, store the result, and then refer to it when needed. In fact, since we also need the maximum value of x, it would be even better to calculate the range once, then refer to the first and second elements when they are needed.

What should we call this intermediate variable? You’ll soon get the message that using good names is an important part of writing clear code! I suggest we call it rng (for “range”).

# Define example vector x
x <- c(1:10, NA)

# Define rng
rng <- range(x, na.rm = TRUE)

# Rewrite this snippet to refer to the elements of rng
(x - min(x, na.rm = TRUE)) / (rng[2] - rng[1])
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000        NA

3. Finally turn it into a function!

What do you need to write a function? You need a name for the function, you need to know the arguments to the function, and you need code that forms the body of the function.

We now have all these pieces ready to put together. It’s time to write the function!

# Define example vector x
x <- c(1:10, NA) 

# Use the function template to create the rescale01 function
rescale01 <- function(x) {
  # body
  rng <- range(x, na.rm = T)
  (x - min(x, na.rm = T)) / (rng[2] - rng[1])
}

# Test your function, call rescale01 using the vector x as the argument
rescale01(x)
##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000        NA

How to write a function

1. Start with a simple problem

Let’s tackle a new problem. We want to write a function, both_na() that counts at how many positions two vectors, x and y, both have a missing value.

How do we get started? Should we start writing our function?

both_na <- function(x, y) {
  # something goes here?
}

No! We should start by solving a simple example problem first. Let’s define an x and y where we know what the answer both_na(x, y) should be.

Let’s start with:

x <- c( 1, 2, NA, 3, NA)
y <- c(NA, 3, NA, 3,  4)

Then both_na(x, y) should return 1, since there is only one element that is missing in both x and y, the third element.

# Define example vectors x and y
x <- c( 1, 2, NA, 3, NA)
y <- c(NA, 3, NA, 3,  4)

# Count how many elements are missing in both x and y
sum(is.na(x) & is.na(y))
## [1] 1

2. Using a for loop to remove duplication

Imagine we have a data frame called df:

df <- data.frame(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

We want to compute the median of each column. You could do this with copy and paste (df is available in your workspace, so go ahead and try it in the console):

median(df[[1]])
median(df[[2]])
median(df[[3]])
median(df[[4]])

But that’s a lot of repetition! Let’s start by seeing how we could reduce the duplication by using a for loop.

# Initialize output vector
output <- numeric(ncol(df))

# Fill in the body of the for loop
for (i in seq_along(df)) {
    output[[i]] <- median(df[[i]])
}

# View the result
output
##  [1] -1.4184440000  0.8004161000  0.3090237000 -0.8101048000  0.4786248000
##  [6] -0.1399472000  0.8173948000 -0.4621488000  0.4101153000  0.7700926000
## [11]  0.5958999000 -1.1135828000  0.5307611000  1.1921092000 -0.5084309000
## [16]  0.5434623000 -0.6386574000 -1.3540538000  1.5017618000 -0.6814851000
## [21]  1.4817806482  0.5439070205  1.4388273001  1.5084212767  0.0009905997
## [26]  0.0947555557 -0.0737710890  0.7828128865 -0.2461718196  0.9395532943
## [31] -1.6848277000  0.6245306000  0.8344307000 -0.9508371000  1.1243584000
## [36] -0.9337636000  0.7013038000  0.4751808000  0.5888368000  2.0274565000

3. Creating a safe function

safely() is an adverb; it takes a verb and modifies it. That is, it takes a function as an argument and it returns a function as its output. The function that is returned is modified so it never throws an error (and never stops the rest of your computation!).

Instead, it always returns a list with two elements:

result is the original result. If there was an error, this will be NULL. error is an error object. If the operation was successful this will be NULL. Let’s try to make the readLines() function safe.

library(purrr)
# Create safe_readLines() by passing readLines() to safely()
safe_readLines <- safely(readLines)

# Call safe_readLines() on "http://example.org"
example_lines <- safe_readLines("http://example.org")
example_lines
## $result
##  [1] "<!doctype html>"                                                                                      
##  [2] "<html>"                                                                                               
##  [3] "<head>"                                                                                               
##  [4] "    <title>Example Domain</title>"                                                                    
##  [5] ""                                                                                                     
##  [6] "    <meta charset=\"utf-8\" />"                                                                       
##  [7] "    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />"                        
##  [8] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />"                       
##  [9] "    <style type=\"text/css\">"                                                                        
## [10] "    body {"                                                                                           
## [11] "        background-color: #f0f0f2;"                                                                   
## [12] "        margin: 0;"                                                                                   
## [13] "        padding: 0;"                                                                                  
## [14] "        font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;"                
## [15] "        "                                                                                             
## [16] "    }"                                                                                                
## [17] "    div {"                                                                                            
## [18] "        width: 600px;"                                                                                
## [19] "        margin: 5em auto;"                                                                            
## [20] "        padding: 50px;"                                                                               
## [21] "        background-color: #fff;"                                                                      
## [22] "        border-radius: 1em;"                                                                          
## [23] "    }"                                                                                                
## [24] "    a:link, a:visited {"                                                                              
## [25] "        color: #38488f;"                                                                              
## [26] "        text-decoration: none;"                                                                       
## [27] "    }"                                                                                                
## [28] "    @media (max-width: 700px) {"                                                                      
## [29] "        body {"                                                                                       
## [30] "            background-color: #fff;"                                                                  
## [31] "        }"                                                                                            
## [32] "        div {"                                                                                        
## [33] "            width: auto;"                                                                             
## [34] "            margin: 0 auto;"                                                                          
## [35] "            border-radius: 0;"                                                                        
## [36] "            padding: 1em;"                                                                            
## [37] "        }"                                                                                            
## [38] "    }"                                                                                                
## [39] "    </style>    "                                                                                     
## [40] "</head>"                                                                                              
## [41] ""                                                                                                     
## [42] "<body>"                                                                                               
## [43] "<div>"                                                                                                
## [44] "    <h1>Example Domain</h1>"                                                                          
## [45] "    <p>This domain is established to be used for illustrative examples in documents. You may use this"
## [46] "    domain in examples without prior coordination or asking for permission.</p>"                      
## [47] "    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>"                   
## [48] "</div>"                                                                                               
## [49] "</body>"                                                                                              
## [50] "</html>"                                                                                              
## 
## $error
## NULL
# Call safe_readLines() on "http://asdfasdasdkfjlda"
nonsense_lines <- safe_readLines("http://asdfasdasdkfjlda")
## Warning in file(con, "r"): fallo en InternetOpenUrl: 'No se pudo resolver
## el nombre de servidor ni su dirección'
nonsense_lines
## $result
## NULL
## 
## $error
## <simpleError in file(con, "r"): no se puede abrir la conexión>

One feature of safely() is that it plays nicely with the map() functions. Consider this list containing the two URLs from the last exercise, plus one additional URL to make things more interesting:

urls <- list(
  example = "http://example.org",
  rproj = "http://www.r-project.org",
  asdf = "http://asdfasdasdkfjlda"
)

We are interested in quickly downloading the HTML files at each URL. You might try:

map(urls, readLines)

But it results in an error, Error in file(con, “r”) : cannot open the connection, and no output for any of the URLs. Go on, try it!

We can solve this problem by using our safe_readLines() instead.

# Define safe_readLines()
safe_readLines <- safely(readLines)

# Use the safe_readLines() function with map(): html
html <- map(urls, safe_readLines)
## Warning in file(con, "r"): fallo en InternetOpenUrl: 'No se pudo resolver
## el nombre de servidor ni su dirección'
# Call str() on html
str(html)
## List of 3
##  $ example:List of 2
##   ..$ result: chr [1:50] "<!doctype html>" "<html>" "<head>" "    <title>Example Domain</title>" ...
##   ..$ error : NULL
##  $ rproj  :List of 2
##   ..$ result: chr [1:124] "<!DOCTYPE html>" "<html lang=\"en\">" "  <head>" "    <meta charset=\"utf-8\">" ...
##   ..$ error : NULL
##  $ asdf   :List of 2
##   ..$ result: NULL
##   ..$ error :List of 2
##   .. ..$ message: chr "no se puede abrir la conexión"
##   .. ..$ call   : language file(con, "r")
##   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# Extract the result from one of the successful elements
html[["example"]][["result"]]
##  [1] "<!doctype html>"                                                                                      
##  [2] "<html>"                                                                                               
##  [3] "<head>"                                                                                               
##  [4] "    <title>Example Domain</title>"                                                                    
##  [5] ""                                                                                                     
##  [6] "    <meta charset=\"utf-8\" />"                                                                       
##  [7] "    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />"                        
##  [8] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />"                       
##  [9] "    <style type=\"text/css\">"                                                                        
## [10] "    body {"                                                                                           
## [11] "        background-color: #f0f0f2;"                                                                   
## [12] "        margin: 0;"                                                                                   
## [13] "        padding: 0;"                                                                                  
## [14] "        font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;"                
## [15] "        "                                                                                             
## [16] "    }"                                                                                                
## [17] "    div {"                                                                                            
## [18] "        width: 600px;"                                                                                
## [19] "        margin: 5em auto;"                                                                            
## [20] "        padding: 50px;"                                                                               
## [21] "        background-color: #fff;"                                                                      
## [22] "        border-radius: 1em;"                                                                          
## [23] "    }"                                                                                                
## [24] "    a:link, a:visited {"                                                                              
## [25] "        color: #38488f;"                                                                              
## [26] "        text-decoration: none;"                                                                       
## [27] "    }"                                                                                                
## [28] "    @media (max-width: 700px) {"                                                                      
## [29] "        body {"                                                                                       
## [30] "            background-color: #fff;"                                                                  
## [31] "        }"                                                                                            
## [32] "        div {"                                                                                        
## [33] "            width: auto;"                                                                             
## [34] "            margin: 0 auto;"                                                                          
## [35] "            border-radius: 0;"                                                                        
## [36] "            padding: 1em;"                                                                            
## [37] "        }"                                                                                            
## [38] "    }"                                                                                                
## [39] "    </style>    "                                                                                     
## [40] "</head>"                                                                                              
## [41] ""                                                                                                     
## [42] "<body>"                                                                                               
## [43] "<div>"                                                                                                
## [44] "    <h1>Example Domain</h1>"                                                                          
## [45] "    <p>This domain is established to be used for illustrative examples in documents. You may use this"
## [46] "    domain in examples without prior coordination or asking for permission.</p>"                      
## [47] "    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>"                   
## [48] "</div>"                                                                                               
## [49] "</body>"                                                                                              
## [50] "</html>"
# Extract the error from the element that was unsuccessful
html[["asdf"]][["error"]]
## <simpleError in file(con, "r"): no se puede abrir la conexión>

We now have output that contains the HTML for each of the two URLs on which readLines() was successful and the error for the other. But the output isn’t that easy to work with, since the results and errors are buried in the inner-most level of the list.

purrr provides a function transpose() that reshapes a list so the inner-most level becomes the outer-most level. In otherwords, it turns a list-of-lists “inside-out”. Consider the following list:

nested_list <- list(
   x1 = list(a = 1, b = 2),
   x2 = list(a = 3, b = 4)
)

If I need to extract the a element in x1, I could do nested_list[[“x1”]][[“a”]]. However, if I transpose the list first, the order of subsetting reverses. That is, to extract the same element I could also do transpose(nested_list)[[“a”]][[“x1”]].

This is really handy for safe output, since we can grab all the results or all the errors really easily.

# Define safe_readLines() and html
safe_readLines <- safely(readLines)
html <- map(urls, safe_readLines)
## Warning in file(con, "r"): fallo en InternetOpenUrl: 'No se pudo resolver
## el nombre de servidor ni su dirección'
# Examine the structure of transpose(html)
str(transpose(html))
## List of 2
##  $ result:List of 3
##   ..$ example: chr [1:50] "<!doctype html>" "<html>" "<head>" "    <title>Example Domain</title>" ...
##   ..$ rproj  : chr [1:124] "<!DOCTYPE html>" "<html lang=\"en\">" "  <head>" "    <meta charset=\"utf-8\">" ...
##   ..$ asdf   : NULL
##  $ error :List of 3
##   ..$ example: NULL
##   ..$ rproj  : NULL
##   ..$ asdf   :List of 2
##   .. ..$ message: chr "no se puede abrir la conexión"
##   .. ..$ call   : language file(con, "r")
##   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
# Extract the results: res
res <- transpose(html)[["result"]]

# Extract the errors: errs
errs <- transpose(html)[["error"]]

Working with errors and results

What you do with the errors and results is up to you. But, commonly you’ll want to collect all the results for the elements that were successful and examine the inputs for all those that weren’t.

# Initialize some objects
safe_readLines <- safely(readLines)
html <- map(urls, safe_readLines)
## Warning in file(con, "r"): fallo en InternetOpenUrl: 'No se pudo resolver
## el nombre de servidor ni su dirección'
res <- transpose(html)[["result"]]
errs <- transpose(html)[["error"]]

# Create a logical vector is_ok
is_ok <- map_lgl(errs, is_null)

# Extract the successful results
res[is_ok]
## $example
##  [1] "<!doctype html>"                                                                                      
##  [2] "<html>"                                                                                               
##  [3] "<head>"                                                                                               
##  [4] "    <title>Example Domain</title>"                                                                    
##  [5] ""                                                                                                     
##  [6] "    <meta charset=\"utf-8\" />"                                                                       
##  [7] "    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />"                        
##  [8] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />"                       
##  [9] "    <style type=\"text/css\">"                                                                        
## [10] "    body {"                                                                                           
## [11] "        background-color: #f0f0f2;"                                                                   
## [12] "        margin: 0;"                                                                                   
## [13] "        padding: 0;"                                                                                  
## [14] "        font-family: \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;"                
## [15] "        "                                                                                             
## [16] "    }"                                                                                                
## [17] "    div {"                                                                                            
## [18] "        width: 600px;"                                                                                
## [19] "        margin: 5em auto;"                                                                            
## [20] "        padding: 50px;"                                                                               
## [21] "        background-color: #fff;"                                                                      
## [22] "        border-radius: 1em;"                                                                          
## [23] "    }"                                                                                                
## [24] "    a:link, a:visited {"                                                                              
## [25] "        color: #38488f;"                                                                              
## [26] "        text-decoration: none;"                                                                       
## [27] "    }"                                                                                                
## [28] "    @media (max-width: 700px) {"                                                                      
## [29] "        body {"                                                                                       
## [30] "            background-color: #fff;"                                                                  
## [31] "        }"                                                                                            
## [32] "        div {"                                                                                        
## [33] "            width: auto;"                                                                             
## [34] "            margin: 0 auto;"                                                                          
## [35] "            border-radius: 0;"                                                                        
## [36] "            padding: 1em;"                                                                            
## [37] "        }"                                                                                            
## [38] "    }"                                                                                                
## [39] "    </style>    "                                                                                     
## [40] "</head>"                                                                                              
## [41] ""                                                                                                     
## [42] "<body>"                                                                                               
## [43] "<div>"                                                                                                
## [44] "    <h1>Example Domain</h1>"                                                                          
## [45] "    <p>This domain is established to be used for illustrative examples in documents. You may use this"
## [46] "    domain in examples without prior coordination or asking for permission.</p>"                      
## [47] "    <p><a href=\"http://www.iana.org/domains/example\">More information...</a></p>"                   
## [48] "</div>"                                                                                               
## [49] "</body>"                                                                                              
## [50] "</html>"                                                                                              
## 
## $rproj
##   [1] "<!DOCTYPE html>"                                                                                                                                                                                                                                                                                                                                     
##   [2] "<html lang=\"en\">"                                                                                                                                                                                                                                                                                                                                  
##   [3] "  <head>"                                                                                                                                                                                                                                                                                                                                            
##   [4] "    <meta charset=\"utf-8\">"                                                                                                                                                                                                                                                                                                                        
##   [5] "    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=edge\">"                                                                                                                                                                                                                                                                                       
##   [6] "    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\">"                                                                                                                                                                                                                                                                        
##   [7] "    <title>R: The R Project for Statistical Computing</title>"                                                                                                                                                                                                                                                                                       
##   [8] ""                                                                                                                                                                                                                                                                                                                                                    
##   [9] "    <link rel=\"icon\" type=\"image/png\" href=\"/favicon-32x32.png\" sizes=\"32x32\" />"                                                                                                                                                                                                                                                            
##  [10] "    <link rel=\"icon\" type=\"image/png\" href=\"/favicon-16x16.png\" sizes=\"16x16\" />"                                                                                                                                                                                                                                                            
##  [11] ""                                                                                                                                                                                                                                                                                                                                                    
##  [12] "    <!-- Bootstrap -->"                                                                                                                                                                                                                                                                                                                              
##  [13] "    <link href=\"/css/bootstrap.min.css\" rel=\"stylesheet\">"                                                                                                                                                                                                                                                                                       
##  [14] "    <link href=\"/css/R.css\" rel=\"stylesheet\">"                                                                                                                                                                                                                                                                                                   
##  [15] ""                                                                                                                                                                                                                                                                                                                                                    
##  [16] "    <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries -->"                                                                                                                                                                                                                                                          
##  [17] "    <!-- WARNING: Respond.js doesn't work if you view the page via file:// -->"                                                                                                                                                                                                                                                                      
##  [18] "    <!--[if lt IE 9]>"                                                                                                                                                                                                                                                                                                                               
##  [19] "      <script src=\"https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js\"></script>"                                                                                                                                                                                                                                                             
##  [20] "      <script src=\"https://oss.maxcdn.com/respond/1.4.2/respond.min.js\"></script>"                                                                                                                                                                                                                                                                 
##  [21] "    <![endif]-->"                                                                                                                                                                                                                                                                                                                                    
##  [22] "  </head>"                                                                                                                                                                                                                                                                                                                                           
##  [23] "  <body>"                                                                                                                                                                                                                                                                                                                                            
##  [24] "    <div class=\"container page\">"                                                                                                                                                                                                                                                                                                                  
##  [25] "      <div class=\"row\">"                                                                                                                                                                                                                                                                                                                           
##  [26] "        <div class=\"col-xs-12 col-sm-offset-1 col-sm-2 sidebar\" role=\"navigation\">"                                                                                                                                                                                                                                                              
##  [27] "<div class=\"row\">"                                                                                                                                                                                                                                                                                                                                 
##  [28] "<div class=\"col-xs-6 col-sm-12\">"                                                                                                                                                                                                                                                                                                                  
##  [29] "<p><a href=\"/\"><img src=\"/Rlogo.png\" width=\"100\" height=\"78\" alt = \"R\" /></a></p>"                                                                                                                                                                                                                                                         
##  [30] "<p><small><a href=\"/\">[Home]</a></small></p>"                                                                                                                                                                                                                                                                                                      
##  [31] "<h2 id=\"download\">Download</h2>"                                                                                                                                                                                                                                                                                                                   
##  [32] "<p><a href=\"http://cran.r-project.org/mirrors.html\">CRAN</a></p>"                                                                                                                                                                                                                                                                                  
##  [33] "<h2 id=\"r-project\">R Project</h2>"                                                                                                                                                                                                                                                                                                                 
##  [34] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [35] "<li><a href=\"/about.html\">About R</a></li>"                                                                                                                                                                                                                                                                                                        
##  [36] "<li><a href=\"/logo/\">Logo</a></li>"                                                                                                                                                                                                                                                                                                                
##  [37] "<li><a href=\"/contributors.html\">Contributors</a></li>"                                                                                                                                                                                                                                                                                            
##  [38] "<li><a href=\"/news.html\">Whatâ\200\231s New?</a></li>"                                                                                                                                                                                                                                                                                                   
##  [39] "<li><a href=\"/bugs.html\">Reporting Bugs</a></li>"                                                                                                                                                                                                                                                                                                  
##  [40] "<li><a href=\"/conferences/\">Conferences</a></li>"                                                                                                                                                                                                                                                                                                  
##  [41] "<li><a href=\"/search.html\">Search</a></li>"                                                                                                                                                                                                                                                                                                        
##  [42] "<li><a href=\"/mail.html\">Get Involved: Mailing Lists</a></li>"                                                                                                                                                                                                                                                                                     
##  [43] "<li><a href=\"http://developer.R-project.org\">Developer Pages</a></li>"                                                                                                                                                                                                                                                                             
##  [44] "<li><a href=\"https://developer.r-project.org/Blog/public/\">R Blog</a></li>"                                                                                                                                                                                                                                                                        
##  [45] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [46] "</div>"                                                                                                                                                                                                                                                                                                                                              
##  [47] "<div class=\"col-xs-6 col-sm-12\">"                                                                                                                                                                                                                                                                                                                  
##  [48] "<h2 id=\"r-foundation\">R Foundation</h2>"                                                                                                                                                                                                                                                                                                           
##  [49] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [50] "<li><a href=\"/foundation/\">Foundation</a></li>"                                                                                                                                                                                                                                                                                                    
##  [51] "<li><a href=\"/foundation/board.html\">Board</a></li>"                                                                                                                                                                                                                                                                                               
##  [52] "<li><a href=\"/foundation/members.html\">Members</a></li>"                                                                                                                                                                                                                                                                                           
##  [53] "<li><a href=\"/foundation/donors.html\">Donors</a></li>"                                                                                                                                                                                                                                                                                             
##  [54] "<li><a href=\"/foundation/donations.html\">Donate</a></li>"                                                                                                                                                                                                                                                                                          
##  [55] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [56] "<h2 id=\"help-with-r\">Help With R</h2>"                                                                                                                                                                                                                                                                                                             
##  [57] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [58] "<li><a href=\"/help.html\">Getting Help</a></li>"                                                                                                                                                                                                                                                                                                    
##  [59] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [60] "<h2 id=\"documentation\">Documentation</h2>"                                                                                                                                                                                                                                                                                                         
##  [61] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [62] "<li><a href=\"http://cran.r-project.org/manuals.html\">Manuals</a></li>"                                                                                                                                                                                                                                                                             
##  [63] "<li><a href=\"http://cran.r-project.org/faqs.html\">FAQs</a></li>"                                                                                                                                                                                                                                                                                   
##  [64] "<li><a href=\"http://journal.r-project.org\">The R Journal</a></li>"                                                                                                                                                                                                                                                                                 
##  [65] "<li><a href=\"/doc/bib/R-books.html\">Books</a></li>"                                                                                                                                                                                                                                                                                                
##  [66] "<li><a href=\"/certification.html\">Certification</a></li>"                                                                                                                                                                                                                                                                                          
##  [67] "<li><a href=\"/other-docs.html\">Other</a></li>"                                                                                                                                                                                                                                                                                                     
##  [68] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [69] "<h2 id=\"links\">Links</h2>"                                                                                                                                                                                                                                                                                                                         
##  [70] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [71] "<li><a href=\"http://www.bioconductor.org\">Bioconductor</a></li>"                                                                                                                                                                                                                                                                                   
##  [72] "<li><a href=\"/other-projects.html\">Related Projects</a></li>"                                                                                                                                                                                                                                                                                      
##  [73] "<li><a href=\"/gsoc.html\">GSoC</a></li>"                                                                                                                                                                                                                                                                                                            
##  [74] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [75] "</div>"                                                                                                                                                                                                                                                                                                                                              
##  [76] "</div>"                                                                                                                                                                                                                                                                                                                                              
##  [77] "        </div>"                                                                                                                                                                                                                                                                                                                                      
##  [78] "        <div class=\"col-xs-12 col-sm-7\">"                                                                                                                                                                                                                                                                                                          
##  [79] "        <h1>The R Project for Statistical Computing</h1>"                                                                                                                                                                                                                                                                                            
##  [80] "<h2 id=\"getting-started\">Getting Started</h2>"                                                                                                                                                                                                                                                                                                     
##  [81] "<p>R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To <strong><a href=\"http://cran.r-project.org/mirrors.html\">download R</a></strong>, please choose your preferred <a href=\"http://cran.r-project.org/mirrors.html\">CRAN mirror</a>.</p>"
##  [82] "<p>If you have questions about R like how to download and install the software, or what the license terms are, please read our <a href=\"http://cran.R-project.org/faqs.html\">answers to frequently asked questions</a> before you send an email.</p>"                                                                                              
##  [83] "<h2 id=\"news\">News</h2>"                                                                                                                                                                                                                                                                                                                           
##  [84] "<ul>"                                                                                                                                                                                                                                                                                                                                                
##  [85] "<li><p><a href=\"https://cran.r-project.org/src/base/R-3\"><strong>R version 3.6.0 (Planting of a Tree)</strong></a> has been released on 2019-04-26.</p></li>"                                                                                                                                                                                      
##  [86] "<li><p>useR! 2020 will take place in St. Louis, Missouri, USA.</p></li>"                                                                                                                                                                                                                                                                            
##  [87] "<li><p><a href=\"https://cran.r-project.org/src/base/R-3\"><strong>R version 3.5.3 (Great Truth)</strong></a> has been released on 2019-03-11.</p></li>"                                                                                                                                                                                             
##  [88] "<li><p>The R Foundation Conference Committee has released a <a href=\"https://www.r-project.org/useR-2020_call.html\">call for proposals</a> to host useR! 2020 in North America.</p></li>"                                                                                                                                                          
##  [89] "<li><p>You can now support the R Foundation with a renewable subscription as a <a href=\"https://www.r-project.org/foundation/donations.html\">supporting member</a></p></li>"                                                                                                                                                                       
##  [90] "<li><p>The R Foundation has been awarded the Personality/Organization of the year 2018 award by the professional association of German market and social researchers.</p></li>"                                                                                                                                                                      
##  [91] "</ul>"                                                                                                                                                                                                                                                                                                                                               
##  [92] "<h2 id=\"news-via-twitter\">News via Twitter</h2>"                                                                                                                                                                                                                                                                                                   
##  [93] "<a class=\"twitter-timeline\""                                                                                                                                                                                                                                                                                                                       
##  [94] " href=\"https://twitter.com/_R_Foundation?ref_src=twsrc%5Etfw\""                                                                                                                                                                                                                                                                                     
##  [95] " data-width=\"400\""                                                                                                                                                                                                                                                                                                                                 
##  [96] " data-show-replies=\"false\""                                                                                                                                                                                                                                                                                                                        
##  [97] " data-chrome=\"noheader,nofooter,noborders\""                                                                                                                                                                                                                                                                                                        
##  [98] " data-dnt=\"true\""                                                                                                                                                                                                                                                                                                                                  
##  [99] " data-tweet-limit=\"3\">News from the R Foundation</a>"                                                                                                                                                                                                                                                                                              
## [100] "<script async"                                                                                                                                                                                                                                                                                                                                       
## [101] " src=\"https://platform.twitter.com/widgets.js\""                                                                                                                                                                                                                                                                                                    
## [102] " charset=\"utf-8\"></script>"                                                                                                                                                                                                                                                                                                                        
## [103] "<!--- (Boilerplate for release run-in)"                                                                                                                                                                                                                                                                                                              
## [104] "-   [**R version 3.1.3 (Smooth Sidewalk) prerelease versions**](http://cran.r-project.org/src/base-prerelease) will appear starting February 28. Final release is scheduled for 2015-03-09."                                                                                                                                                         
## [105] "-->"                                                                                                                                                                                                                                                                                                                                                 
## [106] "        </div>"                                                                                                                                                                                                                                                                                                                                      
## [107] "      </div>"                                                                                                                                                                                                                                                                                                                                        
## [108] "      <div class=\"raw footer\">"                                                                                                                                                                                                                                                                                                                    
## [109] "        &copy; The R Foundation. For queries about this web site, please contact"                                                                                                                                                                                                                                                                    
## [110] "\t<script type='text/javascript'>"                                                                                                                                                                                                                                                                                                                    
## [111] "<!--"                                                                                                                                                                                                                                                                                                                                                
## [112] "var s=\"=b!isfg>#nbjmup;xfcnbtufsAs.qspkfdu/psh#?uif!xfcnbtufs=0b?\";"                                                                                                                                                                                                                                                                               
## [113] "m=\"\"; for (i=0; i<s.length; i++) {if(s.charCodeAt(i) == 28){m+= '&';} else if (s.charCodeAt(i) == 23) {m+= '!';} else {m+=String.fromCharCode(s.charCodeAt(i)-1);}}document.write(m);//-->"                                                                                                                                                        
## [114] "\t</script>;"                                                                                                                                                                                                                                                                                                                                         
## [115] "        for queries about R itself, please consult the "                                                                                                                                                                                                                                                                                             
## [116] "        <a href=\"help.html\">Getting Help</a> section."                                                                                                                                                                                                                                                                                             
## [117] "      </div>"                                                                                                                                                                                                                                                                                                                                        
## [118] "    </div>"                                                                                                                                                                                                                                                                                                                                          
## [119] "    <!-- jQuery (necessary for Bootstrap's JavaScript plugins) -->"                                                                                                                                                                                                                                                                                  
## [120] "    <script src=\"https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js\"></script>"                                                                                                                                                                                                                                                     
## [121] "    <!-- Include all compiled plugins (below), or include individual files as needed -->"                                                                                                                                                                                                                                                            
## [122] "    <script src=\"/js/bootstrap.min.js\"></script>"                                                                                                                                                                                                                                                                                                  
## [123] "  </body>"                                                                                                                                                                                                                                                                                                                                           
## [124] "</html>"
# Find the URLs that were unsuccessful
urls[!is_ok]
## $asdf
## [1] "http://asdfasdasdkfjlda"

Use map with multiple arguments

Getting started

We’ll use random number generation as an example throughout the remaining exercises in this chapter. To get started, let’s imagine simulating 5 random numbers from a Normal distribution. You can do this in R with the rnorm() function. For example, to generate 5 random numbers from a Normal distribution with mean zero, we can do:

rnorm(n = 5)
## [1]  0.03647898 -0.53274166 -0.85936049  0.75121986 -0.65455199

Now, imagine you want to do this three times, but each time with a different sample size. You already know how! Let’s use the map() function to get it done.

# Create a list n containing the values: 5, 10, and 20
n <- list(5, 10, 20)

# Call map() on n with rnorm() to simulate three samples
map(n, rnorm)
## [[1]]
## [1] -1.2308981 -1.2824501 -2.4616473 -2.5225782  0.4291027
## 
## [[2]]
##  [1]  0.004489359 -0.668112556  0.693267685  0.303341987  0.592469129
##  [6]  1.186785700 -0.326621100  1.002250994 -0.275392821 -0.246677779
## 
## [[3]]
##  [1]  0.4377146  0.3949505  0.4266470  0.3264347 -0.6594162  1.3355906
##  [7] -0.8491024  0.8974620 -1.1003455  1.0645520 -1.8343825  0.3628863
## [13]  1.6404376  0.8388511 -0.2937274 -0.7006951  2.6686147 -0.1792945
## [19]  1.7024852  1.2825537

Mapping over two arguments

Ok, but now imagine we don’t just want to vary the sample size, we also want to vary the mean. The mean can be specified in rnorm() by the argument mean. Now there are two arguments to rnorm() we want to vary: n and mean.

The map2() function is designed exactly for this purpose; it allows iteration over two objects. The first two arguments to map2() are the objects to iterate over and the third argument .f is the function to apply.

Let’s use map2() to simulate three samples with different sample sizes and different means.

# Initialize n
n <- list(5, 10, 20)

# Create a list mu containing the values: 1, 5, and 10
mu <- list(1, 5, 10)

# Edit to call map2() on n and mu with rnorm() to simulate three samples
map2(n, mu, rnorm)
## [[1]]
## [1] 1.1172506 3.0518290 0.2673878 2.3679948 0.8162303
## 
## [[2]]
##  [1] 5.717227 4.939556 6.322356 5.549724 6.118284 4.775456 3.829074
##  [8] 5.844079 3.312547 5.940104
## 
## [[3]]
##  [1]  9.830461 12.888200 10.485034 11.083728 10.727070  9.659520  8.555704
##  [8] 10.374596 10.842726  9.894939 10.274485 10.515865  8.644311  8.620166
## [15]  9.733257  9.366410 11.286528  9.428655  8.701046 10.400603

Mapping over more than two arguments

But wait, there’s another argument to rnorm() we might want to vary: sd, the standard deviation of the Normal distribution. You might think there is a map3() function, but there isn’t. Instead purrr provides a pmap() function that iterates over 2 or more arguments.

First, let’s take a look at pmap() for the situation we just solved: iterating over two arguments. Instead of providing each item to iterate over as arguments, pmap() takes a list of arguments as its input. For example, we could replicate our previous example, iterating over both n and mu with the following:

n <- list(5, 10, 20)
mu <- list(1, 5, 10)

pmap(list(n, mu), rnorm)
## [[1]]
## [1] -0.15567031  0.29572569 -0.05224814  2.91191865  0.96278727
## 
## [[2]]
##  [1] 4.186070 5.433151 5.231575 6.277179 3.758945 5.467113 5.468320
##  [8] 5.218611 6.348699 5.797739
## 
## [[3]]
##  [1]  8.652126 10.803269  8.525029  9.302949 11.028244 10.413505  9.336752
##  [8]  8.304407 10.569076 10.927294  9.022364 10.889384 10.670350  9.247592
## [15]  9.243549 10.936106 12.025676  9.521149 10.687484 10.672801

Notice how we had to put our two items to iterate over (n and mu) into a list.

Let’s expand this code to iterate over varying standard deviations too.

# Initialize n and mu
n <- list(5, 10, 20)
mu <- list(1, 5, 10)

# Create a sd list with the values: 0.1, 1 and 0.1
sd <- list(0.1, 1, 0.1)

# Edit this call to pmap() to iterate over the sd list as well
pmap(list(n, mu, sd), rnorm)
## [[1]]
## [1] 0.9785626 0.9872763 1.1432908 1.0363919 0.9023949
## 
## [[2]]
##  [1] 4.827397 5.184590 4.788293 5.136883 3.875741 6.116301 5.130771
##  [8] 6.779347 5.382926 6.131961
## 
## [[3]]
##  [1] 10.068491 10.009728 10.230593  9.961941  9.915285 10.071136 10.138339
##  [8]  9.917251  9.848322 10.056389 10.060127 10.040863  9.908300  9.867742
## [15]  9.959185  9.899381 10.078200 10.041810  9.929110 10.030736

Arguments matching

Compare the following two calls to pmap() (run them in the console and compare their output too!):

pmap(list(n, mu, sd), rnorm)
pmap(list(mu, n, sd), rnorm)

What’s the difference? By default pmap() matches the elements of the list to the arguments in the function by position. In the first case, n to the n argument of rnorm(), mu to the mean argument of rnorm(), and sd to the sd argument of rnorm(). In the second case mu gets matched to the n argument of rnorm(), which is clearly not what we intended!

Instead of relying on this positional matching, a safer alternative is to provide names in our list. The name of each element should be the argument name we want to match it to.

Let’s fix up that second call.

sd <- list(0.1, 0.5, 0.1)
# Name the elements of the argument list
pmap(list(mean = mu, n = n, sd = sd), rnorm)
## [[1]]
## [1] 0.9926445 1.0421568 0.9184989 0.9744957 1.0698974
## 
## [[2]]
##  [1] 5.098869 5.395400 5.835664 5.108595 3.742881 4.921796 4.602659
##  [8] 5.744983 4.964532 5.065715
## 
## [[3]]
##  [1]  9.954841 10.071701 10.074790  9.837672 10.145948  9.935966  9.883195
##  [8]  9.963253  9.968462  9.883433 10.068102  9.958365 10.058214 10.108740
## [15] 10.016174 10.126987  9.870544  9.896215  9.870957 10.030063

Mapping over functions and their arguments

Sometimes it’s not the arguments to a function you want to iterate over, but a set of functions themselves. Imagine that instead of varying the parameters to rnorm() we want to simulate from different distributions, say, using rnorm(), runif(), and rexp(). How do we iterate over calling these functions?

In purrr, this is handled by the invoke_map() function. The first argument is a list of functions. In our example, something like:

funs <- list("rnorm", "runif", "rexp")

The second argument specifies the arguments to the functions. In the simplest case, all the functions take the same argument, and we can specify it directly, relying on … to pass it to each function. In this case, call each function with the argument n = 5:

invoke_map(funs, n = 5)

In more complicated cases, the functions may take different arguments, or we may want to pass different values to each function. In this case, we need to supply invoke_map() with a list, where each element specifies the arguments to the corresponding function.

Let’s use this approach to simulate three samples from the following three distributions: Normal(10, 1), Uniform(0, 5), and Exponential(5).

# Define list of functions
funs <- list("rnorm", "runif", "rexp")

# Parameter list for rnorm()
rnorm_params <- list(mean = 10)

# Add a min element with value 0 and max element with value 5
runif_params <- list(min = 0, max = 5)

# Add a rate element with value 5
rexp_params <- list(rate = 5)

# Define params for each function
params <- list(
  rnorm_params,
  runif_params,
  rexp_params
)

# Call invoke_map() on funs supplying params and setting n to 5
invoke_map(funs, params, n = 5)
## [[1]]
## [1]  9.464062 10.154056 10.795790 10.989272  8.936986
## 
## [[2]]
## [1] 1.6092647 2.3900880 3.6419941 0.6441171 1.7842332
## 
## [[3]]
## [1] 0.4326375 0.6391164 0.7327439 0.1762961 0.0897791

How to write robust functions

An error is better than a surprise

Recall our both_na() function from Chapter 2, that finds the number of entries where vectors x and y both have missing values:

both_na <- function(x, y) {
  sum(is.na(x) & is.na(y))
}

We had an example where the behavior was a little surprising:

x <- c(NA, NA, NA)
y <- c( 1, NA, NA, NA)
both_na(x, y)
## Warning in is.na(x) & is.na(y): longitud de objeto mayor no es múltiplo de
## la longitud de uno menor
## [1] 3

The function works and returns 3, but we certainly didn’t design this function with the idea that people could pass in different length arguments.

Using stopifnot() is a quick way to have your function stop, if a condition isn’t met. stopifnot() takes logical expressions as arguments and if any are FALSE an error will occur.

# Define troublesome x and y
x <- c(NA, NA, NA)
y <- c( 1, NA, NA, NA)

both_na <- function(x, y) {
  # Add stopifnot() to check length of x and y
  stopifnot(length(x) == length(y))  
  
  sum(is.na(x) & is.na(y))
}

# Call both_na() on x and y
# both_na(x, y)
# Error in both_na(x, y) : length(x) == length(y) is not TRUE

An informative error is even better

Using stop() instead of stopifnot() allows you to specify a more informative error message. Recall the general pattern for using stop() is:

if (condition) {
  stop("Error", call. = FALSE)
}

Writing good error messages is an important part of writing a good function! We recommend your error tells the user what should be true, not what is false. For example, here a good error would be “x and y must have the same length”, rather than the bad error “x and y don’t have the same length”.

Let’s use this pattern to write a better check for the length of x and y.

# Define troublesome x and y
x <- c(NA, NA, NA)
y <- c( 1, NA, NA, NA)

both_na <- function(x, y) {
  # Replace condition with logical 
  if (length(x) != length(y)) {
    # Replace "Error" with better message
    stop("x and y must have the same length", call. = FALSE)
  }  
  
  sum(is.na(x) & is.na(y))
}

# Call both_na() 
both_na(x, y)
  
# Error: x and y must have the same length

sapply is another common culprit

sapply() is another common offender returning unstable types. The type of output returned from sapply() depends on the type of input.

Consider the following data frame and two calls to sapply():

df <- data.frame(
  a = 1L,
  b = 1.5,
  y = Sys.time(),
  z = ordered(1)
)

A <- sapply(df[1:4], class) 
B <- sapply(df[3:4], class)

What type of objects will be A and B be?

A will be a list, B will be a character matrix.

Using purrr solves the problem

This unpredictable behaviour is a sign that you shouldn’t rely on sapply() inside your own functions.

So, what do you do? Use alternate functions that are type consistent! And you already know a whole set: the map() functions in purrr.

In this example, when we call class() on the columns of the data frame we are expecting character output, so our function of choice should be: map_chr():

df <- data.frame(
  a = 1L,
  b = 1.5,
  y = Sys.time(),
  z = ordered(1)
)

A <- map_chr(df[1:4], class) 
B <- map_chr(df[3:4], class)
# Error: Result 3 must be a single string, not a character vector of length 2

Except that gives us errors. This is a good thing! It alerts us that our assumption (that class() would return purely character output) is wrong.

Let’s look at a couple of solutions. First, we could use map() instead of map_chr(). Our result will always be a list, no matter the input.

# sapply calls
A <- sapply(df[1:4], class) 
B <- sapply(df[3:4], class)
C <- sapply(df[1:2], class) 

# Demonstrate type inconsistency
str(A)
## List of 4
##  $ a: chr "integer"
##  $ b: chr "numeric"
##  $ y: chr [1:2] "POSIXct" "POSIXt"
##  $ z: chr [1:2] "ordered" "factor"
str(B)
##  chr [1:2, 1:2] "POSIXct" "POSIXt" "ordered" "factor"
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "y" "z"
str(C)
##  Named chr [1:2] "integer" "numeric"
##  - attr(*, "names")= chr [1:2] "a" "b"
# Use map() to define X, Y and Z
X <- map(df[1:4], class) 
Y <- map(df[3:4], class)
Z <- map(df[1:2], class) 

# Use str() to check type consistency
str(X)
## List of 4
##  $ a: chr "integer"
##  $ b: chr "numeric"
##  $ y: chr [1:2] "POSIXct" "POSIXt"
##  $ z: chr [1:2] "ordered" "factor"
str(Y)
## List of 2
##  $ y: chr [1:2] "POSIXct" "POSIXt"
##  $ z: chr [1:2] "ordered" "factor"
str(Z)
## List of 2
##  $ a: chr "integer"
##  $ b: chr "numeric"

A type consistent solution

If we wrap our solution into a function, we can be confident that this function will always return a list because we’ve used a type consistent function, map():

col_classes <- function(df) {
  map(df, class)
}

But what if you wanted this function to always return a character string?

One option would be to decide what should happen if class() returns something longer than length 1. For example, we might simply take the first element of the vector returned by class().

col_classes <- function(df) {
  # Assign list output to class_list
  class_list <- map(df, class)
  
  # Use map_chr() to extract first element in class_list
  map_chr(class_list, 1)
}

# Check that our new function is type consistent
df %>% col_classes() %>% str()
##  Named chr [1:4] "integer" "numeric" "POSIXct" "ordered"
##  - attr(*, "names")= chr [1:4] "a" "b" "y" "z"
df[3:4] %>% col_classes() %>% str()
##  Named chr [1:2] "POSIXct" "ordered"
##  - attr(*, "names")= chr [1:2] "y" "z"
df[1:2] %>% col_classes() %>% str()
##  Named chr [1:2] "integer" "numeric"
##  - attr(*, "names")= chr [1:2] "a" "b"

Or fail early if something goes wrong

Another option would be to simply fail. We could rely on map_chr()’s type consistency to fail for us:

col_classes <- function(df) {
  map_chr(df, class)
}

df %>% col_classes() %>% str()

Or, check the condition ourselves and return an informative error message. We’ll implement this approach in this exercise.

As you write more functions, you’ll find you often come across this tension between implementing a function that does something sensible when something surprising happens, or simply fails when something surprising happens. Our recommendation is to fail when you are writing functions that you’ll use behind the scenes for programming and to do something sensible when writing functions for users to use interactively.

(And by the way, flatten_chr() is yet another useful function in purrr. It takes a list and removes its hierarchy. The suffix _chr indicates that this is another type consistent function, and will either return a character string or an error message.)

col_classes <- function(df) {
  class_list <- map(df, class)
  
  # Add a check that no element of class_list has length > 1
  if (any(map_dbl(class_list, length) > 1)) {
    stop("Some columns have more than one class", call. = FALSE)
  }
  
  # Use flatten_chr() to return a character vector
  flatten_chr(class_list)
}
# Check that our new function is type consistent
df %>% col_classes() %>% str() 
# Error: Some columns have more than one class
# Check that our new function is type consistent
df[3:4] %>% col_classes() %>% str() 
# Error: Some columns have more than one class
# Check that our new function is type consistent
df[1:2] %>% col_classes() %>% str()
##  chr [1:2] "integer" "numeric"

Programming with NSE functions

Let’s take a look at a function that uses the non-standard evaluation (NSE) function filter() from the dplyr package:

big_x <- function(df, threshold) {
  dplyr::filter(df, x > threshold)
}

This big_x() function attempts to return all rows in df where the x column exceeds a certain threshold. Let’s get a feel for how it might be used.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
diamonds_sub <- sample_n(diamonds, size = 1000)

# Use big_x() to find rows in diamonds_sub where x > 7
big_x(diamonds_sub, threshold = 7)
## # A tibble: 121 x 10
##    carat cut       color clarity depth table price     x     y     z
##    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
##  1  1.51 Very Good J     I1       63.3    57  4600  7.31  7.23  4.6 
##  2  2    Good      J     VS2      58      62 13542  8.17  8.25  4.76
##  3  2.06 Good      I     VS2      60.2    64 15046  8.34  8.14  4.97
##  4  1.35 Ideal     D     VS2      61.3    57 10602  7.09  7.13  4.36
##  5  1.51 Very Good E     VS1      63.2    58 14904  7.24  7.16  4.55
##  6  2.29 Fair      J     VS2      57      66 13701  8.8   8.6   4.98
##  7  1.52 Fair      I     VVS2     64.9    56  7303  7.28  7.21  4.7 
##  8  1.71 Premium   H     VS1      62.1    59 10457  7.63  7.55  4.71
##  9  1.71 Good      D     SI1      64.1    59 14426  7.51  7.46  4.8 
## 10  2.01 Very Good H     VS2      60.7    56 18561  8.08  8.17  4.93
## # ... with 111 more rows

##◙ When things go wrong

Now, let’s see how this function might fail. There are two instances in which the non-standard evaluation of filter() could cause surprising results:

  1. The x column doesn’t exist in df.
  2. There is a threshold column in df.

Let’s illustrate these failures. In each case we’ll use big_x() in the same way as the previous exercise, so we should expect the same output. However, not only do we get unexpected outputs, there is no indication (i.e. error message) that lets us know something might have gone wrong.

# Create a threshold column with value 100
diamonds_sub$threshold <- 100

# Use big_x() to find rows in diamonds_sub where x > 7
big_x(diamonds_sub, 7)
## # A tibble: 0 x 11
## # ... with 11 variables: carat <dbl>, cut <ord>, color <ord>,
## #   clarity <ord>, depth <dbl>, table <dbl>, price <int>, x <dbl>,
## #   y <dbl>, z <dbl>, threshold <dbl>

What to do?

To avoid the problems caused by non-standard evaluation functions, you could avoid using them. In our example, we could achieve the same results by using standard subsetting (i.e. []) instead of filter(). For more insight into dealing with NSE and how to write your own non-standard evaluation functions, we recommend reading Hadley’s vignette on the topic. Also, programming with the NSE functions in dplyr will be easier in a future version.

If you do need to use non-standard evaluation functions, it’s up to you to provide protection against the problem cases. That means you need to know what the problem cases are, to check for them, and to fail explicitly.

To see what that might look like, let’s rewrite big_x() to fail for our problem cases.

big_x <- function(df, threshold) {
  # Write a check for x not being in df
  if (!"x" %in% colnames(df)) { 
    stop("df must contain variable called x", call. = FALSE)
  }
  
  # Write a check for threshold being in df
  if ("threshold" %in% colnames(df)) {
    stop("df must not contain variable called threshold", call. = FALSE)
  }
  
  dplyr::filter(df, x > threshold)
}